Kiln » largefiles » largefiles-kiln
Clone URL:  
Pushed to one repository · View In Graph Contained in kiln/default and tip

kiln/default largefiles -> largefiles-kiln

Changeset d5591f6230a1

Parents 877b088b0985

Parents bc714da66961

by Profile picture of User 521Andrew Pritchard <andrewp@fogcreek.com>

Changes to 7 files · Browse files at d5591f6230a1 Showing diff from parent 877b088b0985 bc714da66961 Diff from another changeset...

Change 1 of 1 Show Entire File __init__.py Stacked
 
7
8
9
10
11
12
13
14
15
16
 
 
 
 
 
 
17
18
19
20
21
22
 
 
 
 
 
23
24
25
26
27
 
 
 
28
29
30
 
7
8
9
 
 
 
 
 
 
 
10
11
12
13
14
15
16
 
 
 
 
 
17
18
19
20
21
22
 
 
 
 
23
24
25
26
27
28
@@ -7,24 +7,22 @@
   '''track large binary files   -Large binary files tend to be not very compressible, not very "diffable", -and not at all mergeable. Such files are not handled well by Mercurial\'s -storage format (revlog), which is based on compressed binary deltas. -lfiles solves this problem by adding a centralized client-server layer on -top of Mercurial: big files live in a *central store* out on the network -somewhere, and you only fetch the big files that you need when you need -them. +Large binary files tend to be not very compressible, not very "diffable", and +not at all mergeable. Such files are not handled well by Mercurial\'s storage +format (revlog), which is based on compressed binary deltas. largefiles solves +this problem by adding a centralized client-server layer on top of Mercurial: +largefiles live in a *central store* out on the network somewhere, and you only +fetch the ones that you need when you need them.   -lfiles works by maintaining a *standin* in .hglf/ for each big file. -The standins are small (41 bytes: an SHA-1 hash plus newline) and are -tracked by Mercurial. Big file revisions are identified by the SHA-1 hash -of their contents, which is written to the standin. lfiles uses that -revision ID to get/put big file revisions from/to the central store. +largefiles works by maintaining a *standin* in .hglf/ for each largefile. The +standins are small (41 bytes: an SHA-1 hash plus newline) and are tracked by +Mercurial. Largefile revisions are identified by the SHA-1 hash of their +contents, which is written to the standin. largefiles uses that revision ID to +get/put largefile revisions from/to the central store.   -A complete tutorial for using lfiles is included in ``usage.txt`` in the -lfiles source distribution. See -http://vc.gerg.ca/hg/hg-lfiles/raw-file/tip/usage.txt for the latest -version. +A complete tutorial for using lfiles is included in ``usage.txt`` in the lfiles +source distribution. See +https://developers.kilnhg.com/Repo/Kiln/largefiles/largefiles/File/usage.txt  '''    from mercurial import commands
Change 1 of 1 Show Entire File design.txt Stacked
 
1
 
2
3
4
5
 
6
7
8
9
 
 
 
 
10
11
12
13
14
 
 
 
 
 
15
16
17
18
19
 
 
 
 
20
21
22
23
24
 
 
25
26
27
28
29
30
31
 
 
 
 
32
33
34
35
36
37
 
 
 
 
 
 
38
39
40
41
42
 
 
 
 
 
1
2
3
4
 
5
6
 
 
 
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
 
 
 
23
24
25
26
27
28
29
 
 
30
31
32
33
34
 
 
 
 
35
36
37
38
39
 
 
 
 
 
40
41
42
43
44
45
46
 
 
 
 
47
48
49
@@ -1,42 +1,49 @@
-= bfiles - manage large binary files = += largefiles - manage large binary files =  This extension is based off of Greg Ward's bfiles extension which can be found  at http://mercurial.selenic.com/wiki/BfilesExtension.   -== The Bfile Store == +== The largefile store ==   -Bfile stores are simply directories where each file is a bfile. The filename -is the sha1 hash of the bfile. The path is not necessary because all interactions -with the store have one of these forms: +largefile stores are, in the typical use case, centralized servers that have +every past revision of a given binary file. Each largefile is identified by +its sha1 hash, and all interactions with the store take one of the following +forms.    -Download a bfile with this hash  -Upload a bfile with this hash  -Check if the store has a bfile with this hash   +largefiles stores can take one of two forms: + +-Directories on a network file share +-Mercurial wireproto servers, either via ssh or http (hgweb) +  == The Local Repository ==   -The local repository has a bfile store in .hg/bfiles which holds a subset of the -bfiles needed. On a clone only the bfiles at tip are downloaded. When bfiles are -downloaded from the central store a copy is saved in this store. +The local repository has a largefile cache in .hg/largefiles which holds a +subset of the largefiles needed. On a clone only the largefiles at tip are +downloaded. When largefiles are downloaded from the central store, a copy is +saved in this store.    == The Global Cache ==   -Bfiles in a local repository store are hard linked to files in the global cache. Before -a file is downloaded we check if it is in the global cache. +largefiles in a local repository cache are hardlinked to files in the global +cache. Before a file is downloaded we check if it is in the global cache.    == Implementation Details ==   -Each bfile has a standin which is in .hgbfiles. The standin is tracked by Mercurial. -The contents of the standin is the SHA1 hash of the bfile. When a bfile is added/removed/ -copied/renamed/etc the same operation is applied to the standin. Thus the history of the -standin is the history of the bfile. +Each largefile has a standin which is in .hglf. The standin is tracked by +Mercurial. The standin contains the SHA1 hash of the largefile. When a +largefile is added/removed/copied/renamed/etc the same operation is applied to +the standin. Thus the history of the standin is the history of the largefile.   -For performance reasons the contents of a standin is only updated before a commit. -Standins are added/removed/copied/renamed from add/remove/copy/rename Mercurial -commands but their contents will not be updated. The contents of a standin will always -be the hash of the bfile as of the last commit. To support some commands (revert) some -standins are temporarily updated but will be changed back after the command is finished. +For performance reasons, the contents of a standin are only updated before a +commit. Standins are added/removed/copied/renamed from add/remove/copy/rename +Mercurial commands but their contents will not be updated. The contents of a +standin will always be the hash of the largefile as of the last commit. To +support some commands (revert) some standins are temporarily updated but will +be changed back after the command is finished.   -A Mercurial dirstate object tracks the state of the bfiles. The dirstate uses the -last modified time and current size to detect if a file has changed (without reading -the entire contents of the file). - +A Mercurial dirstate object tracks the state of the largefiles. The dirstate +uses the last modified time and current size to detect if a file has changed +(without reading the entire contents of the file).
Change 1 of 1 Show Entire File lfcommands.py Stacked
 
19
20
21
22
 
23
24
25
26
27
28
29
30
31
 
 
 
 
 
 
32
33
34
 
19
20
21
 
22
23
 
 
 
 
 
 
 
 
24
25
26
27
28
29
30
31
32
@@ -19,16 +19,14 @@
 # -- Commands ----------------------------------------------------------    def lfconvert(ui, src, dest, *pats, **opts): - '''Convert a repository to a repository using largefiles + '''Convert a normal repository to a largefiles repository   - Convert source repository creating an identical - repository, except that all files that match the - patterns given, or are over a given size will - be added as largefiles. The size of a file is the size of the - first version of the file. After running this command you - will need to set the store then run lfput on the new - repository to upload the largefiles to the central store. - ''' + Convert source repository creating an identical repository, except that all + files that match the patterns given, or are over the given size will be + added as largefiles. The size used to determine whether or not to track a + file as a largefile is the size of the first version of the file. After + running this command you will need to make sure that largefiles is enabled + anywhere you intend to push the new repository.'''     if opts['tonormal']:   tolfile = False
Change 1 of 1 Show Entire File lfutil.py Stacked
 
431
432
433
434
435
436
437
438
439
440
441
442
 
443
444
445
446
447
448
 
 
449
450
451
 
431
432
433
 
 
 
 
 
 
 
 
 
434
435
436
437
438
439
440
441
442
443
444
445
@@ -431,21 +431,15 @@
  util.makedirs(os.path.dirname(filename))   if os.path.exists(filename):   os.unlink(filename) - if os.name == 'posix': - # Yuck: on Unix, go through open(2) to ensure that the caller's mode is - # filtered by umask() in the kernel, where it's supposed to be done. - wfile = os.fdopen(os.open(filename, os.O_WRONLY|os.O_CREAT, - getmode(executable)), 'wb') - else: - # But on Windows, use open() directly, since passing mode='wb' to - # os.fdopen() does not work. (Python bug?) - wfile = open(filename, 'wb') + wfile = open(filename, 'wb')     try:   wfile.write(hash)   wfile.write('\n')   finally:   wfile.close() + if os.path.exists(filename): + os.chmod(filename, getmode(executable))    def getexecutable(filename):   mode = os.stat(filename).st_mode
Change 1 of 1 Show Entire File proto.py Stacked
 
149
150
151
152
 
153
154
 
155
156
157
158
159
 
160
161
 
162
 
149
150
151
 
152
153
 
154
155
156
157
158
 
159
160
 
161
162
@@ -149,14 +149,14 @@
   def sshrepo_callstream(self, cmd, **args):   if cmd == 'heads' and self.capable('largefiles'): - cmd = 'k' + cmd + cmd = 'lheads'   if cmd == 'batch' and self.capable('largefiles'): - args['cmds'] = args['cmds'].replace('heads ', 'kheads ') + args['cmds'] = args['cmds'].replace('heads ', 'lheads ')   return ssh_oldcallstream(self, cmd, **args)    def httprepo_callstream(self, cmd, **args):   if cmd == 'heads' and self.capable('largefiles'): - cmd = 'k' + cmd + cmd = 'lheads'   if cmd == 'batch' and self.capable('largefiles'): - args['cmds'] = args['cmds'].replace('heads ', 'kheads ') + args['cmds'] = args['cmds'].replace('heads ', 'lheads ')   return http_oldcallstream(self, cmd, **args)
Change 1 of 1 Show Entire File uisetup.py Stacked
 
83
84
85
86
 
87
88
89
 
83
84
85
 
86
87
88
89
@@ -83,7 +83,7 @@
  # ... and wrap some existing ones   wireproto.commands['capabilities'] = (proto.capabilities, '')   wireproto.commands['heads'] = (proto.heads, '') - wireproto.commands['kheads'] = (wireproto.heads, '') + wireproto.commands['lheads'] = (wireproto.heads, '')     # make putlfile behave the same as push and {get,stat}lfile behave the same   # as pull w.r.t. permissions checks
Change 1 of 1 Show Entire File usage.txt Stacked
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@@ -0,0 +1,51 @@
+Largefiles allows for tracking large, incompressible binary files in Mercurial +without requiring excessive bandwidth for clones and pulls. Files added as +largefiles are not tracked directly by Mercurial; rather, their revisions are +identified by a checksum, and Mercurial tracks these checksums. This way, when +you clone a repository or pull in changesets, the large files in older +revisions of the repository are not needed, and only the ones needed to update +to the current version are downloaded. This saves both disk space and +bandwidth. + +If you are starting a new repository or adding new large binary files, using +largefiles for them is as easy as adding '--large' to your hg add command. For +example: + +$ dd if=/dev/urandom of=thisfileislarge count=2000 +$ hg add --large thisfileislarge +$ hg commit -m 'add thisfileislarge, which is large, as a largefile' + +When you push a changeset that affects largefiles to a remote repository, its +largefile revisions will be uploaded along with it. Note that the remote +Mercurial must also have the largefiles extension enabled for this to work. + +When you pull a changeset that affects largefiles from a remote repository, +nothing different from Mercurial's normal behavior happens. However, when you +update to such a revision, any largefiles needed by that revision are +downloaded and cached if they have never been downloaded before. This means +that network access is required to update to revision you have not yet updated +to. + +If you already have large files tracked by Mercurial without the largefiles +extension, you will need to convert your repository in order to benefit from +largefiles. This is done with the 'hg lfconvert' command: + +$ hg lfconvert --size 10 oldrepo newrepo + +By default, in repositories that already have largefiles in them, any new file +over 10MB will automatically be added as largefiles. To change this +threshhold, set [largefiles].size in your Mercurial config file to the minimum +size in megabytes to track as a largefile, or use the --lfsize option to the +add command (also in megabytes): + +[largefiles] +size = 2 + +$ hg add --lfsize 2 + +The [largefiles].patterns config option allows you to specify specific +space-separated filename patterns (in shell glob syntax) that should always be +tracked as largefiles: + +[largefiles] +pattens = *.jpg *.{png,bmp} library.zip content/audio/*