Changeset 89f711a5cb05…
Parent 1408ea59a63a…
by Andrew Pritchard <andrewp@fogcreek.com>
Changes to 3 files · Browse files at 89f711a5cb05 Showing diff from parent 1408ea59a63a Diff from another changeset...
@@ -7,24 +7,22 @@
'''track large binary files
-Large binary files tend to be not very compressible, not very "diffable",
-and not at all mergeable. Such files are not handled well by Mercurial\'s
-storage format (revlog), which is based on compressed binary deltas.
-lfiles solves this problem by adding a centralized client-server layer on
-top of Mercurial: big files live in a *central store* out on the network
-somewhere, and you only fetch the big files that you need when you need
-them.
+Large binary files tend to be not very compressible, not very "diffable", and
+not at all mergeable. Such files are not handled well by Mercurial\'s storage
+format (revlog), which is based on compressed binary deltas. largefiles solves
+this problem by adding a centralized client-server layer on top of Mercurial:
+largefiles live in a *central store* out on the network somewhere, and you only
+fetch the ones that you need when you need them.
-lfiles works by maintaining a *standin* in .hglf/ for each big file.
-The standins are small (41 bytes: an SHA-1 hash plus newline) and are
-tracked by Mercurial. Big file revisions are identified by the SHA-1 hash
-of their contents, which is written to the standin. lfiles uses that
-revision ID to get/put big file revisions from/to the central store.
+largefiles works by maintaining a *standin* in .hglf/ for each largefile. The
+standins are small (41 bytes: an SHA-1 hash plus newline) and are tracked by
+Mercurial. Largefile revisions are identified by the SHA-1 hash of their
+contents, which is written to the standin. largefiles uses that revision ID to
+get/put largefile revisions from/to the central store.
-A complete tutorial for using lfiles is included in ``usage.txt`` in the
-lfiles source distribution. See
-http://vc.gerg.ca/hg/hg-lfiles/raw-file/tip/usage.txt for the latest
-version.
+A complete tutorial for using lfiles is included in ``usage.txt`` in the lfiles
+source distribution. See
+http://developers.kilnhg.com/Repo/Kiln/largefiles/largefiles/ TODO version.
'''
from mercurial import commands
|
@@ -1,42 +1,49 @@ - = bfiles - manage large binary files =
+= largefiles - manage large binary files =
This extension is based off of Greg Ward's bfiles extension which can be found
at http://mercurial.selenic.com/wiki/BfilesExtension.
-== The Bfile Store ==
+== The largefile store ==
-Bfile stores are simply directories where each file is a bfile. The filename
-is the sha1 hash of the bfile. The path is not necessary because all interactions
-with the store have one of these forms:
+largefile stores are, in the typical use case, centralized servers that have
+every past revision of a given binary file. Each largefile is identified by
+its sha1 hash, and all interactions with the store take one of the following
+forms.
-Download a bfile with this hash
-Upload a bfile with this hash
-Check if the store has a bfile with this hash
+largefiles stores can take one of two forms:
+
+-Directories on a network file share
+-Mercurial wireproto servers, either via ssh or http (hgweb)
+
== The Local Repository ==
-The local repository has a bfile store in .hg/bfiles which holds a subset of the
-bfiles needed. On a clone only the bfiles at tip are downloaded. When bfiles are
-downloaded from the central store a copy is saved in this store.
+The local repository has a largefile cache in .hg/largefiles which holds a
+subset of the largefiles needed. On a clone only the largefiles at tip are
+downloaded. When largefiles are downloaded from the central store, a copy is
+saved in this store.
== The Global Cache ==
-Bfiles in a local repository store are hard linked to files in the global cache. Before
-a file is downloaded we check if it is in the global cache.
+largefiles in a local repository cache are hardlinked to files in the global
+cache. Before a file is downloaded we check if it is in the global cache.
== Implementation Details ==
-Each bfile has a standin which is in .hgbfiles. The standin is tracked by Mercurial.
-The contents of the standin is the SHA1 hash of the bfile. When a bfile is added/removed/
-copied/renamed/etc the same operation is applied to the standin. Thus the history of the
-standin is the history of the bfile.
+Each largefile has a standin which is in .hglf. The standin is tracked by
+Mercurial. The standin contains the SHA1 hash of the largefile. When a
+largefile is added/removed/copied/renamed/etc the same operation is applied to
+the standin. Thus the history of the standin is the history of the largefile.
-For performance reasons the contents of a standin is only updated before a commit.
-Standins are added/removed/copied/renamed from add/remove/copy/rename Mercurial
-commands but their contents will not be updated. The contents of a standin will always
-be the hash of the bfile as of the last commit. To support some commands (revert) some
-standins are temporarily updated but will be changed back after the command is finished.
+For performance reasons, the contents of a standin are only updated before a
+commit. Standins are added/removed/copied/renamed from add/remove/copy/rename
+Mercurial commands but their contents will not be updated. The contents of a
+standin will always be the hash of the largefile as of the last commit. To
+support some commands (revert) some standins are temporarily updated but will
+be changed back after the command is finished.
-A Mercurial dirstate object tracks the state of the bfiles. The dirstate uses the
-last modified time and current size to detect if a file has changed (without reading
-the entire contents of the file).
-
+A Mercurial dirstate object tracks the state of the largefiles. The dirstate
+uses the last modified time and current size to detect if a file has changed
+(without reading the entire contents of the file).
|
|
@@ -0,0 +1,51 @@ + Largefiles allows for tracking large, incompressible binary files in Mercurial
+without requiring excessive bandwidth for clones and pulls. Files added as
+largefiles are not tracked directly by Mercurial; rather, their revisions are
+identified by a checksum, and Mercurial tracks these checksums. This way, when
+you clone a repository or pull in changesets, the large files in older
+revisions of the repository are not needed, and only the ones needed to update
+to the current version are downloaded. This saves both disk space and
+bandwidth.
+
+If you are starting a new repository or adding new large binary files, using
+largefiles for them is as easy as adding '--large' to your hg add command. For
+example:
+
+$ dd if=/dev/urandom of=thisfileislarge count=2000
+$ hg add --large thisfileislarge
+$ hg commit -m 'add thisfileislarge, which is large, as a largefile'
+
+When you push a changeset that affects largefiles to a remote repository, its
+largefile revisions will be uploaded along with it. Note that the remote
+Mercurial must also have the largefiles extension enabled for this to work.
+
+When you pull a changeset that affects largefiles from a remote repository,
+nothing different from Mercurial's normal behavior happens. However, when you
+update to such a revision, any largefiles needed by that revision are
+downloaded and cached if they have never been downloaded before. This means
+that network access is required to update to revision you have not yet updated
+to.
+
+If you already have large files tracked by Mercurial without the largefiles
+extension, you will need to convert your repository in order to benefit from
+largefiles. This is done with the 'hg lfconvert' command:
+
+$ hg lfconvert --size 10 oldrepo newrepo
+
+By default, in repositories that already have largefiles in them, any new file
+over 10MB will automatically be added as largefiles. To change this
+threshhold, set [largefiles].size in your Mercurial config file to the minimum
+size in megabytes to track as a largefile, or use the --lfsize option to the
+add command (also in megabytes):
+
+[largefiles]
+size = 2
+
+$ hg add --lfsize 2
+
+The [largefiles].patterns config option allows you to specify specific
+space-separated filename patterns (in shell glob syntax) that should always be
+tracked as largefiles:
+
+[largefiles]
+pattens = *.jpg *.{png,bmp} library.zip content/audio/*
|
Loading...