hgbook

changeset 111:34b8b7a15ea1

More material.
author Bryan O'Sullivan <bos@serpentine.com>
date Fri Nov 10 15:32:33 2006 -0800 (2006-11-10)
parents 75c076c7a374
children 2fcead053b7a
files en/concepts.tex
line diff
     1.1 --- a/en/concepts.tex	Fri Nov 10 15:09:49 2006 -0800
     1.2 +++ b/en/concepts.tex	Fri Nov 10 15:32:33 2006 -0800
     1.3 @@ -8,9 +8,9 @@
     1.4  
     1.5  This understanding gives me confidence that Mercurial has been
     1.6  carefully designed to be both \emph{safe} and \emph{efficient}.  And
     1.7 -just as importantly, if I have a good idea what the software is doing
     1.8 -when I perform a revision control task, I'm less likely to be
     1.9 -surprised by its behaviour.
    1.10 +just as importantly, if it's easy for me to retain a good idea of what
    1.11 +the software is doing when I perform a revision control task, I'm less
    1.12 +likely to be surprised by its behaviour.
    1.13  
    1.14  \section{Mercurial's historical record}
    1.15  
    1.16 @@ -179,7 +179,10 @@
    1.17  Along with delta or snapshot information, a revlog entry contains a
    1.18  cryptographic hash of the data that it represents.  This makes it
    1.19  difficult to forge the contents of a revision, and easy to detect
    1.20 -accidental corruption.
    1.21 +accidental corruption.  The hash that Mercurial uses is SHA-1, which
    1.22 +is 160 bits long.  Although all revision data is hashed, the changeset
    1.23 +hashes that you see as an end user are from revisions of the
    1.24 +changelog.  Manifest and file hashes are only used behind the scenes.
    1.25  
    1.26  Mercurial checks these hashes when retrieving file revisions and when
    1.27  pulling changes from a repository.  If it encounters an integrity
    1.28 @@ -329,7 +332,34 @@
    1.29  \filename{dirstate}.  The file named \filename{dirstate} is thus
    1.30  guaranteed to be complete, not partially written.
    1.31  
    1.32 -
    1.33 +\subsection{Avoiding seeks}
    1.34 +
    1.35 +Critical to Mercurial's performance is the avoidance of seeks of the
    1.36 +disk head, since any seek is far more expensive than even a
    1.37 +comparatively large read operation.
    1.38 +
    1.39 +This is why, for example, the dirstate is stored in a single file.  If
    1.40 +there were a dirstate file per directory that Mercurial tracked, the
    1.41 +disk would seek once per directory.  Instead, Mercurial reads the
    1.42 +entire single dirstate file in one step.
    1.43 +
    1.44 +Mercurial also uses a ``copy on write'' scheme when cloning a
    1.45 +repository on local storage.  Instead of copying every revlog file
    1.46 +from the old repository into the new repository, it makes a ``hard
    1.47 +link'', which is a shorthand way to say ``these two names point to the
    1.48 +same file''.  When Mercurial is about to write to one of a revlog's
    1.49 +files, it checks to see if the number of names pointing at the file is
    1.50 +greater than one.  If it is, more than one repository is using the
    1.51 +file, so Mercurial makes a new copy of the file that is private to
    1.52 +this repository.
    1.53 +
    1.54 +A few revision control developers have pointed out that this idea of
    1.55 +making a complete private copy of a file is not very efficient in its
    1.56 +use of storage.  While this is true, storage is cheap, and this method
    1.57 +gives the highest performance while deferring most book-keeping to the
    1.58 +operating system.  An alternative scheme would most likely reduce
    1.59 +performance and increase the complexity of the software, each of which
    1.60 +is much more important to the ``feel'' of day-to-day use.
    1.61  
    1.62  %%% Local Variables: 
    1.63  %%% mode: latex