hgbook

changeset 992:8b0f1e2984d0

French translation : sync with original ch04-concepts
author Frédéric Bouquet <youshe.jaalon@gmail.com>
date Fri Sep 11 14:30:20 2009 +0200 (2009-09-11)
parents b4ff7b04efdc
children 71dbda516572
files fr/ch04-concepts.xml
line diff
     1.1 --- a/fr/ch04-concepts.xml	Thu Sep 10 14:45:17 2009 +0200
     1.2 +++ b/fr/ch04-concepts.xml	Fri Sep 11 14:30:20 2009 +0200
     1.3 @@ -1,710 +1,778 @@
     1.4  <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
     1.5  
     1.6 -<chapter>
     1.7 -<title>Behind the scenes</title>
     1.8 -<para>\label{chap:concepts}</para>
     1.9 -
    1.10 -<para>Unlike many revision control systems, the concepts upon which
    1.11 -Mercurial is built are simple enough that it's easy to understand how
    1.12 -the software really works.  Knowing this certainly isn't necessary,
    1.13 -but I find it useful to have a <quote>mental model</quote> of what's going on.</para>
    1.14 -
    1.15 -<para>This understanding gives me confidence that Mercurial has been
    1.16 -carefully designed to be both <emphasis>safe</emphasis> and <emphasis>efficient</emphasis>.  And
    1.17 -just as importantly, if it's easy for me to retain a good idea of what
    1.18 -the software is doing when I perform a revision control task, I'm less
    1.19 -likely to be surprised by its behaviour.</para>
    1.20 -
    1.21 -<para>In this chapter, we'll initially cover the core concepts behind
    1.22 -Mercurial's design, then continue to discuss some of the interesting
    1.23 -details of its implementation.</para>
    1.24 -
    1.25 -<sect1>
    1.26 -<title>Mercurial's historical record</title>
    1.27 -
    1.28 -<sect2>
    1.29 -<title>Tracking the history of a single file</title>
    1.30 -
    1.31 -<para>When Mercurial tracks modifications to a file, it stores the history
    1.32 -of that file in a metadata object called a <emphasis>filelog</emphasis>.  Each entry
    1.33 -in the filelog contains enough information to reconstruct one revision
    1.34 -of the file that is being tracked.  Filelogs are stored as files in
    1.35 -the <filename role="special" class="directory">.hg/store/data</filename> directory.  A filelog contains two kinds
    1.36 -of information: revision data, and an index to help Mercurial to find
    1.37 -a revision efficiently.</para>
    1.38 -
    1.39 -<para>A file that is large, or has a lot of history, has its filelog stored
    1.40 -in separate data (<quote><literal>.d</literal></quote> suffix) and index (<quote><literal>.i</literal></quote>
    1.41 -suffix) files.  For small files without much history, the revision
    1.42 -data and index are combined in a single <quote><literal>.i</literal></quote> file.  The
    1.43 -correspondence between a file in the working directory and the filelog
    1.44 -that tracks its history in the repository is illustrated in
    1.45 -figure <xref linkend="fig:concepts:filelog"/>.</para>
    1.46 -
    1.47 -<informalfigure>
    1.48 -
    1.49 -<para>  <mediaobject><imageobject><imagedata fileref="filelog"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
    1.50 -  \caption{Relationships between files in working directory and
    1.51 -    filelogs in repository}
    1.52 -  \label{fig:concepts:filelog}</para>
    1.53 -</informalfigure>
    1.54 -
    1.55 -</sect2>
    1.56 -<sect2>
    1.57 -<title>Managing tracked files</title>
    1.58 -
    1.59 -<para>Mercurial uses a structure called a <emphasis>manifest</emphasis> to collect
    1.60 -together information about the files that it tracks.  Each entry in
    1.61 -the manifest contains information about the files present in a single
    1.62 -changeset.  An entry records which files are present in the changeset,
    1.63 -the revision of each file, and a few other pieces of file metadata.</para>
    1.64 -
    1.65 -</sect2>
    1.66 -<sect2>
    1.67 -<title>Recording changeset information</title>
    1.68 -
    1.69 -<para>The <emphasis>changelog</emphasis> contains information about each changeset.  Each
    1.70 -revision records who committed a change, the changeset comment, other
    1.71 -pieces of changeset-related information, and the revision of the
    1.72 -manifest to use.
    1.73 -</para>
    1.74 -
    1.75 -</sect2>
    1.76 -<sect2>
    1.77 -<title>Relationships between revisions</title>
    1.78 -
    1.79 -<para>Within a changelog, a manifest, or a filelog, each revision stores a
    1.80 -pointer to its immediate parent (or to its two parents, if it's a
    1.81 -merge revision).  As I mentioned above, there are also relationships
    1.82 -between revisions <emphasis>across</emphasis> these structures, and they are
    1.83 -hierarchical in nature.
    1.84 -</para>
    1.85 -
    1.86 -<para>For every changeset in a repository, there is exactly one revision
    1.87 -stored in the changelog.  Each revision of the changelog contains a
    1.88 -pointer to a single revision of the manifest.  A revision of the
    1.89 -manifest stores a pointer to a single revision of each filelog tracked
    1.90 -when that changeset was created.  These relationships are illustrated
    1.91 -in figure <xref linkend="fig:concepts:metadata"/>.
    1.92 -</para>
    1.93 -
    1.94 -<informalfigure>
    1.95 -
    1.96 -<para>  <mediaobject><imageobject><imagedata fileref="metadata"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
    1.97 -  <caption><para>Metadata relationships</para></caption>
    1.98 -  \label{fig:concepts:metadata}
    1.99 -</para>
   1.100 -</informalfigure>
   1.101 -
   1.102 -<para>As the illustration shows, there is <emphasis>not</emphasis> a <quote>one to one</quote>
   1.103 -relationship between revisions in the changelog, manifest, or filelog.
   1.104 -If the manifest hasn't changed between two changesets, the changelog
   1.105 -entries for those changesets will point to the same revision of the
   1.106 -manifest.  If a file that Mercurial tracks hasn't changed between two
   1.107 -changesets, the entry for that file in the two revisions of the
   1.108 -manifest will point to the same revision of its filelog.
   1.109 -</para>
   1.110 -
   1.111 -</sect2>
   1.112 -</sect1>
   1.113 -<sect1>
   1.114 -<title>Safe, efficient storage</title>
   1.115 -
   1.116 -<para>The underpinnings of changelogs, manifests, and filelogs are provided
   1.117 -by a single structure called the <emphasis>revlog</emphasis>.
   1.118 -</para>
   1.119 -
   1.120 -<sect2>
   1.121 -<title>Efficient storage</title>
   1.122 -
   1.123 -<para>The revlog provides efficient storage of revisions using a
   1.124 -<emphasis>delta</emphasis> mechanism.  Instead of storing a complete copy of a file
   1.125 -for each revision, it stores the changes needed to transform an older
   1.126 -revision into the new revision.  For many kinds of file data, these
   1.127 -deltas are typically a fraction of a percent of the size of a full
   1.128 -copy of a file.
   1.129 -</para>
   1.130 -
   1.131 -<para>Some obsolete revision control systems can only work with deltas of
   1.132 -text files.  They must either store binary files as complete snapshots
   1.133 -or encoded into a text representation, both of which are wasteful
   1.134 -approaches.  Mercurial can efficiently handle deltas of files with
   1.135 -arbitrary binary contents; it doesn't need to treat text as special.
   1.136 -</para>
   1.137 -
   1.138 -</sect2>
   1.139 -<sect2>
   1.140 -<title>Safe operation</title>
   1.141 -<para>\label{sec:concepts:txn}
   1.142 -</para>
   1.143 -
   1.144 -<para>Mercurial only ever <emphasis>appends</emphasis> data to the end of a revlog file.
   1.145 -It never modifies a section of a file after it has written it.  This
   1.146 -is both more robust and efficient than schemes that need to modify or
   1.147 -rewrite data.
   1.148 -</para>
   1.149 -
   1.150 -<para>In addition, Mercurial treats every write as part of a
   1.151 -<emphasis>transaction</emphasis> that can span a number of files.  A transaction is
   1.152 -<emphasis>atomic</emphasis>: either the entire transaction succeeds and its effects
   1.153 -are all visible to readers in one go, or the whole thing is undone.
   1.154 -This guarantee of atomicity means that if you're running two copies of
   1.155 -Mercurial, where one is reading data and one is writing it, the reader
   1.156 -will never see a partially written result that might confuse it.
   1.157 -</para>
   1.158 -
   1.159 -<para>The fact that Mercurial only appends to files makes it easier to
   1.160 -provide this transactional guarantee.  The easier it is to do stuff
   1.161 -like this, the more confident you should be that it's done correctly.
   1.162 -</para>
   1.163 -
   1.164 -</sect2>
   1.165 -<sect2>
   1.166 -<title>Fast retrieval</title>
   1.167 -
   1.168 -<para>Mercurial cleverly avoids a pitfall common to all earlier
   1.169 -revision control systems: the problem of <emphasis>inefficient retrieval</emphasis>.
   1.170 -Most revision control systems store the contents of a revision as an
   1.171 -incremental series of modifications against a <quote>snapshot</quote>.  To
   1.172 -reconstruct a specific revision, you must first read the snapshot, and
   1.173 -then every one of the revisions between the snapshot and your target
   1.174 -revision.  The more history that a file accumulates, the more
   1.175 -revisions you must read, hence the longer it takes to reconstruct a
   1.176 -particular revision.
   1.177 -</para>
   1.178 -
   1.179 -<informalfigure>
   1.180 -
   1.181 -<para>  <mediaobject><imageobject><imagedata fileref="snapshot"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.182 -  <caption><para>Snapshot of a revlog, with incremental deltas</para></caption>
   1.183 -  \label{fig:concepts:snapshot}
   1.184 -</para>
   1.185 -</informalfigure>
   1.186 -
   1.187 -<para>The innovation that Mercurial applies to this problem is simple but
   1.188 -effective.  Once the cumulative amount of delta information stored
   1.189 -since the last snapshot exceeds a fixed threshold, it stores a new
   1.190 -snapshot (compressed, of course), instead of another delta.  This
   1.191 -makes it possible to reconstruct <emphasis>any</emphasis> revision of a file
   1.192 -quickly.  This approach works so well that it has since been copied by
   1.193 -several other revision control systems.
   1.194 -</para>
   1.195 -
   1.196 -<para>Figure <xref linkend="fig:concepts:snapshot"/> illustrates the idea.  In an entry
   1.197 -in a revlog's index file, Mercurial stores the range of entries from
   1.198 -the data file that it must read to reconstruct a particular revision.
   1.199 -</para>
   1.200 -
   1.201 -<sect3>
   1.202 -<title>Aside: the influence of video compression</title>
   1.203 -
   1.204 -<para>If you're familiar with video compression or have ever watched a TV
   1.205 -feed through a digital cable or satellite service, you may know that
   1.206 -most video compression schemes store each frame of video as a delta
   1.207 -against its predecessor frame.  In addition, these schemes use
   1.208 -<quote>lossy</quote> compression techniques to increase the compression ratio, so
   1.209 -visual errors accumulate over the course of a number of inter-frame
   1.210 -deltas.
   1.211 -</para>
   1.212 -
   1.213 -<para>Because it's possible for a video stream to <quote>drop out</quote> occasionally
   1.214 -due to signal glitches, and to limit the accumulation of artefacts
   1.215 -introduced by the lossy compression process, video encoders
   1.216 -periodically insert a complete frame (called a <quote>key frame</quote>) into the
   1.217 -video stream; the next delta is generated against that frame.  This
   1.218 -means that if the video signal gets interrupted, it will resume once
   1.219 -the next key frame is received.  Also, the accumulation of encoding
   1.220 -errors restarts anew with each key frame.
   1.221 -</para>
   1.222 -
   1.223 -</sect3>
   1.224 -</sect2>
   1.225 -<sect2>
   1.226 -<title>Identification and strong integrity</title>
   1.227 -
   1.228 -<para>Along with delta or snapshot information, a revlog entry contains a
   1.229 -cryptographic hash of the data that it represents.  This makes it
   1.230 -difficult to forge the contents of a revision, and easy to detect
   1.231 -accidental corruption.
   1.232 -</para>
   1.233 -
   1.234 -<para>Hashes provide more than a mere check against corruption; they are
   1.235 -used as the identifiers for revisions.  The changeset identification
   1.236 -hashes that you see as an end user are from revisions of the
   1.237 -changelog.  Although filelogs and the manifest also use hashes,
   1.238 -Mercurial only uses these behind the scenes.
   1.239 -</para>
   1.240 -
   1.241 -<para>Mercurial verifies that hashes are correct when it retrieves file
   1.242 -revisions and when it pulls changes from another repository.  If it
   1.243 -encounters an integrity problem, it will complain and stop whatever
   1.244 -it's doing.
   1.245 -</para>
   1.246 -
   1.247 -<para>In addition to the effect it has on retrieval efficiency, Mercurial's
   1.248 -use of periodic snapshots makes it more robust against partial data
   1.249 -corruption.  If a revlog becomes partly corrupted due to a hardware
   1.250 -error or system bug, it's often possible to reconstruct some or most
   1.251 -revisions from the uncorrupted sections of the revlog, both before and
   1.252 -after the corrupted section.  This would not be possible with a
   1.253 -delta-only storage model.
   1.254 -</para>
   1.255 -
   1.256 -<para>\section{Revision history, branching,
   1.257 -  and merging}
   1.258 -</para>
   1.259 -
   1.260 -<para>Every entry in a Mercurial revlog knows the identity of its immediate
   1.261 -ancestor revision, usually referred to as its <emphasis>parent</emphasis>.  In fact,
   1.262 -a revision contains room for not one parent, but two.  Mercurial uses
   1.263 -a special hash, called the <quote>null ID</quote>, to represent the idea <quote>there
   1.264 -is no parent here</quote>.  This hash is simply a string of zeroes.
   1.265 -</para>
   1.266 -
   1.267 -<para>In figure <xref linkend="fig:concepts:revlog"/>, you can see an example of the
   1.268 -conceptual structure of a revlog.  Filelogs, manifests, and changelogs
   1.269 -all have this same structure; they differ only in the kind of data
   1.270 -stored in each delta or snapshot.
   1.271 -</para>
   1.272 -
   1.273 -<para>The first revision in a revlog (at the bottom of the image) has the
   1.274 -null ID in both of its parent slots.  For a <quote>normal</quote> revision, its
   1.275 -first parent slot contains the ID of its parent revision, and its
   1.276 -second contains the null ID, indicating that the revision has only one
   1.277 -real parent.  Any two revisions that have the same parent ID are
   1.278 -branches.  A revision that represents a merge between branches has two
   1.279 -normal revision IDs in its parent slots.
   1.280 -</para>
   1.281 -
   1.282 -<informalfigure>
   1.283 -
   1.284 -<para>  <mediaobject><imageobject><imagedata fileref="revlog"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.285 -  \caption{}
   1.286 -  \label{fig:concepts:revlog}
   1.287 -</para>
   1.288 -</informalfigure>
   1.289 -
   1.290 -</sect2>
   1.291 -</sect1>
   1.292 -<sect1>
   1.293 -<title>The working directory</title>
   1.294 -
   1.295 -<para>In the working directory, Mercurial stores a snapshot of the files
   1.296 -from the repository as of a particular changeset.
   1.297 -</para>
   1.298 -
   1.299 -<para>The working directory <quote>knows</quote> which changeset it contains.  When you
   1.300 -update the working directory to contain a particular changeset,
   1.301 -Mercurial looks up the appropriate revision of the manifest to find
   1.302 -out which files it was tracking at the time that changeset was
   1.303 -committed, and which revision of each file was then current.  It then
   1.304 -recreates a copy of each of those files, with the same contents it had
   1.305 -when the changeset was committed.
   1.306 -</para>
   1.307 -
   1.308 -<para>The <emphasis>dirstate</emphasis> contains Mercurial's knowledge of the working
   1.309 -directory.  This details which changeset the working directory is
   1.310 -updated to, and all of the files that Mercurial is tracking in the
   1.311 -working directory.
   1.312 -</para>
   1.313 -
   1.314 -<para>Just as a revision of a revlog has room for two parents, so that it
   1.315 -can represent either a normal revision (with one parent) or a merge of
   1.316 -two earlier revisions, the dirstate has slots for two parents.  When
   1.317 -you use the <command role="hg-cmd">hg update</command> command, the changeset that you update to
   1.318 -is stored in the <quote>first parent</quote> slot, and the null ID in the second.
   1.319 -When you <command role="hg-cmd">hg merge</command> with another changeset, the first parent
   1.320 -remains unchanged, and the second parent is filled in with the
   1.321 -changeset you're merging with.  The <command role="hg-cmd">hg parents</command> command tells you
   1.322 -what the parents of the dirstate are.
   1.323 -</para>
   1.324 -
   1.325 -<sect2>
   1.326 -<title>What happens when you commit</title>
   1.327 -
   1.328 -<para>The dirstate stores parent information for more than just book-keeping
   1.329 -purposes.  Mercurial uses the parents of the dirstate as \emph{the
   1.330 -  parents of a new changeset} when you perform a commit.
   1.331 -</para>
   1.332 -
   1.333 -<informalfigure>
   1.334 -
   1.335 -<para>  <mediaobject><imageobject><imagedata fileref="wdir"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.336 -  <caption><para>The working directory can have two parents</para></caption>
   1.337 -  \label{fig:concepts:wdir}
   1.338 -</para>
   1.339 -</informalfigure>
   1.340 -
   1.341 -<para>Figure <xref linkend="fig:concepts:wdir"/> shows the normal state of the working
   1.342 -directory, where it has a single changeset as parent.  That changeset
   1.343 -is the <emphasis>tip</emphasis>, the newest changeset in the repository that has no
   1.344 -children.
   1.345 -</para>
   1.346 -
   1.347 -<informalfigure>
   1.348 -
   1.349 -<para>  <mediaobject><imageobject><imagedata fileref="wdir-after-commit"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.350 -  <caption><para>The working directory gains new parents after a commit</para></caption>
   1.351 -  \label{fig:concepts:wdir-after-commit}
   1.352 -</para>
   1.353 -</informalfigure>
   1.354 -
   1.355 -<para>It's useful to think of the working directory as <quote>the changeset I'm
   1.356 -about to commit</quote>.  Any files that you tell Mercurial that you've
   1.357 -added, removed, renamed, or copied will be reflected in that
   1.358 -changeset, as will modifications to any files that Mercurial is
   1.359 -already tracking; the new changeset will have the parents of the
   1.360 -working directory as its parents.
   1.361 -</para>
   1.362 -
   1.363 -<para>After a commit, Mercurial will update the parents of the working
   1.364 -directory, so that the first parent is the ID of the new changeset,
   1.365 -and the second is the null ID.  This is shown in
   1.366 -figure <xref linkend="fig:concepts:wdir-after-commit"/>.  Mercurial doesn't touch
   1.367 -any of the files in the working directory when you commit; it just
   1.368 -modifies the dirstate to note its new parents.
   1.369 -</para>
   1.370 -
   1.371 -</sect2>
   1.372 -<sect2>
   1.373 -<title>Creating a new head</title>
   1.374 -
   1.375 -<para>It's perfectly normal to update the working directory to a changeset
   1.376 -other than the current tip.  For example, you might want to know what
   1.377 -your project looked like last Tuesday, or you could be looking through
   1.378 -changesets to see which one introduced a bug.  In cases like this, the
   1.379 -natural thing to do is update the working directory to the changeset
   1.380 -you're interested in, and then examine the files in the working
   1.381 -directory directly to see their contents as they were when you
   1.382 -committed that changeset.  The effect of this is shown in
   1.383 -figure <xref linkend="fig:concepts:wdir-pre-branch"/>.
   1.384 -</para>
   1.385 -
   1.386 -<informalfigure>
   1.387 -
   1.388 -<para>  <mediaobject><imageobject><imagedata fileref="wdir-pre-branch"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.389 -  <caption><para>The working directory, updated to an older changeset</para></caption>
   1.390 -  \label{fig:concepts:wdir-pre-branch}
   1.391 -</para>
   1.392 -</informalfigure>
   1.393 -
   1.394 -<para>Having updated the working directory to an older changeset, what
   1.395 -happens if you make some changes, and then commit?  Mercurial behaves
   1.396 -in the same way as I outlined above.  The parents of the working
   1.397 -directory become the parents of the new changeset.  This new changeset
   1.398 -has no children, so it becomes the new tip.  And the repository now
   1.399 -contains two changesets that have no children; we call these
   1.400 -<emphasis>heads</emphasis>.  You can see the structure that this creates in
   1.401 -figure <xref linkend="fig:concepts:wdir-branch"/>.
   1.402 -</para>
   1.403 -
   1.404 -<informalfigure>
   1.405 -
   1.406 -<para>  <mediaobject><imageobject><imagedata fileref="wdir-branch"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.407 -  <caption><para>After a commit made while synced to an older changeset</para></caption>
   1.408 -  \label{fig:concepts:wdir-branch}
   1.409 -</para>
   1.410 -</informalfigure>
   1.411 -
   1.412 -<note>
   1.413 -<para>  If you're new to Mercurial, you should keep in mind a common
   1.414 -  <quote>error</quote>, which is to use the <command role="hg-cmd">hg pull</command> command without any
   1.415 -  options.  By default, the <command role="hg-cmd">hg pull</command> command <emphasis>does not</emphasis>
   1.416 -  update the working directory, so you'll bring new changesets into
   1.417 -  your repository, but the working directory will stay synced at the
   1.418 -  same changeset as before the pull.  If you make some changes and
   1.419 -  commit afterwards, you'll thus create a new head, because your
   1.420 -  working directory isn't synced to whatever the current tip is.
   1.421 -</para>
   1.422 -
   1.423 -<para>  I put the word <quote>error</quote> in quotes because all that you need to do
   1.424 -  to rectify this situation is <command role="hg-cmd">hg merge</command>, then <command role="hg-cmd">hg commit</command>.  In
   1.425 -  other words, this almost never has negative consequences; it just
   1.426 -  surprises people.  I'll discuss other ways to avoid this behaviour,
   1.427 -  and why Mercurial behaves in this initially surprising way, later
   1.428 -  on.
   1.429 -</para>
   1.430 -</note>
   1.431 -
   1.432 -</sect2>
   1.433 -<sect2>
   1.434 -<title>Merging heads</title>
   1.435 -
   1.436 -<para>When you run the <command role="hg-cmd">hg merge</command> command, Mercurial leaves the first
   1.437 -parent of the working directory unchanged, and sets the second parent
   1.438 -to the changeset you're merging with, as shown in
   1.439 -figure <xref linkend="fig:concepts:wdir-merge"/>.
   1.440 -</para>
   1.441 -
   1.442 -<informalfigure>
   1.443 -
   1.444 -<para>  <mediaobject><imageobject><imagedata fileref="wdir-merge"/></imageobject><textobject><phrase>XXX add text</phrase></textobject></mediaobject>
   1.445 -  <caption><para>Merging two heads</para></caption>
   1.446 -  \label{fig:concepts:wdir-merge}
   1.447 -</para>
   1.448 -</informalfigure>
   1.449 -
   1.450 -<para>Mercurial also has to modify the working directory, to merge the files
   1.451 -managed in the two changesets.  Simplified a little, the merging
   1.452 -process goes like this, for every file in the manifests of both
   1.453 -changesets.
   1.454 -</para>
   1.455 -<itemizedlist>
   1.456 -<listitem><para>If neither changeset has modified a file, do nothing with that
   1.457 -  file.
   1.458 -</para>
   1.459 -</listitem>
   1.460 -<listitem><para>If one changeset has modified a file, and the other hasn't,
   1.461 -  create the modified copy of the file in the working directory.
   1.462 -</para>
   1.463 -</listitem>
   1.464 -<listitem><para>If one changeset has removed a file, and the other hasn't (or
   1.465 -  has also deleted it), delete the file from the working directory.
   1.466 -</para>
   1.467 -</listitem>
   1.468 -<listitem><para>If one changeset has removed a file, but the other has modified
   1.469 -  the file, ask the user what to do: keep the modified file, or remove
   1.470 -  it?
   1.471 -</para>
   1.472 -</listitem>
   1.473 -<listitem><para>If both changesets have modified a file, invoke an external
   1.474 -  merge program to choose the new contents for the merged file.  This
   1.475 -  may require input from the user.
   1.476 -</para>
   1.477 -</listitem>
   1.478 -<listitem><para>If one changeset has modified a file, and the other has renamed
   1.479 -  or copied the file, make sure that the changes follow the new name
   1.480 -  of the file.
   1.481 -</para>
   1.482 -</listitem></itemizedlist>
   1.483 -<para>There are more details&emdash;merging has plenty of corner cases&emdash;but
   1.484 -these are the most common choices that are involved in a merge.  As
   1.485 -you can see, most cases are completely automatic, and indeed most
   1.486 -merges finish automatically, without requiring your input to resolve
   1.487 -any conflicts.
   1.488 -</para>
   1.489 -
   1.490 -<para>When you're thinking about what happens when you commit after a merge,
   1.491 -once again the working directory is <quote>the changeset I'm about to
   1.492 -commit</quote>.  After the <command role="hg-cmd">hg merge</command> command completes, the working
   1.493 -directory has two parents; these will become the parents of the new
   1.494 -changeset.
   1.495 -</para>
   1.496 -
   1.497 -<para>Mercurial lets you perform multiple merges, but you must commit the
   1.498 -results of each individual merge as you go.  This is necessary because
   1.499 -Mercurial only tracks two parents for both revisions and the working
   1.500 -directory.  While it would be technically possible to merge multiple
   1.501 -changesets at once, the prospect of user confusion and making a
   1.502 -terrible mess of a merge immediately becomes overwhelming.
   1.503 -</para>
   1.504 -
   1.505 -</sect2>
   1.506 -</sect1>
   1.507 -<sect1>
   1.508 -<title>Other interesting design features</title>
   1.509 -
   1.510 -<para>In the sections above, I've tried to highlight some of the most
   1.511 -important aspects of Mercurial's design, to illustrate that it pays
   1.512 -careful attention to reliability and performance.  However, the
   1.513 -attention to detail doesn't stop there.  There are a number of other
   1.514 -aspects of Mercurial's construction that I personally find
   1.515 -interesting.  I'll detail a few of them here, separate from the <quote>big
   1.516 -ticket</quote> items above, so that if you're interested, you can gain a
   1.517 -better idea of the amount of thinking that goes into a well-designed
   1.518 -system.
   1.519 -</para>
   1.520 -
   1.521 -<sect2>
   1.522 -<title>Clever compression</title>
   1.523 -
   1.524 -<para>When appropriate, Mercurial will store both snapshots and deltas in
   1.525 -compressed form.  It does this by always <emphasis>trying to</emphasis> compress a
   1.526 -snapshot or delta, but only storing the compressed version if it's
   1.527 -smaller than the uncompressed version.
   1.528 -</para>
   1.529 -
   1.530 -<para>This means that Mercurial does <quote>the right thing</quote> when storing a file
   1.531 -whose native form is compressed, such as a <literal>zip</literal> archive or a
   1.532 -JPEG image.  When these types of files are compressed a second time,
   1.533 -the resulting file is usually bigger than the once-compressed form,
   1.534 -and so Mercurial will store the plain <literal>zip</literal> or JPEG.
   1.535 -</para>
   1.536 -
   1.537 -<para>Deltas between revisions of a compressed file are usually larger than
   1.538 -snapshots of the file, and Mercurial again does <quote>the right thing</quote> in
   1.539 -these cases.  It finds that such a delta exceeds the threshold at
   1.540 -which it should store a complete snapshot of the file, so it stores
   1.541 -the snapshot, again saving space compared to a naive delta-only
   1.542 -approach.
   1.543 -</para>
   1.544 -
   1.545 -<sect3>
   1.546 -<title>Network recompression</title>
   1.547 -
   1.548 -<para>When storing revisions on disk, Mercurial uses the <quote>deflate</quote>
   1.549 -compression algorithm (the same one used by the popular <literal>zip</literal>
   1.550 -archive format), which balances good speed with a respectable
   1.551 -compression ratio.  However, when transmitting revision data over a
   1.552 -network connection, Mercurial uncompresses the compressed revision
   1.553 -data.
   1.554 -</para>
   1.555 -
   1.556 -<para>If the connection is over HTTP, Mercurial recompresses the entire
   1.557 -stream of data using a compression algorithm that gives a better
   1.558 -compression ratio (the Burrows-Wheeler algorithm from the widely used
   1.559 -<literal>bzip2</literal> compression package).  This combination of algorithm
   1.560 -and compression of the entire stream (instead of a revision at a time)
   1.561 -substantially reduces the number of bytes to be transferred, yielding
   1.562 -better network performance over almost all kinds of network.
   1.563 -</para>
   1.564 -
   1.565 -<para>(If the connection is over <command>ssh</command>, Mercurial <emphasis>doesn't</emphasis>
   1.566 -recompress the stream, because <command>ssh</command> can already do this
   1.567 -itself.)
   1.568 -</para>
   1.569 -
   1.570 -</sect3>
   1.571 -</sect2>
   1.572 -<sect2>
   1.573 -<title>Read/write ordering and atomicity</title>
   1.574 -
   1.575 -<para>Appending to files isn't the whole story when it comes to guaranteeing
   1.576 -that a reader won't see a partial write.  If you recall
   1.577 -figure <xref linkend="fig:concepts:metadata"/>, revisions in the changelog point to
   1.578 -revisions in the manifest, and revisions in the manifest point to
   1.579 -revisions in filelogs.  This hierarchy is deliberate.
   1.580 -</para>
   1.581 -
   1.582 -<para>A writer starts a transaction by writing filelog and manifest data,
   1.583 -and doesn't write any changelog data until those are finished.  A
   1.584 -reader starts by reading changelog data, then manifest data, followed
   1.585 -by filelog data.
   1.586 -</para>
   1.587 -
   1.588 -<para>Since the writer has always finished writing filelog and manifest data
   1.589 -before it writes to the changelog, a reader will never read a pointer
   1.590 -to a partially written manifest revision from the changelog, and it will
   1.591 -never read a pointer to a partially written filelog revision from the
   1.592 -manifest.
   1.593 -</para>
   1.594 -
   1.595 -</sect2>
   1.596 -<sect2>
   1.597 -<title>Concurrent access</title>
   1.598 -
   1.599 -<para>The read/write ordering and atomicity guarantees mean that Mercurial
   1.600 -never needs to <emphasis>lock</emphasis> a repository when it's reading data, even
   1.601 -if the repository is being written to while the read is occurring.
   1.602 -This has a big effect on scalability; you can have an arbitrary number
   1.603 -of Mercurial processes safely reading data from a repository safely
   1.604 -all at once, no matter whether it's being written to or not.
   1.605 -</para>
   1.606 -
   1.607 -<para>The lockless nature of reading means that if you're sharing a
   1.608 -repository on a multi-user system, you don't need to grant other local
   1.609 -users permission to <emphasis>write</emphasis> to your repository in order for them
   1.610 -to be able to clone it or pull changes from it; they only need
   1.611 -<emphasis>read</emphasis> permission.  (This is <emphasis>not</emphasis> a common feature among
   1.612 -revision control systems, so don't take it for granted!  Most require
   1.613 -readers to be able to lock a repository to access it safely, and this
   1.614 -requires write permission on at least one directory, which of course
   1.615 -makes for all kinds of nasty and annoying security and administrative
   1.616 -problems.)
   1.617 -</para>
   1.618 -
   1.619 -<para>Mercurial uses locks to ensure that only one process can write to a
   1.620 -repository at a time (the locking mechanism is safe even over
   1.621 -filesystems that are notoriously hostile to locking, such as NFS).  If
   1.622 -a repository is locked, a writer will wait for a while to retry if the
   1.623 -repository becomes unlocked, but if the repository remains locked for
   1.624 -too long, the process attempting to write will time out after a while.
   1.625 -This means that your daily automated scripts won't get stuck forever
   1.626 -and pile up if a system crashes unnoticed, for example.  (Yes, the
   1.627 -timeout is configurable, from zero to infinity.)
   1.628 -</para>
   1.629 -
   1.630 -<sect3>
   1.631 -<title>Safe dirstate access</title>
   1.632 -
   1.633 -<para>As with revision data, Mercurial doesn't take a lock to read the
   1.634 -dirstate file; it does acquire a lock to write it.  To avoid the
   1.635 -possibility of reading a partially written copy of the dirstate file,
   1.636 -Mercurial writes to a file with a unique name in the same directory as
   1.637 -the dirstate file, then renames the temporary file atomically to
   1.638 -<filename>dirstate</filename>.  The file named <filename>dirstate</filename> is thus
   1.639 -guaranteed to be complete, not partially written.
   1.640 -</para>
   1.641 -
   1.642 -</sect3>
   1.643 -</sect2>
   1.644 -<sect2>
   1.645 -<title>Avoiding seeks</title>
   1.646 -
   1.647 -<para>Critical to Mercurial's performance is the avoidance of seeks of the
   1.648 -disk head, since any seek is far more expensive than even a
   1.649 -comparatively large read operation.
   1.650 -</para>
   1.651 -
   1.652 -<para>This is why, for example, the dirstate is stored in a single file.  If
   1.653 -there were a dirstate file per directory that Mercurial tracked, the
   1.654 -disk would seek once per directory.  Instead, Mercurial reads the
   1.655 -entire single dirstate file in one step.
   1.656 -</para>
   1.657 -
   1.658 -<para>Mercurial also uses a <quote>copy on write</quote> scheme when cloning a
   1.659 -repository on local storage.  Instead of copying every revlog file
   1.660 -from the old repository into the new repository, it makes a <quote>hard
   1.661 -link</quote>, which is a shorthand way to say <quote>these two names point to the
   1.662 -same file</quote>.  When Mercurial is about to write to one of a revlog's
   1.663 -files, it checks to see if the number of names pointing at the file is
   1.664 -greater than one.  If it is, more than one repository is using the
   1.665 -file, so Mercurial makes a new copy of the file that is private to
   1.666 -this repository.
   1.667 -</para>
   1.668 -
   1.669 -<para>A few revision control developers have pointed out that this idea of
   1.670 -making a complete private copy of a file is not very efficient in its
   1.671 -use of storage.  While this is true, storage is cheap, and this method
   1.672 -gives the highest performance while deferring most book-keeping to the
   1.673 -operating system.  An alternative scheme would most likely reduce
   1.674 -performance and increase the complexity of the software, each of which
   1.675 -is much more important to the <quote>feel</quote> of day-to-day use.
   1.676 -</para>
   1.677 -
   1.678 -</sect2>
   1.679 -<sect2>
   1.680 -<title>Other contents of the dirstate</title>
   1.681 -
   1.682 -<para>Because Mercurial doesn't force you to tell it when you're modifying a
   1.683 -file, it uses the dirstate to store some extra information so it can
   1.684 -determine efficiently whether you have modified a file.  For each file
   1.685 -in the working directory, it stores the time that it last modified the
   1.686 -file itself, and the size of the file at that time.
   1.687 -</para>
   1.688 -
   1.689 -<para>When you explicitly <command role="hg-cmd">hg add</command>, <command role="hg-cmd">hg remove</command>, <command role="hg-cmd">hg rename</command> or
   1.690 -<command role="hg-cmd">hg copy</command> files, Mercurial updates the dirstate so that it knows
   1.691 -what to do with those files when you commit.
   1.692 -</para>
   1.693 -
   1.694 -<para>When Mercurial is checking the states of files in the working
   1.695 -directory, it first checks a file's modification time.  If that has
   1.696 -not changed, the file must not have been modified.  If the file's size
   1.697 -has changed, the file must have been modified.  If the modification
   1.698 -time has changed, but the size has not, only then does Mercurial need
   1.699 -to read the actual contents of the file to see if they've changed.
   1.700 -Storing these few extra pieces of information dramatically reduces the
   1.701 -amount of data that Mercurial needs to read, which yields large
   1.702 -performance improvements compared to other revision control systems.
   1.703 -</para>
   1.704 -
   1.705 -</sect2>
   1.706 -</sect1>
   1.707 +<chapter id="chap:concepts">
   1.708 +  <?dbhtml filename="behind-the-scenes.html"?>
   1.709 +  <title>Behind the scenes</title>
   1.710 +
   1.711 +  <para id="x_2e8">Unlike many revision control systems, the concepts
   1.712 +    upon which Mercurial is built are simple enough that it's easy to
   1.713 +    understand how the software really works.  Knowing these details
   1.714 +    certainly isn't necessary, so it is certainly safe to skip this
   1.715 +    chapter.  However, I think you will get more out of the software
   1.716 +    with a <quote>mental model</quote> of what's going on.</para>
   1.717 +
   1.718 +  <para id="x_2e9">Being able to understand what's going on behind the
   1.719 +    scenes gives me confidence that Mercurial has been carefully
   1.720 +    designed to be both <emphasis>safe</emphasis> and
   1.721 +    <emphasis>efficient</emphasis>.  And just as importantly, if it's
   1.722 +    easy for me to retain a good idea of what the software is doing
   1.723 +    when I perform a revision control task, I'm less likely to be
   1.724 +    surprised by its behavior.</para>
   1.725 +
   1.726 +  <para id="x_2ea">In this chapter, we'll initially cover the core concepts
   1.727 +    behind Mercurial's design, then continue to discuss some of the
   1.728 +    interesting details of its implementation.</para>
   1.729 +
   1.730 +  <sect1>
   1.731 +    <title>Mercurial's historical record</title>
   1.732 +
   1.733 +    <sect2>
   1.734 +      <title>Tracking the history of a single file</title>
   1.735 +
   1.736 +      <para id="x_2eb">When Mercurial tracks modifications to a file, it stores
   1.737 +	the history of that file in a metadata object called a
   1.738 +	<emphasis>filelog</emphasis>.  Each entry in the filelog
   1.739 +	contains enough information to reconstruct one revision of the
   1.740 +	file that is being tracked.  Filelogs are stored as files in
   1.741 +	the <filename role="special"
   1.742 +	  class="directory">.hg/store/data</filename> directory.  A
   1.743 +	filelog contains two kinds of information: revision data, and
   1.744 +	an index to help Mercurial to find a revision
   1.745 +	efficiently.</para>
   1.746 +
   1.747 +      <para id="x_2ec">A file that is large, or has a lot of history, has its
   1.748 +	filelog stored in separate data
   1.749 +	(<quote><literal>.d</literal></quote> suffix) and index
   1.750 +	(<quote><literal>.i</literal></quote> suffix) files.  For
   1.751 +	small files without much history, the revision data and index
   1.752 +	are combined in a single <quote><literal>.i</literal></quote>
   1.753 +	file.  The correspondence between a file in the working
   1.754 +	directory and the filelog that tracks its history in the
   1.755 +	repository is illustrated in <xref
   1.756 +	  linkend="fig:concepts:filelog"/>.</para>
   1.757 +
   1.758 +      <figure id="fig:concepts:filelog">
   1.759 +	<title>Relationships between files in working directory and
   1.760 +	  filelogs in repository</title>
   1.761 +	<mediaobject>
   1.762 +	  <imageobject><imagedata fileref="figs/filelog.png"/></imageobject>
   1.763 +	  <textobject><phrase>XXX add text</phrase></textobject>
   1.764 +	</mediaobject>
   1.765 +      </figure>
   1.766 +
   1.767 +    </sect2>
   1.768 +    <sect2>
   1.769 +      <title>Managing tracked files</title>
   1.770 +
   1.771 +      <para id="x_2ee">Mercurial uses a structure called a
   1.772 +	<emphasis>manifest</emphasis> to collect together information
   1.773 +	about the files that it tracks.  Each entry in the manifest
   1.774 +	contains information about the files present in a single
   1.775 +	changeset.  An entry records which files are present in the
   1.776 +	changeset, the revision of each file, and a few other pieces
   1.777 +	of file metadata.</para>
   1.778 +
   1.779 +    </sect2>
   1.780 +    <sect2>
   1.781 +      <title>Recording changeset information</title>
   1.782 +
   1.783 +      <para id="x_2ef">The <emphasis>changelog</emphasis> contains information
   1.784 +	about each changeset.  Each revision records who committed a
   1.785 +	change, the changeset comment, other pieces of
   1.786 +	changeset-related information, and the revision of the
   1.787 +	manifest to use.</para>
   1.788 +
   1.789 +    </sect2>
   1.790 +    <sect2>
   1.791 +      <title>Relationships between revisions</title>
   1.792 +
   1.793 +      <para id="x_2f0">Within a changelog, a manifest, or a filelog, each
   1.794 +	revision stores a pointer to its immediate parent (or to its
   1.795 +	two parents, if it's a merge revision).  As I mentioned above,
   1.796 +	there are also relationships between revisions
   1.797 +	<emphasis>across</emphasis> these structures, and they are
   1.798 +	hierarchical in nature.</para>
   1.799 +
   1.800 +      <para id="x_2f1">For every changeset in a repository, there is exactly one
   1.801 +	revision stored in the changelog.  Each revision of the
   1.802 +	changelog contains a pointer to a single revision of the
   1.803 +	manifest.  A revision of the manifest stores a pointer to a
   1.804 +	single revision of each filelog tracked when that changeset
   1.805 +	was created.  These relationships are illustrated in
   1.806 +	<xref linkend="fig:concepts:metadata"/>.</para>
   1.807 +
   1.808 +      <figure id="fig:concepts:metadata">
   1.809 +	<title>Metadata relationships</title>
   1.810 +	<mediaobject>
   1.811 +	  <imageobject><imagedata fileref="figs/metadata.png"/></imageobject>
   1.812 +	  <textobject><phrase>XXX add text</phrase></textobject>
   1.813 +	</mediaobject>
   1.814 +      </figure>
   1.815 +
   1.816 +      <para id="x_2f3">As the illustration shows, there is
   1.817 +	<emphasis>not</emphasis> a <quote>one to one</quote>
   1.818 +	relationship between revisions in the changelog, manifest, or
   1.819 +	filelog. If a file that
   1.820 +	Mercurial tracks hasn't changed between two changesets, the
   1.821 +	entry for that file in the two revisions of the manifest will
   1.822 +	point to the same revision of its filelog<footnote>
   1.823 +	  <para id="x_725">It is possible (though unusual) for the manifest to
   1.824 +	    remain the same between two changesets, in which case the
   1.825 +	    changelog entries for those changesets will point to the
   1.826 +	    same revision of the manifest.</para>
   1.827 +	</footnote>.</para>
   1.828 +
   1.829 +    </sect2>
   1.830 +  </sect1>
   1.831 +  <sect1>
   1.832 +    <title>Safe, efficient storage</title>
   1.833 +
   1.834 +    <para id="x_2f4">The underpinnings of changelogs, manifests, and filelogs are
   1.835 +      provided by a single structure called the
   1.836 +      <emphasis>revlog</emphasis>.</para>
   1.837 +
   1.838 +    <sect2>
   1.839 +      <title>Efficient storage</title>
   1.840 +
   1.841 +      <para id="x_2f5">The revlog provides efficient storage of revisions using a
   1.842 +	<emphasis>delta</emphasis> mechanism.  Instead of storing a
   1.843 +	complete copy of a file for each revision, it stores the
   1.844 +	changes needed to transform an older revision into the new
   1.845 +	revision.  For many kinds of file data, these deltas are
   1.846 +	typically a fraction of a percent of the size of a full copy
   1.847 +	of a file.</para>
   1.848 +
   1.849 +      <para id="x_2f6">Some obsolete revision control systems can only work with
   1.850 +	deltas of text files.  They must either store binary files as
   1.851 +	complete snapshots or encoded into a text representation, both
   1.852 +	of which are wasteful approaches.  Mercurial can efficiently
   1.853 +	handle deltas of files with arbitrary binary contents; it
   1.854 +	doesn't need to treat text as special.</para>
   1.855 +
   1.856 +    </sect2>
   1.857 +    <sect2 id="sec:concepts:txn">
   1.858 +      <title>Safe operation</title>
   1.859 +
   1.860 +      <para id="x_2f7">Mercurial only ever <emphasis>appends</emphasis> data to
   1.861 +	the end of a revlog file. It never modifies a section of a
   1.862 +	file after it has written it.  This is both more robust and
   1.863 +	efficient than schemes that need to modify or rewrite
   1.864 +	data.</para>
   1.865 +
   1.866 +      <para id="x_2f8">In addition, Mercurial treats every write as part of a
   1.867 +	<emphasis>transaction</emphasis> that can span a number of
   1.868 +	files.  A transaction is <emphasis>atomic</emphasis>: either
   1.869 +	the entire transaction succeeds and its effects are all
   1.870 +	visible to readers in one go, or the whole thing is undone.
   1.871 +	This guarantee of atomicity means that if you're running two
   1.872 +	copies of Mercurial, where one is reading data and one is
   1.873 +	writing it, the reader will never see a partially written
   1.874 +	result that might confuse it.</para>
   1.875 +
   1.876 +      <para id="x_2f9">The fact that Mercurial only appends to files makes it
   1.877 +	easier to provide this transactional guarantee.  The easier it
   1.878 +	is to do stuff like this, the more confident you should be
   1.879 +	that it's done correctly.</para>
   1.880 +
   1.881 +    </sect2>
   1.882 +    <sect2>
   1.883 +      <title>Fast retrieval</title>
   1.884 +
   1.885 +      <para id="x_2fa">Mercurial cleverly avoids a pitfall common to
   1.886 +	all earlier revision control systems: the problem of
   1.887 +	<emphasis>inefficient retrieval</emphasis>. Most revision
   1.888 +	control systems store the contents of a revision as an
   1.889 +	incremental series of modifications against a
   1.890 +	<quote>snapshot</quote>.  (Some base the snapshot on the
   1.891 +	oldest revision, others on the newest.)  To reconstruct a
   1.892 +	specific revision, you must first read the snapshot, and then
   1.893 +	every one of the revisions between the snapshot and your
   1.894 +	target revision.  The more history that a file accumulates,
   1.895 +	the more revisions you must read, hence the longer it takes to
   1.896 +	reconstruct a particular revision.</para>
   1.897 +
   1.898 +      <figure id="fig:concepts:snapshot">
   1.899 +	<title>Snapshot of a revlog, with incremental deltas</title>
   1.900 +	<mediaobject>
   1.901 +	  <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject>
   1.902 +	  <textobject><phrase>XXX add text</phrase></textobject>
   1.903 +	</mediaobject>
   1.904 +      </figure>
   1.905 +
   1.906 +      <para id="x_2fc">The innovation that Mercurial applies to this problem is
   1.907 +	simple but effective.  Once the cumulative amount of delta
   1.908 +	information stored since the last snapshot exceeds a fixed
   1.909 +	threshold, it stores a new snapshot (compressed, of course),
   1.910 +	instead of another delta.  This makes it possible to
   1.911 +	reconstruct <emphasis>any</emphasis> revision of a file
   1.912 +	quickly.  This approach works so well that it has since been
   1.913 +	copied by several other revision control systems.</para>
   1.914 +
   1.915 +      <para id="x_2fd"><xref linkend="fig:concepts:snapshot"/> illustrates
   1.916 +	the idea.  In an entry in a revlog's index file, Mercurial
   1.917 +	stores the range of entries from the data file that it must
   1.918 +	read to reconstruct a particular revision.</para>
   1.919 +
   1.920 +      <sect3>
   1.921 +	<title>Aside: the influence of video compression</title>
   1.922 +
   1.923 +	<para id="x_2fe">If you're familiar with video compression or
   1.924 +	  have ever watched a TV feed through a digital cable or
   1.925 +	  satellite service, you may know that most video compression
   1.926 +	  schemes store each frame of video as a delta against its
   1.927 +	  predecessor frame.</para>
   1.928 +
   1.929 +	<para id="x_2ff">Mercurial borrows this idea to make it
   1.930 +	  possible to reconstruct a revision from a snapshot and a
   1.931 +	  small number of deltas.</para>
   1.932 +
   1.933 +      </sect3>
   1.934 +    </sect2>
   1.935 +    <sect2>
   1.936 +      <title>Identification and strong integrity</title>
   1.937 +
   1.938 +      <para id="x_300">Along with delta or snapshot information, a revlog entry
   1.939 +	contains a cryptographic hash of the data that it represents.
   1.940 +	This makes it difficult to forge the contents of a revision,
   1.941 +	and easy to detect accidental corruption.</para>
   1.942 +
   1.943 +      <para id="x_301">Hashes provide more than a mere check against corruption;
   1.944 +	they are used as the identifiers for revisions.  The changeset
   1.945 +	identification hashes that you see as an end user are from
   1.946 +	revisions of the changelog.  Although filelogs and the
   1.947 +	manifest also use hashes, Mercurial only uses these behind the
   1.948 +	scenes.</para>
   1.949 +
   1.950 +      <para id="x_302">Mercurial verifies that hashes are correct when it
   1.951 +	retrieves file revisions and when it pulls changes from
   1.952 +	another repository.  If it encounters an integrity problem, it
   1.953 +	will complain and stop whatever it's doing.</para>
   1.954 +
   1.955 +      <para id="x_303">In addition to the effect it has on retrieval efficiency,
   1.956 +	Mercurial's use of periodic snapshots makes it more robust
   1.957 +	against partial data corruption.  If a revlog becomes partly
   1.958 +	corrupted due to a hardware error or system bug, it's often
   1.959 +	possible to reconstruct some or most revisions from the
   1.960 +	uncorrupted sections of the revlog, both before and after the
   1.961 +	corrupted section.  This would not be possible with a
   1.962 +	delta-only storage model.</para>
   1.963 +    </sect2>
   1.964 +  </sect1>
   1.965 +
   1.966 +  <sect1>
   1.967 +    <title>Revision history, branching, and merging</title>
   1.968 +
   1.969 +    <para id="x_304">Every entry in a Mercurial revlog knows the identity of its
   1.970 +      immediate ancestor revision, usually referred to as its
   1.971 +      <emphasis>parent</emphasis>.  In fact, a revision contains room
   1.972 +      for not one parent, but two.  Mercurial uses a special hash,
   1.973 +      called the <quote>null ID</quote>, to represent the idea
   1.974 +      <quote>there is no parent here</quote>.  This hash is simply a
   1.975 +      string of zeroes.</para>
   1.976 +
   1.977 +    <para id="x_305">In <xref linkend="fig:concepts:revlog"/>, you can see
   1.978 +      an example of the conceptual structure of a revlog.  Filelogs,
   1.979 +      manifests, and changelogs all have this same structure; they
   1.980 +      differ only in the kind of data stored in each delta or
   1.981 +      snapshot.</para>
   1.982 +
   1.983 +    <para id="x_306">The first revision in a revlog (at the bottom of the image)
   1.984 +      has the null ID in both of its parent slots.  For a
   1.985 +      <quote>normal</quote> revision, its first parent slot contains
   1.986 +      the ID of its parent revision, and its second contains the null
   1.987 +      ID, indicating that the revision has only one real parent.  Any
   1.988 +      two revisions that have the same parent ID are branches.  A
   1.989 +      revision that represents a merge between branches has two normal
   1.990 +      revision IDs in its parent slots.</para>
   1.991 +
   1.992 +    <figure id="fig:concepts:revlog">
   1.993 +      <title>The conceptual structure of a revlog</title>
   1.994 +      <mediaobject>
   1.995 +	<imageobject><imagedata fileref="figs/revlog.png"/></imageobject>
   1.996 +	<textobject><phrase>XXX add text</phrase></textobject>
   1.997 +      </mediaobject>
   1.998 +    </figure>
   1.999 +
  1.1000 +  </sect1>
  1.1001 +  <sect1>
  1.1002 +    <title>The working directory</title>
  1.1003 +
  1.1004 +    <para id="x_307">In the working directory, Mercurial stores a snapshot of the
  1.1005 +      files from the repository as of a particular changeset.</para>
  1.1006 +
  1.1007 +    <para id="x_308">The working directory <quote>knows</quote> which changeset
  1.1008 +      it contains.  When you update the working directory to contain a
  1.1009 +      particular changeset, Mercurial looks up the appropriate
  1.1010 +      revision of the manifest to find out which files it was tracking
  1.1011 +      at the time that changeset was committed, and which revision of
  1.1012 +      each file was then current.  It then recreates a copy of each of
  1.1013 +      those files, with the same contents it had when the changeset
  1.1014 +      was committed.</para>
  1.1015 +
  1.1016 +    <para id="x_309">The <emphasis>dirstate</emphasis> is a special
  1.1017 +      structure that contains Mercurial's knowledge of the working
  1.1018 +      directory.  It is maintained as a file named
  1.1019 +      <filename>.hg/dirstate</filename> inside a repository.  The
  1.1020 +      dirstate details which changeset the working directory is
  1.1021 +      updated to, and all of the files that Mercurial is tracking in
  1.1022 +      the working directory. It also lets Mercurial quickly notice
  1.1023 +      changed files, by recording their checkout times and
  1.1024 +      sizes.</para>
  1.1025 +
  1.1026 +    <para id="x_30a">Just as a revision of a revlog has room for two parents, so
  1.1027 +      that it can represent either a normal revision (with one parent)
  1.1028 +      or a merge of two earlier revisions, the dirstate has slots for
  1.1029 +      two parents.  When you use the <command role="hg-cmd">hg
  1.1030 +	update</command> command, the changeset that you update to is
  1.1031 +      stored in the <quote>first parent</quote> slot, and the null ID
  1.1032 +      in the second. When you <command role="hg-cmd">hg
  1.1033 +	merge</command> with another changeset, the first parent
  1.1034 +      remains unchanged, and the second parent is filled in with the
  1.1035 +      changeset you're merging with.  The <command role="hg-cmd">hg
  1.1036 +	parents</command> command tells you what the parents of the
  1.1037 +      dirstate are.</para>
  1.1038 +
  1.1039 +    <sect2>
  1.1040 +      <title>What happens when you commit</title>
  1.1041 +
  1.1042 +      <para id="x_30b">The dirstate stores parent information for more than just
  1.1043 +	book-keeping purposes.  Mercurial uses the parents of the
  1.1044 +	dirstate as <emphasis>the parents of a new
  1.1045 +	  changeset</emphasis> when you perform a commit.</para>
  1.1046 +
  1.1047 +      <figure id="fig:concepts:wdir">
  1.1048 +	<title>The working directory can have two parents</title>
  1.1049 +	<mediaobject>
  1.1050 +	  <imageobject><imagedata fileref="figs/wdir.png"/></imageobject>
  1.1051 +	  <textobject><phrase>XXX add text</phrase></textobject>
  1.1052 +	</mediaobject>
  1.1053 +      </figure>
  1.1054 +
  1.1055 +      <para id="x_30d"><xref linkend="fig:concepts:wdir"/> shows the
  1.1056 +	normal state of the working directory, where it has a single
  1.1057 +	changeset as parent.  That changeset is the
  1.1058 +	<emphasis>tip</emphasis>, the newest changeset in the
  1.1059 +	repository that has no children.</para>
  1.1060 +
  1.1061 +      <figure id="fig:concepts:wdir-after-commit">
  1.1062 +	<title>The working directory gains new parents after a
  1.1063 +	  commit</title>
  1.1064 +	<mediaobject>
  1.1065 +	  <imageobject><imagedata fileref="figs/wdir-after-commit.png"/></imageobject>
  1.1066 +	  <textobject><phrase>XXX add text</phrase></textobject>
  1.1067 +	</mediaobject>
  1.1068 +      </figure>
  1.1069 +
  1.1070 +      <para id="x_30f">It's useful to think of the working directory as
  1.1071 +	<quote>the changeset I'm about to commit</quote>.  Any files
  1.1072 +	that you tell Mercurial that you've added, removed, renamed,
  1.1073 +	or copied will be reflected in that changeset, as will
  1.1074 +	modifications to any files that Mercurial is already tracking;
  1.1075 +	the new changeset will have the parents of the working
  1.1076 +	directory as its parents.</para>
  1.1077 +
  1.1078 +      <para id="x_310">After a commit, Mercurial will update the
  1.1079 +	parents of the working directory, so that the first parent is
  1.1080 +	the ID of the new changeset, and the second is the null ID.
  1.1081 +	This is shown in <xref
  1.1082 +	  linkend="fig:concepts:wdir-after-commit"/>. Mercurial
  1.1083 +	doesn't touch any of the files in the working directory when
  1.1084 +	you commit; it just modifies the dirstate to note its new
  1.1085 +	parents.</para>
  1.1086 +
  1.1087 +    </sect2>
  1.1088 +    <sect2>
  1.1089 +      <title>Creating a new head</title>
  1.1090 +
  1.1091 +      <para id="x_311">It's perfectly normal to update the working directory to a
  1.1092 +	changeset other than the current tip.  For example, you might
  1.1093 +	want to know what your project looked like last Tuesday, or
  1.1094 +	you could be looking through changesets to see which one
  1.1095 +	introduced a bug.  In cases like this, the natural thing to do
  1.1096 +	is update the working directory to the changeset you're
  1.1097 +	interested in, and then examine the files in the working
  1.1098 +	directory directly to see their contents as they were when you
  1.1099 +	committed that changeset.  The effect of this is shown in
  1.1100 +	<xref linkend="fig:concepts:wdir-pre-branch"/>.</para>
  1.1101 +
  1.1102 +      <figure id="fig:concepts:wdir-pre-branch">
  1.1103 +	<title>The working directory, updated to an older
  1.1104 +	  changeset</title>
  1.1105 +	<mediaobject>
  1.1106 +	  <imageobject><imagedata fileref="figs/wdir-pre-branch.png"/></imageobject>
  1.1107 +	  <textobject><phrase>XXX add text</phrase></textobject>
  1.1108 +	</mediaobject>
  1.1109 +      </figure>
  1.1110 +
  1.1111 +      <para id="x_313">Having updated the working directory to an
  1.1112 +	older changeset, what happens if you make some changes, and
  1.1113 +	then commit?  Mercurial behaves in the same way as I outlined
  1.1114 +	above.  The parents of the working directory become the
  1.1115 +	parents of the new changeset.  This new changeset has no
  1.1116 +	children, so it becomes the new tip.  And the repository now
  1.1117 +	contains two changesets that have no children; we call these
  1.1118 +	<emphasis>heads</emphasis>.  You can see the structure that
  1.1119 +	this creates in <xref
  1.1120 +	  linkend="fig:concepts:wdir-branch"/>.</para>
  1.1121 +
  1.1122 +      <figure id="fig:concepts:wdir-branch">
  1.1123 +	<title>After a commit made while synced to an older
  1.1124 +	  changeset</title>
  1.1125 +	<mediaobject>
  1.1126 +	  <imageobject><imagedata fileref="figs/wdir-branch.png"/></imageobject>
  1.1127 +	  <textobject><phrase>XXX add text</phrase></textobject>
  1.1128 +	</mediaobject>
  1.1129 +      </figure>
  1.1130 +
  1.1131 +      <note>
  1.1132 +	<para id="x_315">If you're new to Mercurial, you should keep
  1.1133 +	  in mind a common <quote>error</quote>, which is to use the
  1.1134 +	  <command role="hg-cmd">hg pull</command> command without any
  1.1135 +	  options.  By default, the <command role="hg-cmd">hg
  1.1136 +	    pull</command> command <emphasis>does not</emphasis>
  1.1137 +	  update the working directory, so you'll bring new changesets
  1.1138 +	  into your repository, but the working directory will stay
  1.1139 +	  synced at the same changeset as before the pull.  If you
  1.1140 +	  make some changes and commit afterwards, you'll thus create
  1.1141 +	  a new head, because your working directory isn't synced to
  1.1142 +	  whatever the current tip is.  To combine the operation of a
  1.1143 +	  pull, followed by an update, run <command>hg pull
  1.1144 +	    -u</command>.</para>
  1.1145 +
  1.1146 +	<para id="x_316">I put the word <quote>error</quote> in quotes
  1.1147 +	  because all that you need to do to rectify the situation
  1.1148 +	  where you created a new head by accident is
  1.1149 +	  <command role="hg-cmd">hg merge</command>, then <command
  1.1150 +	    role="hg-cmd">hg commit</command>.  In other words, this
  1.1151 +	  almost never has negative consequences; it's just something
  1.1152 +	  of a surprise for newcomers.  I'll discuss other ways to
  1.1153 +	  avoid this behavior, and why Mercurial behaves in this
  1.1154 +	  initially surprising way, later on.</para>
  1.1155 +      </note>
  1.1156 +
  1.1157 +    </sect2>
  1.1158 +    <sect2>
  1.1159 +      <title>Merging changes</title>
  1.1160 +
  1.1161 +      <para id="x_317">When you run the <command role="hg-cmd">hg
  1.1162 +	  merge</command> command, Mercurial leaves the first parent
  1.1163 +	of the working directory unchanged, and sets the second parent
  1.1164 +	to the changeset you're merging with, as shown in <xref
  1.1165 +	  linkend="fig:concepts:wdir-merge"/>.</para>
  1.1166 +
  1.1167 +      <figure id="fig:concepts:wdir-merge">
  1.1168 +	<title>Merging two heads</title>
  1.1169 +	<mediaobject>
  1.1170 +	  <imageobject>
  1.1171 +	    <imagedata fileref="figs/wdir-merge.png"/>
  1.1172 +	  </imageobject>
  1.1173 +	  <textobject><phrase>XXX add text</phrase></textobject>
  1.1174 +	</mediaobject>
  1.1175 +      </figure>
  1.1176 +
  1.1177 +      <para id="x_319">Mercurial also has to modify the working directory, to
  1.1178 +	merge the files managed in the two changesets.  Simplified a
  1.1179 +	little, the merging process goes like this, for every file in
  1.1180 +	the manifests of both changesets.</para>
  1.1181 +      <itemizedlist>
  1.1182 +	<listitem><para id="x_31a">If neither changeset has modified a file, do
  1.1183 +	    nothing with that file.</para>
  1.1184 +	</listitem>
  1.1185 +	<listitem><para id="x_31b">If one changeset has modified a file, and the
  1.1186 +	    other hasn't, create the modified copy of the file in the
  1.1187 +	    working directory.</para>
  1.1188 +	</listitem>
  1.1189 +	<listitem><para id="x_31c">If one changeset has removed a file, and the
  1.1190 +	    other hasn't (or has also deleted it), delete the file
  1.1191 +	    from the working directory.</para>
  1.1192 +	</listitem>
  1.1193 +	<listitem><para id="x_31d">If one changeset has removed a file, but the
  1.1194 +	    other has modified the file, ask the user what to do: keep
  1.1195 +	    the modified file, or remove it?</para>
  1.1196 +	</listitem>
  1.1197 +	<listitem><para id="x_31e">If both changesets have modified a file,
  1.1198 +	    invoke an external merge program to choose the new
  1.1199 +	    contents for the merged file.  This may require input from
  1.1200 +	    the user.</para>
  1.1201 +	</listitem>
  1.1202 +	<listitem><para id="x_31f">If one changeset has modified a file, and the
  1.1203 +	    other has renamed or copied the file, make sure that the
  1.1204 +	    changes follow the new name of the file.</para>
  1.1205 +	</listitem></itemizedlist>
  1.1206 +      <para id="x_320">There are more details&emdash;merging has plenty of corner
  1.1207 +	cases&emdash;but these are the most common choices that are
  1.1208 +	involved in a merge.  As you can see, most cases are
  1.1209 +	completely automatic, and indeed most merges finish
  1.1210 +	automatically, without requiring your input to resolve any
  1.1211 +	conflicts.</para>
  1.1212 +
  1.1213 +      <para id="x_321">When you're thinking about what happens when you commit
  1.1214 +	after a merge, once again the working directory is <quote>the
  1.1215 +	  changeset I'm about to commit</quote>.  After the <command
  1.1216 +	  role="hg-cmd">hg merge</command> command completes, the
  1.1217 +	working directory has two parents; these will become the
  1.1218 +	parents of the new changeset.</para>
  1.1219 +
  1.1220 +      <para id="x_322">Mercurial lets you perform multiple merges, but
  1.1221 +	you must commit the results of each individual merge as you
  1.1222 +	go.  This is necessary because Mercurial only tracks two
  1.1223 +	parents for both revisions and the working directory.  While
  1.1224 +	it would be technically feasible to merge multiple changesets
  1.1225 +	at once, Mercurial avoids this for simplicity.  With multi-way
  1.1226 +	merges, the risks of user confusion, nasty conflict
  1.1227 +	resolution, and making a terrible mess of a merge would grow
  1.1228 +	intolerable.</para>
  1.1229 +
  1.1230 +    </sect2>
  1.1231 +
  1.1232 +    <sect2>
  1.1233 +      <title>Merging and renames</title>
  1.1234 +
  1.1235 +      <para id="x_69a">A surprising number of revision control systems pay little
  1.1236 +	or no attention to a file's <emphasis>name</emphasis> over
  1.1237 +	time.  For instance, it used to be common that if a file got
  1.1238 +	renamed on one side of a merge, the changes from the other
  1.1239 +	side would be silently dropped.</para>
  1.1240 +
  1.1241 +      <para id="x_69b">Mercurial records metadata when you tell it to perform a
  1.1242 +	rename or copy. It uses this metadata during a merge to do the
  1.1243 +	right thing in the case of a merge.  For instance, if I rename
  1.1244 +	a file, and you edit it without renaming it, when we merge our
  1.1245 +	work the file will be renamed and have your edits
  1.1246 +	applied.</para>
  1.1247 +    </sect2>
  1.1248 +  </sect1>
  1.1249 +
  1.1250 +  <sect1>
  1.1251 +    <title>Other interesting design features</title>
  1.1252 +
  1.1253 +    <para id="x_323">In the sections above, I've tried to highlight some of the
  1.1254 +      most important aspects of Mercurial's design, to illustrate that
  1.1255 +      it pays careful attention to reliability and performance.
  1.1256 +      However, the attention to detail doesn't stop there.  There are
  1.1257 +      a number of other aspects of Mercurial's construction that I
  1.1258 +      personally find interesting.  I'll detail a few of them here,
  1.1259 +      separate from the <quote>big ticket</quote> items above, so that
  1.1260 +      if you're interested, you can gain a better idea of the amount
  1.1261 +      of thinking that goes into a well-designed system.</para>
  1.1262 +
  1.1263 +    <sect2>
  1.1264 +      <title>Clever compression</title>
  1.1265 +
  1.1266 +      <para id="x_324">When appropriate, Mercurial will store both snapshots and
  1.1267 +	deltas in compressed form.  It does this by always
  1.1268 +	<emphasis>trying to</emphasis> compress a snapshot or delta,
  1.1269 +	but only storing the compressed version if it's smaller than
  1.1270 +	the uncompressed version.</para>
  1.1271 +
  1.1272 +      <para id="x_325">This means that Mercurial does <quote>the right
  1.1273 +	  thing</quote> when storing a file whose native form is
  1.1274 +	compressed, such as a <literal>zip</literal> archive or a JPEG
  1.1275 +	image.  When these types of files are compressed a second
  1.1276 +	time, the resulting file is usually bigger than the
  1.1277 +	once-compressed form, and so Mercurial will store the plain
  1.1278 +	<literal>zip</literal> or JPEG.</para>
  1.1279 +
  1.1280 +      <para id="x_326">Deltas between revisions of a compressed file are usually
  1.1281 +	larger than snapshots of the file, and Mercurial again does
  1.1282 +	<quote>the right thing</quote> in these cases.  It finds that
  1.1283 +	such a delta exceeds the threshold at which it should store a
  1.1284 +	complete snapshot of the file, so it stores the snapshot,
  1.1285 +	again saving space compared to a naive delta-only
  1.1286 +	approach.</para>
  1.1287 +
  1.1288 +      <sect3>
  1.1289 +	<title>Network recompression</title>
  1.1290 +
  1.1291 +	<para id="x_327">When storing revisions on disk, Mercurial uses the
  1.1292 +	  <quote>deflate</quote> compression algorithm (the same one
  1.1293 +	  used by the popular <literal>zip</literal> archive format),
  1.1294 +	  which balances good speed with a respectable compression
  1.1295 +	  ratio.  However, when transmitting revision data over a
  1.1296 +	  network connection, Mercurial uncompresses the compressed
  1.1297 +	  revision data.</para>
  1.1298 +
  1.1299 +	<para id="x_328">If the connection is over HTTP, Mercurial recompresses
  1.1300 +	  the entire stream of data using a compression algorithm that
  1.1301 +	  gives a better compression ratio (the Burrows-Wheeler
  1.1302 +	  algorithm from the widely used <literal>bzip2</literal>
  1.1303 +	  compression package).  This combination of algorithm and
  1.1304 +	  compression of the entire stream (instead of a revision at a
  1.1305 +	  time) substantially reduces the number of bytes to be
  1.1306 +	  transferred, yielding better network performance over most
  1.1307 +	  kinds of network.</para>
  1.1308 +
  1.1309 +	<para id="x_329">If the connection is over
  1.1310 +	  <command>ssh</command>, Mercurial
  1.1311 +	  <emphasis>doesn't</emphasis> recompress the stream, because
  1.1312 +	  <command>ssh</command> can already do this itself.  You can
  1.1313 +	  tell Mercurial to always use <command>ssh</command>'s
  1.1314 +	  compression feature by editing the
  1.1315 +	  <filename>.hgrc</filename> file in your home directory as
  1.1316 +	  follows.</para>
  1.1317 +
  1.1318 +	<programlisting>[ui]
  1.1319 +ssh = ssh -C</programlisting>
  1.1320 +
  1.1321 +      </sect3>
  1.1322 +    </sect2>
  1.1323 +    <sect2>
  1.1324 +      <title>Read/write ordering and atomicity</title>
  1.1325 +
  1.1326 +      <para id="x_32a">Appending to files isn't the whole story when
  1.1327 +	it comes to guaranteeing that a reader won't see a partial
  1.1328 +	write.  If you recall <xref linkend="fig:concepts:metadata"/>,
  1.1329 +	revisions in the changelog point to revisions in the manifest,
  1.1330 +	and revisions in the manifest point to revisions in filelogs.
  1.1331 +	This hierarchy is deliberate.</para>
  1.1332 +
  1.1333 +      <para id="x_32b">A writer starts a transaction by writing filelog and
  1.1334 +	manifest data, and doesn't write any changelog data until
  1.1335 +	those are finished.  A reader starts by reading changelog
  1.1336 +	data, then manifest data, followed by filelog data.</para>
  1.1337 +
  1.1338 +      <para id="x_32c">Since the writer has always finished writing filelog and
  1.1339 +	manifest data before it writes to the changelog, a reader will
  1.1340 +	never read a pointer to a partially written manifest revision
  1.1341 +	from the changelog, and it will never read a pointer to a
  1.1342 +	partially written filelog revision from the manifest.</para>
  1.1343 +
  1.1344 +    </sect2>
  1.1345 +    <sect2>
  1.1346 +      <title>Concurrent access</title>
  1.1347 +
  1.1348 +      <para id="x_32d">The read/write ordering and atomicity guarantees mean that
  1.1349 +	Mercurial never needs to <emphasis>lock</emphasis> a
  1.1350 +	repository when it's reading data, even if the repository is
  1.1351 +	being written to while the read is occurring. This has a big
  1.1352 +	effect on scalability; you can have an arbitrary number of
  1.1353 +	Mercurial processes safely reading data from a repository
  1.1354 +	all at once, no matter whether it's being written to or
  1.1355 +	not.</para>
  1.1356 +
  1.1357 +      <para id="x_32e">The lockless nature of reading means that if you're
  1.1358 +	sharing a repository on a multi-user system, you don't need to
  1.1359 +	grant other local users permission to
  1.1360 +	<emphasis>write</emphasis> to your repository in order for
  1.1361 +	them to be able to clone it or pull changes from it; they only
  1.1362 +	need <emphasis>read</emphasis> permission.  (This is
  1.1363 +	<emphasis>not</emphasis> a common feature among revision
  1.1364 +	control systems, so don't take it for granted!  Most require
  1.1365 +	readers to be able to lock a repository to access it safely,
  1.1366 +	and this requires write permission on at least one directory,
  1.1367 +	which of course makes for all kinds of nasty and annoying
  1.1368 +	security and administrative problems.)</para>
  1.1369 +
  1.1370 +      <para id="x_32f">Mercurial uses locks to ensure that only one process can
  1.1371 +	write to a repository at a time (the locking mechanism is safe
  1.1372 +	even over filesystems that are notoriously hostile to locking,
  1.1373 +	such as NFS).  If a repository is locked, a writer will wait
  1.1374 +	for a while to retry if the repository becomes unlocked, but
  1.1375 +	if the repository remains locked for too long, the process
  1.1376 +	attempting to write will time out after a while. This means
  1.1377 +	that your daily automated scripts won't get stuck forever and
  1.1378 +	pile up if a system crashes unnoticed, for example.  (Yes, the
  1.1379 +	timeout is configurable, from zero to infinity.)</para>
  1.1380 +
  1.1381 +      <sect3>
  1.1382 +	<title>Safe dirstate access</title>
  1.1383 +
  1.1384 +	<para id="x_330">As with revision data, Mercurial doesn't take a lock to
  1.1385 +	  read the dirstate file; it does acquire a lock to write it.
  1.1386 +	  To avoid the possibility of reading a partially written copy
  1.1387 +	  of the dirstate file, Mercurial writes to a file with a
  1.1388 +	  unique name in the same directory as the dirstate file, then
  1.1389 +	  renames the temporary file atomically to
  1.1390 +	  <filename>dirstate</filename>.  The file named
  1.1391 +	  <filename>dirstate</filename> is thus guaranteed to be
  1.1392 +	  complete, not partially written.</para>
  1.1393 +
  1.1394 +      </sect3>
  1.1395 +    </sect2>
  1.1396 +    <sect2>
  1.1397 +      <title>Avoiding seeks</title>
  1.1398 +
  1.1399 +      <para id="x_331">Critical to Mercurial's performance is the avoidance of
  1.1400 +	seeks of the disk head, since any seek is far more expensive
  1.1401 +	than even a comparatively large read operation.</para>
  1.1402 +
  1.1403 +      <para id="x_332">This is why, for example, the dirstate is stored in a
  1.1404 +	single file.  If there were a dirstate file per directory that
  1.1405 +	Mercurial tracked, the disk would seek once per directory.
  1.1406 +	Instead, Mercurial reads the entire single dirstate file in
  1.1407 +	one step.</para>
  1.1408 +
  1.1409 +      <para id="x_333">Mercurial also uses a <quote>copy on write</quote> scheme
  1.1410 +	when cloning a repository on local storage.  Instead of
  1.1411 +	copying every revlog file from the old repository into the new
  1.1412 +	repository, it makes a <quote>hard link</quote>, which is a
  1.1413 +	shorthand way to say <quote>these two names point to the same
  1.1414 +	  file</quote>.  When Mercurial is about to write to one of a
  1.1415 +	revlog's files, it checks to see if the number of names
  1.1416 +	pointing at the file is greater than one.  If it is, more than
  1.1417 +	one repository is using the file, so Mercurial makes a new
  1.1418 +	copy of the file that is private to this repository.</para>
  1.1419 +
  1.1420 +      <para id="x_334">A few revision control developers have pointed out that
  1.1421 +	this idea of making a complete private copy of a file is not
  1.1422 +	very efficient in its use of storage.  While this is true,
  1.1423 +	storage is cheap, and this method gives the highest
  1.1424 +	performance while deferring most book-keeping to the operating
  1.1425 +	system.  An alternative scheme would most likely reduce
  1.1426 +	performance and increase the complexity of the software, but
  1.1427 +	speed and simplicity are key to the <quote>feel</quote> of
  1.1428 +	day-to-day use.</para>
  1.1429 +
  1.1430 +    </sect2>
  1.1431 +    <sect2>
  1.1432 +      <title>Other contents of the dirstate</title>
  1.1433 +
  1.1434 +      <para id="x_335">Because Mercurial doesn't force you to tell it when you're
  1.1435 +	modifying a file, it uses the dirstate to store some extra
  1.1436 +	information so it can determine efficiently whether you have
  1.1437 +	modified a file.  For each file in the working directory, it
  1.1438 +	stores the time that it last modified the file itself, and the
  1.1439 +	size of the file at that time.</para>
  1.1440 +
  1.1441 +      <para id="x_336">When you explicitly <command role="hg-cmd">hg
  1.1442 +	  add</command>, <command role="hg-cmd">hg remove</command>,
  1.1443 +	<command role="hg-cmd">hg rename</command> or <command
  1.1444 +	  role="hg-cmd">hg copy</command> files, Mercurial updates the
  1.1445 +	dirstate so that it knows what to do with those files when you
  1.1446 +	commit.</para>
  1.1447 +
  1.1448 +      <para id="x_337">The dirstate helps Mercurial to efficiently
  1.1449 +	  check the status of files in a repository.</para>
  1.1450 +
  1.1451 +      <itemizedlist>
  1.1452 +	<listitem>
  1.1453 +	  <para id="x_726">When Mercurial checks the state of a file in the
  1.1454 +	    working directory, it first checks a file's modification
  1.1455 +	    time against the time in the dirstate that records when
  1.1456 +	    Mercurial last wrote the file. If the last modified time
  1.1457 +	    is the same as the time when Mercurial wrote the file, the
  1.1458 +	    file must not have been modified, so Mercurial does not
  1.1459 +	    need to check any further.</para>
  1.1460 +	</listitem>
  1.1461 +	<listitem>
  1.1462 +	  <para id="x_727">If the file's size has changed, the file must have
  1.1463 +	    been modified.  If the modification time has changed, but
  1.1464 +	    the size has not, only then does Mercurial need to
  1.1465 +	    actually read the contents of the file to see if it has
  1.1466 +	    changed.</para>
  1.1467 +	</listitem>
  1.1468 +      </itemizedlist>
  1.1469 +
  1.1470 +      <para id="x_728">Storing the modification time and size dramatically
  1.1471 +	reduces the number of read operations that Mercurial needs to
  1.1472 +	perform when we run commands like <command>hg status</command>.
  1.1473 +	This results in large performance improvements.</para>
  1.1474 +    </sect2>
  1.1475 +  </sect1>
  1.1476  </chapter>
  1.1477  
  1.1478  <!--
  1.1479  local variables: 
  1.1480  sgml-parent-document: ("00book.xml" "book" "chapter")
  1.1481  end:
  1.1482 --->
  1.1483 \ No newline at end of file
  1.1484 +-->