bos@559: <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
bos@559: 
bos@559: <chapter id="chap:concepts">
bos@572:   <?dbhtml filename="behind-the-scenes.html"?>
bos@559:   <title>Behind the scenes</title>
bos@559: 
bos@620:   <para id="x_2e8">Unlike many revision control systems, the concepts
bos@620:     upon which Mercurial is built are simple enough that it's easy to
bos@620:     understand how the software really works.  Knowing these details
bos@620:     certainly isn't necessary, so it is certainly safe to skip this
bos@620:     chapter.  However, I think you will get more out of the software
bos@620:     with a <quote>mental model</quote> of what's going on.</para>
bos@620: 
bos@620:   <para id="x_2e9">Being able to understand what's going on behind the
bos@620:     scenes gives me confidence that Mercurial has been carefully
bos@620:     designed to be both <emphasis>safe</emphasis> and
bos@559:     <emphasis>efficient</emphasis>.  And just as importantly, if it's
bos@559:     easy for me to retain a good idea of what the software is doing
bos@559:     when I perform a revision control task, I'm less likely to be
bos@672:     surprised by its behavior.</para>
bos@559: 
bos@584:   <para id="x_2ea">In this chapter, we'll initially cover the core concepts
bos@559:     behind Mercurial's design, then continue to discuss some of the
bos@559:     interesting details of its implementation.</para>
bos@559: 
bos@559:   <sect1>
bos@559:     <title>Mercurial's historical record</title>
bos@559: 
bos@559:     <sect2>
bos@559:       <title>Tracking the history of a single file</title>
bos@559: 
bos@584:       <para id="x_2eb">When Mercurial tracks modifications to a file, it stores
bos@559: 	the history of that file in a metadata object called a
bos@559: 	<emphasis>filelog</emphasis>.  Each entry in the filelog
bos@559: 	contains enough information to reconstruct one revision of the
bos@559: 	file that is being tracked.  Filelogs are stored as files in
bos@559: 	the <filename role="special"
bos@559: 	  class="directory">.hg/store/data</filename> directory.  A
bos@559: 	filelog contains two kinds of information: revision data, and
bos@559: 	an index to help Mercurial to find a revision
bos@559: 	efficiently.</para>
bos@559: 
bos@584:       <para id="x_2ec">A file that is large, or has a lot of history, has its
bos@559: 	filelog stored in separate data
bos@559: 	(<quote><literal>.d</literal></quote> suffix) and index
bos@559: 	(<quote><literal>.i</literal></quote> suffix) files.  For
bos@559: 	small files without much history, the revision data and index
bos@559: 	are combined in a single <quote><literal>.i</literal></quote>
bos@559: 	file.  The correspondence between a file in the working
bos@559: 	directory and the filelog that tracks its history in the
bos@592: 	repository is illustrated in <xref
bos@559: 	  linkend="fig:concepts:filelog"/>.</para>
bos@559: 
bos@591:       <figure id="fig:concepts:filelog">
bos@591: 	<title>Relationships between files in working directory and
bos@591: 	  filelogs in repository</title>
bos@591: 	<mediaobject>
bos@594: 	  <imageobject><imagedata fileref="figs/filelog.png"/></imageobject>
bos@591: 	  <textobject><phrase>XXX add text</phrase></textobject>
bos@591: 	</mediaobject>
bos@591:       </figure>
bos@559: 
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Managing tracked files</title>
bos@559: 
bos@584:       <para id="x_2ee">Mercurial uses a structure called a
bos@559: 	<emphasis>manifest</emphasis> to collect together information
bos@559: 	about the files that it tracks.  Each entry in the manifest
bos@559: 	contains information about the files present in a single
bos@559: 	changeset.  An entry records which files are present in the
bos@559: 	changeset, the revision of each file, and a few other pieces
bos@559: 	of file metadata.</para>
bos@559: 
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Recording changeset information</title>
bos@559: 
bos@584:       <para id="x_2ef">The <emphasis>changelog</emphasis> contains information
bos@559: 	about each changeset.  Each revision records who committed a
bos@559: 	change, the changeset comment, other pieces of
bos@559: 	changeset-related information, and the revision of the
bos@559: 	manifest to use.</para>
bos@559: 
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Relationships between revisions</title>
bos@559: 
bos@584:       <para id="x_2f0">Within a changelog, a manifest, or a filelog, each
bos@559: 	revision stores a pointer to its immediate parent (or to its
bos@559: 	two parents, if it's a merge revision).  As I mentioned above,
bos@559: 	there are also relationships between revisions
bos@559: 	<emphasis>across</emphasis> these structures, and they are
bos@559: 	hierarchical in nature.</para>
bos@559: 
bos@584:       <para id="x_2f1">For every changeset in a repository, there is exactly one
bos@559: 	revision stored in the changelog.  Each revision of the
bos@559: 	changelog contains a pointer to a single revision of the
bos@559: 	manifest.  A revision of the manifest stores a pointer to a
bos@559: 	single revision of each filelog tracked when that changeset
bos@592: 	was created.  These relationships are illustrated in
bos@559: 	<xref linkend="fig:concepts:metadata"/>.</para>
bos@559: 
bos@591:       <figure id="fig:concepts:metadata">
bos@591: 	<title>Metadata relationships</title>
bos@591: 	<mediaobject>
bos@594: 	  <imageobject><imagedata fileref="figs/metadata.png"/></imageobject>
bos@591: 	  <textobject><phrase>XXX add text</phrase></textobject>
bos@559: 	</mediaobject>
bos@591:       </figure>
bos@559: 
bos@584:       <para id="x_2f3">As the illustration shows, there is
bos@559: 	<emphasis>not</emphasis> a <quote>one to one</quote>
bos@559: 	relationship between revisions in the changelog, manifest, or
bos@701: 	filelog. If a file that
bos@559: 	Mercurial tracks hasn't changed between two changesets, the
bos@559: 	entry for that file in the two revisions of the manifest will
bos@701: 	point to the same revision of its filelog<footnote>
bos@702: 	  <para id="x_725">It is possible (though unusual) for the manifest to
bos@701: 	    remain the same between two changesets, in which case the
bos@701: 	    changelog entries for those changesets will point to the
bos@701: 	    same revision of the manifest.</para>
bos@701: 	</footnote>.</para>
bos@559: 
bos@559:     </sect2>
bos@559:   </sect1>
bos@559:   <sect1>
bos@559:     <title>Safe, efficient storage</title>
bos@559: 
bos@584:     <para id="x_2f4">The underpinnings of changelogs, manifests, and filelogs are
bos@559:       provided by a single structure called the
bos@559:       <emphasis>revlog</emphasis>.</para>
bos@559: 
bos@559:     <sect2>
bos@559:       <title>Efficient storage</title>
bos@559: 
bos@584:       <para id="x_2f5">The revlog provides efficient storage of revisions using a
bos@559: 	<emphasis>delta</emphasis> mechanism.  Instead of storing a
bos@559: 	complete copy of a file for each revision, it stores the
bos@559: 	changes needed to transform an older revision into the new
bos@559: 	revision.  For many kinds of file data, these deltas are
bos@559: 	typically a fraction of a percent of the size of a full copy
bos@559: 	of a file.</para>
bos@559: 
bos@584:       <para id="x_2f6">Some obsolete revision control systems can only work with
bos@559: 	deltas of text files.  They must either store binary files as
bos@559: 	complete snapshots or encoded into a text representation, both
bos@559: 	of which are wasteful approaches.  Mercurial can efficiently
bos@559: 	handle deltas of files with arbitrary binary contents; it
bos@559: 	doesn't need to treat text as special.</para>
bos@559: 
bos@559:     </sect2>
bos@559:     <sect2 id="sec:concepts:txn">
bos@559:       <title>Safe operation</title>
bos@559: 
bos@584:       <para id="x_2f7">Mercurial only ever <emphasis>appends</emphasis> data to
bos@559: 	the end of a revlog file. It never modifies a section of a
bos@559: 	file after it has written it.  This is both more robust and
bos@559: 	efficient than schemes that need to modify or rewrite
bos@559: 	data.</para>
bos@559: 
bos@584:       <para id="x_2f8">In addition, Mercurial treats every write as part of a
bos@559: 	<emphasis>transaction</emphasis> that can span a number of
bos@559: 	files.  A transaction is <emphasis>atomic</emphasis>: either
bos@559: 	the entire transaction succeeds and its effects are all
bos@559: 	visible to readers in one go, or the whole thing is undone.
bos@559: 	This guarantee of atomicity means that if you're running two
bos@559: 	copies of Mercurial, where one is reading data and one is
bos@559: 	writing it, the reader will never see a partially written
bos@559: 	result that might confuse it.</para>
bos@559: 
bos@584:       <para id="x_2f9">The fact that Mercurial only appends to files makes it
bos@559: 	easier to provide this transactional guarantee.  The easier it
bos@559: 	is to do stuff like this, the more confident you should be
bos@559: 	that it's done correctly.</para>
bos@559: 
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Fast retrieval</title>
bos@559: 
bos@701:       <para id="x_2fa">Mercurial cleverly avoids a pitfall common to
bos@701: 	all earlier revision control systems: the problem of
bos@701: 	<emphasis>inefficient retrieval</emphasis>. Most revision
bos@701: 	control systems store the contents of a revision as an
bos@701: 	incremental series of modifications against a
bos@701: 	<quote>snapshot</quote>.  (Some base the snapshot on the
bos@701: 	oldest revision, others on the newest.)  To reconstruct a
bos@701: 	specific revision, you must first read the snapshot, and then
bos@701: 	every one of the revisions between the snapshot and your
bos@701: 	target revision.  The more history that a file accumulates,
bos@701: 	the more revisions you must read, hence the longer it takes to
bos@701: 	reconstruct a particular revision.</para>
bos@559: 
bos@591:       <figure id="fig:concepts:snapshot">
bos@591: 	<title>Snapshot of a revlog, with incremental deltas</title>
bos@591: 	<mediaobject>
bos@594: 	  <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject>
bos@591: 	  <textobject><phrase>XXX add text</phrase></textobject>
bos@591: 	</mediaobject>
bos@591:       </figure>
bos@559: 
bos@584:       <para id="x_2fc">The innovation that Mercurial applies to this problem is
bos@559: 	simple but effective.  Once the cumulative amount of delta
bos@559: 	information stored since the last snapshot exceeds a fixed
bos@559: 	threshold, it stores a new snapshot (compressed, of course),
bos@559: 	instead of another delta.  This makes it possible to
bos@559: 	reconstruct <emphasis>any</emphasis> revision of a file
bos@559: 	quickly.  This approach works so well that it has since been
bos@559: 	copied by several other revision control systems.</para>
bos@559: 
bos@592:       <para id="x_2fd"><xref linkend="fig:concepts:snapshot"/> illustrates
bos@559: 	the idea.  In an entry in a revlog's index file, Mercurial
bos@559: 	stores the range of entries from the data file that it must
bos@559: 	read to reconstruct a particular revision.</para>
bos@559: 
bos@559:       <sect3>
bos@559: 	<title>Aside: the influence of video compression</title>
bos@559: 
bos@701: 	<para id="x_2fe">If you're familiar with video compression or
bos@701: 	  have ever watched a TV feed through a digital cable or
bos@701: 	  satellite service, you may know that most video compression
bos@701: 	  schemes store each frame of video as a delta against its
bos@701: 	  predecessor frame.</para>
bos@701: 
bos@701: 	<para id="x_2ff">Mercurial borrows this idea to make it
bos@701: 	  possible to reconstruct a revision from a snapshot and a
bos@701: 	  small number of deltas.</para>
bos@559: 
bos@559:       </sect3>
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Identification and strong integrity</title>
bos@559: 
bos@584:       <para id="x_300">Along with delta or snapshot information, a revlog entry
bos@559: 	contains a cryptographic hash of the data that it represents.
bos@559: 	This makes it difficult to forge the contents of a revision,
bos@559: 	and easy to detect accidental corruption.</para>
bos@559: 
bos@584:       <para id="x_301">Hashes provide more than a mere check against corruption;
bos@559: 	they are used as the identifiers for revisions.  The changeset
bos@559: 	identification hashes that you see as an end user are from
bos@559: 	revisions of the changelog.  Although filelogs and the
bos@559: 	manifest also use hashes, Mercurial only uses these behind the
bos@559: 	scenes.</para>
bos@559: 
bos@584:       <para id="x_302">Mercurial verifies that hashes are correct when it
bos@559: 	retrieves file revisions and when it pulls changes from
bos@559: 	another repository.  If it encounters an integrity problem, it
bos@559: 	will complain and stop whatever it's doing.</para>
bos@559: 
bos@584:       <para id="x_303">In addition to the effect it has on retrieval efficiency,
bos@559: 	Mercurial's use of periodic snapshots makes it more robust
bos@559: 	against partial data corruption.  If a revlog becomes partly
bos@559: 	corrupted due to a hardware error or system bug, it's often
bos@559: 	possible to reconstruct some or most revisions from the
bos@559: 	uncorrupted sections of the revlog, both before and after the
bos@559: 	corrupted section.  This would not be possible with a
bos@559: 	delta-only storage model.</para>
bos@559:     </sect2>
bos@559:   </sect1>
bos@701: 
bos@559:   <sect1>
bos@559:     <title>Revision history, branching, and merging</title>
bos@559: 
bos@584:     <para id="x_304">Every entry in a Mercurial revlog knows the identity of its
bos@559:       immediate ancestor revision, usually referred to as its
bos@559:       <emphasis>parent</emphasis>.  In fact, a revision contains room
bos@559:       for not one parent, but two.  Mercurial uses a special hash,
bos@559:       called the <quote>null ID</quote>, to represent the idea
bos@559:       <quote>there is no parent here</quote>.  This hash is simply a
bos@559:       string of zeroes.</para>
bos@559: 
bos@592:     <para id="x_305">In <xref linkend="fig:concepts:revlog"/>, you can see
bos@559:       an example of the conceptual structure of a revlog.  Filelogs,
bos@559:       manifests, and changelogs all have this same structure; they
bos@559:       differ only in the kind of data stored in each delta or
bos@559:       snapshot.</para>
bos@559: 
bos@584:     <para id="x_306">The first revision in a revlog (at the bottom of the image)
bos@559:       has the null ID in both of its parent slots.  For a
bos@559:       <quote>normal</quote> revision, its first parent slot contains
bos@559:       the ID of its parent revision, and its second contains the null
bos@559:       ID, indicating that the revision has only one real parent.  Any
bos@559:       two revisions that have the same parent ID are branches.  A
bos@559:       revision that represents a merge between branches has two normal
bos@559:       revision IDs in its parent slots.</para>
bos@559: 
bos@591:     <figure id="fig:concepts:revlog">
bos@591:       <title>The conceptual structure of a revlog</title>
bos@591:       <mediaobject>
bos@594: 	<imageobject><imagedata fileref="figs/revlog.png"/></imageobject>
bos@591: 	<textobject><phrase>XXX add text</phrase></textobject>
bos@591:       </mediaobject>
bos@591:     </figure>
bos@559: 
bos@559:   </sect1>
bos@559:   <sect1>
bos@559:     <title>The working directory</title>
bos@559: 
bos@584:     <para id="x_307">In the working directory, Mercurial stores a snapshot of the
bos@559:       files from the repository as of a particular changeset.</para>
bos@559: 
bos@584:     <para id="x_308">The working directory <quote>knows</quote> which changeset
bos@559:       it contains.  When you update the working directory to contain a
bos@559:       particular changeset, Mercurial looks up the appropriate
bos@559:       revision of the manifest to find out which files it was tracking
bos@559:       at the time that changeset was committed, and which revision of
bos@559:       each file was then current.  It then recreates a copy of each of
bos@559:       those files, with the same contents it had when the changeset
bos@559:       was committed.</para>
bos@559: 
bos@701:     <para id="x_309">The <emphasis>dirstate</emphasis> is a special
bos@701:       structure that contains Mercurial's knowledge of the working
bos@701:       directory.  It is maintained as a file named
bos@701:       <filename>.hg/dirstate</filename> inside a repository.  The
bos@701:       dirstate details which changeset the working directory is
bos@701:       updated to, and all of the files that Mercurial is tracking in
bos@701:       the working directory. It also lets Mercurial quickly notice
bos@701:       changed files, by recording their checkout times and
bos@701:       sizes.</para>
bos@559: 
bos@584:     <para id="x_30a">Just as a revision of a revlog has room for two parents, so
bos@559:       that it can represent either a normal revision (with one parent)
bos@559:       or a merge of two earlier revisions, the dirstate has slots for
bos@559:       two parents.  When you use the <command role="hg-cmd">hg
bos@559: 	update</command> command, the changeset that you update to is
bos@559:       stored in the <quote>first parent</quote> slot, and the null ID
bos@559:       in the second. When you <command role="hg-cmd">hg
bos@559: 	merge</command> with another changeset, the first parent
bos@559:       remains unchanged, and the second parent is filled in with the
bos@559:       changeset you're merging with.  The <command role="hg-cmd">hg
bos@559: 	parents</command> command tells you what the parents of the
bos@559:       dirstate are.</para>
bos@559: 
bos@559:     <sect2>
bos@559:       <title>What happens when you commit</title>
bos@559: 
bos@584:       <para id="x_30b">The dirstate stores parent information for more than just
bos@559: 	book-keeping purposes.  Mercurial uses the parents of the
bos@559: 	dirstate as <emphasis>the parents of a new
bos@559: 	  changeset</emphasis> when you perform a commit.</para>
bos@559: 
bos@591:       <figure id="fig:concepts:wdir">
bos@591: 	<title>The working directory can have two parents</title>
bos@591: 	<mediaobject>
bos@594: 	  <imageobject><imagedata fileref="figs/wdir.png"/></imageobject>
bos@591: 	  <textobject><phrase>XXX add text</phrase></textobject>
bos@591: 	</mediaobject>
bos@591:       </figure>
bos@559: 
bos@592:       <para id="x_30d"><xref linkend="fig:concepts:wdir"/> shows the
bos@559: 	normal state of the working directory, where it has a single
bos@559: 	changeset as parent.  That changeset is the
bos@559: 	<emphasis>tip</emphasis>, the newest changeset in the
bos@559: 	repository that has no children.</para>
bos@559: 
bos@591:       <figure id="fig:concepts:wdir-after-commit">
bos@591: 	<title>The working directory gains new parents after a
bos@591: 	  commit</title>
bos@591: 	<mediaobject>
bos@594: 	  <imageobject><imagedata fileref="figs/wdir-after-commit.png"/></imageobject>
bos@591: 	  <textobject><phrase>XXX add text</phrase></textobject>
bos@591: 	</mediaobject>
bos@591:       </figure>
bos@559: 
bos@584:       <para id="x_30f">It's useful to think of the working directory as
bos@559: 	<quote>the changeset I'm about to commit</quote>.  Any files
bos@559: 	that you tell Mercurial that you've added, removed, renamed,
bos@559: 	or copied will be reflected in that changeset, as will
bos@559: 	modifications to any files that Mercurial is already tracking;
bos@559: 	the new changeset will have the parents of the working
bos@559: 	directory as its parents.</para>
bos@559: 
bos@592:       <para id="x_310">After a commit, Mercurial will update the
bos@592: 	parents of the working directory, so that the first parent is
bos@592: 	the ID of the new changeset, and the second is the null ID.
bos@592: 	This is shown in <xref
bos@592: 	  linkend="fig:concepts:wdir-after-commit"/>. Mercurial
bos@559: 	doesn't touch any of the files in the working directory when
bos@559: 	you commit; it just modifies the dirstate to note its new
bos@559: 	parents.</para>
bos@559: 
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Creating a new head</title>
bos@559: 
bos@584:       <para id="x_311">It's perfectly normal to update the working directory to a
bos@559: 	changeset other than the current tip.  For example, you might
bos@559: 	want to know what your project looked like last Tuesday, or
bos@559: 	you could be looking through changesets to see which one
bos@559: 	introduced a bug.  In cases like this, the natural thing to do
bos@559: 	is update the working directory to the changeset you're
bos@559: 	interested in, and then examine the files in the working
bos@559: 	directory directly to see their contents as they were when you
bos@559: 	committed that changeset.  The effect of this is shown in
bos@592: 	<xref linkend="fig:concepts:wdir-pre-branch"/>.</para>
bos@559: 
bos@591:       <figure id="fig:concepts:wdir-pre-branch">
bos@591: 	<title>The working directory, updated to an older
bos@591: 	  changeset</title>
bos@591: 	<mediaobject>
bos@594: 	  <imageobject><imagedata fileref="figs/wdir-pre-branch.png"/></imageobject>
bos@591: 	  <textobject><phrase>XXX add text</phrase></textobject>
bos@591: 	</mediaobject>
bos@591:       </figure>
bos@559: 
bos@592:       <para id="x_313">Having updated the working directory to an
bos@592: 	older changeset, what happens if you make some changes, and
bos@592: 	then commit?  Mercurial behaves in the same way as I outlined
bos@559: 	above.  The parents of the working directory become the
bos@559: 	parents of the new changeset.  This new changeset has no
bos@559: 	children, so it becomes the new tip.  And the repository now
bos@559: 	contains two changesets that have no children; we call these
bos@559: 	<emphasis>heads</emphasis>.  You can see the structure that
bos@592: 	this creates in <xref
bos@559: 	  linkend="fig:concepts:wdir-branch"/>.</para>
bos@559: 
bos@591:       <figure id="fig:concepts:wdir-branch">
bos@591: 	<title>After a commit made while synced to an older
bos@591: 	  changeset</title>
bos@591: 	<mediaobject>
bos@594: 	  <imageobject><imagedata fileref="figs/wdir-branch.png"/></imageobject>
bos@591: 	  <textobject><phrase>XXX add text</phrase></textobject>
bos@591: 	</mediaobject>
bos@591:       </figure>
bos@559: 
bos@559:       <note>
bos@701: 	<para id="x_315">If you're new to Mercurial, you should keep
bos@701: 	  in mind a common <quote>error</quote>, which is to use the
bos@701: 	  <command role="hg-cmd">hg pull</command> command without any
bos@559: 	  options.  By default, the <command role="hg-cmd">hg
bos@559: 	    pull</command> command <emphasis>does not</emphasis>
bos@559: 	  update the working directory, so you'll bring new changesets
bos@559: 	  into your repository, but the working directory will stay
bos@559: 	  synced at the same changeset as before the pull.  If you
bos@559: 	  make some changes and commit afterwards, you'll thus create
bos@559: 	  a new head, because your working directory isn't synced to
bos@701: 	  whatever the current tip is.  To combine the operation of a
bos@701: 	  pull, followed by an update, run <command>hg pull
bos@701: 	    -u</command>.</para>
bos@701: 
bos@701: 	<para id="x_316">I put the word <quote>error</quote> in quotes
bos@701: 	  because all that you need to do to rectify the situation
bos@701: 	  where you created a new head by accident is
bos@701: 	  <command role="hg-cmd">hg merge</command>, then <command
bos@701: 	    role="hg-cmd">hg commit</command>.  In other words, this
bos@701: 	  almost never has negative consequences; it's just something
bos@701: 	  of a surprise for newcomers.  I'll discuss other ways to
bos@701: 	  avoid this behavior, and why Mercurial behaves in this
bos@701: 	  initially surprising way, later on.</para>
bos@559:       </note>
bos@559: 
bos@559:     </sect2>
bos@559:     <sect2>
bos@620:       <title>Merging changes</title>
bos@559: 
bos@592:       <para id="x_317">When you run the <command role="hg-cmd">hg
bos@592: 	  merge</command> command, Mercurial leaves the first parent
bos@592: 	of the working directory unchanged, and sets the second parent
bos@592: 	to the changeset you're merging with, as shown in <xref
bos@559: 	  linkend="fig:concepts:wdir-merge"/>.</para>
bos@559: 
bos@591:       <figure id="fig:concepts:wdir-merge">
bos@591: 	<title>Merging two heads</title>
bos@591: 	<mediaobject>
bos@591: 	  <imageobject>
bos@594: 	    <imagedata fileref="figs/wdir-merge.png"/>
bos@591: 	  </imageobject>
bos@591: 	  <textobject><phrase>XXX add text</phrase></textobject>
bos@591: 	</mediaobject>
bos@591:       </figure>
bos@559: 
bos@584:       <para id="x_319">Mercurial also has to modify the working directory, to
bos@559: 	merge the files managed in the two changesets.  Simplified a
bos@559: 	little, the merging process goes like this, for every file in
bos@559: 	the manifests of both changesets.</para>
bos@559:       <itemizedlist>
bos@584: 	<listitem><para id="x_31a">If neither changeset has modified a file, do
bos@559: 	    nothing with that file.</para>
bos@559: 	</listitem>
bos@584: 	<listitem><para id="x_31b">If one changeset has modified a file, and the
bos@559: 	    other hasn't, create the modified copy of the file in the
bos@559: 	    working directory.</para>
bos@559: 	</listitem>
bos@584: 	<listitem><para id="x_31c">If one changeset has removed a file, and the
bos@559: 	    other hasn't (or has also deleted it), delete the file
bos@559: 	    from the working directory.</para>
bos@559: 	</listitem>
bos@584: 	<listitem><para id="x_31d">If one changeset has removed a file, but the
bos@559: 	    other has modified the file, ask the user what to do: keep
bos@559: 	    the modified file, or remove it?</para>
bos@559: 	</listitem>
bos@584: 	<listitem><para id="x_31e">If both changesets have modified a file,
bos@559: 	    invoke an external merge program to choose the new
bos@559: 	    contents for the merged file.  This may require input from
bos@559: 	    the user.</para>
bos@559: 	</listitem>
bos@584: 	<listitem><para id="x_31f">If one changeset has modified a file, and the
bos@559: 	    other has renamed or copied the file, make sure that the
bos@559: 	    changes follow the new name of the file.</para>
bos@559: 	</listitem></itemizedlist>
bos@584:       <para id="x_320">There are more details&emdash;merging has plenty of corner
bos@559: 	cases&emdash;but these are the most common choices that are
bos@559: 	involved in a merge.  As you can see, most cases are
bos@559: 	completely automatic, and indeed most merges finish
bos@559: 	automatically, without requiring your input to resolve any
bos@559: 	conflicts.</para>
bos@559: 
bos@584:       <para id="x_321">When you're thinking about what happens when you commit
bos@559: 	after a merge, once again the working directory is <quote>the
bos@559: 	  changeset I'm about to commit</quote>.  After the <command
bos@559: 	  role="hg-cmd">hg merge</command> command completes, the
bos@559: 	working directory has two parents; these will become the
bos@559: 	parents of the new changeset.</para>
bos@559: 
bos@701:       <para id="x_322">Mercurial lets you perform multiple merges, but
bos@701: 	you must commit the results of each individual merge as you
bos@701: 	go.  This is necessary because Mercurial only tracks two
bos@701: 	parents for both revisions and the working directory.  While
bos@701: 	it would be technically feasible to merge multiple changesets
bos@701: 	at once, Mercurial avoids this for simplicity.  With multi-way
bos@701: 	merges, the risks of user confusion, nasty conflict
bos@701: 	resolution, and making a terrible mess of a merge would grow
bos@701: 	intolerable.</para>
bos@559: 
bos@559:     </sect2>
bos@620: 
bos@620:     <sect2>
bos@620:       <title>Merging and renames</title>
bos@620: 
bos@676:       <para id="x_69a">A surprising number of revision control systems pay little
bos@620: 	or no attention to a file's <emphasis>name</emphasis> over
bos@620: 	time.  For instance, it used to be common that if a file got
bos@620: 	renamed on one side of a merge, the changes from the other
bos@620: 	side would be silently dropped.</para>
bos@620: 
bos@676:       <para id="x_69b">Mercurial records metadata when you tell it to perform a
bos@620: 	rename or copy. It uses this metadata during a merge to do the
bos@620: 	right thing in the case of a merge.  For instance, if I rename
bos@620: 	a file, and you edit it without renaming it, when we merge our
bos@620: 	work the file will be renamed and have your edits
bos@620: 	applied.</para>
bos@620:     </sect2>
bos@559:   </sect1>
bos@620: 
bos@559:   <sect1>
bos@559:     <title>Other interesting design features</title>
bos@559: 
bos@584:     <para id="x_323">In the sections above, I've tried to highlight some of the
bos@559:       most important aspects of Mercurial's design, to illustrate that
bos@559:       it pays careful attention to reliability and performance.
bos@559:       However, the attention to detail doesn't stop there.  There are
bos@559:       a number of other aspects of Mercurial's construction that I
bos@559:       personally find interesting.  I'll detail a few of them here,
bos@559:       separate from the <quote>big ticket</quote> items above, so that
bos@559:       if you're interested, you can gain a better idea of the amount
bos@559:       of thinking that goes into a well-designed system.</para>
bos@559: 
bos@559:     <sect2>
bos@559:       <title>Clever compression</title>
bos@559: 
bos@584:       <para id="x_324">When appropriate, Mercurial will store both snapshots and
bos@559: 	deltas in compressed form.  It does this by always
bos@559: 	<emphasis>trying to</emphasis> compress a snapshot or delta,
bos@559: 	but only storing the compressed version if it's smaller than
bos@559: 	the uncompressed version.</para>
bos@559: 
bos@584:       <para id="x_325">This means that Mercurial does <quote>the right
bos@559: 	  thing</quote> when storing a file whose native form is
bos@559: 	compressed, such as a <literal>zip</literal> archive or a JPEG
bos@559: 	image.  When these types of files are compressed a second
bos@559: 	time, the resulting file is usually bigger than the
bos@559: 	once-compressed form, and so Mercurial will store the plain
bos@559: 	<literal>zip</literal> or JPEG.</para>
bos@559: 
bos@584:       <para id="x_326">Deltas between revisions of a compressed file are usually
bos@559: 	larger than snapshots of the file, and Mercurial again does
bos@559: 	<quote>the right thing</quote> in these cases.  It finds that
bos@559: 	such a delta exceeds the threshold at which it should store a
bos@559: 	complete snapshot of the file, so it stores the snapshot,
bos@559: 	again saving space compared to a naive delta-only
bos@559: 	approach.</para>
bos@559: 
bos@559:       <sect3>
bos@559: 	<title>Network recompression</title>
bos@559: 
bos@584: 	<para id="x_327">When storing revisions on disk, Mercurial uses the
bos@559: 	  <quote>deflate</quote> compression algorithm (the same one
bos@559: 	  used by the popular <literal>zip</literal> archive format),
bos@559: 	  which balances good speed with a respectable compression
bos@559: 	  ratio.  However, when transmitting revision data over a
bos@559: 	  network connection, Mercurial uncompresses the compressed
bos@559: 	  revision data.</para>
bos@559: 
bos@584: 	<para id="x_328">If the connection is over HTTP, Mercurial recompresses
bos@559: 	  the entire stream of data using a compression algorithm that
bos@559: 	  gives a better compression ratio (the Burrows-Wheeler
bos@559: 	  algorithm from the widely used <literal>bzip2</literal>
bos@559: 	  compression package).  This combination of algorithm and
bos@559: 	  compression of the entire stream (instead of a revision at a
bos@559: 	  time) substantially reduces the number of bytes to be
bos@620: 	  transferred, yielding better network performance over most
bos@620: 	  kinds of network.</para>
bos@559: 
bos@701: 	<para id="x_329">If the connection is over
bos@701: 	  <command>ssh</command>, Mercurial
bos@701: 	  <emphasis>doesn't</emphasis> recompress the stream, because
bos@701: 	  <command>ssh</command> can already do this itself.  You can
bos@701: 	  tell Mercurial to always use <command>ssh</command>'s
bos@701: 	  compression feature by editing the
bos@701: 	  <filename>.hgrc</filename> file in your home directory as
bos@701: 	  follows.</para>
bos@701: 
bos@701: 	<programlisting>[ui]
bos@701: ssh = ssh -C</programlisting>
bos@559: 
bos@559:       </sect3>
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Read/write ordering and atomicity</title>
bos@559: 
bos@592:       <para id="x_32a">Appending to files isn't the whole story when
bos@592: 	it comes to guaranteeing that a reader won't see a partial
bos@592: 	write.  If you recall <xref linkend="fig:concepts:metadata"/>,
bos@701: 	revisions in the changelog point to revisions in the manifest,
bos@701: 	and revisions in the manifest point to revisions in filelogs.
bos@592: 	This hierarchy is deliberate.</para>
bos@559: 
bos@584:       <para id="x_32b">A writer starts a transaction by writing filelog and
bos@559: 	manifest data, and doesn't write any changelog data until
bos@559: 	those are finished.  A reader starts by reading changelog
bos@559: 	data, then manifest data, followed by filelog data.</para>
bos@559: 
bos@584:       <para id="x_32c">Since the writer has always finished writing filelog and
bos@559: 	manifest data before it writes to the changelog, a reader will
bos@559: 	never read a pointer to a partially written manifest revision
bos@559: 	from the changelog, and it will never read a pointer to a
bos@559: 	partially written filelog revision from the manifest.</para>
bos@559: 
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Concurrent access</title>
bos@559: 
bos@584:       <para id="x_32d">The read/write ordering and atomicity guarantees mean that
bos@559: 	Mercurial never needs to <emphasis>lock</emphasis> a
bos@559: 	repository when it's reading data, even if the repository is
bos@559: 	being written to while the read is occurring. This has a big
bos@559: 	effect on scalability; you can have an arbitrary number of
bos@559: 	Mercurial processes safely reading data from a repository
bos@701: 	all at once, no matter whether it's being written to or
bos@559: 	not.</para>
bos@559: 
bos@584:       <para id="x_32e">The lockless nature of reading means that if you're
bos@559: 	sharing a repository on a multi-user system, you don't need to
bos@559: 	grant other local users permission to
bos@559: 	<emphasis>write</emphasis> to your repository in order for
bos@559: 	them to be able to clone it or pull changes from it; they only
bos@559: 	need <emphasis>read</emphasis> permission.  (This is
bos@559: 	<emphasis>not</emphasis> a common feature among revision
bos@559: 	control systems, so don't take it for granted!  Most require
bos@559: 	readers to be able to lock a repository to access it safely,
bos@559: 	and this requires write permission on at least one directory,
bos@559: 	which of course makes for all kinds of nasty and annoying
bos@559: 	security and administrative problems.)</para>
bos@559: 
bos@584:       <para id="x_32f">Mercurial uses locks to ensure that only one process can
bos@559: 	write to a repository at a time (the locking mechanism is safe
bos@559: 	even over filesystems that are notoriously hostile to locking,
bos@559: 	such as NFS).  If a repository is locked, a writer will wait
bos@559: 	for a while to retry if the repository becomes unlocked, but
bos@559: 	if the repository remains locked for too long, the process
bos@559: 	attempting to write will time out after a while. This means
bos@559: 	that your daily automated scripts won't get stuck forever and
bos@559: 	pile up if a system crashes unnoticed, for example.  (Yes, the
bos@559: 	timeout is configurable, from zero to infinity.)</para>
bos@559: 
bos@559:       <sect3>
bos@559: 	<title>Safe dirstate access</title>
bos@559: 
bos@584: 	<para id="x_330">As with revision data, Mercurial doesn't take a lock to
bos@559: 	  read the dirstate file; it does acquire a lock to write it.
bos@559: 	  To avoid the possibility of reading a partially written copy
bos@559: 	  of the dirstate file, Mercurial writes to a file with a
bos@559: 	  unique name in the same directory as the dirstate file, then
bos@559: 	  renames the temporary file atomically to
bos@559: 	  <filename>dirstate</filename>.  The file named
bos@559: 	  <filename>dirstate</filename> is thus guaranteed to be
bos@559: 	  complete, not partially written.</para>
bos@559: 
bos@559:       </sect3>
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Avoiding seeks</title>
bos@559: 
bos@584:       <para id="x_331">Critical to Mercurial's performance is the avoidance of
bos@559: 	seeks of the disk head, since any seek is far more expensive
bos@559: 	than even a comparatively large read operation.</para>
bos@559: 
bos@584:       <para id="x_332">This is why, for example, the dirstate is stored in a
bos@559: 	single file.  If there were a dirstate file per directory that
bos@559: 	Mercurial tracked, the disk would seek once per directory.
bos@559: 	Instead, Mercurial reads the entire single dirstate file in
bos@559: 	one step.</para>
bos@559: 
bos@584:       <para id="x_333">Mercurial also uses a <quote>copy on write</quote> scheme
bos@559: 	when cloning a repository on local storage.  Instead of
bos@559: 	copying every revlog file from the old repository into the new
bos@559: 	repository, it makes a <quote>hard link</quote>, which is a
bos@559: 	shorthand way to say <quote>these two names point to the same
bos@559: 	  file</quote>.  When Mercurial is about to write to one of a
bos@559: 	revlog's files, it checks to see if the number of names
bos@559: 	pointing at the file is greater than one.  If it is, more than
bos@559: 	one repository is using the file, so Mercurial makes a new
bos@559: 	copy of the file that is private to this repository.</para>
bos@559: 
bos@584:       <para id="x_334">A few revision control developers have pointed out that
bos@559: 	this idea of making a complete private copy of a file is not
bos@559: 	very efficient in its use of storage.  While this is true,
bos@559: 	storage is cheap, and this method gives the highest
bos@559: 	performance while deferring most book-keeping to the operating
bos@559: 	system.  An alternative scheme would most likely reduce
bos@701: 	performance and increase the complexity of the software, but
bos@701: 	speed and simplicity are key to the <quote>feel</quote> of
bos@559: 	day-to-day use.</para>
bos@559: 
bos@559:     </sect2>
bos@559:     <sect2>
bos@559:       <title>Other contents of the dirstate</title>
bos@559: 
bos@584:       <para id="x_335">Because Mercurial doesn't force you to tell it when you're
bos@559: 	modifying a file, it uses the dirstate to store some extra
bos@559: 	information so it can determine efficiently whether you have
bos@559: 	modified a file.  For each file in the working directory, it
bos@559: 	stores the time that it last modified the file itself, and the
bos@559: 	size of the file at that time.</para>
bos@559: 
bos@584:       <para id="x_336">When you explicitly <command role="hg-cmd">hg
bos@559: 	  add</command>, <command role="hg-cmd">hg remove</command>,
bos@559: 	<command role="hg-cmd">hg rename</command> or <command
bos@559: 	  role="hg-cmd">hg copy</command> files, Mercurial updates the
bos@559: 	dirstate so that it knows what to do with those files when you
bos@559: 	commit.</para>
bos@559: 
bos@701:       <para id="x_337">The dirstate helps Mercurial to efficiently
bos@701: 	  check the status of files in a repository.</para>
bos@701: 
bos@701:       <itemizedlist>
bos@701: 	<listitem>
bos@702: 	  <para id="x_726">When Mercurial checks the state of a file in the
bos@701: 	    working directory, it first checks a file's modification
bos@701: 	    time against the time in the dirstate that records when
bos@701: 	    Mercurial last wrote the file. If the last modified time
bos@701: 	    is the same as the time when Mercurial wrote the file, the
bos@701: 	    file must not have been modified, so Mercurial does not
bos@701: 	    need to check any further.</para>
bos@701: 	</listitem>
bos@701: 	<listitem>
bos@702: 	  <para id="x_727">If the file's size has changed, the file must have
bos@701: 	    been modified.  If the modification time has changed, but
bos@701: 	    the size has not, only then does Mercurial need to
bos@701: 	    actually read the contents of the file to see if it has
bos@701: 	    changed.</para>
bos@701: 	</listitem>
bos@701:       </itemizedlist>
bos@701: 
bos@702:       <para id="x_728">Storing the modification time and size dramatically
bos@701: 	reduces the number of read operations that Mercurial needs to
bos@701: 	perform when we run commands like <command>hg status</command>.
bos@701: 	This results in large performance improvements.</para>
bos@559:     </sect2>
bos@559:   </sect1>
bos@559: </chapter>
bos@559: 
bos@559: <!--
bos@559: local variables: 
bos@559: sgml-parent-document: ("00book.xml" "book" "chapter")
bos@559: end:
bos@559: -->