hgbook

diff en/ch03-concepts.xml @ 701:477d6a3e5023

Many final changes.
author Bryan O'Sullivan <bos@serpentine.com>
date Mon May 04 23:52:38 2009 -0700 (2009-05-04)
parents 29f0f79cf614
children 18131160f7ee
line diff
     1.1 --- a/en/ch03-concepts.xml	Thu Apr 16 23:46:45 2009 -0700
     1.2 +++ b/en/ch03-concepts.xml	Mon May 04 23:52:38 2009 -0700
     1.3 @@ -112,12 +112,15 @@
     1.4        <para id="x_2f3">As the illustration shows, there is
     1.5  	<emphasis>not</emphasis> a <quote>one to one</quote>
     1.6  	relationship between revisions in the changelog, manifest, or
     1.7 -	filelog. If the manifest hasn't changed between two
     1.8 -	changesets, the changelog entries for those changesets will
     1.9 -	point to the same revision of the manifest.  If a file that
    1.10 +	filelog. If a file that
    1.11  	Mercurial tracks hasn't changed between two changesets, the
    1.12  	entry for that file in the two revisions of the manifest will
    1.13 -	point to the same revision of its filelog.</para>
    1.14 +	point to the same revision of its filelog<footnote>
    1.15 +	  <para>It is possible (though unusual) for the manifest to
    1.16 +	    remain the same between two changesets, in which case the
    1.17 +	    changelog entries for those changesets will point to the
    1.18 +	    same revision of the manifest.</para>
    1.19 +	</footnote>.</para>
    1.20  
    1.21      </sect2>
    1.22    </sect1>
    1.23 @@ -175,16 +178,18 @@
    1.24      <sect2>
    1.25        <title>Fast retrieval</title>
    1.26  
    1.27 -      <para id="x_2fa">Mercurial cleverly avoids a pitfall common to all earlier
    1.28 -	revision control systems: the problem of <emphasis>inefficient
    1.29 -	  retrieval</emphasis>. Most revision control systems store
    1.30 -	the contents of a revision as an incremental series of
    1.31 -	modifications against a <quote>snapshot</quote>.  To
    1.32 -	reconstruct a specific revision, you must first read the
    1.33 -	snapshot, and then every one of the revisions between the
    1.34 -	snapshot and your target revision.  The more history that a
    1.35 -	file accumulates, the more revisions you must read, hence the
    1.36 -	longer it takes to reconstruct a particular revision.</para>
    1.37 +      <para id="x_2fa">Mercurial cleverly avoids a pitfall common to
    1.38 +	all earlier revision control systems: the problem of
    1.39 +	<emphasis>inefficient retrieval</emphasis>. Most revision
    1.40 +	control systems store the contents of a revision as an
    1.41 +	incremental series of modifications against a
    1.42 +	<quote>snapshot</quote>.  (Some base the snapshot on the
    1.43 +	oldest revision, others on the newest.)  To reconstruct a
    1.44 +	specific revision, you must first read the snapshot, and then
    1.45 +	every one of the revisions between the snapshot and your
    1.46 +	target revision.  The more history that a file accumulates,
    1.47 +	the more revisions you must read, hence the longer it takes to
    1.48 +	reconstruct a particular revision.</para>
    1.49  
    1.50        <figure id="fig:concepts:snapshot">
    1.51  	<title>Snapshot of a revlog, with incremental deltas</title>
    1.52 @@ -211,25 +216,15 @@
    1.53        <sect3>
    1.54  	<title>Aside: the influence of video compression</title>
    1.55  
    1.56 -	<para id="x_2fe">If you're familiar with video compression or have ever
    1.57 -	  watched a TV feed through a digital cable or satellite
    1.58 -	  service, you may know that most video compression schemes
    1.59 -	  store each frame of video as a delta against its predecessor
    1.60 -	  frame.  In addition, these schemes use <quote>lossy</quote>
    1.61 -	  compression techniques to increase the compression ratio, so
    1.62 -	  visual errors accumulate over the course of a number of
    1.63 -	  inter-frame deltas.</para>
    1.64 -
    1.65 -	<para id="x_2ff">Because it's possible for a video stream to <quote>drop
    1.66 -	    out</quote> occasionally due to signal glitches, and to
    1.67 -	  limit the accumulation of artefacts introduced by the lossy
    1.68 -	  compression process, video encoders periodically insert a
    1.69 -	  complete frame (called a <quote>key frame</quote>) into the
    1.70 -	  video stream; the next delta is generated against that
    1.71 -	  frame.  This means that if the video signal gets
    1.72 -	  interrupted, it will resume once the next key frame is
    1.73 -	  received.  Also, the accumulation of encoding errors
    1.74 -	  restarts anew with each key frame.</para>
    1.75 +	<para id="x_2fe">If you're familiar with video compression or
    1.76 +	  have ever watched a TV feed through a digital cable or
    1.77 +	  satellite service, you may know that most video compression
    1.78 +	  schemes store each frame of video as a delta against its
    1.79 +	  predecessor frame.</para>
    1.80 +
    1.81 +	<para id="x_2ff">Mercurial borrows this idea to make it
    1.82 +	  possible to reconstruct a revision from a snapshot and a
    1.83 +	  small number of deltas.</para>
    1.84  
    1.85        </sect3>
    1.86      </sect2>
    1.87 @@ -261,9 +256,9 @@
    1.88  	uncorrupted sections of the revlog, both before and after the
    1.89  	corrupted section.  This would not be possible with a
    1.90  	delta-only storage model.</para>
    1.91 -
    1.92      </sect2>
    1.93    </sect1>
    1.94 +
    1.95    <sect1>
    1.96      <title>Revision history, branching, and merging</title>
    1.97  
    1.98 @@ -314,11 +309,15 @@
    1.99        those files, with the same contents it had when the changeset
   1.100        was committed.</para>
   1.101  
   1.102 -    <para id="x_309">The <emphasis>dirstate</emphasis> contains Mercurial's
   1.103 -      knowledge of the working directory.  This details which
   1.104 -      changeset the working directory is updated to, and all of the
   1.105 -      files that Mercurial is tracking in the working
   1.106 -      directory.</para>
   1.107 +    <para id="x_309">The <emphasis>dirstate</emphasis> is a special
   1.108 +      structure that contains Mercurial's knowledge of the working
   1.109 +      directory.  It is maintained as a file named
   1.110 +      <filename>.hg/dirstate</filename> inside a repository.  The
   1.111 +      dirstate details which changeset the working directory is
   1.112 +      updated to, and all of the files that Mercurial is tracking in
   1.113 +      the working directory. It also lets Mercurial quickly notice
   1.114 +      changed files, by recording their checkout times and
   1.115 +      sizes.</para>
   1.116  
   1.117      <para id="x_30a">Just as a revision of a revlog has room for two parents, so
   1.118        that it can represent either a normal revision (with one parent)
   1.119 @@ -426,9 +425,9 @@
   1.120        </figure>
   1.121  
   1.122        <note>
   1.123 -	<para id="x_315">  If you're new to Mercurial, you should keep in mind a
   1.124 -	  common <quote>error</quote>, which is to use the <command
   1.125 -	    role="hg-cmd">hg pull</command> command without any
   1.126 +	<para id="x_315">If you're new to Mercurial, you should keep
   1.127 +	  in mind a common <quote>error</quote>, which is to use the
   1.128 +	  <command role="hg-cmd">hg pull</command> command without any
   1.129  	  options.  By default, the <command role="hg-cmd">hg
   1.130  	    pull</command> command <emphasis>does not</emphasis>
   1.131  	  update the working directory, so you'll bring new changesets
   1.132 @@ -436,16 +435,19 @@
   1.133  	  synced at the same changeset as before the pull.  If you
   1.134  	  make some changes and commit afterwards, you'll thus create
   1.135  	  a new head, because your working directory isn't synced to
   1.136 -	  whatever the current tip is.</para>
   1.137 -
   1.138 -	<para id="x_316">  I put the word <quote>error</quote> in
   1.139 -	  quotes because all that you need to do to rectify this
   1.140 -	  situation is <command role="hg-cmd">hg merge</command>, then
   1.141 -	  <command role="hg-cmd">hg commit</command>.  In other words,
   1.142 -	  this almost never has negative consequences; it's just
   1.143 -	  something of a surprise for newcomers.  I'll discuss other
   1.144 -	  ways to avoid this behavior, and why Mercurial behaves in
   1.145 -	  this initially surprising way, later on.</para>
   1.146 +	  whatever the current tip is.  To combine the operation of a
   1.147 +	  pull, followed by an update, run <command>hg pull
   1.148 +	    -u</command>.</para>
   1.149 +
   1.150 +	<para id="x_316">I put the word <quote>error</quote> in quotes
   1.151 +	  because all that you need to do to rectify the situation
   1.152 +	  where you created a new head by accident is
   1.153 +	  <command role="hg-cmd">hg merge</command>, then <command
   1.154 +	    role="hg-cmd">hg commit</command>.  In other words, this
   1.155 +	  almost never has negative consequences; it's just something
   1.156 +	  of a surprise for newcomers.  I'll discuss other ways to
   1.157 +	  avoid this behavior, and why Mercurial behaves in this
   1.158 +	  initially surprising way, later on.</para>
   1.159        </note>
   1.160  
   1.161      </sect2>
   1.162 @@ -511,13 +513,15 @@
   1.163  	working directory has two parents; these will become the
   1.164  	parents of the new changeset.</para>
   1.165  
   1.166 -      <para id="x_322">Mercurial lets you perform multiple merges, but you must
   1.167 -	commit the results of each individual merge as you go.  This
   1.168 -	is necessary because Mercurial only tracks two parents for
   1.169 -	both revisions and the working directory.  While it would be
   1.170 -	technically possible to merge multiple changesets at once, the
   1.171 -	prospect of user confusion and making a terrible mess of a
   1.172 -	merge immediately becomes overwhelming.</para>
   1.173 +      <para id="x_322">Mercurial lets you perform multiple merges, but
   1.174 +	you must commit the results of each individual merge as you
   1.175 +	go.  This is necessary because Mercurial only tracks two
   1.176 +	parents for both revisions and the working directory.  While
   1.177 +	it would be technically feasible to merge multiple changesets
   1.178 +	at once, Mercurial avoids this for simplicity.  With multi-way
   1.179 +	merges, the risks of user confusion, nasty conflict
   1.180 +	resolution, and making a terrible mess of a merge would grow
   1.181 +	intolerable.</para>
   1.182  
   1.183      </sect2>
   1.184  
   1.185 @@ -598,10 +602,17 @@
   1.186  	  transferred, yielding better network performance over most
   1.187  	  kinds of network.</para>
   1.188  
   1.189 -	<para id="x_329">(If the connection is over <command>ssh</command>,
   1.190 -	  Mercurial <emphasis>doesn't</emphasis> recompress the
   1.191 -	  stream, because <command>ssh</command> can already do this
   1.192 -	  itself.)</para>
   1.193 +	<para id="x_329">If the connection is over
   1.194 +	  <command>ssh</command>, Mercurial
   1.195 +	  <emphasis>doesn't</emphasis> recompress the stream, because
   1.196 +	  <command>ssh</command> can already do this itself.  You can
   1.197 +	  tell Mercurial to always use <command>ssh</command>'s
   1.198 +	  compression feature by editing the
   1.199 +	  <filename>.hgrc</filename> file in your home directory as
   1.200 +	  follows.</para>
   1.201 +
   1.202 +	<programlisting>[ui]
   1.203 +ssh = ssh -C</programlisting>
   1.204  
   1.205        </sect3>
   1.206      </sect2>
   1.207 @@ -611,9 +622,8 @@
   1.208        <para id="x_32a">Appending to files isn't the whole story when
   1.209  	it comes to guaranteeing that a reader won't see a partial
   1.210  	write.  If you recall <xref linkend="fig:concepts:metadata"/>,
   1.211 -	revisions in
   1.212 -	the changelog point to revisions in the manifest, and
   1.213 -	revisions in the manifest point to revisions in filelogs.
   1.214 +	revisions in the changelog point to revisions in the manifest,
   1.215 +	and revisions in the manifest point to revisions in filelogs.
   1.216  	This hierarchy is deliberate.</para>
   1.217  
   1.218        <para id="x_32b">A writer starts a transaction by writing filelog and
   1.219 @@ -637,7 +647,7 @@
   1.220  	being written to while the read is occurring. This has a big
   1.221  	effect on scalability; you can have an arbitrary number of
   1.222  	Mercurial processes safely reading data from a repository
   1.223 -	safely all at once, no matter whether it's being written to or
   1.224 +	all at once, no matter whether it's being written to or
   1.225  	not.</para>
   1.226  
   1.227        <para id="x_32e">The lockless nature of reading means that if you're
   1.228 @@ -709,8 +719,8 @@
   1.229  	storage is cheap, and this method gives the highest
   1.230  	performance while deferring most book-keeping to the operating
   1.231  	system.  An alternative scheme would most likely reduce
   1.232 -	performance and increase the complexity of the software, each
   1.233 -	of which is much more important to the <quote>feel</quote> of
   1.234 +	performance and increase the complexity of the software, but
   1.235 +	speed and simplicity are key to the <quote>feel</quote> of
   1.236  	day-to-day use.</para>
   1.237  
   1.238      </sect2>
   1.239 @@ -731,18 +741,32 @@
   1.240  	dirstate so that it knows what to do with those files when you
   1.241  	commit.</para>
   1.242  
   1.243 -      <para id="x_337">When Mercurial is checking the states of files in the
   1.244 -	working directory, it first checks a file's modification time.
   1.245 -	If that has not changed, the file must not have been modified.
   1.246 -	If the file's size has changed, the file must have been
   1.247 -	modified.  If the modification time has changed, but the size
   1.248 -	has not, only then does Mercurial need to read the actual
   1.249 -	contents of the file to see if they've changed. Storing these
   1.250 -	few extra pieces of information dramatically reduces the
   1.251 -	amount of data that Mercurial needs to read, which yields
   1.252 -	large performance improvements compared to other revision
   1.253 -	control systems.</para>
   1.254 -
   1.255 +      <para id="x_337">The dirstate helps Mercurial to efficiently
   1.256 +	  check the status of files in a repository.</para>
   1.257 +
   1.258 +      <itemizedlist>
   1.259 +	<listitem>
   1.260 +	  <para>When Mercurial checks the state of a file in the
   1.261 +	    working directory, it first checks a file's modification
   1.262 +	    time against the time in the dirstate that records when
   1.263 +	    Mercurial last wrote the file. If the last modified time
   1.264 +	    is the same as the time when Mercurial wrote the file, the
   1.265 +	    file must not have been modified, so Mercurial does not
   1.266 +	    need to check any further.</para>
   1.267 +	</listitem>
   1.268 +	<listitem>
   1.269 +	  <para>If the file's size has changed, the file must have
   1.270 +	    been modified.  If the modification time has changed, but
   1.271 +	    the size has not, only then does Mercurial need to
   1.272 +	    actually read the contents of the file to see if it has
   1.273 +	    changed.</para>
   1.274 +	</listitem>
   1.275 +      </itemizedlist>
   1.276 +
   1.277 +      <para>Storing the modification time and size dramatically
   1.278 +	reduces the number of read operations that Mercurial needs to
   1.279 +	perform when we run commands like <command>hg status</command>.
   1.280 +	This results in large performance improvements.</para>
   1.281      </sect2>
   1.282    </sect1>
   1.283  </chapter>