hgbook
diff en/ch03-concepts.xml @ 701:477d6a3e5023
Many final changes.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Mon May 04 23:52:38 2009 -0700 (2009-05-04) |
parents | 29f0f79cf614 |
children | 18131160f7ee |
line diff
1.1 --- a/en/ch03-concepts.xml Thu Apr 16 23:46:45 2009 -0700 1.2 +++ b/en/ch03-concepts.xml Mon May 04 23:52:38 2009 -0700 1.3 @@ -112,12 +112,15 @@ 1.4 <para id="x_2f3">As the illustration shows, there is 1.5 <emphasis>not</emphasis> a <quote>one to one</quote> 1.6 relationship between revisions in the changelog, manifest, or 1.7 - filelog. If the manifest hasn't changed between two 1.8 - changesets, the changelog entries for those changesets will 1.9 - point to the same revision of the manifest. If a file that 1.10 + filelog. If a file that 1.11 Mercurial tracks hasn't changed between two changesets, the 1.12 entry for that file in the two revisions of the manifest will 1.13 - point to the same revision of its filelog.</para> 1.14 + point to the same revision of its filelog<footnote> 1.15 + <para>It is possible (though unusual) for the manifest to 1.16 + remain the same between two changesets, in which case the 1.17 + changelog entries for those changesets will point to the 1.18 + same revision of the manifest.</para> 1.19 + </footnote>.</para> 1.20 1.21 </sect2> 1.22 </sect1> 1.23 @@ -175,16 +178,18 @@ 1.24 <sect2> 1.25 <title>Fast retrieval</title> 1.26 1.27 - <para id="x_2fa">Mercurial cleverly avoids a pitfall common to all earlier 1.28 - revision control systems: the problem of <emphasis>inefficient 1.29 - retrieval</emphasis>. Most revision control systems store 1.30 - the contents of a revision as an incremental series of 1.31 - modifications against a <quote>snapshot</quote>. To 1.32 - reconstruct a specific revision, you must first read the 1.33 - snapshot, and then every one of the revisions between the 1.34 - snapshot and your target revision. The more history that a 1.35 - file accumulates, the more revisions you must read, hence the 1.36 - longer it takes to reconstruct a particular revision.</para> 1.37 + <para id="x_2fa">Mercurial cleverly avoids a pitfall common to 1.38 + all earlier revision control systems: the problem of 1.39 + <emphasis>inefficient retrieval</emphasis>. Most revision 1.40 + control systems store the contents of a revision as an 1.41 + incremental series of modifications against a 1.42 + <quote>snapshot</quote>. (Some base the snapshot on the 1.43 + oldest revision, others on the newest.) To reconstruct a 1.44 + specific revision, you must first read the snapshot, and then 1.45 + every one of the revisions between the snapshot and your 1.46 + target revision. The more history that a file accumulates, 1.47 + the more revisions you must read, hence the longer it takes to 1.48 + reconstruct a particular revision.</para> 1.49 1.50 <figure id="fig:concepts:snapshot"> 1.51 <title>Snapshot of a revlog, with incremental deltas</title> 1.52 @@ -211,25 +216,15 @@ 1.53 <sect3> 1.54 <title>Aside: the influence of video compression</title> 1.55 1.56 - <para id="x_2fe">If you're familiar with video compression or have ever 1.57 - watched a TV feed through a digital cable or satellite 1.58 - service, you may know that most video compression schemes 1.59 - store each frame of video as a delta against its predecessor 1.60 - frame. In addition, these schemes use <quote>lossy</quote> 1.61 - compression techniques to increase the compression ratio, so 1.62 - visual errors accumulate over the course of a number of 1.63 - inter-frame deltas.</para> 1.64 - 1.65 - <para id="x_2ff">Because it's possible for a video stream to <quote>drop 1.66 - out</quote> occasionally due to signal glitches, and to 1.67 - limit the accumulation of artefacts introduced by the lossy 1.68 - compression process, video encoders periodically insert a 1.69 - complete frame (called a <quote>key frame</quote>) into the 1.70 - video stream; the next delta is generated against that 1.71 - frame. This means that if the video signal gets 1.72 - interrupted, it will resume once the next key frame is 1.73 - received. Also, the accumulation of encoding errors 1.74 - restarts anew with each key frame.</para> 1.75 + <para id="x_2fe">If you're familiar with video compression or 1.76 + have ever watched a TV feed through a digital cable or 1.77 + satellite service, you may know that most video compression 1.78 + schemes store each frame of video as a delta against its 1.79 + predecessor frame.</para> 1.80 + 1.81 + <para id="x_2ff">Mercurial borrows this idea to make it 1.82 + possible to reconstruct a revision from a snapshot and a 1.83 + small number of deltas.</para> 1.84 1.85 </sect3> 1.86 </sect2> 1.87 @@ -261,9 +256,9 @@ 1.88 uncorrupted sections of the revlog, both before and after the 1.89 corrupted section. This would not be possible with a 1.90 delta-only storage model.</para> 1.91 - 1.92 </sect2> 1.93 </sect1> 1.94 + 1.95 <sect1> 1.96 <title>Revision history, branching, and merging</title> 1.97 1.98 @@ -314,11 +309,15 @@ 1.99 those files, with the same contents it had when the changeset 1.100 was committed.</para> 1.101 1.102 - <para id="x_309">The <emphasis>dirstate</emphasis> contains Mercurial's 1.103 - knowledge of the working directory. This details which 1.104 - changeset the working directory is updated to, and all of the 1.105 - files that Mercurial is tracking in the working 1.106 - directory.</para> 1.107 + <para id="x_309">The <emphasis>dirstate</emphasis> is a special 1.108 + structure that contains Mercurial's knowledge of the working 1.109 + directory. It is maintained as a file named 1.110 + <filename>.hg/dirstate</filename> inside a repository. The 1.111 + dirstate details which changeset the working directory is 1.112 + updated to, and all of the files that Mercurial is tracking in 1.113 + the working directory. It also lets Mercurial quickly notice 1.114 + changed files, by recording their checkout times and 1.115 + sizes.</para> 1.116 1.117 <para id="x_30a">Just as a revision of a revlog has room for two parents, so 1.118 that it can represent either a normal revision (with one parent) 1.119 @@ -426,9 +425,9 @@ 1.120 </figure> 1.121 1.122 <note> 1.123 - <para id="x_315"> If you're new to Mercurial, you should keep in mind a 1.124 - common <quote>error</quote>, which is to use the <command 1.125 - role="hg-cmd">hg pull</command> command without any 1.126 + <para id="x_315">If you're new to Mercurial, you should keep 1.127 + in mind a common <quote>error</quote>, which is to use the 1.128 + <command role="hg-cmd">hg pull</command> command without any 1.129 options. By default, the <command role="hg-cmd">hg 1.130 pull</command> command <emphasis>does not</emphasis> 1.131 update the working directory, so you'll bring new changesets 1.132 @@ -436,16 +435,19 @@ 1.133 synced at the same changeset as before the pull. If you 1.134 make some changes and commit afterwards, you'll thus create 1.135 a new head, because your working directory isn't synced to 1.136 - whatever the current tip is.</para> 1.137 - 1.138 - <para id="x_316"> I put the word <quote>error</quote> in 1.139 - quotes because all that you need to do to rectify this 1.140 - situation is <command role="hg-cmd">hg merge</command>, then 1.141 - <command role="hg-cmd">hg commit</command>. In other words, 1.142 - this almost never has negative consequences; it's just 1.143 - something of a surprise for newcomers. I'll discuss other 1.144 - ways to avoid this behavior, and why Mercurial behaves in 1.145 - this initially surprising way, later on.</para> 1.146 + whatever the current tip is. To combine the operation of a 1.147 + pull, followed by an update, run <command>hg pull 1.148 + -u</command>.</para> 1.149 + 1.150 + <para id="x_316">I put the word <quote>error</quote> in quotes 1.151 + because all that you need to do to rectify the situation 1.152 + where you created a new head by accident is 1.153 + <command role="hg-cmd">hg merge</command>, then <command 1.154 + role="hg-cmd">hg commit</command>. In other words, this 1.155 + almost never has negative consequences; it's just something 1.156 + of a surprise for newcomers. I'll discuss other ways to 1.157 + avoid this behavior, and why Mercurial behaves in this 1.158 + initially surprising way, later on.</para> 1.159 </note> 1.160 1.161 </sect2> 1.162 @@ -511,13 +513,15 @@ 1.163 working directory has two parents; these will become the 1.164 parents of the new changeset.</para> 1.165 1.166 - <para id="x_322">Mercurial lets you perform multiple merges, but you must 1.167 - commit the results of each individual merge as you go. This 1.168 - is necessary because Mercurial only tracks two parents for 1.169 - both revisions and the working directory. While it would be 1.170 - technically possible to merge multiple changesets at once, the 1.171 - prospect of user confusion and making a terrible mess of a 1.172 - merge immediately becomes overwhelming.</para> 1.173 + <para id="x_322">Mercurial lets you perform multiple merges, but 1.174 + you must commit the results of each individual merge as you 1.175 + go. This is necessary because Mercurial only tracks two 1.176 + parents for both revisions and the working directory. While 1.177 + it would be technically feasible to merge multiple changesets 1.178 + at once, Mercurial avoids this for simplicity. With multi-way 1.179 + merges, the risks of user confusion, nasty conflict 1.180 + resolution, and making a terrible mess of a merge would grow 1.181 + intolerable.</para> 1.182 1.183 </sect2> 1.184 1.185 @@ -598,10 +602,17 @@ 1.186 transferred, yielding better network performance over most 1.187 kinds of network.</para> 1.188 1.189 - <para id="x_329">(If the connection is over <command>ssh</command>, 1.190 - Mercurial <emphasis>doesn't</emphasis> recompress the 1.191 - stream, because <command>ssh</command> can already do this 1.192 - itself.)</para> 1.193 + <para id="x_329">If the connection is over 1.194 + <command>ssh</command>, Mercurial 1.195 + <emphasis>doesn't</emphasis> recompress the stream, because 1.196 + <command>ssh</command> can already do this itself. You can 1.197 + tell Mercurial to always use <command>ssh</command>'s 1.198 + compression feature by editing the 1.199 + <filename>.hgrc</filename> file in your home directory as 1.200 + follows.</para> 1.201 + 1.202 + <programlisting>[ui] 1.203 +ssh = ssh -C</programlisting> 1.204 1.205 </sect3> 1.206 </sect2> 1.207 @@ -611,9 +622,8 @@ 1.208 <para id="x_32a">Appending to files isn't the whole story when 1.209 it comes to guaranteeing that a reader won't see a partial 1.210 write. If you recall <xref linkend="fig:concepts:metadata"/>, 1.211 - revisions in 1.212 - the changelog point to revisions in the manifest, and 1.213 - revisions in the manifest point to revisions in filelogs. 1.214 + revisions in the changelog point to revisions in the manifest, 1.215 + and revisions in the manifest point to revisions in filelogs. 1.216 This hierarchy is deliberate.</para> 1.217 1.218 <para id="x_32b">A writer starts a transaction by writing filelog and 1.219 @@ -637,7 +647,7 @@ 1.220 being written to while the read is occurring. This has a big 1.221 effect on scalability; you can have an arbitrary number of 1.222 Mercurial processes safely reading data from a repository 1.223 - safely all at once, no matter whether it's being written to or 1.224 + all at once, no matter whether it's being written to or 1.225 not.</para> 1.226 1.227 <para id="x_32e">The lockless nature of reading means that if you're 1.228 @@ -709,8 +719,8 @@ 1.229 storage is cheap, and this method gives the highest 1.230 performance while deferring most book-keeping to the operating 1.231 system. An alternative scheme would most likely reduce 1.232 - performance and increase the complexity of the software, each 1.233 - of which is much more important to the <quote>feel</quote> of 1.234 + performance and increase the complexity of the software, but 1.235 + speed and simplicity are key to the <quote>feel</quote> of 1.236 day-to-day use.</para> 1.237 1.238 </sect2> 1.239 @@ -731,18 +741,32 @@ 1.240 dirstate so that it knows what to do with those files when you 1.241 commit.</para> 1.242 1.243 - <para id="x_337">When Mercurial is checking the states of files in the 1.244 - working directory, it first checks a file's modification time. 1.245 - If that has not changed, the file must not have been modified. 1.246 - If the file's size has changed, the file must have been 1.247 - modified. If the modification time has changed, but the size 1.248 - has not, only then does Mercurial need to read the actual 1.249 - contents of the file to see if they've changed. Storing these 1.250 - few extra pieces of information dramatically reduces the 1.251 - amount of data that Mercurial needs to read, which yields 1.252 - large performance improvements compared to other revision 1.253 - control systems.</para> 1.254 - 1.255 + <para id="x_337">The dirstate helps Mercurial to efficiently 1.256 + check the status of files in a repository.</para> 1.257 + 1.258 + <itemizedlist> 1.259 + <listitem> 1.260 + <para>When Mercurial checks the state of a file in the 1.261 + working directory, it first checks a file's modification 1.262 + time against the time in the dirstate that records when 1.263 + Mercurial last wrote the file. If the last modified time 1.264 + is the same as the time when Mercurial wrote the file, the 1.265 + file must not have been modified, so Mercurial does not 1.266 + need to check any further.</para> 1.267 + </listitem> 1.268 + <listitem> 1.269 + <para>If the file's size has changed, the file must have 1.270 + been modified. If the modification time has changed, but 1.271 + the size has not, only then does Mercurial need to 1.272 + actually read the contents of the file to see if it has 1.273 + changed.</para> 1.274 + </listitem> 1.275 + </itemizedlist> 1.276 + 1.277 + <para>Storing the modification time and size dramatically 1.278 + reduces the number of read operations that Mercurial needs to 1.279 + perform when we run commands like <command>hg status</command>. 1.280 + This results in large performance improvements.</para> 1.281 </sect2> 1.282 </sect1> 1.283 </chapter>