hgbook
diff en/ch04-concepts.xml @ 1058:e8c480caa169
1.7 first para translated
author | zhaopingsun |
---|---|
date | Tue Nov 10 20:29:31 2009 -0500 (2009-11-10) |
parents | 18131160f7ee |
children |
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/en/ch04-concepts.xml Tue Nov 10 20:29:31 2009 -0500 1.3 @@ -0,0 +1,778 @@ 1.4 +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> 1.5 + 1.6 +<chapter id="chap:concepts"> 1.7 + <?dbhtml filename="behind-the-scenes.html"?> 1.8 + <title>Behind the scenes</title> 1.9 + 1.10 + <para id="x_2e8">Unlike many revision control systems, the concepts 1.11 + upon which Mercurial is built are simple enough that it's easy to 1.12 + understand how the software really works. Knowing these details 1.13 + certainly isn't necessary, so it is certainly safe to skip this 1.14 + chapter. However, I think you will get more out of the software 1.15 + with a <quote>mental model</quote> of what's going on.</para> 1.16 + 1.17 + <para id="x_2e9">Being able to understand what's going on behind the 1.18 + scenes gives me confidence that Mercurial has been carefully 1.19 + designed to be both <emphasis>safe</emphasis> and 1.20 + <emphasis>efficient</emphasis>. And just as importantly, if it's 1.21 + easy for me to retain a good idea of what the software is doing 1.22 + when I perform a revision control task, I'm less likely to be 1.23 + surprised by its behavior.</para> 1.24 + 1.25 + <para id="x_2ea">In this chapter, we'll initially cover the core concepts 1.26 + behind Mercurial's design, then continue to discuss some of the 1.27 + interesting details of its implementation.</para> 1.28 + 1.29 + <sect1> 1.30 + <title>Mercurial's historical record</title> 1.31 + 1.32 + <sect2> 1.33 + <title>Tracking the history of a single file</title> 1.34 + 1.35 + <para id="x_2eb">When Mercurial tracks modifications to a file, it stores 1.36 + the history of that file in a metadata object called a 1.37 + <emphasis>filelog</emphasis>. Each entry in the filelog 1.38 + contains enough information to reconstruct one revision of the 1.39 + file that is being tracked. Filelogs are stored as files in 1.40 + the <filename role="special" 1.41 + class="directory">.hg/store/data</filename> directory. A 1.42 + filelog contains two kinds of information: revision data, and 1.43 + an index to help Mercurial to find a revision 1.44 + efficiently.</para> 1.45 + 1.46 + <para id="x_2ec">A file that is large, or has a lot of history, has its 1.47 + filelog stored in separate data 1.48 + (<quote><literal>.d</literal></quote> suffix) and index 1.49 + (<quote><literal>.i</literal></quote> suffix) files. For 1.50 + small files without much history, the revision data and index 1.51 + are combined in a single <quote><literal>.i</literal></quote> 1.52 + file. The correspondence between a file in the working 1.53 + directory and the filelog that tracks its history in the 1.54 + repository is illustrated in <xref 1.55 + linkend="fig:concepts:filelog"/>.</para> 1.56 + 1.57 + <figure id="fig:concepts:filelog"> 1.58 + <title>Relationships between files in working directory and 1.59 + filelogs in repository</title> 1.60 + <mediaobject> 1.61 + <imageobject><imagedata fileref="figs/filelog.png"/></imageobject> 1.62 + <textobject><phrase>XXX add text</phrase></textobject> 1.63 + </mediaobject> 1.64 + </figure> 1.65 + 1.66 + </sect2> 1.67 + <sect2> 1.68 + <title>Managing tracked files</title> 1.69 + 1.70 + <para id="x_2ee">Mercurial uses a structure called a 1.71 + <emphasis>manifest</emphasis> to collect together information 1.72 + about the files that it tracks. Each entry in the manifest 1.73 + contains information about the files present in a single 1.74 + changeset. An entry records which files are present in the 1.75 + changeset, the revision of each file, and a few other pieces 1.76 + of file metadata.</para> 1.77 + 1.78 + </sect2> 1.79 + <sect2> 1.80 + <title>Recording changeset information</title> 1.81 + 1.82 + <para id="x_2ef">The <emphasis>changelog</emphasis> contains information 1.83 + about each changeset. Each revision records who committed a 1.84 + change, the changeset comment, other pieces of 1.85 + changeset-related information, and the revision of the 1.86 + manifest to use.</para> 1.87 + 1.88 + </sect2> 1.89 + <sect2> 1.90 + <title>Relationships between revisions</title> 1.91 + 1.92 + <para id="x_2f0">Within a changelog, a manifest, or a filelog, each 1.93 + revision stores a pointer to its immediate parent (or to its 1.94 + two parents, if it's a merge revision). As I mentioned above, 1.95 + there are also relationships between revisions 1.96 + <emphasis>across</emphasis> these structures, and they are 1.97 + hierarchical in nature.</para> 1.98 + 1.99 + <para id="x_2f1">For every changeset in a repository, there is exactly one 1.100 + revision stored in the changelog. Each revision of the 1.101 + changelog contains a pointer to a single revision of the 1.102 + manifest. A revision of the manifest stores a pointer to a 1.103 + single revision of each filelog tracked when that changeset 1.104 + was created. These relationships are illustrated in 1.105 + <xref linkend="fig:concepts:metadata"/>.</para> 1.106 + 1.107 + <figure id="fig:concepts:metadata"> 1.108 + <title>Metadata relationships</title> 1.109 + <mediaobject> 1.110 + <imageobject><imagedata fileref="figs/metadata.png"/></imageobject> 1.111 + <textobject><phrase>XXX add text</phrase></textobject> 1.112 + </mediaobject> 1.113 + </figure> 1.114 + 1.115 + <para id="x_2f3">As the illustration shows, there is 1.116 + <emphasis>not</emphasis> a <quote>one to one</quote> 1.117 + relationship between revisions in the changelog, manifest, or 1.118 + filelog. If a file that 1.119 + Mercurial tracks hasn't changed between two changesets, the 1.120 + entry for that file in the two revisions of the manifest will 1.121 + point to the same revision of its filelog<footnote> 1.122 + <para id="x_725">It is possible (though unusual) for the manifest to 1.123 + remain the same between two changesets, in which case the 1.124 + changelog entries for those changesets will point to the 1.125 + same revision of the manifest.</para> 1.126 + </footnote>.</para> 1.127 + 1.128 + </sect2> 1.129 + </sect1> 1.130 + <sect1> 1.131 + <title>Safe, efficient storage</title> 1.132 + 1.133 + <para id="x_2f4">The underpinnings of changelogs, manifests, and filelogs are 1.134 + provided by a single structure called the 1.135 + <emphasis>revlog</emphasis>.</para> 1.136 + 1.137 + <sect2> 1.138 + <title>Efficient storage</title> 1.139 + 1.140 + <para id="x_2f5">The revlog provides efficient storage of revisions using a 1.141 + <emphasis>delta</emphasis> mechanism. Instead of storing a 1.142 + complete copy of a file for each revision, it stores the 1.143 + changes needed to transform an older revision into the new 1.144 + revision. For many kinds of file data, these deltas are 1.145 + typically a fraction of a percent of the size of a full copy 1.146 + of a file.</para> 1.147 + 1.148 + <para id="x_2f6">Some obsolete revision control systems can only work with 1.149 + deltas of text files. They must either store binary files as 1.150 + complete snapshots or encoded into a text representation, both 1.151 + of which are wasteful approaches. Mercurial can efficiently 1.152 + handle deltas of files with arbitrary binary contents; it 1.153 + doesn't need to treat text as special.</para> 1.154 + 1.155 + </sect2> 1.156 + <sect2 id="sec:concepts:txn"> 1.157 + <title>Safe operation</title> 1.158 + 1.159 + <para id="x_2f7">Mercurial only ever <emphasis>appends</emphasis> data to 1.160 + the end of a revlog file. It never modifies a section of a 1.161 + file after it has written it. This is both more robust and 1.162 + efficient than schemes that need to modify or rewrite 1.163 + data.</para> 1.164 + 1.165 + <para id="x_2f8">In addition, Mercurial treats every write as part of a 1.166 + <emphasis>transaction</emphasis> that can span a number of 1.167 + files. A transaction is <emphasis>atomic</emphasis>: either 1.168 + the entire transaction succeeds and its effects are all 1.169 + visible to readers in one go, or the whole thing is undone. 1.170 + This guarantee of atomicity means that if you're running two 1.171 + copies of Mercurial, where one is reading data and one is 1.172 + writing it, the reader will never see a partially written 1.173 + result that might confuse it.</para> 1.174 + 1.175 + <para id="x_2f9">The fact that Mercurial only appends to files makes it 1.176 + easier to provide this transactional guarantee. The easier it 1.177 + is to do stuff like this, the more confident you should be 1.178 + that it's done correctly.</para> 1.179 + 1.180 + </sect2> 1.181 + <sect2> 1.182 + <title>Fast retrieval</title> 1.183 + 1.184 + <para id="x_2fa">Mercurial cleverly avoids a pitfall common to 1.185 + all earlier revision control systems: the problem of 1.186 + <emphasis>inefficient retrieval</emphasis>. Most revision 1.187 + control systems store the contents of a revision as an 1.188 + incremental series of modifications against a 1.189 + <quote>snapshot</quote>. (Some base the snapshot on the 1.190 + oldest revision, others on the newest.) To reconstruct a 1.191 + specific revision, you must first read the snapshot, and then 1.192 + every one of the revisions between the snapshot and your 1.193 + target revision. The more history that a file accumulates, 1.194 + the more revisions you must read, hence the longer it takes to 1.195 + reconstruct a particular revision.</para> 1.196 + 1.197 + <figure id="fig:concepts:snapshot"> 1.198 + <title>Snapshot of a revlog, with incremental deltas</title> 1.199 + <mediaobject> 1.200 + <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject> 1.201 + <textobject><phrase>XXX add text</phrase></textobject> 1.202 + </mediaobject> 1.203 + </figure> 1.204 + 1.205 + <para id="x_2fc">The innovation that Mercurial applies to this problem is 1.206 + simple but effective. Once the cumulative amount of delta 1.207 + information stored since the last snapshot exceeds a fixed 1.208 + threshold, it stores a new snapshot (compressed, of course), 1.209 + instead of another delta. This makes it possible to 1.210 + reconstruct <emphasis>any</emphasis> revision of a file 1.211 + quickly. This approach works so well that it has since been 1.212 + copied by several other revision control systems.</para> 1.213 + 1.214 + <para id="x_2fd"><xref linkend="fig:concepts:snapshot"/> illustrates 1.215 + the idea. In an entry in a revlog's index file, Mercurial 1.216 + stores the range of entries from the data file that it must 1.217 + read to reconstruct a particular revision.</para> 1.218 + 1.219 + <sect3> 1.220 + <title>Aside: the influence of video compression</title> 1.221 + 1.222 + <para id="x_2fe">If you're familiar with video compression or 1.223 + have ever watched a TV feed through a digital cable or 1.224 + satellite service, you may know that most video compression 1.225 + schemes store each frame of video as a delta against its 1.226 + predecessor frame.</para> 1.227 + 1.228 + <para id="x_2ff">Mercurial borrows this idea to make it 1.229 + possible to reconstruct a revision from a snapshot and a 1.230 + small number of deltas.</para> 1.231 + 1.232 + </sect3> 1.233 + </sect2> 1.234 + <sect2> 1.235 + <title>Identification and strong integrity</title> 1.236 + 1.237 + <para id="x_300">Along with delta or snapshot information, a revlog entry 1.238 + contains a cryptographic hash of the data that it represents. 1.239 + This makes it difficult to forge the contents of a revision, 1.240 + and easy to detect accidental corruption.</para> 1.241 + 1.242 + <para id="x_301">Hashes provide more than a mere check against corruption; 1.243 + they are used as the identifiers for revisions. The changeset 1.244 + identification hashes that you see as an end user are from 1.245 + revisions of the changelog. Although filelogs and the 1.246 + manifest also use hashes, Mercurial only uses these behind the 1.247 + scenes.</para> 1.248 + 1.249 + <para id="x_302">Mercurial verifies that hashes are correct when it 1.250 + retrieves file revisions and when it pulls changes from 1.251 + another repository. If it encounters an integrity problem, it 1.252 + will complain and stop whatever it's doing.</para> 1.253 + 1.254 + <para id="x_303">In addition to the effect it has on retrieval efficiency, 1.255 + Mercurial's use of periodic snapshots makes it more robust 1.256 + against partial data corruption. If a revlog becomes partly 1.257 + corrupted due to a hardware error or system bug, it's often 1.258 + possible to reconstruct some or most revisions from the 1.259 + uncorrupted sections of the revlog, both before and after the 1.260 + corrupted section. This would not be possible with a 1.261 + delta-only storage model.</para> 1.262 + </sect2> 1.263 + </sect1> 1.264 + 1.265 + <sect1> 1.266 + <title>Revision history, branching, and merging</title> 1.267 + 1.268 + <para id="x_304">Every entry in a Mercurial revlog knows the identity of its 1.269 + immediate ancestor revision, usually referred to as its 1.270 + <emphasis>parent</emphasis>. In fact, a revision contains room 1.271 + for not one parent, but two. Mercurial uses a special hash, 1.272 + called the <quote>null ID</quote>, to represent the idea 1.273 + <quote>there is no parent here</quote>. This hash is simply a 1.274 + string of zeroes.</para> 1.275 + 1.276 + <para id="x_305">In <xref linkend="fig:concepts:revlog"/>, you can see 1.277 + an example of the conceptual structure of a revlog. Filelogs, 1.278 + manifests, and changelogs all have this same structure; they 1.279 + differ only in the kind of data stored in each delta or 1.280 + snapshot.</para> 1.281 + 1.282 + <para id="x_306">The first revision in a revlog (at the bottom of the image) 1.283 + has the null ID in both of its parent slots. For a 1.284 + <quote>normal</quote> revision, its first parent slot contains 1.285 + the ID of its parent revision, and its second contains the null 1.286 + ID, indicating that the revision has only one real parent. Any 1.287 + two revisions that have the same parent ID are branches. A 1.288 + revision that represents a merge between branches has two normal 1.289 + revision IDs in its parent slots.</para> 1.290 + 1.291 + <figure id="fig:concepts:revlog"> 1.292 + <title>The conceptual structure of a revlog</title> 1.293 + <mediaobject> 1.294 + <imageobject><imagedata fileref="figs/revlog.png"/></imageobject> 1.295 + <textobject><phrase>XXX add text</phrase></textobject> 1.296 + </mediaobject> 1.297 + </figure> 1.298 + 1.299 + </sect1> 1.300 + <sect1> 1.301 + <title>The working directory</title> 1.302 + 1.303 + <para id="x_307">In the working directory, Mercurial stores a snapshot of the 1.304 + files from the repository as of a particular changeset.</para> 1.305 + 1.306 + <para id="x_308">The working directory <quote>knows</quote> which changeset 1.307 + it contains. When you update the working directory to contain a 1.308 + particular changeset, Mercurial looks up the appropriate 1.309 + revision of the manifest to find out which files it was tracking 1.310 + at the time that changeset was committed, and which revision of 1.311 + each file was then current. It then recreates a copy of each of 1.312 + those files, with the same contents it had when the changeset 1.313 + was committed.</para> 1.314 + 1.315 + <para id="x_309">The <emphasis>dirstate</emphasis> is a special 1.316 + structure that contains Mercurial's knowledge of the working 1.317 + directory. It is maintained as a file named 1.318 + <filename>.hg/dirstate</filename> inside a repository. The 1.319 + dirstate details which changeset the working directory is 1.320 + updated to, and all of the files that Mercurial is tracking in 1.321 + the working directory. It also lets Mercurial quickly notice 1.322 + changed files, by recording their checkout times and 1.323 + sizes.</para> 1.324 + 1.325 + <para id="x_30a">Just as a revision of a revlog has room for two parents, so 1.326 + that it can represent either a normal revision (with one parent) 1.327 + or a merge of two earlier revisions, the dirstate has slots for 1.328 + two parents. When you use the <command role="hg-cmd">hg 1.329 + update</command> command, the changeset that you update to is 1.330 + stored in the <quote>first parent</quote> slot, and the null ID 1.331 + in the second. When you <command role="hg-cmd">hg 1.332 + merge</command> with another changeset, the first parent 1.333 + remains unchanged, and the second parent is filled in with the 1.334 + changeset you're merging with. The <command role="hg-cmd">hg 1.335 + parents</command> command tells you what the parents of the 1.336 + dirstate are.</para> 1.337 + 1.338 + <sect2> 1.339 + <title>What happens when you commit</title> 1.340 + 1.341 + <para id="x_30b">The dirstate stores parent information for more than just 1.342 + book-keeping purposes. Mercurial uses the parents of the 1.343 + dirstate as <emphasis>the parents of a new 1.344 + changeset</emphasis> when you perform a commit.</para> 1.345 + 1.346 + <figure id="fig:concepts:wdir"> 1.347 + <title>The working directory can have two parents</title> 1.348 + <mediaobject> 1.349 + <imageobject><imagedata fileref="figs/wdir.png"/></imageobject> 1.350 + <textobject><phrase>XXX add text</phrase></textobject> 1.351 + </mediaobject> 1.352 + </figure> 1.353 + 1.354 + <para id="x_30d"><xref linkend="fig:concepts:wdir"/> shows the 1.355 + normal state of the working directory, where it has a single 1.356 + changeset as parent. That changeset is the 1.357 + <emphasis>tip</emphasis>, the newest changeset in the 1.358 + repository that has no children.</para> 1.359 + 1.360 + <figure id="fig:concepts:wdir-after-commit"> 1.361 + <title>The working directory gains new parents after a 1.362 + commit</title> 1.363 + <mediaobject> 1.364 + <imageobject><imagedata fileref="figs/wdir-after-commit.png"/></imageobject> 1.365 + <textobject><phrase>XXX add text</phrase></textobject> 1.366 + </mediaobject> 1.367 + </figure> 1.368 + 1.369 + <para id="x_30f">It's useful to think of the working directory as 1.370 + <quote>the changeset I'm about to commit</quote>. Any files 1.371 + that you tell Mercurial that you've added, removed, renamed, 1.372 + or copied will be reflected in that changeset, as will 1.373 + modifications to any files that Mercurial is already tracking; 1.374 + the new changeset will have the parents of the working 1.375 + directory as its parents.</para> 1.376 + 1.377 + <para id="x_310">After a commit, Mercurial will update the 1.378 + parents of the working directory, so that the first parent is 1.379 + the ID of the new changeset, and the second is the null ID. 1.380 + This is shown in <xref 1.381 + linkend="fig:concepts:wdir-after-commit"/>. Mercurial 1.382 + doesn't touch any of the files in the working directory when 1.383 + you commit; it just modifies the dirstate to note its new 1.384 + parents.</para> 1.385 + 1.386 + </sect2> 1.387 + <sect2> 1.388 + <title>Creating a new head</title> 1.389 + 1.390 + <para id="x_311">It's perfectly normal to update the working directory to a 1.391 + changeset other than the current tip. For example, you might 1.392 + want to know what your project looked like last Tuesday, or 1.393 + you could be looking through changesets to see which one 1.394 + introduced a bug. In cases like this, the natural thing to do 1.395 + is update the working directory to the changeset you're 1.396 + interested in, and then examine the files in the working 1.397 + directory directly to see their contents as they were when you 1.398 + committed that changeset. The effect of this is shown in 1.399 + <xref linkend="fig:concepts:wdir-pre-branch"/>.</para> 1.400 + 1.401 + <figure id="fig:concepts:wdir-pre-branch"> 1.402 + <title>The working directory, updated to an older 1.403 + changeset</title> 1.404 + <mediaobject> 1.405 + <imageobject><imagedata fileref="figs/wdir-pre-branch.png"/></imageobject> 1.406 + <textobject><phrase>XXX add text</phrase></textobject> 1.407 + </mediaobject> 1.408 + </figure> 1.409 + 1.410 + <para id="x_313">Having updated the working directory to an 1.411 + older changeset, what happens if you make some changes, and 1.412 + then commit? Mercurial behaves in the same way as I outlined 1.413 + above. The parents of the working directory become the 1.414 + parents of the new changeset. This new changeset has no 1.415 + children, so it becomes the new tip. And the repository now 1.416 + contains two changesets that have no children; we call these 1.417 + <emphasis>heads</emphasis>. You can see the structure that 1.418 + this creates in <xref 1.419 + linkend="fig:concepts:wdir-branch"/>.</para> 1.420 + 1.421 + <figure id="fig:concepts:wdir-branch"> 1.422 + <title>After a commit made while synced to an older 1.423 + changeset</title> 1.424 + <mediaobject> 1.425 + <imageobject><imagedata fileref="figs/wdir-branch.png"/></imageobject> 1.426 + <textobject><phrase>XXX add text</phrase></textobject> 1.427 + </mediaobject> 1.428 + </figure> 1.429 + 1.430 + <note> 1.431 + <para id="x_315">If you're new to Mercurial, you should keep 1.432 + in mind a common <quote>error</quote>, which is to use the 1.433 + <command role="hg-cmd">hg pull</command> command without any 1.434 + options. By default, the <command role="hg-cmd">hg 1.435 + pull</command> command <emphasis>does not</emphasis> 1.436 + update the working directory, so you'll bring new changesets 1.437 + into your repository, but the working directory will stay 1.438 + synced at the same changeset as before the pull. If you 1.439 + make some changes and commit afterwards, you'll thus create 1.440 + a new head, because your working directory isn't synced to 1.441 + whatever the current tip is. To combine the operation of a 1.442 + pull, followed by an update, run <command>hg pull 1.443 + -u</command>.</para> 1.444 + 1.445 + <para id="x_316">I put the word <quote>error</quote> in quotes 1.446 + because all that you need to do to rectify the situation 1.447 + where you created a new head by accident is 1.448 + <command role="hg-cmd">hg merge</command>, then <command 1.449 + role="hg-cmd">hg commit</command>. In other words, this 1.450 + almost never has negative consequences; it's just something 1.451 + of a surprise for newcomers. I'll discuss other ways to 1.452 + avoid this behavior, and why Mercurial behaves in this 1.453 + initially surprising way, later on.</para> 1.454 + </note> 1.455 + 1.456 + </sect2> 1.457 + <sect2> 1.458 + <title>Merging changes</title> 1.459 + 1.460 + <para id="x_317">When you run the <command role="hg-cmd">hg 1.461 + merge</command> command, Mercurial leaves the first parent 1.462 + of the working directory unchanged, and sets the second parent 1.463 + to the changeset you're merging with, as shown in <xref 1.464 + linkend="fig:concepts:wdir-merge"/>.</para> 1.465 + 1.466 + <figure id="fig:concepts:wdir-merge"> 1.467 + <title>Merging two heads</title> 1.468 + <mediaobject> 1.469 + <imageobject> 1.470 + <imagedata fileref="figs/wdir-merge.png"/> 1.471 + </imageobject> 1.472 + <textobject><phrase>XXX add text</phrase></textobject> 1.473 + </mediaobject> 1.474 + </figure> 1.475 + 1.476 + <para id="x_319">Mercurial also has to modify the working directory, to 1.477 + merge the files managed in the two changesets. Simplified a 1.478 + little, the merging process goes like this, for every file in 1.479 + the manifests of both changesets.</para> 1.480 + <itemizedlist> 1.481 + <listitem><para id="x_31a">If neither changeset has modified a file, do 1.482 + nothing with that file.</para> 1.483 + </listitem> 1.484 + <listitem><para id="x_31b">If one changeset has modified a file, and the 1.485 + other hasn't, create the modified copy of the file in the 1.486 + working directory.</para> 1.487 + </listitem> 1.488 + <listitem><para id="x_31c">If one changeset has removed a file, and the 1.489 + other hasn't (or has also deleted it), delete the file 1.490 + from the working directory.</para> 1.491 + </listitem> 1.492 + <listitem><para id="x_31d">If one changeset has removed a file, but the 1.493 + other has modified the file, ask the user what to do: keep 1.494 + the modified file, or remove it?</para> 1.495 + </listitem> 1.496 + <listitem><para id="x_31e">If both changesets have modified a file, 1.497 + invoke an external merge program to choose the new 1.498 + contents for the merged file. This may require input from 1.499 + the user.</para> 1.500 + </listitem> 1.501 + <listitem><para id="x_31f">If one changeset has modified a file, and the 1.502 + other has renamed or copied the file, make sure that the 1.503 + changes follow the new name of the file.</para> 1.504 + </listitem></itemizedlist> 1.505 + <para id="x_320">There are more details&emdash;merging has plenty of corner 1.506 + cases&emdash;but these are the most common choices that are 1.507 + involved in a merge. As you can see, most cases are 1.508 + completely automatic, and indeed most merges finish 1.509 + automatically, without requiring your input to resolve any 1.510 + conflicts.</para> 1.511 + 1.512 + <para id="x_321">When you're thinking about what happens when you commit 1.513 + after a merge, once again the working directory is <quote>the 1.514 + changeset I'm about to commit</quote>. After the <command 1.515 + role="hg-cmd">hg merge</command> command completes, the 1.516 + working directory has two parents; these will become the 1.517 + parents of the new changeset.</para> 1.518 + 1.519 + <para id="x_322">Mercurial lets you perform multiple merges, but 1.520 + you must commit the results of each individual merge as you 1.521 + go. This is necessary because Mercurial only tracks two 1.522 + parents for both revisions and the working directory. While 1.523 + it would be technically feasible to merge multiple changesets 1.524 + at once, Mercurial avoids this for simplicity. With multi-way 1.525 + merges, the risks of user confusion, nasty conflict 1.526 + resolution, and making a terrible mess of a merge would grow 1.527 + intolerable.</para> 1.528 + 1.529 + </sect2> 1.530 + 1.531 + <sect2> 1.532 + <title>Merging and renames</title> 1.533 + 1.534 + <para id="x_69a">A surprising number of revision control systems pay little 1.535 + or no attention to a file's <emphasis>name</emphasis> over 1.536 + time. For instance, it used to be common that if a file got 1.537 + renamed on one side of a merge, the changes from the other 1.538 + side would be silently dropped.</para> 1.539 + 1.540 + <para id="x_69b">Mercurial records metadata when you tell it to perform a 1.541 + rename or copy. It uses this metadata during a merge to do the 1.542 + right thing in the case of a merge. For instance, if I rename 1.543 + a file, and you edit it without renaming it, when we merge our 1.544 + work the file will be renamed and have your edits 1.545 + applied.</para> 1.546 + </sect2> 1.547 + </sect1> 1.548 + 1.549 + <sect1> 1.550 + <title>Other interesting design features</title> 1.551 + 1.552 + <para id="x_323">In the sections above, I've tried to highlight some of the 1.553 + most important aspects of Mercurial's design, to illustrate that 1.554 + it pays careful attention to reliability and performance. 1.555 + However, the attention to detail doesn't stop there. There are 1.556 + a number of other aspects of Mercurial's construction that I 1.557 + personally find interesting. I'll detail a few of them here, 1.558 + separate from the <quote>big ticket</quote> items above, so that 1.559 + if you're interested, you can gain a better idea of the amount 1.560 + of thinking that goes into a well-designed system.</para> 1.561 + 1.562 + <sect2> 1.563 + <title>Clever compression</title> 1.564 + 1.565 + <para id="x_324">When appropriate, Mercurial will store both snapshots and 1.566 + deltas in compressed form. It does this by always 1.567 + <emphasis>trying to</emphasis> compress a snapshot or delta, 1.568 + but only storing the compressed version if it's smaller than 1.569 + the uncompressed version.</para> 1.570 + 1.571 + <para id="x_325">This means that Mercurial does <quote>the right 1.572 + thing</quote> when storing a file whose native form is 1.573 + compressed, such as a <literal>zip</literal> archive or a JPEG 1.574 + image. When these types of files are compressed a second 1.575 + time, the resulting file is usually bigger than the 1.576 + once-compressed form, and so Mercurial will store the plain 1.577 + <literal>zip</literal> or JPEG.</para> 1.578 + 1.579 + <para id="x_326">Deltas between revisions of a compressed file are usually 1.580 + larger than snapshots of the file, and Mercurial again does 1.581 + <quote>the right thing</quote> in these cases. It finds that 1.582 + such a delta exceeds the threshold at which it should store a 1.583 + complete snapshot of the file, so it stores the snapshot, 1.584 + again saving space compared to a naive delta-only 1.585 + approach.</para> 1.586 + 1.587 + <sect3> 1.588 + <title>Network recompression</title> 1.589 + 1.590 + <para id="x_327">When storing revisions on disk, Mercurial uses the 1.591 + <quote>deflate</quote> compression algorithm (the same one 1.592 + used by the popular <literal>zip</literal> archive format), 1.593 + which balances good speed with a respectable compression 1.594 + ratio. However, when transmitting revision data over a 1.595 + network connection, Mercurial uncompresses the compressed 1.596 + revision data.</para> 1.597 + 1.598 + <para id="x_328">If the connection is over HTTP, Mercurial recompresses 1.599 + the entire stream of data using a compression algorithm that 1.600 + gives a better compression ratio (the Burrows-Wheeler 1.601 + algorithm from the widely used <literal>bzip2</literal> 1.602 + compression package). This combination of algorithm and 1.603 + compression of the entire stream (instead of a revision at a 1.604 + time) substantially reduces the number of bytes to be 1.605 + transferred, yielding better network performance over most 1.606 + kinds of network.</para> 1.607 + 1.608 + <para id="x_329">If the connection is over 1.609 + <command>ssh</command>, Mercurial 1.610 + <emphasis>doesn't</emphasis> recompress the stream, because 1.611 + <command>ssh</command> can already do this itself. You can 1.612 + tell Mercurial to always use <command>ssh</command>'s 1.613 + compression feature by editing the 1.614 + <filename>.hgrc</filename> file in your home directory as 1.615 + follows.</para> 1.616 + 1.617 + <programlisting>[ui] 1.618 +ssh = ssh -C</programlisting> 1.619 + 1.620 + </sect3> 1.621 + </sect2> 1.622 + <sect2> 1.623 + <title>Read/write ordering and atomicity</title> 1.624 + 1.625 + <para id="x_32a">Appending to files isn't the whole story when 1.626 + it comes to guaranteeing that a reader won't see a partial 1.627 + write. If you recall <xref linkend="fig:concepts:metadata"/>, 1.628 + revisions in the changelog point to revisions in the manifest, 1.629 + and revisions in the manifest point to revisions in filelogs. 1.630 + This hierarchy is deliberate.</para> 1.631 + 1.632 + <para id="x_32b">A writer starts a transaction by writing filelog and 1.633 + manifest data, and doesn't write any changelog data until 1.634 + those are finished. A reader starts by reading changelog 1.635 + data, then manifest data, followed by filelog data.</para> 1.636 + 1.637 + <para id="x_32c">Since the writer has always finished writing filelog and 1.638 + manifest data before it writes to the changelog, a reader will 1.639 + never read a pointer to a partially written manifest revision 1.640 + from the changelog, and it will never read a pointer to a 1.641 + partially written filelog revision from the manifest.</para> 1.642 + 1.643 + </sect2> 1.644 + <sect2> 1.645 + <title>Concurrent access</title> 1.646 + 1.647 + <para id="x_32d">The read/write ordering and atomicity guarantees mean that 1.648 + Mercurial never needs to <emphasis>lock</emphasis> a 1.649 + repository when it's reading data, even if the repository is 1.650 + being written to while the read is occurring. This has a big 1.651 + effect on scalability; you can have an arbitrary number of 1.652 + Mercurial processes safely reading data from a repository 1.653 + all at once, no matter whether it's being written to or 1.654 + not.</para> 1.655 + 1.656 + <para id="x_32e">The lockless nature of reading means that if you're 1.657 + sharing a repository on a multi-user system, you don't need to 1.658 + grant other local users permission to 1.659 + <emphasis>write</emphasis> to your repository in order for 1.660 + them to be able to clone it or pull changes from it; they only 1.661 + need <emphasis>read</emphasis> permission. (This is 1.662 + <emphasis>not</emphasis> a common feature among revision 1.663 + control systems, so don't take it for granted! Most require 1.664 + readers to be able to lock a repository to access it safely, 1.665 + and this requires write permission on at least one directory, 1.666 + which of course makes for all kinds of nasty and annoying 1.667 + security and administrative problems.)</para> 1.668 + 1.669 + <para id="x_32f">Mercurial uses locks to ensure that only one process can 1.670 + write to a repository at a time (the locking mechanism is safe 1.671 + even over filesystems that are notoriously hostile to locking, 1.672 + such as NFS). If a repository is locked, a writer will wait 1.673 + for a while to retry if the repository becomes unlocked, but 1.674 + if the repository remains locked for too long, the process 1.675 + attempting to write will time out after a while. This means 1.676 + that your daily automated scripts won't get stuck forever and 1.677 + pile up if a system crashes unnoticed, for example. (Yes, the 1.678 + timeout is configurable, from zero to infinity.)</para> 1.679 + 1.680 + <sect3> 1.681 + <title>Safe dirstate access</title> 1.682 + 1.683 + <para id="x_330">As with revision data, Mercurial doesn't take a lock to 1.684 + read the dirstate file; it does acquire a lock to write it. 1.685 + To avoid the possibility of reading a partially written copy 1.686 + of the dirstate file, Mercurial writes to a file with a 1.687 + unique name in the same directory as the dirstate file, then 1.688 + renames the temporary file atomically to 1.689 + <filename>dirstate</filename>. The file named 1.690 + <filename>dirstate</filename> is thus guaranteed to be 1.691 + complete, not partially written.</para> 1.692 + 1.693 + </sect3> 1.694 + </sect2> 1.695 + <sect2> 1.696 + <title>Avoiding seeks</title> 1.697 + 1.698 + <para id="x_331">Critical to Mercurial's performance is the avoidance of 1.699 + seeks of the disk head, since any seek is far more expensive 1.700 + than even a comparatively large read operation.</para> 1.701 + 1.702 + <para id="x_332">This is why, for example, the dirstate is stored in a 1.703 + single file. If there were a dirstate file per directory that 1.704 + Mercurial tracked, the disk would seek once per directory. 1.705 + Instead, Mercurial reads the entire single dirstate file in 1.706 + one step.</para> 1.707 + 1.708 + <para id="x_333">Mercurial also uses a <quote>copy on write</quote> scheme 1.709 + when cloning a repository on local storage. Instead of 1.710 + copying every revlog file from the old repository into the new 1.711 + repository, it makes a <quote>hard link</quote>, which is a 1.712 + shorthand way to say <quote>these two names point to the same 1.713 + file</quote>. When Mercurial is about to write to one of a 1.714 + revlog's files, it checks to see if the number of names 1.715 + pointing at the file is greater than one. If it is, more than 1.716 + one repository is using the file, so Mercurial makes a new 1.717 + copy of the file that is private to this repository.</para> 1.718 + 1.719 + <para id="x_334">A few revision control developers have pointed out that 1.720 + this idea of making a complete private copy of a file is not 1.721 + very efficient in its use of storage. While this is true, 1.722 + storage is cheap, and this method gives the highest 1.723 + performance while deferring most book-keeping to the operating 1.724 + system. An alternative scheme would most likely reduce 1.725 + performance and increase the complexity of the software, but 1.726 + speed and simplicity are key to the <quote>feel</quote> of 1.727 + day-to-day use.</para> 1.728 + 1.729 + </sect2> 1.730 + <sect2> 1.731 + <title>Other contents of the dirstate</title> 1.732 + 1.733 + <para id="x_335">Because Mercurial doesn't force you to tell it when you're 1.734 + modifying a file, it uses the dirstate to store some extra 1.735 + information so it can determine efficiently whether you have 1.736 + modified a file. For each file in the working directory, it 1.737 + stores the time that it last modified the file itself, and the 1.738 + size of the file at that time.</para> 1.739 + 1.740 + <para id="x_336">When you explicitly <command role="hg-cmd">hg 1.741 + add</command>, <command role="hg-cmd">hg remove</command>, 1.742 + <command role="hg-cmd">hg rename</command> or <command 1.743 + role="hg-cmd">hg copy</command> files, Mercurial updates the 1.744 + dirstate so that it knows what to do with those files when you 1.745 + commit.</para> 1.746 + 1.747 + <para id="x_337">The dirstate helps Mercurial to efficiently 1.748 + check the status of files in a repository.</para> 1.749 + 1.750 + <itemizedlist> 1.751 + <listitem> 1.752 + <para id="x_726">When Mercurial checks the state of a file in the 1.753 + working directory, it first checks a file's modification 1.754 + time against the time in the dirstate that records when 1.755 + Mercurial last wrote the file. If the last modified time 1.756 + is the same as the time when Mercurial wrote the file, the 1.757 + file must not have been modified, so Mercurial does not 1.758 + need to check any further.</para> 1.759 + </listitem> 1.760 + <listitem> 1.761 + <para id="x_727">If the file's size has changed, the file must have 1.762 + been modified. If the modification time has changed, but 1.763 + the size has not, only then does Mercurial need to 1.764 + actually read the contents of the file to see if it has 1.765 + changed.</para> 1.766 + </listitem> 1.767 + </itemizedlist> 1.768 + 1.769 + <para id="x_728">Storing the modification time and size dramatically 1.770 + reduces the number of read operations that Mercurial needs to 1.771 + perform when we run commands like <command>hg status</command>. 1.772 + This results in large performance improvements.</para> 1.773 + </sect2> 1.774 + </sect1> 1.775 +</chapter> 1.776 + 1.777 +<!-- 1.778 +local variables: 1.779 +sgml-parent-document: ("00book.xml" "book" "chapter") 1.780 +end: 1.781 +-->