hgbook
annotate en/ch04-concepts.xml @ 964:6b680d569bb4
deleting a bunch of files not longer necessary to build the documentation.
Adding missing newly files needed to build the documentation
Adding missing newly files needed to build the documentation
author | Romain PELISSE <belaran@gmail.com> |
---|---|
date | Sun Aug 16 04:58:01 2009 +0200 (2009-08-16) |
parents | 18131160f7ee |
children |
rev | line source |
---|---|
bos@559 | 1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> |
bos@559 | 2 |
bos@559 | 3 <chapter id="chap:concepts"> |
bos@572 | 4 <?dbhtml filename="behind-the-scenes.html"?> |
bos@559 | 5 <title>Behind the scenes</title> |
bos@559 | 6 |
bos@620 | 7 <para id="x_2e8">Unlike many revision control systems, the concepts |
bos@620 | 8 upon which Mercurial is built are simple enough that it's easy to |
bos@620 | 9 understand how the software really works. Knowing these details |
bos@620 | 10 certainly isn't necessary, so it is certainly safe to skip this |
bos@620 | 11 chapter. However, I think you will get more out of the software |
bos@620 | 12 with a <quote>mental model</quote> of what's going on.</para> |
bos@620 | 13 |
bos@620 | 14 <para id="x_2e9">Being able to understand what's going on behind the |
bos@620 | 15 scenes gives me confidence that Mercurial has been carefully |
bos@620 | 16 designed to be both <emphasis>safe</emphasis> and |
bos@559 | 17 <emphasis>efficient</emphasis>. And just as importantly, if it's |
bos@559 | 18 easy for me to retain a good idea of what the software is doing |
bos@559 | 19 when I perform a revision control task, I'm less likely to be |
bos@672 | 20 surprised by its behavior.</para> |
bos@559 | 21 |
bos@584 | 22 <para id="x_2ea">In this chapter, we'll initially cover the core concepts |
bos@559 | 23 behind Mercurial's design, then continue to discuss some of the |
bos@559 | 24 interesting details of its implementation.</para> |
bos@559 | 25 |
bos@559 | 26 <sect1> |
bos@559 | 27 <title>Mercurial's historical record</title> |
bos@559 | 28 |
bos@559 | 29 <sect2> |
bos@559 | 30 <title>Tracking the history of a single file</title> |
bos@559 | 31 |
bos@584 | 32 <para id="x_2eb">When Mercurial tracks modifications to a file, it stores |
bos@559 | 33 the history of that file in a metadata object called a |
bos@559 | 34 <emphasis>filelog</emphasis>. Each entry in the filelog |
bos@559 | 35 contains enough information to reconstruct one revision of the |
bos@559 | 36 file that is being tracked. Filelogs are stored as files in |
bos@559 | 37 the <filename role="special" |
bos@559 | 38 class="directory">.hg/store/data</filename> directory. A |
bos@559 | 39 filelog contains two kinds of information: revision data, and |
bos@559 | 40 an index to help Mercurial to find a revision |
bos@559 | 41 efficiently.</para> |
bos@559 | 42 |
bos@584 | 43 <para id="x_2ec">A file that is large, or has a lot of history, has its |
bos@559 | 44 filelog stored in separate data |
bos@559 | 45 (<quote><literal>.d</literal></quote> suffix) and index |
bos@559 | 46 (<quote><literal>.i</literal></quote> suffix) files. For |
bos@559 | 47 small files without much history, the revision data and index |
bos@559 | 48 are combined in a single <quote><literal>.i</literal></quote> |
bos@559 | 49 file. The correspondence between a file in the working |
bos@559 | 50 directory and the filelog that tracks its history in the |
bos@592 | 51 repository is illustrated in <xref |
bos@559 | 52 linkend="fig:concepts:filelog"/>.</para> |
bos@559 | 53 |
bos@591 | 54 <figure id="fig:concepts:filelog"> |
bos@591 | 55 <title>Relationships between files in working directory and |
bos@591 | 56 filelogs in repository</title> |
bos@591 | 57 <mediaobject> |
bos@594 | 58 <imageobject><imagedata fileref="figs/filelog.png"/></imageobject> |
bos@591 | 59 <textobject><phrase>XXX add text</phrase></textobject> |
bos@591 | 60 </mediaobject> |
bos@591 | 61 </figure> |
bos@559 | 62 |
bos@559 | 63 </sect2> |
bos@559 | 64 <sect2> |
bos@559 | 65 <title>Managing tracked files</title> |
bos@559 | 66 |
bos@584 | 67 <para id="x_2ee">Mercurial uses a structure called a |
bos@559 | 68 <emphasis>manifest</emphasis> to collect together information |
bos@559 | 69 about the files that it tracks. Each entry in the manifest |
bos@559 | 70 contains information about the files present in a single |
bos@559 | 71 changeset. An entry records which files are present in the |
bos@559 | 72 changeset, the revision of each file, and a few other pieces |
bos@559 | 73 of file metadata.</para> |
bos@559 | 74 |
bos@559 | 75 </sect2> |
bos@559 | 76 <sect2> |
bos@559 | 77 <title>Recording changeset information</title> |
bos@559 | 78 |
bos@584 | 79 <para id="x_2ef">The <emphasis>changelog</emphasis> contains information |
bos@559 | 80 about each changeset. Each revision records who committed a |
bos@559 | 81 change, the changeset comment, other pieces of |
bos@559 | 82 changeset-related information, and the revision of the |
bos@559 | 83 manifest to use.</para> |
bos@559 | 84 |
bos@559 | 85 </sect2> |
bos@559 | 86 <sect2> |
bos@559 | 87 <title>Relationships between revisions</title> |
bos@559 | 88 |
bos@584 | 89 <para id="x_2f0">Within a changelog, a manifest, or a filelog, each |
bos@559 | 90 revision stores a pointer to its immediate parent (or to its |
bos@559 | 91 two parents, if it's a merge revision). As I mentioned above, |
bos@559 | 92 there are also relationships between revisions |
bos@559 | 93 <emphasis>across</emphasis> these structures, and they are |
bos@559 | 94 hierarchical in nature.</para> |
bos@559 | 95 |
bos@584 | 96 <para id="x_2f1">For every changeset in a repository, there is exactly one |
bos@559 | 97 revision stored in the changelog. Each revision of the |
bos@559 | 98 changelog contains a pointer to a single revision of the |
bos@559 | 99 manifest. A revision of the manifest stores a pointer to a |
bos@559 | 100 single revision of each filelog tracked when that changeset |
bos@592 | 101 was created. These relationships are illustrated in |
bos@559 | 102 <xref linkend="fig:concepts:metadata"/>.</para> |
bos@559 | 103 |
bos@591 | 104 <figure id="fig:concepts:metadata"> |
bos@591 | 105 <title>Metadata relationships</title> |
bos@591 | 106 <mediaobject> |
bos@594 | 107 <imageobject><imagedata fileref="figs/metadata.png"/></imageobject> |
bos@591 | 108 <textobject><phrase>XXX add text</phrase></textobject> |
bos@559 | 109 </mediaobject> |
bos@591 | 110 </figure> |
bos@559 | 111 |
bos@584 | 112 <para id="x_2f3">As the illustration shows, there is |
bos@559 | 113 <emphasis>not</emphasis> a <quote>one to one</quote> |
bos@559 | 114 relationship between revisions in the changelog, manifest, or |
bos@701 | 115 filelog. If a file that |
bos@559 | 116 Mercurial tracks hasn't changed between two changesets, the |
bos@559 | 117 entry for that file in the two revisions of the manifest will |
bos@701 | 118 point to the same revision of its filelog<footnote> |
bos@702 | 119 <para id="x_725">It is possible (though unusual) for the manifest to |
bos@701 | 120 remain the same between two changesets, in which case the |
bos@701 | 121 changelog entries for those changesets will point to the |
bos@701 | 122 same revision of the manifest.</para> |
bos@701 | 123 </footnote>.</para> |
bos@559 | 124 |
bos@559 | 125 </sect2> |
bos@559 | 126 </sect1> |
bos@559 | 127 <sect1> |
bos@559 | 128 <title>Safe, efficient storage</title> |
bos@559 | 129 |
bos@584 | 130 <para id="x_2f4">The underpinnings of changelogs, manifests, and filelogs are |
bos@559 | 131 provided by a single structure called the |
bos@559 | 132 <emphasis>revlog</emphasis>.</para> |
bos@559 | 133 |
bos@559 | 134 <sect2> |
bos@559 | 135 <title>Efficient storage</title> |
bos@559 | 136 |
bos@584 | 137 <para id="x_2f5">The revlog provides efficient storage of revisions using a |
bos@559 | 138 <emphasis>delta</emphasis> mechanism. Instead of storing a |
bos@559 | 139 complete copy of a file for each revision, it stores the |
bos@559 | 140 changes needed to transform an older revision into the new |
bos@559 | 141 revision. For many kinds of file data, these deltas are |
bos@559 | 142 typically a fraction of a percent of the size of a full copy |
bos@559 | 143 of a file.</para> |
bos@559 | 144 |
bos@584 | 145 <para id="x_2f6">Some obsolete revision control systems can only work with |
bos@559 | 146 deltas of text files. They must either store binary files as |
bos@559 | 147 complete snapshots or encoded into a text representation, both |
bos@559 | 148 of which are wasteful approaches. Mercurial can efficiently |
bos@559 | 149 handle deltas of files with arbitrary binary contents; it |
bos@559 | 150 doesn't need to treat text as special.</para> |
bos@559 | 151 |
bos@559 | 152 </sect2> |
bos@559 | 153 <sect2 id="sec:concepts:txn"> |
bos@559 | 154 <title>Safe operation</title> |
bos@559 | 155 |
bos@584 | 156 <para id="x_2f7">Mercurial only ever <emphasis>appends</emphasis> data to |
bos@559 | 157 the end of a revlog file. It never modifies a section of a |
bos@559 | 158 file after it has written it. This is both more robust and |
bos@559 | 159 efficient than schemes that need to modify or rewrite |
bos@559 | 160 data.</para> |
bos@559 | 161 |
bos@584 | 162 <para id="x_2f8">In addition, Mercurial treats every write as part of a |
bos@559 | 163 <emphasis>transaction</emphasis> that can span a number of |
bos@559 | 164 files. A transaction is <emphasis>atomic</emphasis>: either |
bos@559 | 165 the entire transaction succeeds and its effects are all |
bos@559 | 166 visible to readers in one go, or the whole thing is undone. |
bos@559 | 167 This guarantee of atomicity means that if you're running two |
bos@559 | 168 copies of Mercurial, where one is reading data and one is |
bos@559 | 169 writing it, the reader will never see a partially written |
bos@559 | 170 result that might confuse it.</para> |
bos@559 | 171 |
bos@584 | 172 <para id="x_2f9">The fact that Mercurial only appends to files makes it |
bos@559 | 173 easier to provide this transactional guarantee. The easier it |
bos@559 | 174 is to do stuff like this, the more confident you should be |
bos@559 | 175 that it's done correctly.</para> |
bos@559 | 176 |
bos@559 | 177 </sect2> |
bos@559 | 178 <sect2> |
bos@559 | 179 <title>Fast retrieval</title> |
bos@559 | 180 |
bos@701 | 181 <para id="x_2fa">Mercurial cleverly avoids a pitfall common to |
bos@701 | 182 all earlier revision control systems: the problem of |
bos@701 | 183 <emphasis>inefficient retrieval</emphasis>. Most revision |
bos@701 | 184 control systems store the contents of a revision as an |
bos@701 | 185 incremental series of modifications against a |
bos@701 | 186 <quote>snapshot</quote>. (Some base the snapshot on the |
bos@701 | 187 oldest revision, others on the newest.) To reconstruct a |
bos@701 | 188 specific revision, you must first read the snapshot, and then |
bos@701 | 189 every one of the revisions between the snapshot and your |
bos@701 | 190 target revision. The more history that a file accumulates, |
bos@701 | 191 the more revisions you must read, hence the longer it takes to |
bos@701 | 192 reconstruct a particular revision.</para> |
bos@559 | 193 |
bos@591 | 194 <figure id="fig:concepts:snapshot"> |
bos@591 | 195 <title>Snapshot of a revlog, with incremental deltas</title> |
bos@591 | 196 <mediaobject> |
bos@594 | 197 <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject> |
bos@591 | 198 <textobject><phrase>XXX add text</phrase></textobject> |
bos@591 | 199 </mediaobject> |
bos@591 | 200 </figure> |
bos@559 | 201 |
bos@584 | 202 <para id="x_2fc">The innovation that Mercurial applies to this problem is |
bos@559 | 203 simple but effective. Once the cumulative amount of delta |
bos@559 | 204 information stored since the last snapshot exceeds a fixed |
bos@559 | 205 threshold, it stores a new snapshot (compressed, of course), |
bos@559 | 206 instead of another delta. This makes it possible to |
bos@559 | 207 reconstruct <emphasis>any</emphasis> revision of a file |
bos@559 | 208 quickly. This approach works so well that it has since been |
bos@559 | 209 copied by several other revision control systems.</para> |
bos@559 | 210 |
bos@592 | 211 <para id="x_2fd"><xref linkend="fig:concepts:snapshot"/> illustrates |
bos@559 | 212 the idea. In an entry in a revlog's index file, Mercurial |
bos@559 | 213 stores the range of entries from the data file that it must |
bos@559 | 214 read to reconstruct a particular revision.</para> |
bos@559 | 215 |
bos@559 | 216 <sect3> |
bos@559 | 217 <title>Aside: the influence of video compression</title> |
bos@559 | 218 |
bos@701 | 219 <para id="x_2fe">If you're familiar with video compression or |
bos@701 | 220 have ever watched a TV feed through a digital cable or |
bos@701 | 221 satellite service, you may know that most video compression |
bos@701 | 222 schemes store each frame of video as a delta against its |
bos@701 | 223 predecessor frame.</para> |
bos@701 | 224 |
bos@701 | 225 <para id="x_2ff">Mercurial borrows this idea to make it |
bos@701 | 226 possible to reconstruct a revision from a snapshot and a |
bos@701 | 227 small number of deltas.</para> |
bos@559 | 228 |
bos@559 | 229 </sect3> |
bos@559 | 230 </sect2> |
bos@559 | 231 <sect2> |
bos@559 | 232 <title>Identification and strong integrity</title> |
bos@559 | 233 |
bos@584 | 234 <para id="x_300">Along with delta or snapshot information, a revlog entry |
bos@559 | 235 contains a cryptographic hash of the data that it represents. |
bos@559 | 236 This makes it difficult to forge the contents of a revision, |
bos@559 | 237 and easy to detect accidental corruption.</para> |
bos@559 | 238 |
bos@584 | 239 <para id="x_301">Hashes provide more than a mere check against corruption; |
bos@559 | 240 they are used as the identifiers for revisions. The changeset |
bos@559 | 241 identification hashes that you see as an end user are from |
bos@559 | 242 revisions of the changelog. Although filelogs and the |
bos@559 | 243 manifest also use hashes, Mercurial only uses these behind the |
bos@559 | 244 scenes.</para> |
bos@559 | 245 |
bos@584 | 246 <para id="x_302">Mercurial verifies that hashes are correct when it |
bos@559 | 247 retrieves file revisions and when it pulls changes from |
bos@559 | 248 another repository. If it encounters an integrity problem, it |
bos@559 | 249 will complain and stop whatever it's doing.</para> |
bos@559 | 250 |
bos@584 | 251 <para id="x_303">In addition to the effect it has on retrieval efficiency, |
bos@559 | 252 Mercurial's use of periodic snapshots makes it more robust |
bos@559 | 253 against partial data corruption. If a revlog becomes partly |
bos@559 | 254 corrupted due to a hardware error or system bug, it's often |
bos@559 | 255 possible to reconstruct some or most revisions from the |
bos@559 | 256 uncorrupted sections of the revlog, both before and after the |
bos@559 | 257 corrupted section. This would not be possible with a |
bos@559 | 258 delta-only storage model.</para> |
bos@559 | 259 </sect2> |
bos@559 | 260 </sect1> |
bos@701 | 261 |
bos@559 | 262 <sect1> |
bos@559 | 263 <title>Revision history, branching, and merging</title> |
bos@559 | 264 |
bos@584 | 265 <para id="x_304">Every entry in a Mercurial revlog knows the identity of its |
bos@559 | 266 immediate ancestor revision, usually referred to as its |
bos@559 | 267 <emphasis>parent</emphasis>. In fact, a revision contains room |
bos@559 | 268 for not one parent, but two. Mercurial uses a special hash, |
bos@559 | 269 called the <quote>null ID</quote>, to represent the idea |
bos@559 | 270 <quote>there is no parent here</quote>. This hash is simply a |
bos@559 | 271 string of zeroes.</para> |
bos@559 | 272 |
bos@592 | 273 <para id="x_305">In <xref linkend="fig:concepts:revlog"/>, you can see |
bos@559 | 274 an example of the conceptual structure of a revlog. Filelogs, |
bos@559 | 275 manifests, and changelogs all have this same structure; they |
bos@559 | 276 differ only in the kind of data stored in each delta or |
bos@559 | 277 snapshot.</para> |
bos@559 | 278 |
bos@584 | 279 <para id="x_306">The first revision in a revlog (at the bottom of the image) |
bos@559 | 280 has the null ID in both of its parent slots. For a |
bos@559 | 281 <quote>normal</quote> revision, its first parent slot contains |
bos@559 | 282 the ID of its parent revision, and its second contains the null |
bos@559 | 283 ID, indicating that the revision has only one real parent. Any |
bos@559 | 284 two revisions that have the same parent ID are branches. A |
bos@559 | 285 revision that represents a merge between branches has two normal |
bos@559 | 286 revision IDs in its parent slots.</para> |
bos@559 | 287 |
bos@591 | 288 <figure id="fig:concepts:revlog"> |
bos@591 | 289 <title>The conceptual structure of a revlog</title> |
bos@591 | 290 <mediaobject> |
bos@594 | 291 <imageobject><imagedata fileref="figs/revlog.png"/></imageobject> |
bos@591 | 292 <textobject><phrase>XXX add text</phrase></textobject> |
bos@591 | 293 </mediaobject> |
bos@591 | 294 </figure> |
bos@559 | 295 |
bos@559 | 296 </sect1> |
bos@559 | 297 <sect1> |
bos@559 | 298 <title>The working directory</title> |
bos@559 | 299 |
bos@584 | 300 <para id="x_307">In the working directory, Mercurial stores a snapshot of the |
bos@559 | 301 files from the repository as of a particular changeset.</para> |
bos@559 | 302 |
bos@584 | 303 <para id="x_308">The working directory <quote>knows</quote> which changeset |
bos@559 | 304 it contains. When you update the working directory to contain a |
bos@559 | 305 particular changeset, Mercurial looks up the appropriate |
bos@559 | 306 revision of the manifest to find out which files it was tracking |
bos@559 | 307 at the time that changeset was committed, and which revision of |
bos@559 | 308 each file was then current. It then recreates a copy of each of |
bos@559 | 309 those files, with the same contents it had when the changeset |
bos@559 | 310 was committed.</para> |
bos@559 | 311 |
bos@701 | 312 <para id="x_309">The <emphasis>dirstate</emphasis> is a special |
bos@701 | 313 structure that contains Mercurial's knowledge of the working |
bos@701 | 314 directory. It is maintained as a file named |
bos@701 | 315 <filename>.hg/dirstate</filename> inside a repository. The |
bos@701 | 316 dirstate details which changeset the working directory is |
bos@701 | 317 updated to, and all of the files that Mercurial is tracking in |
bos@701 | 318 the working directory. It also lets Mercurial quickly notice |
bos@701 | 319 changed files, by recording their checkout times and |
bos@701 | 320 sizes.</para> |
bos@559 | 321 |
bos@584 | 322 <para id="x_30a">Just as a revision of a revlog has room for two parents, so |
bos@559 | 323 that it can represent either a normal revision (with one parent) |
bos@559 | 324 or a merge of two earlier revisions, the dirstate has slots for |
bos@559 | 325 two parents. When you use the <command role="hg-cmd">hg |
bos@559 | 326 update</command> command, the changeset that you update to is |
bos@559 | 327 stored in the <quote>first parent</quote> slot, and the null ID |
bos@559 | 328 in the second. When you <command role="hg-cmd">hg |
bos@559 | 329 merge</command> with another changeset, the first parent |
bos@559 | 330 remains unchanged, and the second parent is filled in with the |
bos@559 | 331 changeset you're merging with. The <command role="hg-cmd">hg |
bos@559 | 332 parents</command> command tells you what the parents of the |
bos@559 | 333 dirstate are.</para> |
bos@559 | 334 |
bos@559 | 335 <sect2> |
bos@559 | 336 <title>What happens when you commit</title> |
bos@559 | 337 |
bos@584 | 338 <para id="x_30b">The dirstate stores parent information for more than just |
bos@559 | 339 book-keeping purposes. Mercurial uses the parents of the |
bos@559 | 340 dirstate as <emphasis>the parents of a new |
bos@559 | 341 changeset</emphasis> when you perform a commit.</para> |
bos@559 | 342 |
bos@591 | 343 <figure id="fig:concepts:wdir"> |
bos@591 | 344 <title>The working directory can have two parents</title> |
bos@591 | 345 <mediaobject> |
bos@594 | 346 <imageobject><imagedata fileref="figs/wdir.png"/></imageobject> |
bos@591 | 347 <textobject><phrase>XXX add text</phrase></textobject> |
bos@591 | 348 </mediaobject> |
bos@591 | 349 </figure> |
bos@559 | 350 |
bos@592 | 351 <para id="x_30d"><xref linkend="fig:concepts:wdir"/> shows the |
bos@559 | 352 normal state of the working directory, where it has a single |
bos@559 | 353 changeset as parent. That changeset is the |
bos@559 | 354 <emphasis>tip</emphasis>, the newest changeset in the |
bos@559 | 355 repository that has no children.</para> |
bos@559 | 356 |
bos@591 | 357 <figure id="fig:concepts:wdir-after-commit"> |
bos@591 | 358 <title>The working directory gains new parents after a |
bos@591 | 359 commit</title> |
bos@591 | 360 <mediaobject> |
bos@594 | 361 <imageobject><imagedata fileref="figs/wdir-after-commit.png"/></imageobject> |
bos@591 | 362 <textobject><phrase>XXX add text</phrase></textobject> |
bos@591 | 363 </mediaobject> |
bos@591 | 364 </figure> |
bos@559 | 365 |
bos@584 | 366 <para id="x_30f">It's useful to think of the working directory as |
bos@559 | 367 <quote>the changeset I'm about to commit</quote>. Any files |
bos@559 | 368 that you tell Mercurial that you've added, removed, renamed, |
bos@559 | 369 or copied will be reflected in that changeset, as will |
bos@559 | 370 modifications to any files that Mercurial is already tracking; |
bos@559 | 371 the new changeset will have the parents of the working |
bos@559 | 372 directory as its parents.</para> |
bos@559 | 373 |
bos@592 | 374 <para id="x_310">After a commit, Mercurial will update the |
bos@592 | 375 parents of the working directory, so that the first parent is |
bos@592 | 376 the ID of the new changeset, and the second is the null ID. |
bos@592 | 377 This is shown in <xref |
bos@592 | 378 linkend="fig:concepts:wdir-after-commit"/>. Mercurial |
bos@559 | 379 doesn't touch any of the files in the working directory when |
bos@559 | 380 you commit; it just modifies the dirstate to note its new |
bos@559 | 381 parents.</para> |
bos@559 | 382 |
bos@559 | 383 </sect2> |
bos@559 | 384 <sect2> |
bos@559 | 385 <title>Creating a new head</title> |
bos@559 | 386 |
bos@584 | 387 <para id="x_311">It's perfectly normal to update the working directory to a |
bos@559 | 388 changeset other than the current tip. For example, you might |
bos@559 | 389 want to know what your project looked like last Tuesday, or |
bos@559 | 390 you could be looking through changesets to see which one |
bos@559 | 391 introduced a bug. In cases like this, the natural thing to do |
bos@559 | 392 is update the working directory to the changeset you're |
bos@559 | 393 interested in, and then examine the files in the working |
bos@559 | 394 directory directly to see their contents as they were when you |
bos@559 | 395 committed that changeset. The effect of this is shown in |
bos@592 | 396 <xref linkend="fig:concepts:wdir-pre-branch"/>.</para> |
bos@559 | 397 |
bos@591 | 398 <figure id="fig:concepts:wdir-pre-branch"> |
bos@591 | 399 <title>The working directory, updated to an older |
bos@591 | 400 changeset</title> |
bos@591 | 401 <mediaobject> |
bos@594 | 402 <imageobject><imagedata fileref="figs/wdir-pre-branch.png"/></imageobject> |
bos@591 | 403 <textobject><phrase>XXX add text</phrase></textobject> |
bos@591 | 404 </mediaobject> |
bos@591 | 405 </figure> |
bos@559 | 406 |
bos@592 | 407 <para id="x_313">Having updated the working directory to an |
bos@592 | 408 older changeset, what happens if you make some changes, and |
bos@592 | 409 then commit? Mercurial behaves in the same way as I outlined |
bos@559 | 410 above. The parents of the working directory become the |
bos@559 | 411 parents of the new changeset. This new changeset has no |
bos@559 | 412 children, so it becomes the new tip. And the repository now |
bos@559 | 413 contains two changesets that have no children; we call these |
bos@559 | 414 <emphasis>heads</emphasis>. You can see the structure that |
bos@592 | 415 this creates in <xref |
bos@559 | 416 linkend="fig:concepts:wdir-branch"/>.</para> |
bos@559 | 417 |
bos@591 | 418 <figure id="fig:concepts:wdir-branch"> |
bos@591 | 419 <title>After a commit made while synced to an older |
bos@591 | 420 changeset</title> |
bos@591 | 421 <mediaobject> |
bos@594 | 422 <imageobject><imagedata fileref="figs/wdir-branch.png"/></imageobject> |
bos@591 | 423 <textobject><phrase>XXX add text</phrase></textobject> |
bos@591 | 424 </mediaobject> |
bos@591 | 425 </figure> |
bos@559 | 426 |
bos@559 | 427 <note> |
bos@701 | 428 <para id="x_315">If you're new to Mercurial, you should keep |
bos@701 | 429 in mind a common <quote>error</quote>, which is to use the |
bos@701 | 430 <command role="hg-cmd">hg pull</command> command without any |
bos@559 | 431 options. By default, the <command role="hg-cmd">hg |
bos@559 | 432 pull</command> command <emphasis>does not</emphasis> |
bos@559 | 433 update the working directory, so you'll bring new changesets |
bos@559 | 434 into your repository, but the working directory will stay |
bos@559 | 435 synced at the same changeset as before the pull. If you |
bos@559 | 436 make some changes and commit afterwards, you'll thus create |
bos@559 | 437 a new head, because your working directory isn't synced to |
bos@701 | 438 whatever the current tip is. To combine the operation of a |
bos@701 | 439 pull, followed by an update, run <command>hg pull |
bos@701 | 440 -u</command>.</para> |
bos@701 | 441 |
bos@701 | 442 <para id="x_316">I put the word <quote>error</quote> in quotes |
bos@701 | 443 because all that you need to do to rectify the situation |
bos@701 | 444 where you created a new head by accident is |
bos@701 | 445 <command role="hg-cmd">hg merge</command>, then <command |
bos@701 | 446 role="hg-cmd">hg commit</command>. In other words, this |
bos@701 | 447 almost never has negative consequences; it's just something |
bos@701 | 448 of a surprise for newcomers. I'll discuss other ways to |
bos@701 | 449 avoid this behavior, and why Mercurial behaves in this |
bos@701 | 450 initially surprising way, later on.</para> |
bos@559 | 451 </note> |
bos@559 | 452 |
bos@559 | 453 </sect2> |
bos@559 | 454 <sect2> |
bos@620 | 455 <title>Merging changes</title> |
bos@559 | 456 |
bos@592 | 457 <para id="x_317">When you run the <command role="hg-cmd">hg |
bos@592 | 458 merge</command> command, Mercurial leaves the first parent |
bos@592 | 459 of the working directory unchanged, and sets the second parent |
bos@592 | 460 to the changeset you're merging with, as shown in <xref |
bos@559 | 461 linkend="fig:concepts:wdir-merge"/>.</para> |
bos@559 | 462 |
bos@591 | 463 <figure id="fig:concepts:wdir-merge"> |
bos@591 | 464 <title>Merging two heads</title> |
bos@591 | 465 <mediaobject> |
bos@591 | 466 <imageobject> |
bos@594 | 467 <imagedata fileref="figs/wdir-merge.png"/> |
bos@591 | 468 </imageobject> |
bos@591 | 469 <textobject><phrase>XXX add text</phrase></textobject> |
bos@591 | 470 </mediaobject> |
bos@591 | 471 </figure> |
bos@559 | 472 |
bos@584 | 473 <para id="x_319">Mercurial also has to modify the working directory, to |
bos@559 | 474 merge the files managed in the two changesets. Simplified a |
bos@559 | 475 little, the merging process goes like this, for every file in |
bos@559 | 476 the manifests of both changesets.</para> |
bos@559 | 477 <itemizedlist> |
bos@584 | 478 <listitem><para id="x_31a">If neither changeset has modified a file, do |
bos@559 | 479 nothing with that file.</para> |
bos@559 | 480 </listitem> |
bos@584 | 481 <listitem><para id="x_31b">If one changeset has modified a file, and the |
bos@559 | 482 other hasn't, create the modified copy of the file in the |
bos@559 | 483 working directory.</para> |
bos@559 | 484 </listitem> |
bos@584 | 485 <listitem><para id="x_31c">If one changeset has removed a file, and the |
bos@559 | 486 other hasn't (or has also deleted it), delete the file |
bos@559 | 487 from the working directory.</para> |
bos@559 | 488 </listitem> |
bos@584 | 489 <listitem><para id="x_31d">If one changeset has removed a file, but the |
bos@559 | 490 other has modified the file, ask the user what to do: keep |
bos@559 | 491 the modified file, or remove it?</para> |
bos@559 | 492 </listitem> |
bos@584 | 493 <listitem><para id="x_31e">If both changesets have modified a file, |
bos@559 | 494 invoke an external merge program to choose the new |
bos@559 | 495 contents for the merged file. This may require input from |
bos@559 | 496 the user.</para> |
bos@559 | 497 </listitem> |
bos@584 | 498 <listitem><para id="x_31f">If one changeset has modified a file, and the |
bos@559 | 499 other has renamed or copied the file, make sure that the |
bos@559 | 500 changes follow the new name of the file.</para> |
bos@559 | 501 </listitem></itemizedlist> |
bos@584 | 502 <para id="x_320">There are more details&emdash;merging has plenty of corner |
bos@559 | 503 cases&emdash;but these are the most common choices that are |
bos@559 | 504 involved in a merge. As you can see, most cases are |
bos@559 | 505 completely automatic, and indeed most merges finish |
bos@559 | 506 automatically, without requiring your input to resolve any |
bos@559 | 507 conflicts.</para> |
bos@559 | 508 |
bos@584 | 509 <para id="x_321">When you're thinking about what happens when you commit |
bos@559 | 510 after a merge, once again the working directory is <quote>the |
bos@559 | 511 changeset I'm about to commit</quote>. After the <command |
bos@559 | 512 role="hg-cmd">hg merge</command> command completes, the |
bos@559 | 513 working directory has two parents; these will become the |
bos@559 | 514 parents of the new changeset.</para> |
bos@559 | 515 |
bos@701 | 516 <para id="x_322">Mercurial lets you perform multiple merges, but |
bos@701 | 517 you must commit the results of each individual merge as you |
bos@701 | 518 go. This is necessary because Mercurial only tracks two |
bos@701 | 519 parents for both revisions and the working directory. While |
bos@701 | 520 it would be technically feasible to merge multiple changesets |
bos@701 | 521 at once, Mercurial avoids this for simplicity. With multi-way |
bos@701 | 522 merges, the risks of user confusion, nasty conflict |
bos@701 | 523 resolution, and making a terrible mess of a merge would grow |
bos@701 | 524 intolerable.</para> |
bos@559 | 525 |
bos@559 | 526 </sect2> |
bos@620 | 527 |
bos@620 | 528 <sect2> |
bos@620 | 529 <title>Merging and renames</title> |
bos@620 | 530 |
bos@676 | 531 <para id="x_69a">A surprising number of revision control systems pay little |
bos@620 | 532 or no attention to a file's <emphasis>name</emphasis> over |
bos@620 | 533 time. For instance, it used to be common that if a file got |
bos@620 | 534 renamed on one side of a merge, the changes from the other |
bos@620 | 535 side would be silently dropped.</para> |
bos@620 | 536 |
bos@676 | 537 <para id="x_69b">Mercurial records metadata when you tell it to perform a |
bos@620 | 538 rename or copy. It uses this metadata during a merge to do the |
bos@620 | 539 right thing in the case of a merge. For instance, if I rename |
bos@620 | 540 a file, and you edit it without renaming it, when we merge our |
bos@620 | 541 work the file will be renamed and have your edits |
bos@620 | 542 applied.</para> |
bos@620 | 543 </sect2> |
bos@559 | 544 </sect1> |
bos@620 | 545 |
bos@559 | 546 <sect1> |
bos@559 | 547 <title>Other interesting design features</title> |
bos@559 | 548 |
bos@584 | 549 <para id="x_323">In the sections above, I've tried to highlight some of the |
bos@559 | 550 most important aspects of Mercurial's design, to illustrate that |
bos@559 | 551 it pays careful attention to reliability and performance. |
bos@559 | 552 However, the attention to detail doesn't stop there. There are |
bos@559 | 553 a number of other aspects of Mercurial's construction that I |
bos@559 | 554 personally find interesting. I'll detail a few of them here, |
bos@559 | 555 separate from the <quote>big ticket</quote> items above, so that |
bos@559 | 556 if you're interested, you can gain a better idea of the amount |
bos@559 | 557 of thinking that goes into a well-designed system.</para> |
bos@559 | 558 |
bos@559 | 559 <sect2> |
bos@559 | 560 <title>Clever compression</title> |
bos@559 | 561 |
bos@584 | 562 <para id="x_324">When appropriate, Mercurial will store both snapshots and |
bos@559 | 563 deltas in compressed form. It does this by always |
bos@559 | 564 <emphasis>trying to</emphasis> compress a snapshot or delta, |
bos@559 | 565 but only storing the compressed version if it's smaller than |
bos@559 | 566 the uncompressed version.</para> |
bos@559 | 567 |
bos@584 | 568 <para id="x_325">This means that Mercurial does <quote>the right |
bos@559 | 569 thing</quote> when storing a file whose native form is |
bos@559 | 570 compressed, such as a <literal>zip</literal> archive or a JPEG |
bos@559 | 571 image. When these types of files are compressed a second |
bos@559 | 572 time, the resulting file is usually bigger than the |
bos@559 | 573 once-compressed form, and so Mercurial will store the plain |
bos@559 | 574 <literal>zip</literal> or JPEG.</para> |
bos@559 | 575 |
bos@584 | 576 <para id="x_326">Deltas between revisions of a compressed file are usually |
bos@559 | 577 larger than snapshots of the file, and Mercurial again does |
bos@559 | 578 <quote>the right thing</quote> in these cases. It finds that |
bos@559 | 579 such a delta exceeds the threshold at which it should store a |
bos@559 | 580 complete snapshot of the file, so it stores the snapshot, |
bos@559 | 581 again saving space compared to a naive delta-only |
bos@559 | 582 approach.</para> |
bos@559 | 583 |
bos@559 | 584 <sect3> |
bos@559 | 585 <title>Network recompression</title> |
bos@559 | 586 |
bos@584 | 587 <para id="x_327">When storing revisions on disk, Mercurial uses the |
bos@559 | 588 <quote>deflate</quote> compression algorithm (the same one |
bos@559 | 589 used by the popular <literal>zip</literal> archive format), |
bos@559 | 590 which balances good speed with a respectable compression |
bos@559 | 591 ratio. However, when transmitting revision data over a |
bos@559 | 592 network connection, Mercurial uncompresses the compressed |
bos@559 | 593 revision data.</para> |
bos@559 | 594 |
bos@584 | 595 <para id="x_328">If the connection is over HTTP, Mercurial recompresses |
bos@559 | 596 the entire stream of data using a compression algorithm that |
bos@559 | 597 gives a better compression ratio (the Burrows-Wheeler |
bos@559 | 598 algorithm from the widely used <literal>bzip2</literal> |
bos@559 | 599 compression package). This combination of algorithm and |
bos@559 | 600 compression of the entire stream (instead of a revision at a |
bos@559 | 601 time) substantially reduces the number of bytes to be |
bos@620 | 602 transferred, yielding better network performance over most |
bos@620 | 603 kinds of network.</para> |
bos@559 | 604 |
bos@701 | 605 <para id="x_329">If the connection is over |
bos@701 | 606 <command>ssh</command>, Mercurial |
bos@701 | 607 <emphasis>doesn't</emphasis> recompress the stream, because |
bos@701 | 608 <command>ssh</command> can already do this itself. You can |
bos@701 | 609 tell Mercurial to always use <command>ssh</command>'s |
bos@701 | 610 compression feature by editing the |
bos@701 | 611 <filename>.hgrc</filename> file in your home directory as |
bos@701 | 612 follows.</para> |
bos@701 | 613 |
bos@701 | 614 <programlisting>[ui] |
bos@701 | 615 ssh = ssh -C</programlisting> |
bos@559 | 616 |
bos@559 | 617 </sect3> |
bos@559 | 618 </sect2> |
bos@559 | 619 <sect2> |
bos@559 | 620 <title>Read/write ordering and atomicity</title> |
bos@559 | 621 |
bos@592 | 622 <para id="x_32a">Appending to files isn't the whole story when |
bos@592 | 623 it comes to guaranteeing that a reader won't see a partial |
bos@592 | 624 write. If you recall <xref linkend="fig:concepts:metadata"/>, |
bos@701 | 625 revisions in the changelog point to revisions in the manifest, |
bos@701 | 626 and revisions in the manifest point to revisions in filelogs. |
bos@592 | 627 This hierarchy is deliberate.</para> |
bos@559 | 628 |
bos@584 | 629 <para id="x_32b">A writer starts a transaction by writing filelog and |
bos@559 | 630 manifest data, and doesn't write any changelog data until |
bos@559 | 631 those are finished. A reader starts by reading changelog |
bos@559 | 632 data, then manifest data, followed by filelog data.</para> |
bos@559 | 633 |
bos@584 | 634 <para id="x_32c">Since the writer has always finished writing filelog and |
bos@559 | 635 manifest data before it writes to the changelog, a reader will |
bos@559 | 636 never read a pointer to a partially written manifest revision |
bos@559 | 637 from the changelog, and it will never read a pointer to a |
bos@559 | 638 partially written filelog revision from the manifest.</para> |
bos@559 | 639 |
bos@559 | 640 </sect2> |
bos@559 | 641 <sect2> |
bos@559 | 642 <title>Concurrent access</title> |
bos@559 | 643 |
bos@584 | 644 <para id="x_32d">The read/write ordering and atomicity guarantees mean that |
bos@559 | 645 Mercurial never needs to <emphasis>lock</emphasis> a |
bos@559 | 646 repository when it's reading data, even if the repository is |
bos@559 | 647 being written to while the read is occurring. This has a big |
bos@559 | 648 effect on scalability; you can have an arbitrary number of |
bos@559 | 649 Mercurial processes safely reading data from a repository |
bos@701 | 650 all at once, no matter whether it's being written to or |
bos@559 | 651 not.</para> |
bos@559 | 652 |
bos@584 | 653 <para id="x_32e">The lockless nature of reading means that if you're |
bos@559 | 654 sharing a repository on a multi-user system, you don't need to |
bos@559 | 655 grant other local users permission to |
bos@559 | 656 <emphasis>write</emphasis> to your repository in order for |
bos@559 | 657 them to be able to clone it or pull changes from it; they only |
bos@559 | 658 need <emphasis>read</emphasis> permission. (This is |
bos@559 | 659 <emphasis>not</emphasis> a common feature among revision |
bos@559 | 660 control systems, so don't take it for granted! Most require |
bos@559 | 661 readers to be able to lock a repository to access it safely, |
bos@559 | 662 and this requires write permission on at least one directory, |
bos@559 | 663 which of course makes for all kinds of nasty and annoying |
bos@559 | 664 security and administrative problems.)</para> |
bos@559 | 665 |
bos@584 | 666 <para id="x_32f">Mercurial uses locks to ensure that only one process can |
bos@559 | 667 write to a repository at a time (the locking mechanism is safe |
bos@559 | 668 even over filesystems that are notoriously hostile to locking, |
bos@559 | 669 such as NFS). If a repository is locked, a writer will wait |
bos@559 | 670 for a while to retry if the repository becomes unlocked, but |
bos@559 | 671 if the repository remains locked for too long, the process |
bos@559 | 672 attempting to write will time out after a while. This means |
bos@559 | 673 that your daily automated scripts won't get stuck forever and |
bos@559 | 674 pile up if a system crashes unnoticed, for example. (Yes, the |
bos@559 | 675 timeout is configurable, from zero to infinity.)</para> |
bos@559 | 676 |
bos@559 | 677 <sect3> |
bos@559 | 678 <title>Safe dirstate access</title> |
bos@559 | 679 |
bos@584 | 680 <para id="x_330">As with revision data, Mercurial doesn't take a lock to |
bos@559 | 681 read the dirstate file; it does acquire a lock to write it. |
bos@559 | 682 To avoid the possibility of reading a partially written copy |
bos@559 | 683 of the dirstate file, Mercurial writes to a file with a |
bos@559 | 684 unique name in the same directory as the dirstate file, then |
bos@559 | 685 renames the temporary file atomically to |
bos@559 | 686 <filename>dirstate</filename>. The file named |
bos@559 | 687 <filename>dirstate</filename> is thus guaranteed to be |
bos@559 | 688 complete, not partially written.</para> |
bos@559 | 689 |
bos@559 | 690 </sect3> |
bos@559 | 691 </sect2> |
bos@559 | 692 <sect2> |
bos@559 | 693 <title>Avoiding seeks</title> |
bos@559 | 694 |
bos@584 | 695 <para id="x_331">Critical to Mercurial's performance is the avoidance of |
bos@559 | 696 seeks of the disk head, since any seek is far more expensive |
bos@559 | 697 than even a comparatively large read operation.</para> |
bos@559 | 698 |
bos@584 | 699 <para id="x_332">This is why, for example, the dirstate is stored in a |
bos@559 | 700 single file. If there were a dirstate file per directory that |
bos@559 | 701 Mercurial tracked, the disk would seek once per directory. |
bos@559 | 702 Instead, Mercurial reads the entire single dirstate file in |
bos@559 | 703 one step.</para> |
bos@559 | 704 |
bos@584 | 705 <para id="x_333">Mercurial also uses a <quote>copy on write</quote> scheme |
bos@559 | 706 when cloning a repository on local storage. Instead of |
bos@559 | 707 copying every revlog file from the old repository into the new |
bos@559 | 708 repository, it makes a <quote>hard link</quote>, which is a |
bos@559 | 709 shorthand way to say <quote>these two names point to the same |
bos@559 | 710 file</quote>. When Mercurial is about to write to one of a |
bos@559 | 711 revlog's files, it checks to see if the number of names |
bos@559 | 712 pointing at the file is greater than one. If it is, more than |
bos@559 | 713 one repository is using the file, so Mercurial makes a new |
bos@559 | 714 copy of the file that is private to this repository.</para> |
bos@559 | 715 |
bos@584 | 716 <para id="x_334">A few revision control developers have pointed out that |
bos@559 | 717 this idea of making a complete private copy of a file is not |
bos@559 | 718 very efficient in its use of storage. While this is true, |
bos@559 | 719 storage is cheap, and this method gives the highest |
bos@559 | 720 performance while deferring most book-keeping to the operating |
bos@559 | 721 system. An alternative scheme would most likely reduce |
bos@701 | 722 performance and increase the complexity of the software, but |
bos@701 | 723 speed and simplicity are key to the <quote>feel</quote> of |
bos@559 | 724 day-to-day use.</para> |
bos@559 | 725 |
bos@559 | 726 </sect2> |
bos@559 | 727 <sect2> |
bos@559 | 728 <title>Other contents of the dirstate</title> |
bos@559 | 729 |
bos@584 | 730 <para id="x_335">Because Mercurial doesn't force you to tell it when you're |
bos@559 | 731 modifying a file, it uses the dirstate to store some extra |
bos@559 | 732 information so it can determine efficiently whether you have |
bos@559 | 733 modified a file. For each file in the working directory, it |
bos@559 | 734 stores the time that it last modified the file itself, and the |
bos@559 | 735 size of the file at that time.</para> |
bos@559 | 736 |
bos@584 | 737 <para id="x_336">When you explicitly <command role="hg-cmd">hg |
bos@559 | 738 add</command>, <command role="hg-cmd">hg remove</command>, |
bos@559 | 739 <command role="hg-cmd">hg rename</command> or <command |
bos@559 | 740 role="hg-cmd">hg copy</command> files, Mercurial updates the |
bos@559 | 741 dirstate so that it knows what to do with those files when you |
bos@559 | 742 commit.</para> |
bos@559 | 743 |
bos@701 | 744 <para id="x_337">The dirstate helps Mercurial to efficiently |
bos@701 | 745 check the status of files in a repository.</para> |
bos@701 | 746 |
bos@701 | 747 <itemizedlist> |
bos@701 | 748 <listitem> |
bos@702 | 749 <para id="x_726">When Mercurial checks the state of a file in the |
bos@701 | 750 working directory, it first checks a file's modification |
bos@701 | 751 time against the time in the dirstate that records when |
bos@701 | 752 Mercurial last wrote the file. If the last modified time |
bos@701 | 753 is the same as the time when Mercurial wrote the file, the |
bos@701 | 754 file must not have been modified, so Mercurial does not |
bos@701 | 755 need to check any further.</para> |
bos@701 | 756 </listitem> |
bos@701 | 757 <listitem> |
bos@702 | 758 <para id="x_727">If the file's size has changed, the file must have |
bos@701 | 759 been modified. If the modification time has changed, but |
bos@701 | 760 the size has not, only then does Mercurial need to |
bos@701 | 761 actually read the contents of the file to see if it has |
bos@701 | 762 changed.</para> |
bos@701 | 763 </listitem> |
bos@701 | 764 </itemizedlist> |
bos@701 | 765 |
bos@702 | 766 <para id="x_728">Storing the modification time and size dramatically |
bos@701 | 767 reduces the number of read operations that Mercurial needs to |
bos@701 | 768 perform when we run commands like <command>hg status</command>. |
bos@701 | 769 This results in large performance improvements.</para> |
bos@559 | 770 </sect2> |
bos@559 | 771 </sect1> |
bos@559 | 772 </chapter> |
bos@559 | 773 |
bos@559 | 774 <!-- |
bos@559 | 775 local variables: |
bos@559 | 776 sgml-parent-document: ("00book.xml" "book" "chapter") |
bos@559 | 777 end: |
bos@559 | 778 --> |