hgbook
annotate en/intro.tex @ 371:0801d625fabe
translated up to section 1.8
updated also the status of the project
updated also the status of the project
author | Javier Rojas <jerojasro@devnull.li> |
---|---|
date | Sun Oct 26 17:39:41 2008 -0500 (2008-10-26) |
parents | f3bef43b8ca1 635d7c0fcac3 |
children | d2e041bef460 |
rev | line source |
---|---|
bos@16 | 1 \chapter{Introduction} |
bos@16 | 2 \label{chap:intro} |
bos@16 | 3 |
bos@217 | 4 \section{About revision control} |
bos@155 | 5 |
bos@219 | 6 Revision control is the process of managing multiple versions of a |
bos@219 | 7 piece of information. In its simplest form, this is something that |
bos@219 | 8 many people do by hand: every time you modify a file, save it under a |
bos@219 | 9 new name that contains a number, each one higher than the number of |
bos@219 | 10 the preceding version. |
bos@217 | 11 |
bos@217 | 12 Manually managing multiple versions of even a single file is an |
bos@217 | 13 error-prone task, though, so software tools to help automate this |
bos@217 | 14 process have long been available. The earliest automated revision |
bos@217 | 15 control tools were intended to help a single user to manage revisions |
bos@219 | 16 of a single file. Over the past few decades, the scope of revision |
bos@219 | 17 control tools has expanded greatly; they now manage multiple files, |
bos@219 | 18 and help multiple people to work together. The best modern revision |
bos@219 | 19 control tools have no problem coping with thousands of people working |
bos@219 | 20 together on projects that consist of hundreds of thousands of files. |
bos@217 | 21 |
bos@217 | 22 \subsection{Why use revision control?} |
bos@217 | 23 |
bos@217 | 24 There are a number of reasons why you or your team might want to use |
bos@217 | 25 an automated revision control tool for a project. |
bos@217 | 26 \begin{itemize} |
bos@219 | 27 \item It will track the history and evolution of your project, so you |
bos@219 | 28 don't have to. For every change, you'll have a log of \emph{who} |
bos@219 | 29 made it; \emph{why} they made it; \emph{when} they made it; and |
bos@219 | 30 \emph{what} the change was. |
bos@219 | 31 \item When you're working with other people, revision control software |
bos@219 | 32 makes it easier for you to collaborate. For example, when people |
bos@219 | 33 more or less simultaneously make potentially incompatible changes, |
bos@219 | 34 the software will help you to identify and resolve those conflicts. |
bos@217 | 35 \item It can help you to recover from mistakes. If you make a change |
bos@217 | 36 that later turns out to be in error, you can revert to an earlier |
bos@217 | 37 version of one or more files. In fact, a \emph{really} good |
bos@217 | 38 revision control tool will even help you to efficiently figure out |
bos@217 | 39 exactly when a problem was introduced (see |
bos@217 | 40 section~\ref{sec:undo:bisect} for details). |
bos@218 | 41 \item It will help you to work simultaneously on, and manage the drift |
bos@218 | 42 between, multiple versions of your project. |
bos@217 | 43 \end{itemize} |
bos@218 | 44 Most of these reasons are equally valid---at least in theory---whether |
bos@218 | 45 you're working on a project by yourself, or with a hundred other |
bos@218 | 46 people. |
bos@218 | 47 |
bos@218 | 48 A key question about the practicality of revision control at these two |
bos@218 | 49 different scales (``lone hacker'' and ``huge team'') is how its |
bos@218 | 50 \emph{benefits} compare to its \emph{costs}. A revision control tool |
bos@218 | 51 that's difficult to understand or use is going to impose a high cost. |
bos@218 | 52 |
bos@219 | 53 A five-hundred-person project is likely to collapse under its own |
bos@219 | 54 weight almost immediately without a revision control tool and process. |
bos@219 | 55 In this case, the cost of using revision control might hardly seem |
bos@219 | 56 worth considering, since \emph{without} it, failure is almost |
bos@219 | 57 guaranteed. |
bos@218 | 58 |
bos@218 | 59 On the other hand, a one-person ``quick hack'' might seem like a poor |
bos@218 | 60 place to use a revision control tool, because surely the cost of using |
bos@218 | 61 one must be close to the overall cost of the project. Right? |
bos@218 | 62 |
bos@218 | 63 Mercurial uniquely supports \emph{both} of these scales of |
bos@218 | 64 development. You can learn the basics in just a few minutes, and due |
bos@218 | 65 to its low overhead, you can apply revision control to the smallest of |
bos@218 | 66 projects with ease. Its simplicity means you won't have a lot of |
bos@218 | 67 abstruse concepts or command sequences competing for mental space with |
bos@218 | 68 whatever you're \emph{really} trying to do. At the same time, |
bos@218 | 69 Mercurial's high performance and peer-to-peer nature let you scale |
bos@218 | 70 painlessly to handle large projects. |
bos@217 | 71 |
bos@219 | 72 No revision control tool can rescue a poorly run project, but a good |
bos@219 | 73 choice of tools can make a huge difference to the fluidity with which |
bos@219 | 74 you can work on a project. |
bos@219 | 75 |
bos@217 | 76 \subsection{The many names of revision control} |
bos@217 | 77 |
bos@217 | 78 Revision control is a diverse field, so much so that it doesn't |
bos@217 | 79 actually have a single name or acronym. Here are a few of the more |
bos@217 | 80 common names and acronyms you'll encounter: |
bos@217 | 81 \begin{itemize} |
bos@217 | 82 \item Revision control (RCS) |
bos@219 | 83 \item Software configuration management (SCM), or configuration management |
bos@218 | 84 \item Source code management |
bos@219 | 85 \item Source code control, or source control |
bos@217 | 86 \item Version control (VCS) |
bos@217 | 87 \end{itemize} |
bos@217 | 88 Some people claim that these terms actually have different meanings, |
bos@217 | 89 but in practice they overlap so much that there's no agreed or even |
bos@217 | 90 useful way to tease them apart. |
bos@155 | 91 |
bos@219 | 92 \section{A short history of revision control} |
bos@155 | 93 |
bos@218 | 94 The best known of the old-time revision control tools is SCCS (Source |
bos@218 | 95 Code Control System), which Marc Rochkind wrote at Bell Labs, in the |
bos@218 | 96 early 1970s. SCCS operated on individual files, and required every |
bos@218 | 97 person working on a project to have access to a shared workspace on a |
bos@218 | 98 single system. Only one person could modify a file at any time; |
bos@218 | 99 arbitration for access to files was via locks. It was common for |
bos@218 | 100 people to lock files, and later forget to unlock them, preventing |
bos@218 | 101 anyone else from modifying those files without the help of an |
bos@218 | 102 administrator. |
bos@218 | 103 |
bos@218 | 104 Walter Tichy developed a free alternative to SCCS in the early 1980s; |
bos@218 | 105 he called his program RCS (Revison Control System). Like SCCS, RCS |
bos@218 | 106 required developers to work in a single shared workspace, and to lock |
bos@218 | 107 files to prevent multiple people from modifying them simultaneously. |
bos@218 | 108 |
bos@218 | 109 Later in the 1980s, Dick Grune used RCS as a building block for a set |
bos@218 | 110 of shell scripts he initially called cmt, but then renamed to CVS |
bos@218 | 111 (Concurrent Versions System). The big innovation of CVS was that it |
bos@218 | 112 let developers work simultaneously and somewhat independently in their |
bos@218 | 113 own personal workspaces. The personal workspaces prevented developers |
bos@218 | 114 from stepping on each other's toes all the time, as was common with |
bos@218 | 115 SCCS and RCS. Each developer had a copy of every project file, and |
bos@218 | 116 could modify their copies independently. They had to merge their |
bos@218 | 117 edits prior to committing changes to the central repository. |
bos@218 | 118 |
bos@218 | 119 Brian Berliner took Grune's original scripts and rewrote them in~C, |
bos@218 | 120 releasing in 1989 the code that has since developed into the modern |
bos@218 | 121 version of CVS. CVS subsequently acquired the ability to operate over |
bos@218 | 122 a network connection, giving it a client/server architecture. CVS's |
bos@218 | 123 architecture is centralised; only the server has a copy of the history |
bos@218 | 124 of the project. Client workspaces just contain copies of recent |
bos@218 | 125 versions of the project's files, and a little metadata to tell them |
bos@218 | 126 where the server is. CVS has been enormously successful; it is |
bos@218 | 127 probably the world's most widely used revision control system. |
bos@218 | 128 |
bos@218 | 129 In the early 1990s, Sun Microsystems developed an early distributed |
bos@218 | 130 revision control system, called TeamWare. A TeamWare workspace |
bos@218 | 131 contains a complete copy of the project's history. TeamWare has no |
bos@218 | 132 notion of a central repository. (CVS relied upon RCS for its history |
bos@218 | 133 storage; TeamWare used SCCS.) |
bos@218 | 134 |
bos@218 | 135 As the 1990s progressed, awareness grew of a number of problems with |
bos@218 | 136 CVS. It records simultaneous changes to multiple files individually, |
bos@218 | 137 instead of grouping them together as a single logically atomic |
bos@218 | 138 operation. It does not manage its file hierarchy well; it is easy to |
bos@218 | 139 make a mess of a repository by renaming files and directories. Worse, |
bos@218 | 140 its source code is difficult to read and maintain, which made the |
bos@218 | 141 ``pain level'' of fixing these architectural problems prohibitive. |
bos@218 | 142 |
bos@218 | 143 In 2001, Jim Blandy and Karl Fogel, two developers who had worked on |
bos@218 | 144 CVS, started a project to replace it with a tool that would have a |
bos@218 | 145 better architecture and cleaner code. The result, Subversion, does |
bos@218 | 146 not stray from CVS's centralised client/server model, but it adds |
bos@218 | 147 multi-file atomic commits, better namespace management, and a number |
bos@218 | 148 of other features that make it a generally better tool than CVS. |
bos@218 | 149 Since its initial release, it has rapidly grown in popularity. |
bos@218 | 150 |
bos@218 | 151 More or less simultaneously, Graydon Hoare began working on an |
bos@218 | 152 ambitious distributed revision control system that he named Monotone. |
bos@218 | 153 While Monotone addresses many of CVS's design flaws and has a |
bos@218 | 154 peer-to-peer architecture, it goes beyond earlier (and subsequent) |
bos@218 | 155 revision control tools in a number of innovative ways. It uses |
bos@218 | 156 cryptographic hashes as identifiers, and has an integral notion of |
bos@218 | 157 ``trust'' for code from different sources. |
bos@218 | 158 |
bos@218 | 159 Mercurial began life in 2005. While a few aspects of its design are |
bos@218 | 160 influenced by Monotone, Mercurial focuses on ease of use, high |
bos@218 | 161 performance, and scalability to very large projects. |
bos@155 | 162 |
bos@219 | 163 \section{Trends in revision control} |
bos@219 | 164 |
bos@219 | 165 There has been an unmistakable trend in the development and use of |
bos@219 | 166 revision control tools over the past four decades, as people have |
bos@219 | 167 become familiar with the capabilities of their tools and constrained |
bos@219 | 168 by their limitations. |
bos@219 | 169 |
bos@219 | 170 The first generation began by managing single files on individual |
bos@219 | 171 computers. Although these tools represented a huge advance over |
bos@219 | 172 ad-hoc manual revision control, their locking model and reliance on a |
bos@219 | 173 single computer limited them to small, tightly-knit teams. |
bos@219 | 174 |
bos@219 | 175 The second generation loosened these constraints by moving to |
bos@219 | 176 network-centered architectures, and managing entire projects at a |
bos@219 | 177 time. As projects grew larger, they ran into new problems. With |
bos@219 | 178 clients needing to talk to servers very frequently, server scaling |
bos@219 | 179 became an issue for large projects. An unreliable network connection |
bos@219 | 180 could prevent remote users from being able to talk to the server at |
bos@219 | 181 all. As open source projects started making read-only access |
bos@219 | 182 available anonymously to anyone, people without commit privileges |
bos@219 | 183 found that they could not use the tools to interact with a project in |
bos@219 | 184 a natural way, as they could not record their changes. |
bos@219 | 185 |
bos@219 | 186 The current generation of revision control tools is peer-to-peer in |
bos@219 | 187 nature. All of these systems have dropped the dependency on a single |
bos@219 | 188 central server, and allow people to distribute their revision control |
bos@219 | 189 data to where it's actually needed. Collaboration over the Internet |
bos@219 | 190 has moved from constrained by technology to a matter of choice and |
bos@219 | 191 consensus. Modern tools can operate offline indefinitely and |
bos@219 | 192 autonomously, with a network connection only needed when syncing |
bos@219 | 193 changes with another repository. |
bos@219 | 194 |
bos@219 | 195 \section{A few of the advantages of distributed revision control} |
bos@219 | 196 |
bos@219 | 197 Even though distributed revision control tools have for several years |
bos@219 | 198 been as robust and usable as their previous-generation counterparts, |
bos@219 | 199 people using older tools have not yet necessarily woken up to their |
bos@219 | 200 advantages. There are a number of ways in which distributed tools |
bos@219 | 201 shine relative to centralised ones. |
bos@219 | 202 |
bos@219 | 203 For an individual developer, distributed tools are almost always much |
bos@219 | 204 faster than centralised tools. This is for a simple reason: a |
bos@219 | 205 centralised tool needs to talk over the network for many common |
bos@219 | 206 operations, because most metadata is stored in a single copy on the |
bos@219 | 207 central server. A distributed tool stores all of its metadata |
bos@219 | 208 locally. All else being equal, talking over the network adds overhead |
bos@219 | 209 to a centralised tool. Don't underestimate the value of a snappy, |
bos@219 | 210 responsive tool: you're going to spend a lot of time interacting with |
bos@219 | 211 your revision control software. |
bos@219 | 212 |
bos@219 | 213 Distributed tools are indifferent to the vagaries of your server |
bos@219 | 214 infrastructure, again because they replicate metadata to so many |
bos@219 | 215 locations. If you use a centralised system and your server catches |
bos@219 | 216 fire, you'd better hope that your backup media are reliable, and that |
bos@219 | 217 your last backup was recent and actually worked. With a distributed |
bos@219 | 218 tool, you have many backups available on every contributor's computer. |
bos@219 | 219 |
bos@219 | 220 The reliability of your network will affect distributed tools far less |
bos@219 | 221 than it will centralised tools. You can't even use a centralised tool |
bos@219 | 222 without a network connection, except for a few highly constrained |
bos@219 | 223 commands. With a distributed tool, if your network connection goes |
bos@219 | 224 down while you're working, you may not even notice. The only thing |
bos@219 | 225 you won't be able to do is talk to repositories on other computers, |
bos@219 | 226 something that is relatively rare compared with local operations. If |
bos@219 | 227 you have a far-flung team of collaborators, this may be significant. |
bos@219 | 228 |
bos@220 | 229 \subsection{Advantages for open source projects} |
bos@220 | 230 |
bos@219 | 231 If you take a shine to an open source project and decide that you |
bos@219 | 232 would like to start hacking on it, and that project uses a distributed |
bos@219 | 233 revision control tool, you are at once a peer with the people who |
bos@219 | 234 consider themselves the ``core'' of that project. If they publish |
bos@219 | 235 their repositories, you can immediately copy their project history, |
bos@219 | 236 start making changes, and record your work, using the same tools in |
bos@219 | 237 the same ways as insiders. By contrast, with a centralised tool, you |
bos@219 | 238 must use the software in a ``read only'' mode unless someone grants |
bos@219 | 239 you permission to commit changes to their central server. Until then, |
bos@219 | 240 you won't be able to record changes, and your local modifications will |
bos@219 | 241 be at risk of corruption any time you try to update your client's view |
bos@219 | 242 of the repository. |
bos@155 | 243 |
bos@220 | 244 \subsubsection{The forking non-problem} |
bos@220 | 245 |
bos@220 | 246 It has been suggested that distributed revision control tools pose |
bos@220 | 247 some sort of risk to open source projects because they make it easy to |
bos@220 | 248 ``fork'' the development of a project. A fork happens when there are |
bos@220 | 249 differences in opinion or attitude between groups of developers that |
bos@220 | 250 cause them to decide that they can't work together any longer. Each |
bos@220 | 251 side takes a more or less complete copy of the project's source code, |
bos@220 | 252 and goes off in its own direction. |
bos@220 | 253 |
bos@220 | 254 Sometimes the camps in a fork decide to reconcile their differences. |
bos@220 | 255 With a centralised revision control system, the \emph{technical} |
bos@220 | 256 process of reconciliation is painful, and has to be performed largely |
bos@220 | 257 by hand. You have to decide whose revision history is going to |
bos@220 | 258 ``win'', and graft the other team's changes into the tree somehow. |
bos@220 | 259 This usually loses some or all of one side's revision history. |
bos@220 | 260 |
bos@220 | 261 What distributed tools do with respect to forking is they make forking |
bos@220 | 262 the \emph{only} way to develop a project. Every single change that |
bos@220 | 263 you make is potentially a fork point. The great strength of this |
bos@220 | 264 approach is that a distributed revision control tool has to be really |
bos@220 | 265 good at \emph{merging} forks, because forks are absolutely |
bos@220 | 266 fundamental: they happen all the time. |
bos@220 | 267 |
bos@220 | 268 If every piece of work that everybody does, all the time, is framed in |
bos@220 | 269 terms of forking and merging, then what the open source world refers |
bos@220 | 270 to as a ``fork'' becomes \emph{purely} a social issue. If anything, |
bos@220 | 271 distributed tools \emph{lower} the likelihood of a fork: |
bos@220 | 272 \begin{itemize} |
bos@220 | 273 \item They eliminate the social distinction that centralised tools |
bos@220 | 274 impose: that between insiders (people with commit access) and |
bos@220 | 275 outsiders (people without). |
bos@220 | 276 \item They make it easier to reconcile after a social fork, because |
bos@220 | 277 all that's involved from the perspective of the revision control |
bos@220 | 278 software is just another merge. |
bos@220 | 279 \end{itemize} |
bos@220 | 280 |
bos@220 | 281 Some people resist distributed tools because they want to retain tight |
bos@220 | 282 control over their projects, and they believe that centralised tools |
bos@220 | 283 give them this control. However, if you're of this belief, and you |
bos@220 | 284 publish your CVS or Subversion repositories publically, there are |
bos@220 | 285 plenty of tools available that can pull out your entire project's |
bos@220 | 286 history (albeit slowly) and recreate it somewhere that you don't |
bos@220 | 287 control. So while your control in this case is illusory, you are |
tktan@263 | 288 forgoing the ability to fluidly collaborate with whatever people feel |
bos@220 | 289 compelled to mirror and fork your history. |
bos@220 | 290 |
bos@220 | 291 \subsection{Advantages for commercial projects} |
bos@220 | 292 |
bos@220 | 293 Many commercial projects are undertaken by teams that are scattered |
bos@220 | 294 across the globe. Contributors who are far from a central server will |
bos@220 | 295 see slower command execution and perhaps less reliability. Commercial |
bos@220 | 296 revision control systems attempt to ameliorate these problems with |
bos@220 | 297 remote-site replication add-ons that are typically expensive to buy |
bos@220 | 298 and cantankerous to administer. A distributed system doesn't suffer |
bos@220 | 299 from these problems in the first place. Better yet, you can easily |
bos@220 | 300 set up multiple authoritative servers, say one per site, so that |
bos@220 | 301 there's no redundant communication between repositories over expensive |
bos@220 | 302 long-haul network links. |
bos@220 | 303 |
bos@220 | 304 Centralised revision control systems tend to have relatively low |
bos@220 | 305 scalability. It's not unusual for an expensive centralised system to |
bos@220 | 306 fall over under the combined load of just a few dozen concurrent |
bos@220 | 307 users. Once again, the typical response tends to be an expensive and |
bos@220 | 308 clunky replication facility. Since the load on a central server---if |
bos@280 | 309 you have one at all---is many times lower with a distributed |
bos@220 | 310 tool (because all of the data is replicated everywhere), a single |
bos@220 | 311 cheap server can handle the needs of a much larger team, and |
bos@220 | 312 replication to balance load becomes a simple matter of scripting. |
bos@220 | 313 |
bos@220 | 314 If you have an employee in the field, troubleshooting a problem at a |
bos@220 | 315 customer's site, they'll benefit from distributed revision control. |
bos@220 | 316 The tool will let them generate custom builds, try different fixes in |
bos@220 | 317 isolation from each other, and search efficiently through history for |
bos@220 | 318 the sources of bugs and regressions in the customer's environment, all |
bos@220 | 319 without needing to connect to your company's network. |
bos@219 | 320 |
bos@155 | 321 \section{Why choose Mercurial?} |
bos@155 | 322 |
bos@221 | 323 Mercurial has a unique set of properties that make it a particularly |
bos@221 | 324 good choice as a revision control system. |
bos@221 | 325 \begin{itemize} |
bos@221 | 326 \item It is easy to learn and use. |
bos@221 | 327 \item It is lightweight. |
bos@221 | 328 \item It scales excellently. |
bos@221 | 329 \item It is easy to customise. |
bos@221 | 330 \end{itemize} |
bos@221 | 331 |
bos@221 | 332 If you are at all familiar with revision control systems, you should |
bos@221 | 333 be able to get up and running with Mercurial in less than five |
bos@221 | 334 minutes. Even if not, it will take no more than a few minutes |
bos@221 | 335 longer. Mercurial's command and feature sets are generally uniform |
bos@221 | 336 and consistent, so you can keep track of a few general rules instead |
bos@221 | 337 of a host of exceptions. |
bos@221 | 338 |
bos@221 | 339 On a small project, you can start working with Mercurial in moments. |
bos@221 | 340 Creating new changes and branches; transferring changes around |
bos@221 | 341 (whether locally or over a network); and history and status operations |
bos@221 | 342 are all fast. Mercurial attempts to stay nimble and largely out of |
bos@221 | 343 your way by combining low cognitive overhead with blazingly fast |
bos@221 | 344 operations. |
bos@221 | 345 |
bos@221 | 346 The usefulness of Mercurial is not limited to small projects: it is |
bos@221 | 347 used by projects with hundreds to thousands of contributors, each |
bos@221 | 348 containing tens of thousands of files and hundreds of megabytes of |
bos@221 | 349 source code. |
bos@221 | 350 |
bos@221 | 351 If the core functionality of Mercurial is not enough for you, it's |
bos@221 | 352 easy to build on. Mercurial is well suited to scripting tasks, and |
bos@221 | 353 its clean internals and implementation in Python make it easy to add |
bos@221 | 354 features in the form of extensions. There are a number of popular and |
bos@221 | 355 useful extensions already available, ranging from helping to identify |
bos@221 | 356 bugs to improving performance. |
bos@221 | 357 |
bos@221 | 358 \section{Mercurial compared with other tools} |
bos@221 | 359 |
bos@221 | 360 Before you read on, please understand that this section necessarily |
bos@221 | 361 reflects my own experiences, interests, and (dare I say it) biases. I |
bos@221 | 362 have used every one of the revision control tools listed below, in |
bos@221 | 363 most cases for several years at a time. |
bos@221 | 364 |
bos@280 | 365 |
bos@221 | 366 \subsection{Subversion} |
bos@221 | 367 |
bos@221 | 368 Subversion is a popular revision control tool, developed to replace |
bos@221 | 369 CVS. It has a centralised client/server architecture. |
bos@221 | 370 |
bos@221 | 371 Subversion and Mercurial have similarly named commands for performing |
bos@280 | 372 the same operations, so if you're familiar with one, it is easy to |
bos@280 | 373 learn to use the other. Both tools are portable to all popular |
bos@221 | 374 operating systems. |
bos@221 | 375 |
bos@315 | 376 Prior to version 1.5, Subversion had no useful support for merges. |
bos@315 | 377 At the time of writing, its merge tracking capability is new, and known to be |
bos@315 | 378 \href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated |
bos@315 | 379 and buggy}. |
bos@256 | 380 |
bos@221 | 381 Mercurial has a substantial performance advantage over Subversion on |
bos@221 | 382 every revision control operation I have benchmarked. I have measured |
bos@221 | 383 its advantage as ranging from a factor of two to a factor of six when |
bos@221 | 384 compared with Subversion~1.4.3's \emph{ra\_local} file store, which is |
simon@313 | 385 the fastest access method available. In more realistic deployments |
bos@221 | 386 involving a network-based store, Subversion will be at a substantially |
bos@256 | 387 larger disadvantage. Because many Subversion commands must talk to |
bos@256 | 388 the server and Subversion does not have useful replication facilities, |
bos@280 | 389 server capacity and network bandwidth become bottlenecks for modestly |
bos@280 | 390 large projects. |
bos@280 | 391 |
bos@280 | 392 Additionally, Subversion incurs substantial storage overhead to avoid |
bos@280 | 393 network transactions for a few common operations, such as finding |
bos@280 | 394 modified files (\texttt{status}) and displaying modifications against |
bos@280 | 395 the current revision (\texttt{diff}). As a result, a Subversion |
bos@280 | 396 working copy is often the same size as, or larger than, a Mercurial |
bos@280 | 397 repository and working directory, even though the Mercurial repository |
bos@280 | 398 contains a complete history of the project. |
bos@280 | 399 |
bos@280 | 400 Subversion is widely supported by third party tools. Mercurial |
bos@280 | 401 currently lags considerably in this area. This gap is closing, |
bos@280 | 402 however, and indeed some of Mercurial's GUI tools now outshine their |
bos@280 | 403 Subversion equivalents. Like Mercurial, Subversion has an excellent |
bos@280 | 404 user manual. |
bos@280 | 405 |
bos@280 | 406 Because Subversion doesn't store revision history on the client, it is |
bos@280 | 407 well suited to managing projects that deal with lots of large, opaque |
bos@280 | 408 binary files. If you check in fifty revisions to an incompressible |
bos@280 | 409 10MB file, Subversion's client-side space usage stays constant The |
bos@280 | 410 space used by any distributed SCM will grow rapidly in proportion to |
bos@280 | 411 the number of revisions, because the differences between each revision |
bos@280 | 412 are large. |
bos@280 | 413 |
bos@280 | 414 In addition, it's often difficult or, more usually, impossible to |
bos@280 | 415 merge different versions of a binary file. Subversion's ability to |
bos@280 | 416 let a user lock a file, so that they temporarily have the exclusive |
bos@280 | 417 right to commit changes to it, can be a significant advantage to a |
bos@280 | 418 project where binary files are widely used. |
bos@280 | 419 |
bos@280 | 420 Mercurial can import revision history from a Subversion repository. |
bos@280 | 421 It can also export revision history to a Subversion repository. This |
bos@280 | 422 makes it easy to ``test the waters'' and use Mercurial and Subversion |
bos@280 | 423 in parallel before deciding to switch. History conversion is |
bos@280 | 424 incremental, so you can perform an initial conversion, then small |
bos@280 | 425 additional conversions afterwards to bring in new changes. |
bos@280 | 426 |
bos@221 | 427 |
bos@221 | 428 \subsection{Git} |
bos@221 | 429 |
bos@221 | 430 Git is a distributed revision control tool that was developed for |
bos@221 | 431 managing the Linux kernel source tree. Like Mercurial, its early |
bos@221 | 432 design was somewhat influenced by Monotone. |
bos@221 | 433 |
bos@280 | 434 Git has a very large command set, with version~1.5.0 providing~139 |
bos@280 | 435 individual commands. It has something of a reputation for being |
bos@280 | 436 difficult to learn. Compared to Git, Mercurial has a strong focus on |
bos@280 | 437 simplicity. |
bos@280 | 438 |
bos@280 | 439 In terms of performance, Git is extremely fast. In several cases, it |
bos@280 | 440 is faster than Mercurial, at least on Linux, while Mercurial performs |
bos@280 | 441 better on other operations. However, on Windows, the performance and |
bos@280 | 442 general level of support that Git provides is, at the time of writing, |
bos@280 | 443 far behind that of Mercurial. |
bos@221 | 444 |
bos@221 | 445 While a Mercurial repository needs no maintenance, a Git repository |
bos@221 | 446 requires frequent manual ``repacks'' of its metadata. Without these, |
bos@221 | 447 performance degrades, while space usage grows rapidly. A server that |
bos@221 | 448 contains many Git repositories that are not rigorously and frequently |
bos@221 | 449 repacked will become heavily disk-bound during backups, and there have |
bos@221 | 450 been instances of daily backups taking far longer than~24 hours as a |
bos@221 | 451 result. A freshly packed Git repository is slightly smaller than a |
bos@221 | 452 Mercurial repository, but an unpacked repository is several orders of |
bos@221 | 453 magnitude larger. |
bos@221 | 454 |
bos@221 | 455 The core of Git is written in C. Many Git commands are implemented as |
bos@221 | 456 shell or Perl scripts, and the quality of these scripts varies widely. |
bos@280 | 457 I have encountered several instances where scripts charged along |
bos@221 | 458 blindly in the presence of errors that should have been fatal. |
bos@221 | 459 |
bos@280 | 460 Mercurial can import revision history from a Git repository. |
bos@280 | 461 |
bos@280 | 462 |
bos@221 | 463 \subsection{CVS} |
bos@221 | 464 |
bos@221 | 465 CVS is probably the most widely used revision control tool in the |
bos@280 | 466 world. Due to its age and internal untidiness, it has been only |
bos@280 | 467 lightly maintained for many years. |
bos@221 | 468 |
bos@221 | 469 It has a centralised client/server architecture. It does not group |
bos@221 | 470 related file changes into atomic commits, making it easy for people to |
bos@256 | 471 ``break the build'': one person can successfully commit part of a |
bos@256 | 472 change and then be blocked by the need for a merge, causing other |
bos@256 | 473 people to see only a portion of the work they intended to do. This |
bos@256 | 474 also affects how you work with project history. If you want to see |
bos@256 | 475 all of the modifications someone made as part of a task, you will need |
bos@256 | 476 to manually inspect the descriptions and timestamps of the changes |
bos@256 | 477 made to each file involved (if you even know what those files were). |
bos@256 | 478 |
bos@256 | 479 CVS has a muddled notion of tags and branches that I will not attempt |
bos@256 | 480 to even describe. It does not support renaming of files or |
bos@256 | 481 directories well, making it easy to corrupt a repository. It has |
bos@256 | 482 almost no internal consistency checking capabilities, so it is usually |
bos@256 | 483 not even possible to tell whether or how a repository is corrupt. I |
bos@256 | 484 would not recommend CVS for any project, existing or new. |
bos@221 | 485 |
bos@221 | 486 Mercurial can import CVS revision history. However, there are a few |
bos@221 | 487 caveats that apply; these are true of every other revision control |
bos@221 | 488 tool's CVS importer, too. Due to CVS's lack of atomic changes and |
bos@221 | 489 unversioned filesystem hierarchy, it is not possible to reconstruct |
bos@221 | 490 CVS history completely accurately; some guesswork is involved, and |
bos@221 | 491 renames will usually not show up. Because a lot of advanced CVS |
bos@221 | 492 administration has to be done by hand and is hence error-prone, it's |
bos@221 | 493 common for CVS importers to run into multiple problems with corrupted |
bos@221 | 494 repositories (completely bogus revision timestamps and files that have |
bos@221 | 495 remained locked for over a decade are just two of the less interesting |
bos@221 | 496 problems I can recall from personal experience). |
bos@221 | 497 |
bos@280 | 498 Mercurial can import revision history from a CVS repository. |
bos@280 | 499 |
bos@280 | 500 |
bos@221 | 501 \subsection{Commercial tools} |
bos@221 | 502 |
bos@221 | 503 Perforce has a centralised client/server architecture, with no |
bos@221 | 504 client-side caching of any data. Unlike modern revision control |
bos@221 | 505 tools, Perforce requires that a user run a command to inform the |
bos@221 | 506 server about every file they intend to edit. |
bos@221 | 507 |
bos@221 | 508 The performance of Perforce is quite good for small teams, but it |
bos@221 | 509 falls off rapidly as the number of users grows beyond a few dozen. |
bos@221 | 510 Modestly large Perforce installations require the deployment of |
bos@221 | 511 proxies to cope with the load their users generate. |
bos@16 | 512 |
bos@280 | 513 |
bos@280 | 514 \subsection{Choosing a revision control tool} |
bos@280 | 515 |
bos@280 | 516 With the exception of CVS, all of the tools listed above have unique |
bos@280 | 517 strengths that suit them to particular styles of work. There is no |
bos@280 | 518 single revision control tool that is best in all situations. |
bos@280 | 519 |
bos@280 | 520 As an example, Subversion is a good choice for working with frequently |
bos@280 | 521 edited binary files, due to its centralised nature and support for |
bos@318 | 522 file locking. |
bos@280 | 523 |
bos@280 | 524 I personally find Mercurial's properties of simplicity, performance, |
bos@280 | 525 and good merge support to be a compelling combination that has served |
bos@280 | 526 me well for several years. |
bos@280 | 527 |
bos@280 | 528 |
bos@280 | 529 \section{Switching from another tool to Mercurial} |
bos@280 | 530 |
bos@280 | 531 Mercurial is bundled with an extension named \hgext{convert}, which |
bos@280 | 532 can incrementally import revision history from several other revision |
bos@280 | 533 control tools. By ``incremental'', I mean that you can convert all of |
bos@280 | 534 a project's history to date in one go, then rerun the conversion later |
bos@280 | 535 to obtain new changes that happened after the initial conversion. |
bos@280 | 536 |
bos@280 | 537 The revision control tools supported by \hgext{convert} are as |
bos@280 | 538 follows: |
bos@280 | 539 \begin{itemize} |
bos@280 | 540 \item Subversion |
bos@280 | 541 \item CVS |
bos@280 | 542 \item Git |
bos@280 | 543 \item Darcs |
bos@280 | 544 \end{itemize} |
bos@280 | 545 |
bos@280 | 546 In addition, \hgext{convert} can export changes from Mercurial to |
bos@280 | 547 Subversion. This makes it possible to try Subversion and Mercurial in |
bos@280 | 548 parallel before committing to a switchover, without risking the loss |
bos@280 | 549 of any work. |
bos@280 | 550 |
bos@280 | 551 The \hgxcmd{conver}{convert} command is easy to use. Simply point it |
bos@280 | 552 at the path or URL of the source repository, optionally give it the |
bos@280 | 553 name of the destination repository, and it will start working. After |
bos@280 | 554 the initial conversion, just run the same command again to import new |
bos@280 | 555 changes. |
bos@280 | 556 |
bos@280 | 557 |
bos@16 | 558 %%% Local Variables: |
bos@16 | 559 %%% mode: latex |
bos@16 | 560 %%% TeX-master: "00book" |
bos@16 | 561 %%% End: |