hgbook: 0d08ac613527 fr/intro.tex

hgbook

view fr/intro.tex @ 923:0d08ac613527

Beginning translation work on 'intro.text'

author	Romain PELISSE <romain.pelisse@atosorigin.com>
date	Fri Feb 06 15:31:26 2009 +0100 (2009-02-06)
parents	547d3aa25ef0
children	6a2ccedd1e4c

line source

1 \chapter{Introduction}

2 \label{chap:intro}

4 \section{A propros de la gestion source}

6 La gestion de source est un processus permettant de gérer différentes

7 version de la même information. Dans sa forme la plus simple, c'est

8 quelquechose que tout le monde fait manuellement : quand vous modifiez

9 un fichier, vous le sauvegarder sous un nouveau nom contenant un numéro,

10 à chaque fois plus grand la précédente version.

12 Ce genre de gestion de version manuel est cependant sujette facilement

13 à des erreurs, ainsi, depuis longtemps, des logiciels existent pour

14 adresser cette problématique. Les premiers outils de gestion de source

15 étaient destinés à aider un seul utilisateur, à automatiser la gestion

16 des versions d'un seulf fichier. Dans les dernières décades, cette cilble

17 a largement était agrandie, ils gèrent désormais de multiple fichiers, et

18 aident un grand nombre de personnes à travailler ensemble. Le outils les

19 plus modernes n'ont aucune difficultés à gérer plusieurs milliers de

20 personnes travaillant ensemble sur des projets regroupant plusieurs

21 centaines de milliers de fichiers.

23 \subsection{Pourquoi utiliser un gestionnaire de source ?}

25 Il y a de nombreuse raisons pour que vous ou votre équipe souhaitiez

26 utiliser un outil automatisant la gestion de version pour votre projet.

27 \begin{itemize}

28 \item L'outil se chargera de suivre l'évolution de votre projet, sans

29 que vous ayez à le faire. Pour chaque modification, vous aurez à votre

30 disposition un journal indiquant \emph{qui} a faient quoi, \emph{pourquoi}

31 ils l'ont fait, \emph{quand} ils l'ont fait, et \emph{ce} qu'ils ont

32 modifiés.

33 \item Quand vous travaillez avec d'autres personnes, les logiciels de

34 gestion de source facilite le travail collaboratif. Par exemple, quand

35 plusieurs personnes font, plus ou moins simultannéement, des modifications

36 incompatibles, le logiciel vous aidera à identifier et résoudre les conflits.

37 \item It can help you to recover from mistakes. If you make a change

38 that later turns out to be in error, you can revert to an earlier

39 version of one or more files. In fact, a \emph{really} good

40 revision control tool will even help you to efficiently figure out

41 exactly when a problem was introduced (see

42 section~\ref{sec:undo:bisect} for details).

43 \item It will help you to work simultaneously on, and manage the drift

44 between, multiple versions of your project.

45 \end{itemize}

46 Most of these reasons are equally valid---at least in theory---whether

47 you're working on a project by yourself, or with a hundred other

48 people.

50 A key question about the practicality of revision control at these two

51 different scales (``lone hacker'' and ``huge team'') is how its

52 \emph{benefits} compare to its \emph{costs}. A revision control tool

53 that's difficult to understand or use is going to impose a high cost.

55 A five-hundred-person project is likely to collapse under its own

56 weight almost immediately without a revision control tool and process.

57 In this case, the cost of using revision control might hardly seem

58 worth considering, since \emph{without} it, failure is almost

59 guaranteed.

61 On the other hand, a one-person ``quick hack'' might seem like a poor

62 place to use a revision control tool, because surely the cost of using

63 one must be close to the overall cost of the project. Right?

65 Mercurial uniquely supports \emph{both} of these scales of

66 development. You can learn the basics in just a few minutes, and due

67 to its low overhead, you can apply revision control to the smallest of

68 projects with ease. Its simplicity means you won't have a lot of

69 abstruse concepts or command sequences competing for mental space with

70 whatever you're \emph{really} trying to do. At the same time,

71 Mercurial's high performance and peer-to-peer nature let you scale

72 painlessly to handle large projects.

74 No revision control tool can rescue a poorly run project, but a good

75 choice of tools can make a huge difference to the fluidity with which

76 you can work on a project.

78 \subsection{The many names of revision control}

80 Revision control is a diverse field, so much so that it doesn't

81 actually have a single name or acronym. Here are a few of the more

82 common names and acronyms you'll encounter:

83 \begin{itemize}

84 \item Revision control (RCS)

85 \item Software configuration management (SCM), or configuration management

86 \item Source code management

87 \item Source code control, or source control

88 \item Version control (VCS)

89 \end{itemize}

90 Some people claim that these terms actually have different meanings,

91 but in practice they overlap so much that there's no agreed or even

92 useful way to tease them apart.

94 \section{A short history of revision control}

96 The best known of the old-time revision control tools is SCCS (Source

97 Code Control System), which Marc Rochkind wrote at Bell Labs, in the

98 early 1970s. SCCS operated on individual files, and required every

99 person working on a project to have access to a shared workspace on a

100 single system. Only one person could modify a file at any time;

101 arbitration for access to files was via locks. It was common for

102 people to lock files, and later forget to unlock them, preventing

103 anyone else from modifying those files without the help of an

104 administrator.

105

106 Walter Tichy developed a free alternative to SCCS in the early 1980s;

107 he called his program RCS (Revison Control System). Like SCCS, RCS

108 required developers to work in a single shared workspace, and to lock

109 files to prevent multiple people from modifying them simultaneously.

110

111 Later in the 1980s, Dick Grune used RCS as a building block for a set

112 of shell scripts he initially called cmt, but then renamed to CVS

113 (Concurrent Versions System). The big innovation of CVS was that it

114 let developers work simultaneously and somewhat independently in their

115 own personal workspaces. The personal workspaces prevented developers

116 from stepping on each other's toes all the time, as was common with

117 SCCS and RCS. Each developer had a copy of every project file, and

118 could modify their copies independently. They had to merge their

119 edits prior to committing changes to the central repository.

120

121 Brian Berliner took Grune's original scripts and rewrote them in~C,

122 releasing in 1989 the code that has since developed into the modern

123 version of CVS. CVS subsequently acquired the ability to operate over

124 a network connection, giving it a client/server architecture. CVS's

125 architecture is centralised; only the server has a copy of the history

126 of the project. Client workspaces just contain copies of recent

127 versions of the project's files, and a little metadata to tell them

128 where the server is. CVS has been enormously successful; it is

129 probably the world's most widely used revision control system.

130

131 In the early 1990s, Sun Microsystems developed an early distributed

132 revision control system, called TeamWare. A TeamWare workspace

133 contains a complete copy of the project's history. TeamWare has no

134 notion of a central repository. (CVS relied upon RCS for its history

135 storage; TeamWare used SCCS.)

136

137 As the 1990s progressed, awareness grew of a number of problems with

138 CVS. It records simultaneous changes to multiple files individually,

139 instead of grouping them together as a single logically atomic

140 operation. It does not manage its file hierarchy well; it is easy to

141 make a mess of a repository by renaming files and directories. Worse,

142 its source code is difficult to read and maintain, which made the

143 ``pain level'' of fixing these architectural problems prohibitive.

144

145 In 2001, Jim Blandy and Karl Fogel, two developers who had worked on

146 CVS, started a project to replace it with a tool that would have a

147 better architecture and cleaner code. The result, Subversion, does

148 not stray from CVS's centralised client/server model, but it adds

149 multi-file atomic commits, better namespace management, and a number

150 of other features that make it a generally better tool than CVS.

151 Since its initial release, it has rapidly grown in popularity.

152

153 More or less simultaneously, Graydon Hoare began working on an

154 ambitious distributed revision control system that he named Monotone.

155 While Monotone addresses many of CVS's design flaws and has a

156 peer-to-peer architecture, it goes beyond earlier (and subsequent)

157 revision control tools in a number of innovative ways. It uses

158 cryptographic hashes as identifiers, and has an integral notion of

159 ``trust'' for code from different sources.

160

161 Mercurial began life in 2005. While a few aspects of its design are

162 influenced by Monotone, Mercurial focuses on ease of use, high

163 performance, and scalability to very large projects.

164

165 \section{Trends in revision control}

166

167 There has been an unmistakable trend in the development and use of

168 revision control tools over the past four decades, as people have

169 become familiar with the capabilities of their tools and constrained

170 by their limitations.

171

172 The first generation began by managing single files on individual

173 computers. Although these tools represented a huge advance over

174 ad-hoc manual revision control, their locking model and reliance on a

175 single computer limited them to small, tightly-knit teams.

176

177 The second generation loosened these constraints by moving to

178 network-centered architectures, and managing entire projects at a

179 time. As projects grew larger, they ran into new problems. With

180 clients needing to talk to servers very frequently, server scaling

181 became an issue for large projects. An unreliable network connection

182 could prevent remote users from being able to talk to the server at

183 all. As open source projects started making read-only access

184 available anonymously to anyone, people without commit privileges

185 found that they could not use the tools to interact with a project in

186 a natural way, as they could not record their changes.

187

188 The current generation of revision control tools is peer-to-peer in

189 nature. All of these systems have dropped the dependency on a single

190 central server, and allow people to distribute their revision control

191 data to where it's actually needed. Collaboration over the Internet

192 has moved from constrained by technology to a matter of choice and

193 consensus. Modern tools can operate offline indefinitely and

194 autonomously, with a network connection only needed when syncing

195 changes with another repository.

196

197 \section{A few of the advantages of distributed revision control}

198

199 Even though distributed revision control tools have for several years

200 been as robust and usable as their previous-generation counterparts,

201 people using older tools have not yet necessarily woken up to their

202 advantages. There are a number of ways in which distributed tools

203 shine relative to centralised ones.

204

205 For an individual developer, distributed tools are almost always much

206 faster than centralised tools. This is for a simple reason: a

207 centralised tool needs to talk over the network for many common

208 operations, because most metadata is stored in a single copy on the

209 central server. A distributed tool stores all of its metadata

210 locally. All else being equal, talking over the network adds overhead

211 to a centralised tool. Don't underestimate the value of a snappy,

212 responsive tool: you're going to spend a lot of time interacting with

213 your revision control software.

214

215 Distributed tools are indifferent to the vagaries of your server

216 infrastructure, again because they replicate metadata to so many

217 locations. If you use a centralised system and your server catches

218 fire, you'd better hope that your backup media are reliable, and that

219 your last backup was recent and actually worked. With a distributed

220 tool, you have many backups available on every contributor's computer.

221

222 The reliability of your network will affect distributed tools far less

223 than it will centralised tools. You can't even use a centralised tool

224 without a network connection, except for a few highly constrained

225 commands. With a distributed tool, if your network connection goes

226 down while you're working, you may not even notice. The only thing

227 you won't be able to do is talk to repositories on other computers,

228 something that is relatively rare compared with local operations. If

229 you have a far-flung team of collaborators, this may be significant.

230

231 \subsection{Advantages for open source projects}

232

233 If you take a shine to an open source project and decide that you

234 would like to start hacking on it, and that project uses a distributed

235 revision control tool, you are at once a peer with the people who

236 consider themselves the ``core'' of that project. If they publish

237 their repositories, you can immediately copy their project history,

238 start making changes, and record your work, using the same tools in

239 the same ways as insiders. By contrast, with a centralised tool, you

240 must use the software in a ``read only'' mode unless someone grants

241 you permission to commit changes to their central server. Until then,

242 you won't be able to record changes, and your local modifications will

243 be at risk of corruption any time you try to update your client's view

244 of the repository.

245

246 \subsubsection{The forking non-problem}

247

248 It has been suggested that distributed revision control tools pose

249 some sort of risk to open source projects because they make it easy to

250 ``fork'' the development of a project. A fork happens when there are

251 differences in opinion or attitude between groups of developers that

252 cause them to decide that they can't work together any longer. Each

253 side takes a more or less complete copy of the project's source code,

254 and goes off in its own direction.

255

256 Sometimes the camps in a fork decide to reconcile their differences.

257 With a centralised revision control system, the \emph{technical}

258 process of reconciliation is painful, and has to be performed largely

259 by hand. You have to decide whose revision history is going to

260 ``win'', and graft the other team's changes into the tree somehow.

261 This usually loses some or all of one side's revision history.

262

263 What distributed tools do with respect to forking is they make forking

264 the \emph{only} way to develop a project. Every single change that

265 you make is potentially a fork point. The great strength of this

266 approach is that a distributed revision control tool has to be really

267 good at \emph{merging} forks, because forks are absolutely

268 fundamental: they happen all the time.

269

270 If every piece of work that everybody does, all the time, is framed in

271 terms of forking and merging, then what the open source world refers

272 to as a ``fork'' becomes \emph{purely} a social issue. If anything,

273 distributed tools \emph{lower} the likelihood of a fork:

274 \begin{itemize}

275 \item They eliminate the social distinction that centralised tools

276 impose: that between insiders (people with commit access) and

277 outsiders (people without).

278 \item They make it easier to reconcile after a social fork, because

279 all that's involved from the perspective of the revision control

280 software is just another merge.

281 \end{itemize}

282

283 Some people resist distributed tools because they want to retain tight

284 control over their projects, and they believe that centralised tools

285 give them this control. However, if you're of this belief, and you

286 publish your CVS or Subversion repositories publically, there are

287 plenty of tools available that can pull out your entire project's

288 history (albeit slowly) and recreate it somewhere that you don't

289 control. So while your control in this case is illusory, you are

290 forgoing the ability to fluidly collaborate with whatever people feel

291 compelled to mirror and fork your history.

292

293 \subsection{Advantages for commercial projects}

294

295 Many commercial projects are undertaken by teams that are scattered

296 across the globe. Contributors who are far from a central server will

297 see slower command execution and perhaps less reliability. Commercial

298 revision control systems attempt to ameliorate these problems with

299 remote-site replication add-ons that are typically expensive to buy

300 and cantankerous to administer. A distributed system doesn't suffer

301 from these problems in the first place. Better yet, you can easily

302 set up multiple authoritative servers, say one per site, so that

303 there's no redundant communication between repositories over expensive

304 long-haul network links.

305

306 Centralised revision control systems tend to have relatively low

307 scalability. It's not unusual for an expensive centralised system to

308 fall over under the combined load of just a few dozen concurrent

309 users. Once again, the typical response tends to be an expensive and

310 clunky replication facility. Since the load on a central server---if

311 you have one at all---is many times lower with a distributed

312 tool (because all of the data is replicated everywhere), a single

313 cheap server can handle the needs of a much larger team, and

314 replication to balance load becomes a simple matter of scripting.

315

316 If you have an employee in the field, troubleshooting a problem at a

317 customer's site, they'll benefit from distributed revision control.

318 The tool will let them generate custom builds, try different fixes in

319 isolation from each other, and search efficiently through history for

320 the sources of bugs and regressions in the customer's environment, all

321 without needing to connect to your company's network.

322

323 \section{Why choose Mercurial?}

324

325 Mercurial has a unique set of properties that make it a particularly

326 good choice as a revision control system.

327 \begin{itemize}

328 \item It is easy to learn and use.

329 \item It is lightweight.

330 \item It scales excellently.

331 \item It is easy to customise.

332 \end{itemize}

333

334 If you are at all familiar with revision control systems, you should

335 be able to get up and running with Mercurial in less than five

336 minutes. Even if not, it will take no more than a few minutes

337 longer. Mercurial's command and feature sets are generally uniform

338 and consistent, so you can keep track of a few general rules instead

339 of a host of exceptions.

340

341 On a small project, you can start working with Mercurial in moments.

342 Creating new changes and branches; transferring changes around

343 (whether locally or over a network); and history and status operations

344 are all fast. Mercurial attempts to stay nimble and largely out of

345 your way by combining low cognitive overhead with blazingly fast

346 operations.

347

348 The usefulness of Mercurial is not limited to small projects: it is

349 used by projects with hundreds to thousands of contributors, each

350 containing tens of thousands of files and hundreds of megabytes of

351 source code.

352

353 If the core functionality of Mercurial is not enough for you, it's

354 easy to build on. Mercurial is well suited to scripting tasks, and

355 its clean internals and implementation in Python make it easy to add

356 features in the form of extensions. There are a number of popular and

357 useful extensions already available, ranging from helping to identify

358 bugs to improving performance.

359

360 \section{Mercurial compared with other tools}

361

362 Before you read on, please understand that this section necessarily

363 reflects my own experiences, interests, and (dare I say it) biases. I

364 have used every one of the revision control tools listed below, in

365 most cases for several years at a time.

366

367

368 \subsection{Subversion}

369

370 Subversion is a popular revision control tool, developed to replace

371 CVS. It has a centralised client/server architecture.

372

373 Subversion and Mercurial have similarly named commands for performing

374 the same operations, so if you're familiar with one, it is easy to

375 learn to use the other. Both tools are portable to all popular

376 operating systems.

377

378 Prior to version 1.5, Subversion had no useful support for merges.

379 At the time of writing, its merge tracking capability is new, and known to be

380 \href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated

381 and buggy}.

382

383 Mercurial has a substantial performance advantage over Subversion on

384 every revision control operation I have benchmarked. I have measured

385 its advantage as ranging from a factor of two to a factor of six when

386 compared with Subversion~1.4.3's \emph{ra\_local} file store, which is

387 the fastest access method available. In more realistic deployments

388 involving a network-based store, Subversion will be at a substantially

389 larger disadvantage. Because many Subversion commands must talk to

390 the server and Subversion does not have useful replication facilities,

391 server capacity and network bandwidth become bottlenecks for modestly

392 large projects.

393

394 Additionally, Subversion incurs substantial storage overhead to avoid

395 network transactions for a few common operations, such as finding

396 modified files (\texttt{status}) and displaying modifications against

397 the current revision (\texttt{diff}). As a result, a Subversion

398 working copy is often the same size as, or larger than, a Mercurial

399 repository and working directory, even though the Mercurial repository

400 contains a complete history of the project.

401

402 Subversion is widely supported by third party tools. Mercurial

403 currently lags considerably in this area. This gap is closing,

404 however, and indeed some of Mercurial's GUI tools now outshine their

405 Subversion equivalents. Like Mercurial, Subversion has an excellent

406 user manual.

407

408 Because Subversion doesn't store revision history on the client, it is

409 well suited to managing projects that deal with lots of large, opaque

410 binary files. If you check in fifty revisions to an incompressible

411 10MB file, Subversion's client-side space usage stays constant The

412 space used by any distributed SCM will grow rapidly in proportion to

413 the number of revisions, because the differences between each revision

414 are large.

415

416 In addition, it's often difficult or, more usually, impossible to

417 merge different versions of a binary file. Subversion's ability to

418 let a user lock a file, so that they temporarily have the exclusive

419 right to commit changes to it, can be a significant advantage to a

420 project where binary files are widely used.

421

422 Mercurial can import revision history from a Subversion repository.

423 It can also export revision history to a Subversion repository. This

424 makes it easy to ``test the waters'' and use Mercurial and Subversion

425 in parallel before deciding to switch. History conversion is

426 incremental, so you can perform an initial conversion, then small

427 additional conversions afterwards to bring in new changes.

428

429

430 \subsection{Git}

431

432 Git is a distributed revision control tool that was developed for

433 managing the Linux kernel source tree. Like Mercurial, its early

434 design was somewhat influenced by Monotone.

435

436 Git has a very large command set, with version~1.5.0 providing~139

437 individual commands. It has something of a reputation for being

438 difficult to learn. Compared to Git, Mercurial has a strong focus on

439 simplicity.

440

441 In terms of performance, Git is extremely fast. In several cases, it

442 is faster than Mercurial, at least on Linux, while Mercurial performs

443 better on other operations. However, on Windows, the performance and

444 general level of support that Git provides is, at the time of writing,

445 far behind that of Mercurial.

446

447 While a Mercurial repository needs no maintenance, a Git repository

448 requires frequent manual ``repacks'' of its metadata. Without these,

449 performance degrades, while space usage grows rapidly. A server that

450 contains many Git repositories that are not rigorously and frequently

451 repacked will become heavily disk-bound during backups, and there have

452 been instances of daily backups taking far longer than~24 hours as a

453 result. A freshly packed Git repository is slightly smaller than a

454 Mercurial repository, but an unpacked repository is several orders of

455 magnitude larger.

456

457 The core of Git is written in C. Many Git commands are implemented as

458 shell or Perl scripts, and the quality of these scripts varies widely.

459 I have encountered several instances where scripts charged along

460 blindly in the presence of errors that should have been fatal.

461

462 Mercurial can import revision history from a Git repository.

463

464

465 \subsection{CVS}

466

467 CVS is probably the most widely used revision control tool in the

468 world. Due to its age and internal untidiness, it has been only

469 lightly maintained for many years.

470

471 It has a centralised client/server architecture. It does not group

472 related file changes into atomic commits, making it easy for people to

473 ``break the build'': one person can successfully commit part of a

474 change and then be blocked by the need for a merge, causing other

475 people to see only a portion of the work they intended to do. This

476 also affects how you work with project history. If you want to see

477 all of the modifications someone made as part of a task, you will need

478 to manually inspect the descriptions and timestamps of the changes

479 made to each file involved (if you even know what those files were).

480

481 CVS has a muddled notion of tags and branches that I will not attempt

482 to even describe. It does not support renaming of files or

483 directories well, making it easy to corrupt a repository. It has

484 almost no internal consistency checking capabilities, so it is usually

485 not even possible to tell whether or how a repository is corrupt. I

486 would not recommend CVS for any project, existing or new.

487

488 Mercurial can import CVS revision history. However, there are a few

489 caveats that apply; these are true of every other revision control

490 tool's CVS importer, too. Due to CVS's lack of atomic changes and

491 unversioned filesystem hierarchy, it is not possible to reconstruct

492 CVS history completely accurately; some guesswork is involved, and

493 renames will usually not show up. Because a lot of advanced CVS

494 administration has to be done by hand and is hence error-prone, it's

495 common for CVS importers to run into multiple problems with corrupted

496 repositories (completely bogus revision timestamps and files that have

497 remained locked for over a decade are just two of the less interesting

498 problems I can recall from personal experience).

499

500 Mercurial can import revision history from a CVS repository.

501

502

503 \subsection{Commercial tools}

504

505 Perforce has a centralised client/server architecture, with no

506 client-side caching of any data. Unlike modern revision control

507 tools, Perforce requires that a user run a command to inform the

508 server about every file they intend to edit.

509

510 The performance of Perforce is quite good for small teams, but it

511 falls off rapidly as the number of users grows beyond a few dozen.

512 Modestly large Perforce installations require the deployment of

513 proxies to cope with the load their users generate.

514

515

516 \subsection{Choosing a revision control tool}

517

518 With the exception of CVS, all of the tools listed above have unique

519 strengths that suit them to particular styles of work. There is no

520 single revision control tool that is best in all situations.

521

522 As an example, Subversion is a good choice for working with frequently

523 edited binary files, due to its centralised nature and support for

524 file locking.

525

526 I personally find Mercurial's properties of simplicity, performance,

527 and good merge support to be a compelling combination that has served

528 me well for several years.

529

530

531 \section{Switching from another tool to Mercurial}

532

533 Mercurial is bundled with an extension named \hgext{convert}, which

534 can incrementally import revision history from several other revision

535 control tools. By ``incremental'', I mean that you can convert all of

536 a project's history to date in one go, then rerun the conversion later

537 to obtain new changes that happened after the initial conversion.

538

539 The revision control tools supported by \hgext{convert} are as

540 follows:

541 \begin{itemize}

542 \item Subversion

543 \item CVS

544 \item Git

545 \item Darcs

546 \end{itemize}

547

548 In addition, \hgext{convert} can export changes from Mercurial to

549 Subversion. This makes it possible to try Subversion and Mercurial in

550 parallel before committing to a switchover, without risking the loss

551 of any work.

552

553 The \hgxcmd{conver}{convert} command is easy to use. Simply point it

554 at the path or URL of the source repository, optionally give it the

555 name of the destination repository, and it will start working. After

556 the initial conversion, just run the same command again to import new

557 changes.

558

559

560 %%% Local Variables:

561 %%% mode: latex

562 %%% TeX-master: "00book"

563 %%% End: