hgbook: 81454425eee9 en/mq.tex

hgbook

view en/mq.tex @ 16:81454425eee9

Progress on a few fronts, mainly indexing and MQ chapter content

author	Bryan O'Sullivan <bos@serpentine.com>
date	Mon Jul 03 22:43:52 2006 -0700 (2006-07-03)
parents	b8ac9f312a47
children	2668e15c76e9

line source

1 \chapter{Managing change with Mercurial Queues}

2 \label{chap:mq}

4 \section{The patch management problem}

5 \label{sec:mq:patch-mgmt}

7 Here is a common scenario: you need to install a software package from

8 source, but you find a bug that you must fix in the source before you

9 can start using the package. You make your changes, forget about the

10 package for a while, and a few months later you need to upgrade to a

11 newer version of the package. If the newer version of the package

12 still has the bug, you must extract your fix from the older source

13 tree and apply it against the newer version. This is a tedious task,

14 and it's easy to make mistakes.

16 This is a simple case of the ``patch management'' problem. You have

17 an ``upstream'' source tree that you can't change; you need to make

18 some local changes on top of the upstream tree; and you'd like to be

19 able to keep those changes separate, so that you can apply them to

20 newer versions of the upstream source.

22 The patch management problem arises in many situations. Probably the

23 most visible is that a user of an open source software project will

24 contribute a bug fix or new feature to the project's maintainers in the

25 form of a patch.

27 Distributors of operating systems that include open source software

28 often need to make changes to the packages they distribute so that

29 they will build properly in their environments.

31 When you have few changes to maintain, it is easy to manage a single

32 patch using the standard \texttt{diff} and \texttt{patch} programs

33 (see section~\ref{sec:mq:patch} for a discussion of these tools).

34 Once the number of changes grows, it starts to makes sense to maintain

35 patches as discrete ``chunks of work,'' so that for example a single

36 patch will contain only one bug fix (the patch might modify several

37 files, but it's doing ``only one thing''), and you may have a number

38 of such patches for different bugs you need fixed and local changes

39 you require. In this situation, if you submit a bug fix patch to the

40 upstream maintainers of a package and they include your fix in a

41 subsequent release, you can simply drop that single patch when you're

42 updating to the newer release.

44 Maintaining a single patch against an upstream tree is a little

45 tedious and error-prone, but not difficult. However, the complexity

46 of the problem grows rapidly as the number of patches you have to

47 maintain increases. With more than a tiny number of patches in hand,

48 understanding which ones you have applied and maintaining them moves

49 from messy to overwhelming.

51 Fortunately, Mercurial includes a powerful extension, Mercurial Queues

52 (or simply ``MQ''), that massively simplifies the patch management

53 problem.

55 \section{The prehistory of Mercurial Queues}

56 \label{sec:mq:history}

58 During the late 1990s, several Linux kernel developers started to

59 maintain ``patch series'' that modified the behaviour of the Linux

60 kernel. Some of these series were focused on stability, some on

61 feature coverage, and others were more speculative.

63 The sizes of these patch series grew rapidly. In 2002, Andrew Morton

64 published some shell scripts he had been using to automate the task of

65 managing his patch queues. Andrew was successfully using these

66 scripts to manage hundreds (sometimes thousands) of patches on top of

67 the Linux kernel.

69 \subsection{A patchwork quilt}

70 \label{sec:mq:quilt}

73 In early 2003, Andreas Gruenbacher and Martin Quinson borrowed the

74 approach of Andrew's scripts and published a tool called ``patchwork

75 quilt''~\cite{web:quilt}, or simply ``quilt''

76 (see~\cite{gruenbacher:2005} for a paper describing it). Because

77 quilt substantially automated patch management, it rapidly gained a

78 large following among open source software developers.

80 Quilt manages a \emph{stack of patches} on top of a directory tree.

81 To begin, you tell quilt to manage a directory tree; it stores away

82 the names and contents of all files in the tree. To fix a bug, you

83 create a new patch (using a single command), edit the files you need

84 to fix, then ``refresh'' the patch.

86 The refresh step causes quilt to scan the directory tree; it updates

87 the patch with all of the changes you have made. You can create

88 another patch on top of the first, which will track the changes

89 required to modify the tree from ``tree with one patch applied'' to

90 ``tree with two patches applied''.

92 You can \emph{change} which patches are applied to the tree. If you

93 ``pop'' a patch, the changes made by that patch will vanish from the

94 directory tree. Quilt remembers which patches you have popped,

95 though, so you can ``push'' a popped patch again, and the directory

96 tree will be restored to contain the modifications in the patch. Most

97 importantly, you can run the ``refresh'' command at any time, and the

98 topmost applied patch will be updated. This means that you can, at

99 any time, change both which patches are applied and what

100 modifications those patches make.

101

102 Quilt knows nothing about revision control tools, so it works equally

103 well on top of an unpacked tarball or a Subversion repository.

104

105 \subsection{From patchwork quilt to Mercurial Queues}

106 \label{sec:mq:quilt-mq}

107

108 In mid-2005, Chris Mason took the features of quilt and wrote an

109 extension that he called Mercurial Queues, which added quilt-like

110 behaviour to Mercurial.

111

112 The key difference between quilt and MQ is that quilt knows nothing

113 about revision control systems, while MQ is \emph{integrated} into

114 Mercurial. Each patch that you push is represented as a Mercurial

115 changeset. Pop a patch, and the changeset goes away.

116

117 This integration makes understanding patches and debugging their

118 effects \emph{enormously} easier. Since every applied patch has an

119 associated changeset, you can use \hgcmdargs{log}{\emph{filename}} to

120 see which changesets and patches affected a file. You can use the

121 \hgext{bisect} extension to binary-search through all changesets and

122 applied patches to see where a bug got introduced or fixed. You can

123 use the \hgcmd{annotate} command to see which changeset or patch

124 modified a particular line of a source file. And so on.

125

126 Because quilt does not care about revision control tools, it is still

127 a tremendously useful piece of software to know about for situations

128 where you cannot use Mercurial and MQ.

129 \section{Getting started with Mercurial Queues}

130 \label{sec:mq:start}

131

132 Because MQ is implemented as an extension, you must explicitly enable

133 before you can use it. (You don't need to download anything; MQ ships

134 with the standard Mercurial distribution.) To enable MQ, edit your

135 \tildefile{.hgrc} file, and add the lines in figure~\ref{ex:mq:config}.

136

137 \begin{figure}[ht]

138 \begin{codesample4}

139 [extensions]

140 hgext.mq =

141 \end{codesample4}

142 \label{ex:mq:config}

143 \caption{Contents to add to \tildefile{.hgrc} to enable the MQ extension}

144 \end{figure}

145

146 Once the extension is enabled, it will make a number of new commands

147 available. To verify that the extension is working, you can use

148 \hgcmd{help} to see if the \hgcmd{qinit} command is now available; see

149 the example in figure~\ref{ex:mq:enabled}.

150

151 \begin{figure}[ht]

152 \interaction{mq.qinit-help.help}

153 \caption{How to verify that MQ is enabled}

154 \label{ex:mq:enabled}

155 \end{figure}

156

157 You can use MQ with \emph{any} Mercurial repository, and its commands

158 only operate within that repository. To get started, simply prepare

159 the repository using the \hgcmd{qinit} command (see

160 figure~\ref{ex:mq:qinit}). This command creates an empty directory

161 called \sdirname{.hg/patches}, where MQ will keep its metadata. As

162 with many Mercurial commands, the \hgcmd{qinit} command prints nothing

163 if it succeeds.

164

165 \begin{figure}[ht]

166 \interaction{mq.tutorial.qinit}

167 \caption{Preparing a repository for use with MQ}

168 \label{ex:mq:qinit}

169 \end{figure}

170

171 \begin{figure}[ht]

172 \interaction{mq.tutorial.qnew}

173 \caption{Creating a new patch}

174 \label{ex:mq:qnew}

175 \end{figure}

176

177 \subsection{Creating a new patch}

178

179 To begin work on a new patch, use the \hgcmd{qnew} command. This

180 command takes one argument, the name of the patch to create. MQ will

181 use this as the name of an actual file in the \sdirname{.hg/patches}

182 directory, as you can see in figure~\ref{ex:mq:qnew}.

183

184 Also newly present in the \sdirname{.hg/patches} directory are two

185 other files, \sfilename{series} and \sfilename{status}. The

186 \sfilename{series} file lists all of the patches that MQ knows about

187 for this repository, with one patch per line. Mercurial uses the

188 \sfilename{status} file for internal book-keeping; it tracks all of the

189 patches that MQ has \emph{applied} in this repository.

190

191 \begin{note}

192 You may sometimes want to edit the \sfilename{series} file by hand;

193 for example, to change the sequence in which some patches are

194 applied. However, manually editing the \sfilename{status} file is

195 almost always a bad idea, as it's easy to corrupt MQ's idea of what

196 is happening.

197 \end{note}

198

199 Once you have created your new patch, you can edit files in the

200 working directory as you usually would. All of the normal Mercurial

201 commands, such as \hgcmd{diff} and \hgcmd{annotate}, work exactly as

202 they did before.

203 \subsection{Refreshing a patch}

204

205 When you reach a point where you want to save your work, use the

206 \hgcmd{qrefresh} command (figure~\ref{ex:mq:qnew}) to update the patch

207 you are working on. This command folds the changes you have made in

208 the working directory into your patch, and updates its corresponding

209 changeset to contain those changes.

210

211 \begin{figure}[ht]

212 \interaction{mq.tutorial.qrefresh}

213 \caption{Refreshing a patch}

214 \label{ex:mq:qrefresh}

215 \end{figure}

216

217 You can run \hgcmd{qrefresh} as often as you like, so it's a good way

218 to ``checkpoint'' your work. Refresh your patch at an opportune

219 time; try an experiment; and if the experiment doesn't work out,

220 \hgcmd{revert} your modifications back to the last time you refreshed.

221

222 \begin{figure}[ht]

223 \interaction{mq.tutorial.qrefresh2}

224 \caption{Refresh a patch many times to accumulate changes}

225 \label{ex:mq:qrefresh2}

226 \end{figure}

227

228 \subsection{Stacking and tracking patches}

229

230 Once you have finished working on a patch, or need to work on another,

231 you can use the \hgcmd{qnew} command again to create a new patch.

232 Mercurial will apply this patch on top of your existing patch. See

233 figure~\ref{ex:mq:qnew2} for an example. Notice that the patch

234 contains the changes in our prior patch as part of its context (you

235 can see this more clearly in the output of \hgcmd{annotate}).

236

237 \begin{figure}[ht]

238 \interaction{mq.tutorial.qnew2}

239 \caption{Stacking a second patch on top of the first}

240 \label{ex:mq:qnew2}

241 \end{figure}

242

243 So far, with the exception of \hgcmd{qnew} and \hgcmd{qrefresh}, we've

244 been careful to only use regular Mercurial commands. However, there

245 are more ``natural'' commands you can use when thinking about patches

246 with MQ, as illustrated in figure~\ref{ex:mq:qseries}:

247

248 \begin{itemize}

249 \item The \hgcmd{qseries} command lists every patch that MQ knows

250 about in this repository, from oldest to newest (most recently

251 \emph{created}).

252 \item The \hgcmd{qapplied} command lists every patch that MQ has

253 \emph{applied} in this repository, again from oldest to newest (most

254 recently applied).

255 \end{itemize}

256

257 \begin{figure}[ht]

258 \interaction{mq.tutorial.qseries}

259 \caption{Understanding the patch stack with \hgcmd{qseries} and

260 \hgcmd{qapplied}}

261 \label{ex:mq:qseries}

262 \end{figure}

263

264 \subsection{Manipulating the patch stack}

265

266 The previous discussion implied that there must be a difference

267 between ``known'' and ``applied'' patches, and there is. MQ can

268 manage a patch without it being applied in the repository.

269

270 An \emph{applied} patch has a corresponding changeset in the

271 repository, and the effects of the patch and changeset are visible in

272 the working directory. You can undo the application of a patch using

273 the \hgcmd{qpop} command. MQ still \emph{knows about}, or manages, a

274 popped patch, but the patch no longer has a corresponding changeset in

275 the repository, and the working directory does not contain the changes

276 made by the patch. Figure~\ref{fig:mq:stack} illustrates the

277 difference between applied and tracked patches.

278

279 \begin{figure}[ht]

280 \centering

281 \grafix{mq-stack}

282 \caption{Applied and unapplied patches in the MQ patch stack}

283 \label{fig:mq:stack}

284 \end{figure}

285

286 You can reapply an unapplied, or popped, patch using the \hgcmd{qpush}

287 command. This creates a new changeset to correspond to the patch, and

288 the patch's changes once again become present in the working

289 directory. See figure~\ref{ex:mq:qpop} for examples of \hgcmd{qpop}

290 and \hgcmd{qpush} in action. Notice that once we have popped a patch

291 or two patches, the output of \hgcmd{qseries} remains the same, while

292 that of \hgcmd{qapplied} has changed.

293

294 \begin{figure}[ht]

295 \interaction{mq.tutorial.qpop}

296 \caption{Modifying the stack of applied patches}

297 \label{ex:mq:qpop}

298 \end{figure}

299

300 MQ does not limit you to pushing or popping one patch. You can have

301 no patches, all of them, or any number in between applied at some

302 point in time.

303

304 \subsection{Working on several patches at once}

305

306 The \hgcmd{qrefresh} command always refreshes the \emph{topmost}

307 applied patch. This means that you can suspend work on one patch (by

308 refreshing it), pop or push to make a different patch the top, and

309 work on \emph{that} patch for a while.

310

311 Here's an example that illustrates how you can use this ability.

312 Let's say you're developing a new feature as two patches. The first

313 is a change to the core of your software, and the second--layered on

314 top of the first--changes the user interface to use the code you just

315 added to the core. If you notice a bug in the core while you're

316 working on the UI patch, it's easy to fix the core. Simply

317 \hgcmd{qrefresh} the UI patch to save your in-progress changes, and

318 \hgcmd{qpop} down to the core patch. Fix the core bug,

319 \hgcmd{qrefresh} the core patch, and \hgcmd{qpush} back to the UI

320 patch to continue where you left off.

321

322 \section{Mercurial Queues and GNU patch}

323 \label{sec:mq:patch}

324

325 MQ uses the GNU \command{patch} command to apply patches. Because MQ

326 doesn't hide its patch-oriented nature, it is helpful to understand

327 the data that MQ and \command{patch} work with, and a few aspects of

328 how \command{patch} operates.

329

330 The \command{diff} command generates a list of modifications by

331 comparing two files. The \command{patch} command applies a list of

332 modifications to a file. The kinds of files that \command{diff} and

333 \command{patch} work with are referred to as both ``diffs'' and

334 ``patches;'' there is no difference between a diff and a patch.

335

336 A patch file can start with arbitrary text; MQ uses this text as the

337 commit message when creating changesets. It treats the first line

338 that starts with the string ``\texttt{diff~-}'' as the separator

339 between header and content.

340

341 MQ works with \emph{unified} diffs (\command{patch} can accept several

342 other diff formats, but MQ doesn't). A unified diff contains two

343 kinds of header. The \emph{file header} describes the file being

344 modified; it contains the name of the file to modify. When

345 \command{patch} sees a new file header, it looks for a file with that

346 name to start modifying.

347

348 After the file header comes a series of \emph{hunks}. Each hunk

349 starts with a header; this identifies the range of line numbers within

350 the file that the hunk should modify. Following the header, a hunk

351 starts and ends with a few (usually three) lines of text from the

352 unmodified file; these are called the \emph{context} for the hunk.

353 Each unmodified line begins with a space characters. Within the hunk,

354 a line that begins with ``\texttt{-}'' means ``remove this line,''

355 while a line that begins with ``\texttt{+}'' means ``insert this

356 line.'' For example, a line that is modified is represented by one

357 deletion and one insertion.

358

359 The \command{diff} command runs hunks together when there's not enough

360 context between modifications to justify

361

362 When \command{patch} applies a hunk, it tries a handful of

363 successively less accurate strategies to try to make the hunk apply.

364 This falling-back technique often makes it possible to take a patch

365 that was generated against an old version of a file, and apply it

366 against a newer version of that file.

367

368 First, \command{patch} tries an exact match, where the line numbers,

369 the context, and the text to be modified must apply exactly. If it

370 cannot make an exact match, it tries to find an exact match for the

371 context, without honouring the line numbering information. If this

372 succeeds, it prints a line of output saying that the hunk was applied,

373 but at some \emph{offset} from the original line number.

374

375 If a context-only match fails, \command{patch} removes the first and

376 last lines of the context, and tries a \emph{reduced} context-only

377 match. If the hunk with reduced context succeeds, it prints a message

378 saying that it applied the hunk with a \emph{fuzz factor} (the number

379 after the fuzz factor indicates how many lines of context

380 \command{patch} had to trim before the patch applied).

381

382 When neither of these techniques works, \command{patch} prints a

383 message saying that the hunk in question was rejected. It saves

384 rejected hunks to a file with the same name, and an added

385 \sfilename{.rej} extension. It also saves an unmodified copy of the

386 file with a \sfilename{.orig} extension; the copy of the file without

387 any extensions will contain any changes made by hunks that \emph{did}

388 apply cleanly. If you have a patch that modifies \filename{foo} with

389 six hunks, and one of them fails to apply, you will have: an

390 unmodified \filename{foo.orig}, a \filename{foo.rej} containing one

391 hunk, and \filename{foo}, containing the changes made by the five

392 successful five hunks.

393

394 \subsection{Beware the fuzz}

395

396 While applying a hunk at an offset, or with a fuzz factor, will often

397 be completely successful, these inexact techniques naturally leave

398 open the possibility of corrupting the patched file. The most common

399 cases typically involve applying a patch twice, or at an incorrect

400 location in the file. If \command{patch} or \hgcmd{qpush} ever

401 mentions an offset or fuzz factor, you should make sure that the

402 modified files are correct afterwards.

403

404 It's often a good idea to refresh a patch that has applied with an

405 offset or fuzz factor; refreshing the patch generates new context

406 information that will make it apply cleanly. I say ``often,'' not

407 ``always,'' because sometimes refreshing a patch will make it fail to

408 apply against a different revision of the underlying files. In some

409 cases, such as when you're maintaining a patch that must sit on top of

410 multiple versions of a source tree, it's acceptable to have a patch

411 apply with some fuzz, provided you've verified the results of the

412 patching process in such cases.

413

414 \subsection{Handling rejection}

415

416 If \hgcmd{qpush} fails to apply a patch, it will print an error

417 message and exit. If it has left \sfilename{.rej} files behind, it is

418 usually best to fix up the rejected hunks before you push more patches

419 or do any further work.

420

421 If your patch \emph{used to} apply cleanly, and no longer does because

422 you've changed the underlying code that your patches are based on,

423 Mercurial Queues can help; see section~\ref{seq:mq:merge} for details.

424

425 Unfortunately, there aren't any great techniques for dealing with

426 rejected hunks. Most often, you'll need to view the \sfilename{.rej}

427 file and edit the target file, applying the rejected hunks by hand.

428

429 If you're feeling adventurous, Neil Brown, a Linux kernel hacker,

430 wrote a tool called \command{wiggle}~\cite{web:wiggle}, which is more

431 vigorous than \command{patch} in its attempts to make a patch apply.

432

433 Another Linux kernel hacker, Chris Mason (the author of Mercurial

434 Queues), wrote a similar tool called \command{rej}~\cite{web:rej},

435 which takes a simple approach to automating the application of hunks

436 rejected by \command{patch}. \command{rej} can help with four common

437 reasons that a hunk may be rejected:

438

439 \begin{itemize}

440 \item The context in the middle of a hunk has changed.

441 \item A hunk is missing some context at the beginning or end.

442 \item A large hunk might apply better--either entirely or in part--if

443 it was broken up into smaller hunks.

444 \item A hunk removes lines with slightly different content than those

445 currently present in the file.

446 \end{itemize}

447

448 If you use \command{wiggle} or \command{rej}, you should be doubly

449 careful to check your results when you're done.

450

451 \section{Updating your patches when the underlying code changes}

452 \label{sec:mq:merge}

453

454 XXX.

455

456 \section{Managing patches in a repository}

457

458 Because MQ's \sdirname{.hg/patches} directory resides outside a

459 Mercurial repository's working directory, the ``underlying'' Mercurial

460 repository knows nothing about the management or presence of patches.

461

462 This presents the interesting possibility of managing the contents of

463 the patch directory as a Mercurial repository in its own right. This

464 can be a useful way to work. For example, you can work on a patch for

465 a while, \hgcmd{qrefresh} it, then \hgcmd{commit} the current state of

466 the patch. This lets you ``roll back'' to that version of the patch

467 later on.

468

469 In addition, you can then share different versions of the same patch

470 stack among multiple underlying repositories. I use this when I am

471 developing a Linux kernel feature. I have a pristine copy of my

472 kernel sources for each of several CPU architectures, and a cloned

473 repository under each that contains the patches I am working on. When

474 I want to test a change on a different architecture, I push my current

475 patches to the patch repository associated with that kernel tree, pop

476 and push all of my patches, and build and test that kernel.

477

478 Managing patches in a repository makes it possible for multiple

479 developers to work on the same patch series without colliding with

480 each other, all on top of an underlying source base that they may or

481 may not control.

482

483 \subsection{MQ support for managing a patch repository}

484

485 MQ helps you to work with the \sdirname{.hg/patches} directory as a

486 repository; when you prepare a repository for working with patches

487 using \hgcmdargs{qinit}, you can pass the \hgopt{qinit}{-c} option to

488 create the \sdirname{.hg/patches} directory as a Mercurial repository.

489

490 \begin{note}

491 If you forget to use the \hgopt{qinit}{-c} option, you can simply go

492 into the \sdirname{.hg/patches} directory at any time and run

493 \hgcmd{init}. Don't forget to add an entry for the

494 \filename{status} file to the \filename{.hgignore} file, though

495 (\hgopt{qinit}{-c} does this for you automatically); you

496 \emph{really} don't want to manage the \filename{status} file.

497 \end{note}

498

499 As a convenience, if MQ notices that the \dirname{.hg/patches}

500 directory is a repository, it will automatically \hgcmd{add} every

501 patch that you create and import.

502

503 Finally, MQ provides a shortcut command, \hgcmd{qcommit}, that runs

504 \hgcmd{commit} in the \sdirname{.hg/patches} directory. This saves

505 some cumbersome typing.

506

507 \subsection{A few things to watch out for}

508

509 MQ's support for working with a repository full of patches is limited

510 in a few small respects.

511

512 MQ cannot automatically detect changes that you make to the patch

513 directory. If you \hgcmd{pull}, manually edit, or \hgcmd{update}

514 changes to patches or the \sfilename{series} file, you will have to

515 \hgcmdargs{qpop}{-a} and then \hgcmdargs{qpush}{-a} in the underlying

516 repository to see those changes show up there. If you forget to do

517 this, you can confuse MQ's idea of which patches are applied.

518

519 \section{Commands for working with patches}

520

521 Once you've been working with patches for a while, you'll find

522 yourself hungry for tools that will help you to understand and

523 manipulate the patches you're dealing with.

524

525 The \command{diffstat} command~\cite{web:diffstat} generates a

526 histogram of the modifications made to each file in a patch. It

527 provides a good way to ``get a sense of'' a patch--which files it

528 affects, and how much change it introduces to each file and as a

529 whole. (I find that it's a good idea to use \command{diffstat}'s

530 \texttt{-p} option as a matter of course, as otherwise it will try to

531 do clever things with prefixes of file names that inevitably confuse

532 at least me.)

533

534 The \package{patchutils} package~\cite{web:patchutils} is invaluable.

535 It provides a set of small utilities that follow the ``Unix

536 philosophy;'' each does one useful thing with a patch. The

537 \package{patchutils} command I use most is \command{filterdiff}, which

538 extracts subsets from a patch file. For example, given a patch that

539 modifies hundreds of files across dozens of directories, a single

540 invocation of \command{filterdiff} can generate a smaller patch that

541 only touches files whose names match a particular glob pattern.

542

543 %%% Local Variables:

544 %%% mode: latex

545 %%% TeX-master: "00book"

546 %%% End: