hgbook

view en/mq.tex @ 15:b8ac9f312a47

Document wiggle and rej
author Bryan O'Sullivan <bos@serpentine.com>
date Mon Jul 03 17:58:29 2006 -0700 (2006-07-03)
parents e2aa527bafa0
children 81454425eee9
line source
1 \chapter{Managing change with Mercurial Queues}
2 \label{chap:mq}
4 \section{The patch management problem}
5 \label{sec:mq:patch-mgmt}
7 Here is a common scenario: you need to install a software package from
8 source, but you find a bug that you must fix in the source before you
9 can start using the package. You make your changes, forget about the
10 package for a while, and a few months later you need to upgrade to a
11 newer version of the package. If the newer version of the package
12 still has the bug, you must extract your fix from the older source
13 tree and apply it against the newer version. This is a tedious task,
14 and it's easy to make mistakes.
16 This is a simple case of the ``patch management'' problem. You have
17 an ``upstream'' source tree that you can't change; you need to make
18 some local changes on top of the upstream tree; and you'd like to be
19 able to keep those changes separate, so that you can apply them to
20 newer versions of the upstream source.
22 The patch management problem arises in many situations. Probably the
23 most visible is that a user of an open source software project will
24 contribute a bug fix or new feature to the project's maintainers in the
25 form of a patch.
27 Distributors of operating systems that include open source software
28 often need to make changes to the packages they distribute so that
29 they will build properly in their environments.
31 When you have few changes to maintain, it is easy to manage a single
32 patch using the standard \texttt{diff} and \texttt{patch} programs
33 (see section~\ref{sec:mq:patch} for a discussion of these tools).
34 Once the number of changes grows, it starts to makes sense to maintain
35 patches as discrete ``chunks of work,'' so that for example a single
36 patch will contain only one bug fix (the patch might modify several
37 files, but it's doing ``only one thing''), and you may have a number
38 of such patches for different bugs you need fixed and local changes
39 you require. In this situation, if you submit a bug fix patch to the
40 upstream maintainers of a package and they include your fix in a
41 subsequent release, you can simply drop that single patch when you're
42 updating to the newer release.
44 Maintaining a single patch against an upstream tree is a little
45 tedious and error-prone, but not difficult. However, the complexity
46 of the problem grows rapidly as the number of patches you have to
47 maintain increases. With more than a tiny number of patches in hand,
48 understanding which ones you have applied and maintaining them moves
49 from messy to overwhelming.
51 Fortunately, Mercurial includes a powerful extension, Mercurial Queues
52 (or simply ``MQ''), that massively simplifies the patch management
53 problem.
55 \section{The prehistory of Mercurial Queues}
56 \label{sec:mq:history}
58 During the late 1990s, several Linux kernel developers started to
59 maintain ``patch series'' that modified the behaviour of the Linux
60 kernel. Some of these series were focused on stability, some on
61 feature coverage, and others were more speculative.
63 The sizes of these patch series grew rapidly. In 2002, Andrew Morton
64 published some shell scripts he had been using to automate the task of
65 managing his patch queues. Andrew was successfully using these
66 scripts to manage hundreds (sometimes thousands) of patches on top of
67 the Linux kernel.
69 \subsection{A patchwork quilt}
70 \label{sec:mq:quilt}
73 In early 2003, Andreas Gruenbacher and Martin Quinson borrowed the
74 approach of Andrew's scripts and published a tool called ``patchwork
75 quilt''~\cite{web:quilt}, or simply ``quilt''
76 (see~\cite{gruenbacher:2005} for a paper describing it). Because
77 quilt substantially automated patch management, it rapidly gained a
78 large following among open source software developers.
80 Quilt manages a \emph{stack of patches} on top of a directory tree.
81 To begin, you tell quilt to manage a directory tree; it stores away
82 the names and contents of all files in the tree. To fix a bug, you
83 create a new patch (using a single command), edit the files you need
84 to fix, then ``refresh'' the patch.
86 The refresh step causes quilt to scan the directory tree; it updates
87 the patch with all of the changes you have made. You can create
88 another patch on top of the first, which will track the changes
89 required to modify the tree from ``tree with one patch applied'' to
90 ``tree with two patches applied''.
92 You can \emph{change} which patches are applied to the tree. If you
93 ``pop'' a patch, the changes made by that patch will vanish from the
94 directory tree. Quilt remembers which patches you have popped,
95 though, so you can ``push'' a popped patch again, and the directory
96 tree will be restored to contain the modifications in the patch. Most
97 importantly, you can run the ``refresh'' command at any time, and the
98 topmost applied patch will be updated. This means that you can, at
99 any time, change both which patches are applied and what
100 modifications those patches make.
102 Quilt knows nothing about revision control tools, so it works equally
103 well on top of an unpacked tarball or a Subversion repository.
105 \subsection{From patchwork quilt to Mercurial Queues}
106 \label{sec:mq:quilt-mq}
108 In mid-2005, Chris Mason took the features of quilt and wrote an
109 extension that he called Mercurial Queues, which added quilt-like
110 behaviour to Mercurial.
112 The key difference between quilt and MQ is that quilt knows nothing
113 about revision control systems, while MQ is \emph{integrated} into
114 Mercurial. Each patch that you push is represented as a Mercurial
115 changeset. Pop a patch, and the changeset goes away.
117 This integration makes understanding patches and debugging their
118 effects \emph{enormously} easier. Since every applied patch has an
119 associated changeset, you can use \hgcmdargs{log}{\emph{filename}} to
120 see which changesets and patches affected a file. You can use the
121 \hgext{bisect} extension to binary-search through all changesets and
122 applied patches to see where a bug got introduced or fixed. You can
123 use the \hgcmd{annotate} command to see which changeset or patch
124 modified a particular line of a source file. And so on.
126 Because quilt does not care about revision control tools, it is still
127 a tremendously useful piece of software to know about for situations
128 where you cannot use Mercurial and MQ.
129 \section{Getting started with Mercurial Queues}
130 \label{sec:mq:start}
132 Because MQ is implemented as an extension, you must explicitly enable
133 before you can use it. (You don't need to download anything; MQ ships
134 with the standard Mercurial distribution.) To enable MQ, edit your
135 \tildefile{.hgrc} file, and add the lines in figure~\ref{ex:mq:config}.
137 \begin{figure}[ht]
138 \begin{codesample4}
139 [extensions]
140 hgext.mq =
141 \end{codesample4}
142 \label{ex:mq:config}
143 \caption{Contents to add to \tildefile{.hgrc} to enable the MQ extension}
144 \end{figure}
146 Once the extension is enabled, it will make a number of new commands
147 available. To verify that the extension is working, you can use
148 \hgcmd{help} to see if the \hgcmd{qinit} command is now available; see
149 the example in figure~\ref{ex:mq:enabled}.
151 \begin{figure}[ht]
152 \interaction{mq.qinit-help.help}
153 \caption{How to verify that MQ is enabled}
154 \label{ex:mq:enabled}
155 \end{figure}
157 You can use MQ with \emph{any} Mercurial repository, and its commands
158 only operate within that repository. To get started, simply prepare
159 the repository using the \hgcmd{qinit} command (see
160 figure~\ref{ex:mq:qinit}). This command creates an empty directory
161 called \filename{.hg/patches}, where MQ will keep its metadata. As
162 with many Mercurial commands, the \hgcmd{qinit} command prints nothing
163 if it succeeds.
165 \begin{figure}[ht]
166 \interaction{mq.tutorial.qinit}
167 \caption{Preparing a repository for use with MQ}
168 \label{ex:mq:qinit}
169 \end{figure}
171 \begin{figure}[ht]
172 \interaction{mq.tutorial.qnew}
173 \caption{Creating a new patch}
174 \label{ex:mq:qnew}
175 \end{figure}
177 \subsection{Creating a new patch}
179 To begin work on a new patch, use the \hgcmd{qnew} command. This
180 command takes one argument, the name of the patch to create. MQ will
181 use this as the name of an actual file in the \filename{.hg/patches}
182 directory, as you can see in figure~\ref{ex:mq:qnew}.
184 Also newly present in the \filename{.hg/patches} directory are two
185 other files, \filename{series} and \filename{status}. The
186 \filename{series} file lists all of the patches that MQ knows about
187 for this repository, with one patch per line. Mercurial uses the
188 \filename{status} file for internal book-keeping; it tracks all of the
189 patches that MQ has \emph{applied} in this repository.
191 \begin{note}
192 You may sometimes want to edit the \filename{series} file by hand;
193 for example, to change the sequence in which some patches are
194 applied. However, manually editing the \filename{status} file is
195 almost always a bad idea, as it's easy to corrupt MQ's idea of what
196 is happening.
197 \end{note}
199 Once you have created your new patch, you can edit files in the
200 working directory as you usually would. All of the normal Mercurial
201 commands, such as \hgcmd{diff} and \hgcmd{annotate}, work exactly as
202 they did before.
203 \subsection{Refreshing a patch}
205 When you reach a point where you want to save your work, use the
206 \hgcmd{qrefresh} command (figure~\ref{ex:mq:qnew}) to update the patch
207 you are working on. This command folds the changes you have made in
208 the working directory into your patch, and updates its corresponding
209 changeset to contain those changes.
211 \begin{figure}[ht]
212 \interaction{mq.tutorial.qrefresh}
213 \caption{Refreshing a patch}
214 \label{ex:mq:qrefresh}
215 \end{figure}
217 You can run \hgcmd{qrefresh} as often as you like, so it's a good way
218 to ``checkpoint'' your work. Refresh your patch at an opportune
219 time; try an experiment; and if the experiment doesn't work out,
220 \hgcmd{revert} your modifications back to the last time you refreshed.
222 \begin{figure}[ht]
223 \interaction{mq.tutorial.qrefresh2}
224 \caption{Refresh a patch many times to accumulate changes}
225 \label{ex:mq:qrefresh2}
226 \end{figure}
228 \subsection{Stacking and tracking patches}
230 Once you have finished working on a patch, or need to work on another,
231 you can use the \hgcmd{qnew} command again to create a new patch.
232 Mercurial will apply this patch on top of your existing patch. See
233 figure~\ref{ex:mq:qnew2} for an example. Notice that the patch
234 contains the changes in our prior patch as part of its context (you
235 can see this more clearly in the output of \hgcmd{annotate}).
237 \begin{figure}[ht]
238 \interaction{mq.tutorial.qnew2}
239 \caption{Stacking a second patch on top of the first}
240 \label{ex:mq:qnew2}
241 \end{figure}
243 So far, with the exception of \hgcmd{qnew} and \hgcmd{qrefresh}, we've
244 been careful to only use regular Mercurial commands. However, there
245 are more ``natural'' commands you can use when thinking about patches
246 with MQ, as illustrated in figure~\ref{ex:mq:qseries}:
248 \begin{itemize}
249 \item The \hgcmd{qseries} command lists every patch that MQ knows
250 about in this repository, from oldest to newest (most recently
251 \emph{created}).
252 \item The \hgcmd{qapplied} command lists every patch that MQ has
253 \emph{applied} in this repository, again from oldest to newest (most
254 recently applied).
255 \end{itemize}
257 \begin{figure}[ht]
258 \interaction{mq.tutorial.qseries}
259 \caption{Understanding the patch stack with \hgcmd{qseries} and
260 \hgcmd{qapplied}}
261 \label{ex:mq:qseries}
262 \end{figure}
264 \subsection{Manipulating the patch stack}
266 The previous discussion implied that there must be a difference
267 between ``known'' and ``applied'' patches, and there is. MQ can
268 manage a patch without it being applied in the repository.
270 An \emph{applied} patch has a corresponding changeset in the
271 repository, and the effects of the patch and changeset are visible in
272 the working directory. You can undo the application of a patch using
273 the \hgcmd{qpop} command. MQ still \emph{knows about}, or manages, a
274 popped patch, but the patch no longer has a corresponding changeset in
275 the repository, and the working directory does not contain the changes
276 made by the patch. Figure~\ref{fig:mq:stack} illustrates the
277 difference between applied and tracked patches.
279 \begin{figure}[ht]
280 \centering
281 \grafix{mq-stack}
282 \caption{Applied and unapplied patches in the MQ patch stack}
283 \label{fig:mq:stack}
284 \end{figure}
286 You can reapply an unapplied, or popped, patch using the \hgcmd{qpush}
287 command. This creates a new changeset to correspond to the patch, and
288 the patch's changes once again become present in the working
289 directory. See figure~\ref{ex:mq:qpop} for examples of \hgcmd{qpop}
290 and \hgcmd{qpush} in action. Notice that once we have popped a patch
291 or two patches, the output of \hgcmd{qseries} remains the same, while
292 that of \hgcmd{qapplied} has changed.
294 \begin{figure}[ht]
295 \interaction{mq.tutorial.qpop}
296 \caption{Modifying the stack of applied patches}
297 \label{ex:mq:qpop}
298 \end{figure}
300 MQ does not limit you to pushing or popping one patch. You can have
301 no patches, all of them, or any number in between applied at some
302 point in time.
304 \subsection{Working on several patches at once}
306 The \hgcmd{qrefresh} command always refreshes the \emph{topmost}
307 applied patch. This means that you can suspend work on one patch (by
308 refreshing it), pop or push to make a different patch the top, and
309 work on \emph{that} patch for a while.
311 Here's an example that illustrates how you can use this ability.
312 Let's say you're developing a new feature as two patches. The first
313 is a change to the core of your software, and the second--layered on
314 top of the first--changes the user interface to use the code you just
315 added to the core. If you notice a bug in the core while you're
316 working on the UI patch, it's easy to fix the core. Simply
317 \hgcmd{qrefresh} the UI patch to save your in-progress changes, and
318 \hgcmd{qpop} down to the core patch. Fix the core bug,
319 \hgcmd{qrefresh} the core patch, and \hgcmd{qpush} back to the UI
320 patch to continue where you left off.
322 \section{Mercurial Queues and GNU patch}
323 \label{sec:mq:patch}
325 MQ uses the GNU \command{patch} command to apply patches. It will
326 help you to understand the data that MQ and \command{patch} work with,
327 and a few aspects of how \command{patch} operates.
329 The \command{diff} command generates a list of modifications by
330 comparing two files. The \command{patch} command applies a list of
331 modifications to a file. The kinds of files that \command{diff} and
332 \command{patch} work with are referred to as both ``diffs'' and
333 ``patches;'' there is no difference between a diff and a patch.
335 A patch file can start with arbitrary text; MQ uses this text as the
336 commit message when creating changesets. It treats the first line
337 that starts with the string ``\texttt{diff~-}'' as the separator
338 between header and content.
340 MQ works with \emph{unified} diffs (\command{patch} can accept several
341 other diff formats, but MQ doesn't). A unified diff contains two
342 kinds of header. The \emph{file header} describes the file being
343 modified; it contains the name of the file to modify. When
344 \command{patch} sees a new file header, it looks for a file with that
345 name to start modifying.
347 After the file header comes a series of \emph{hunks}. Each hunk
348 starts with a header; this identifies the range of line numbers within
349 the file that the hunk should modify. Following the header, a hunk
350 starts and ends with a few (usually three) lines of text from the
351 unmodified file; these are called the \emph{context} for the hunk.
352 Each unmodified line begins with a space characters. Within the hunk,
353 a line that begins with ``\texttt{-}'' means ``remove this line,''
354 while a line that begins with ``\texttt{+}'' means ``insert this
355 line.'' For example, a line that is modified is represented by one
356 deletion and one insertion.
358 The \command{diff} command runs hunks together when there's not enough
359 context between modifications to justify
361 When \command{patch} applies a hunk, it tries a handful of
362 successively less accurate strategies to try to make the hunk apply.
363 This falling-back technique often makes it possible to take a patch
364 that was generated against an old version of a file, and apply it
365 against a newer version of that file.
367 First, \command{patch} tries an exact match, where the line numbers,
368 the context, and the text to be modified must apply exactly. If it
369 cannot make an exact match, it tries to find an exact match for the
370 context, without honouring the line numbering information. If this
371 succeeds, it prints a line of output saying that the hunk was applied,
372 but at some \emph{offset} from the original line number.
374 If a context-only match fails, \command{patch} removes the first and
375 last lines of the context, and tries a \emph{reduced} context-only
376 match. If the hunk with reduced context succeeds, it prints a message
377 saying that it applied the hunk with a \emph{fuzz factor} (the number
378 after the fuzz factor indicates how many lines of context
379 \command{patch} had to trim before the patch applied).
381 When neither of these techniques works, \command{patch} prints a
382 message saying that the hunk in question was rejected. It saves
383 rejected hunks to a file with the same name, and an added
384 \filename{.rej} extension. It also saves an unmodified copy of the
385 file with a \filename{.orig} extension; the copy of the file without
386 any extensions will contain any changes made by hunks that \emph{did}
387 apply cleanly. If you have a patch that modifies \filename{foo} with
388 six hunks, and one of them fails to apply, you will have: an
389 unmodified \filename{foo.orig}, a \filename{foo.rej} containing one
390 hunk, and \filename{foo}, containing the changes made by the five
391 successful five hunks.
393 \subsection{Beware the fuzz}
395 While applying a hunk at an offset, or with a fuzz factor, will often
396 be completely successful, these inexact techniques naturally leave
397 open the possibility of corrupting the patched file. The most common
398 cases typically involve applying a patch twice, or at an incorrect
399 location in the file. If \command{patch} or \hgcmd{qpush} ever
400 mentions an offset or fuzz factor, you should make sure that the
401 modified files are correct afterwards.
403 It's often a good idea to refresh a patch that has applied with an
404 offset or fuzz factor; refreshing the patch generates new context
405 information that will make it apply cleanly. I say ``often,'' not
406 ``always,'' because sometimes refreshing a patch will make it fail to
407 apply against a different revision of the underlying files. In some
408 cases, such as when you're maintaining a patch that must sit on top of
409 multiple versions of a source tree, it's acceptable to have a patch
410 apply with some fuzz, provided you've verified the results of the
411 patching process in such cases.
413 \subsection{Handling rejection}
415 If \hgcmd{qpush} fails to apply a patch, it will print an error
416 message and exit. If it has left \filename{.rej} files behind, it is
417 usually best to fix up the rejected hunks before you push more patches
418 or do any further work.
420 If your patch \emph{used to} apply cleanly, and no longer does because
421 you've changed the underlying code that your patches are based on,
422 Mercurial Queues can help; see section~\ref{seq:mq:merge} for details.
424 Unfortunately, there aren't any great techniques for dealing with
425 rejected hunks. Most often, you'll need to view the \filename{.rej}
426 file and edit the target file, applying the rejected hunks by hand.
428 If you're feeling adventurous, Neil Brown, an Australian Linux kernel
429 hacker, has written a tool called \command{wiggle}~\cite{web:wiggle},
430 which is more vigorous than \command{patch} in its attempts to make a
431 patch apply.
433 Another Linux kernel hacker, Chris Mason (the author of Mercurial
434 Queues), wrote a similar tool called \command{rej}~\cite{web:rej},
435 which takes a simple approach to automating the application of hunks
436 rejected by \command{patch}. \command{rej} can help with four common
437 reasons that a hunk may be rejected:
439 \begin{itemize}
440 \item The context in the middle of a hunk has changed.
441 \item A hunk is missing some context at the beginning or end.
442 \item A large hunk might apply better--either entirely or in part--if
443 it was broken up into smaller hunks.
444 \item A hunk removes lines with slightly different content than those
445 currently present in the file.
446 \end{itemize}
448 If you use \command{wiggle} or \command{rej}, you should be doubly
449 careful to check your results when you're done.
451 \section{Updating your patches when the underlying code changes}
452 \label{sec:mq:merge}
454 XXX.
456 %%% Local Variables:
457 %%% mode: latex
458 %%% TeX-master: "00book"
459 %%% End: