hgbook
diff fr/concepts.tex @ 959:b3cb66d935cf
Finished tour-merge.tex
author | Romain PELISSE <belaran@gmail.com> |
---|---|
date | Mon Feb 23 23:53:18 2009 +0100 (2009-02-23) |
parents | 97e929385442 |
children |
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/fr/concepts.tex Mon Feb 23 23:53:18 2009 +0100 1.3 @@ -0,0 +1,577 @@ 1.4 +\chapter{Behind the scenes} 1.5 +\label{chap:concepts} 1.6 + 1.7 +Unlike many revision control systems, the concepts upon which 1.8 +Mercurial is built are simple enough that it's easy to understand how 1.9 +the software really works. Knowing this certainly isn't necessary, 1.10 +but I find it useful to have a ``mental model'' of what's going on. 1.11 + 1.12 +This understanding gives me confidence that Mercurial has been 1.13 +carefully designed to be both \emph{safe} and \emph{efficient}. And 1.14 +just as importantly, if it's easy for me to retain a good idea of what 1.15 +the software is doing when I perform a revision control task, I'm less 1.16 +likely to be surprised by its behaviour. 1.17 + 1.18 +In this chapter, we'll initially cover the core concepts behind 1.19 +Mercurial's design, then continue to discuss some of the interesting 1.20 +details of its implementation. 1.21 + 1.22 +\section{Mercurial's historical record} 1.23 + 1.24 +\subsection{Tracking the history of a single file} 1.25 + 1.26 +When Mercurial tracks modifications to a file, it stores the history 1.27 +of that file in a metadata object called a \emph{filelog}. Each entry 1.28 +in the filelog contains enough information to reconstruct one revision 1.29 +of the file that is being tracked. Filelogs are stored as files in 1.30 +the \sdirname{.hg/store/data} directory. A filelog contains two kinds 1.31 +of information: revision data, and an index to help Mercurial to find 1.32 +a revision efficiently. 1.33 + 1.34 +A file that is large, or has a lot of history, has its filelog stored 1.35 +in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}'' 1.36 +suffix) files. For small files without much history, the revision 1.37 +data and index are combined in a single ``\texttt{.i}'' file. The 1.38 +correspondence between a file in the working directory and the filelog 1.39 +that tracks its history in the repository is illustrated in 1.40 +figure~\ref{fig:concepts:filelog}. 1.41 + 1.42 +\begin{figure}[ht] 1.43 + \centering 1.44 + \grafix{filelog} 1.45 + \caption{Relationships between files in working directory and 1.46 + filelogs in repository} 1.47 + \label{fig:concepts:filelog} 1.48 +\end{figure} 1.49 + 1.50 +\subsection{Managing tracked files} 1.51 + 1.52 +Mercurial uses a structure called a \emph{manifest} to collect 1.53 +together information about the files that it tracks. Each entry in 1.54 +the manifest contains information about the files present in a single 1.55 +changeset. An entry records which files are present in the changeset, 1.56 +the revision of each file, and a few other pieces of file metadata. 1.57 + 1.58 +\subsection{Recording changeset information} 1.59 + 1.60 +The \emph{changelog} contains information about each changeset. Each 1.61 +revision records who committed a change, the changeset comment, other 1.62 +pieces of changeset-related information, and the revision of the 1.63 +manifest to use. 1.64 + 1.65 +\subsection{Relationships between revisions} 1.66 + 1.67 +Within a changelog, a manifest, or a filelog, each revision stores a 1.68 +pointer to its immediate parent (or to its two parents, if it's a 1.69 +merge revision). As I mentioned above, there are also relationships 1.70 +between revisions \emph{across} these structures, and they are 1.71 +hierarchical in nature. 1.72 + 1.73 +For every changeset in a repository, there is exactly one revision 1.74 +stored in the changelog. Each revision of the changelog contains a 1.75 +pointer to a single revision of the manifest. A revision of the 1.76 +manifest stores a pointer to a single revision of each filelog tracked 1.77 +when that changeset was created. These relationships are illustrated 1.78 +in figure~\ref{fig:concepts:metadata}. 1.79 + 1.80 +\begin{figure}[ht] 1.81 + \centering 1.82 + \grafix{metadata} 1.83 + \caption{Metadata relationships} 1.84 + \label{fig:concepts:metadata} 1.85 +\end{figure} 1.86 + 1.87 +As the illustration shows, there is \emph{not} a ``one to one'' 1.88 +relationship between revisions in the changelog, manifest, or filelog. 1.89 +If the manifest hasn't changed between two changesets, the changelog 1.90 +entries for those changesets will point to the same revision of the 1.91 +manifest. If a file that Mercurial tracks hasn't changed between two 1.92 +changesets, the entry for that file in the two revisions of the 1.93 +manifest will point to the same revision of its filelog. 1.94 + 1.95 +\section{Safe, efficient storage} 1.96 + 1.97 +The underpinnings of changelogs, manifests, and filelogs are provided 1.98 +by a single structure called the \emph{revlog}. 1.99 + 1.100 +\subsection{Efficient storage} 1.101 + 1.102 +The revlog provides efficient storage of revisions using a 1.103 +\emph{delta} mechanism. Instead of storing a complete copy of a file 1.104 +for each revision, it stores the changes needed to transform an older 1.105 +revision into the new revision. For many kinds of file data, these 1.106 +deltas are typically a fraction of a percent of the size of a full 1.107 +copy of a file. 1.108 + 1.109 +Some obsolete revision control systems can only work with deltas of 1.110 +text files. They must either store binary files as complete snapshots 1.111 +or encoded into a text representation, both of which are wasteful 1.112 +approaches. Mercurial can efficiently handle deltas of files with 1.113 +arbitrary binary contents; it doesn't need to treat text as special. 1.114 + 1.115 +\subsection{Safe operation} 1.116 +\label{sec:concepts:txn} 1.117 + 1.118 +Mercurial only ever \emph{appends} data to the end of a revlog file. 1.119 +It never modifies a section of a file after it has written it. This 1.120 +is both more robust and efficient than schemes that need to modify or 1.121 +rewrite data. 1.122 + 1.123 +In addition, Mercurial treats every write as part of a 1.124 +\emph{transaction} that can span a number of files. A transaction is 1.125 +\emph{atomic}: either the entire transaction succeeds and its effects 1.126 +are all visible to readers in one go, or the whole thing is undone. 1.127 +This guarantee of atomicity means that if you're running two copies of 1.128 +Mercurial, where one is reading data and one is writing it, the reader 1.129 +will never see a partially written result that might confuse it. 1.130 + 1.131 +The fact that Mercurial only appends to files makes it easier to 1.132 +provide this transactional guarantee. The easier it is to do stuff 1.133 +like this, the more confident you should be that it's done correctly. 1.134 + 1.135 +\subsection{Fast retrieval} 1.136 + 1.137 +Mercurial cleverly avoids a pitfall common to all earlier 1.138 +revision control systems: the problem of \emph{inefficient retrieval}. 1.139 +Most revision control systems store the contents of a revision as an 1.140 +incremental series of modifications against a ``snapshot''. To 1.141 +reconstruct a specific revision, you must first read the snapshot, and 1.142 +then every one of the revisions between the snapshot and your target 1.143 +revision. The more history that a file accumulates, the more 1.144 +revisions you must read, hence the longer it takes to reconstruct a 1.145 +particular revision. 1.146 + 1.147 +\begin{figure}[ht] 1.148 + \centering 1.149 + \grafix{snapshot} 1.150 + \caption{Snapshot of a revlog, with incremental deltas} 1.151 + \label{fig:concepts:snapshot} 1.152 +\end{figure} 1.153 + 1.154 +The innovation that Mercurial applies to this problem is simple but 1.155 +effective. Once the cumulative amount of delta information stored 1.156 +since the last snapshot exceeds a fixed threshold, it stores a new 1.157 +snapshot (compressed, of course), instead of another delta. This 1.158 +makes it possible to reconstruct \emph{any} revision of a file 1.159 +quickly. This approach works so well that it has since been copied by 1.160 +several other revision control systems. 1.161 + 1.162 +Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry 1.163 +in a revlog's index file, Mercurial stores the range of entries from 1.164 +the data file that it must read to reconstruct a particular revision. 1.165 + 1.166 +\subsubsection{Aside: the influence of video compression} 1.167 + 1.168 +If you're familiar with video compression or have ever watched a TV 1.169 +feed through a digital cable or satellite service, you may know that 1.170 +most video compression schemes store each frame of video as a delta 1.171 +against its predecessor frame. In addition, these schemes use 1.172 +``lossy'' compression techniques to increase the compression ratio, so 1.173 +visual errors accumulate over the course of a number of inter-frame 1.174 +deltas. 1.175 + 1.176 +Because it's possible for a video stream to ``drop out'' occasionally 1.177 +due to signal glitches, and to limit the accumulation of artefacts 1.178 +introduced by the lossy compression process, video encoders 1.179 +periodically insert a complete frame (called a ``key frame'') into the 1.180 +video stream; the next delta is generated against that frame. This 1.181 +means that if the video signal gets interrupted, it will resume once 1.182 +the next key frame is received. Also, the accumulation of encoding 1.183 +errors restarts anew with each key frame. 1.184 + 1.185 +\subsection{Identification and strong integrity} 1.186 + 1.187 +Along with delta or snapshot information, a revlog entry contains a 1.188 +cryptographic hash of the data that it represents. This makes it 1.189 +difficult to forge the contents of a revision, and easy to detect 1.190 +accidental corruption. 1.191 + 1.192 +Hashes provide more than a mere check against corruption; they are 1.193 +used as the identifiers for revisions. The changeset identification 1.194 +hashes that you see as an end user are from revisions of the 1.195 +changelog. Although filelogs and the manifest also use hashes, 1.196 +Mercurial only uses these behind the scenes. 1.197 + 1.198 +Mercurial verifies that hashes are correct when it retrieves file 1.199 +revisions and when it pulls changes from another repository. If it 1.200 +encounters an integrity problem, it will complain and stop whatever 1.201 +it's doing. 1.202 + 1.203 +In addition to the effect it has on retrieval efficiency, Mercurial's 1.204 +use of periodic snapshots makes it more robust against partial data 1.205 +corruption. If a revlog becomes partly corrupted due to a hardware 1.206 +error or system bug, it's often possible to reconstruct some or most 1.207 +revisions from the uncorrupted sections of the revlog, both before and 1.208 +after the corrupted section. This would not be possible with a 1.209 +delta-only storage model. 1.210 + 1.211 +\section{Revision history, branching, 1.212 + and merging} 1.213 + 1.214 +Every entry in a Mercurial revlog knows the identity of its immediate 1.215 +ancestor revision, usually referred to as its \emph{parent}. In fact, 1.216 +a revision contains room for not one parent, but two. Mercurial uses 1.217 +a special hash, called the ``null ID'', to represent the idea ``there 1.218 +is no parent here''. This hash is simply a string of zeroes. 1.219 + 1.220 +In figure~\ref{fig:concepts:revlog}, you can see an example of the 1.221 +conceptual structure of a revlog. Filelogs, manifests, and changelogs 1.222 +all have this same structure; they differ only in the kind of data 1.223 +stored in each delta or snapshot. 1.224 + 1.225 +The first revision in a revlog (at the bottom of the image) has the 1.226 +null ID in both of its parent slots. For a ``normal'' revision, its 1.227 +first parent slot contains the ID of its parent revision, and its 1.228 +second contains the null ID, indicating that the revision has only one 1.229 +real parent. Any two revisions that have the same parent ID are 1.230 +branches. A revision that represents a merge between branches has two 1.231 +normal revision IDs in its parent slots. 1.232 + 1.233 +\begin{figure}[ht] 1.234 + \centering 1.235 + \grafix{revlog} 1.236 + \caption{} 1.237 + \label{fig:concepts:revlog} 1.238 +\end{figure} 1.239 + 1.240 +\section{The working directory} 1.241 + 1.242 +In the working directory, Mercurial stores a snapshot of the files 1.243 +from the repository as of a particular changeset. 1.244 + 1.245 +The working directory ``knows'' which changeset it contains. When you 1.246 +update the working directory to contain a particular changeset, 1.247 +Mercurial looks up the appropriate revision of the manifest to find 1.248 +out which files it was tracking at the time that changeset was 1.249 +committed, and which revision of each file was then current. It then 1.250 +recreates a copy of each of those files, with the same contents it had 1.251 +when the changeset was committed. 1.252 + 1.253 +The \emph{dirstate} contains Mercurial's knowledge of the working 1.254 +directory. This details which changeset the working directory is 1.255 +updated to, and all of the files that Mercurial is tracking in the 1.256 +working directory. 1.257 + 1.258 +Just as a revision of a revlog has room for two parents, so that it 1.259 +can represent either a normal revision (with one parent) or a merge of 1.260 +two earlier revisions, the dirstate has slots for two parents. When 1.261 +you use the \hgcmd{update} command, the changeset that you update to 1.262 +is stored in the ``first parent'' slot, and the null ID in the second. 1.263 +When you \hgcmd{merge} with another changeset, the first parent 1.264 +remains unchanged, and the second parent is filled in with the 1.265 +changeset you're merging with. The \hgcmd{parents} command tells you 1.266 +what the parents of the dirstate are. 1.267 + 1.268 +\subsection{What happens when you commit} 1.269 + 1.270 +The dirstate stores parent information for more than just book-keeping 1.271 +purposes. Mercurial uses the parents of the dirstate as \emph{the 1.272 + parents of a new changeset} when you perform a commit. 1.273 + 1.274 +\begin{figure}[ht] 1.275 + \centering 1.276 + \grafix{wdir} 1.277 + \caption{The working directory can have two parents} 1.278 + \label{fig:concepts:wdir} 1.279 +\end{figure} 1.280 + 1.281 +Figure~\ref{fig:concepts:wdir} shows the normal state of the working 1.282 +directory, where it has a single changeset as parent. That changeset 1.283 +is the \emph{tip}, the newest changeset in the repository that has no 1.284 +children. 1.285 + 1.286 +\begin{figure}[ht] 1.287 + \centering 1.288 + \grafix{wdir-after-commit} 1.289 + \caption{The working directory gains new parents after a commit} 1.290 + \label{fig:concepts:wdir-after-commit} 1.291 +\end{figure} 1.292 + 1.293 +It's useful to think of the working directory as ``the changeset I'm 1.294 +about to commit''. Any files that you tell Mercurial that you've 1.295 +added, removed, renamed, or copied will be reflected in that 1.296 +changeset, as will modifications to any files that Mercurial is 1.297 +already tracking; the new changeset will have the parents of the 1.298 +working directory as its parents. 1.299 + 1.300 +After a commit, Mercurial will update the parents of the working 1.301 +directory, so that the first parent is the ID of the new changeset, 1.302 +and the second is the null ID. This is shown in 1.303 +figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch 1.304 +any of the files in the working directory when you commit; it just 1.305 +modifies the dirstate to note its new parents. 1.306 + 1.307 +\subsection{Creating a new head} 1.308 + 1.309 +It's perfectly normal to update the working directory to a changeset 1.310 +other than the current tip. For example, you might want to know what 1.311 +your project looked like last Tuesday, or you could be looking through 1.312 +changesets to see which one introduced a bug. In cases like this, the 1.313 +natural thing to do is update the working directory to the changeset 1.314 +you're interested in, and then examine the files in the working 1.315 +directory directly to see their contents as they were when you 1.316 +committed that changeset. The effect of this is shown in 1.317 +figure~\ref{fig:concepts:wdir-pre-branch}. 1.318 + 1.319 +\begin{figure}[ht] 1.320 + \centering 1.321 + \grafix{wdir-pre-branch} 1.322 + \caption{The working directory, updated to an older changeset} 1.323 + \label{fig:concepts:wdir-pre-branch} 1.324 +\end{figure} 1.325 + 1.326 +Having updated the working directory to an older changeset, what 1.327 +happens if you make some changes, and then commit? Mercurial behaves 1.328 +in the same way as I outlined above. The parents of the working 1.329 +directory become the parents of the new changeset. This new changeset 1.330 +has no children, so it becomes the new tip. And the repository now 1.331 +contains two changesets that have no children; we call these 1.332 +\emph{heads}. You can see the structure that this creates in 1.333 +figure~\ref{fig:concepts:wdir-branch}. 1.334 + 1.335 +\begin{figure}[ht] 1.336 + \centering 1.337 + \grafix{wdir-branch} 1.338 + \caption{After a commit made while synced to an older changeset} 1.339 + \label{fig:concepts:wdir-branch} 1.340 +\end{figure} 1.341 + 1.342 +\begin{note} 1.343 + If you're new to Mercurial, you should keep in mind a common 1.344 + ``error'', which is to use the \hgcmd{pull} command without any 1.345 + options. By default, the \hgcmd{pull} command \emph{does not} 1.346 + update the working directory, so you'll bring new changesets into 1.347 + your repository, but the working directory will stay synced at the 1.348 + same changeset as before the pull. If you make some changes and 1.349 + commit afterwards, you'll thus create a new head, because your 1.350 + working directory isn't synced to whatever the current tip is. 1.351 + 1.352 + I put the word ``error'' in quotes because all that you need to do 1.353 + to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In 1.354 + other words, this almost never has negative consequences; it just 1.355 + surprises people. I'll discuss other ways to avoid this behaviour, 1.356 + and why Mercurial behaves in this initially surprising way, later 1.357 + on. 1.358 +\end{note} 1.359 + 1.360 +\subsection{Merging heads} 1.361 + 1.362 +When you run the \hgcmd{merge} command, Mercurial leaves the first 1.363 +parent of the working directory unchanged, and sets the second parent 1.364 +to the changeset you're merging with, as shown in 1.365 +figure~\ref{fig:concepts:wdir-merge}. 1.366 + 1.367 +\begin{figure}[ht] 1.368 + \centering 1.369 + \grafix{wdir-merge} 1.370 + \caption{Merging two heads} 1.371 + \label{fig:concepts:wdir-merge} 1.372 +\end{figure} 1.373 + 1.374 +Mercurial also has to modify the working directory, to merge the files 1.375 +managed in the two changesets. Simplified a little, the merging 1.376 +process goes like this, for every file in the manifests of both 1.377 +changesets. 1.378 +\begin{itemize} 1.379 +\item If neither changeset has modified a file, do nothing with that 1.380 + file. 1.381 +\item If one changeset has modified a file, and the other hasn't, 1.382 + create the modified copy of the file in the working directory. 1.383 +\item If one changeset has removed a file, and the other hasn't (or 1.384 + has also deleted it), delete the file from the working directory. 1.385 +\item If one changeset has removed a file, but the other has modified 1.386 + the file, ask the user what to do: keep the modified file, or remove 1.387 + it? 1.388 +\item If both changesets have modified a file, invoke an external 1.389 + merge program to choose the new contents for the merged file. This 1.390 + may require input from the user. 1.391 +\item If one changeset has modified a file, and the other has renamed 1.392 + or copied the file, make sure that the changes follow the new name 1.393 + of the file. 1.394 +\end{itemize} 1.395 +There are more details---merging has plenty of corner cases---but 1.396 +these are the most common choices that are involved in a merge. As 1.397 +you can see, most cases are completely automatic, and indeed most 1.398 +merges finish automatically, without requiring your input to resolve 1.399 +any conflicts. 1.400 + 1.401 +When you're thinking about what happens when you commit after a merge, 1.402 +once again the working directory is ``the changeset I'm about to 1.403 +commit''. After the \hgcmd{merge} command completes, the working 1.404 +directory has two parents; these will become the parents of the new 1.405 +changeset. 1.406 + 1.407 +Mercurial lets you perform multiple merges, but you must commit the 1.408 +results of each individual merge as you go. This is necessary because 1.409 +Mercurial only tracks two parents for both revisions and the working 1.410 +directory. While it would be technically possible to merge multiple 1.411 +changesets at once, the prospect of user confusion and making a 1.412 +terrible mess of a merge immediately becomes overwhelming. 1.413 + 1.414 +\section{Other interesting design features} 1.415 + 1.416 +In the sections above, I've tried to highlight some of the most 1.417 +important aspects of Mercurial's design, to illustrate that it pays 1.418 +careful attention to reliability and performance. However, the 1.419 +attention to detail doesn't stop there. There are a number of other 1.420 +aspects of Mercurial's construction that I personally find 1.421 +interesting. I'll detail a few of them here, separate from the ``big 1.422 +ticket'' items above, so that if you're interested, you can gain a 1.423 +better idea of the amount of thinking that goes into a well-designed 1.424 +system. 1.425 + 1.426 +\subsection{Clever compression} 1.427 + 1.428 +When appropriate, Mercurial will store both snapshots and deltas in 1.429 +compressed form. It does this by always \emph{trying to} compress a 1.430 +snapshot or delta, but only storing the compressed version if it's 1.431 +smaller than the uncompressed version. 1.432 + 1.433 +This means that Mercurial does ``the right thing'' when storing a file 1.434 +whose native form is compressed, such as a \texttt{zip} archive or a 1.435 +JPEG image. When these types of files are compressed a second time, 1.436 +the resulting file is usually bigger than the once-compressed form, 1.437 +and so Mercurial will store the plain \texttt{zip} or JPEG. 1.438 + 1.439 +Deltas between revisions of a compressed file are usually larger than 1.440 +snapshots of the file, and Mercurial again does ``the right thing'' in 1.441 +these cases. It finds that such a delta exceeds the threshold at 1.442 +which it should store a complete snapshot of the file, so it stores 1.443 +the snapshot, again saving space compared to a naive delta-only 1.444 +approach. 1.445 + 1.446 +\subsubsection{Network recompression} 1.447 + 1.448 +When storing revisions on disk, Mercurial uses the ``deflate'' 1.449 +compression algorithm (the same one used by the popular \texttt{zip} 1.450 +archive format), which balances good speed with a respectable 1.451 +compression ratio. However, when transmitting revision data over a 1.452 +network connection, Mercurial uncompresses the compressed revision 1.453 +data. 1.454 + 1.455 +If the connection is over HTTP, Mercurial recompresses the entire 1.456 +stream of data using a compression algorithm that gives a better 1.457 +compression ratio (the Burrows-Wheeler algorithm from the widely used 1.458 +\texttt{bzip2} compression package). This combination of algorithm 1.459 +and compression of the entire stream (instead of a revision at a time) 1.460 +substantially reduces the number of bytes to be transferred, yielding 1.461 +better network performance over almost all kinds of network. 1.462 + 1.463 +(If the connection is over \command{ssh}, Mercurial \emph{doesn't} 1.464 +recompress the stream, because \command{ssh} can already do this 1.465 +itself.) 1.466 + 1.467 +\subsection{Read/write ordering and atomicity} 1.468 + 1.469 +Appending to files isn't the whole story when it comes to guaranteeing 1.470 +that a reader won't see a partial write. If you recall 1.471 +figure~\ref{fig:concepts:metadata}, revisions in the changelog point to 1.472 +revisions in the manifest, and revisions in the manifest point to 1.473 +revisions in filelogs. This hierarchy is deliberate. 1.474 + 1.475 +A writer starts a transaction by writing filelog and manifest data, 1.476 +and doesn't write any changelog data until those are finished. A 1.477 +reader starts by reading changelog data, then manifest data, followed 1.478 +by filelog data. 1.479 + 1.480 +Since the writer has always finished writing filelog and manifest data 1.481 +before it writes to the changelog, a reader will never read a pointer 1.482 +to a partially written manifest revision from the changelog, and it will 1.483 +never read a pointer to a partially written filelog revision from the 1.484 +manifest. 1.485 + 1.486 +\subsection{Concurrent access} 1.487 + 1.488 +The read/write ordering and atomicity guarantees mean that Mercurial 1.489 +never needs to \emph{lock} a repository when it's reading data, even 1.490 +if the repository is being written to while the read is occurring. 1.491 +This has a big effect on scalability; you can have an arbitrary number 1.492 +of Mercurial processes safely reading data from a repository safely 1.493 +all at once, no matter whether it's being written to or not. 1.494 + 1.495 +The lockless nature of reading means that if you're sharing a 1.496 +repository on a multi-user system, you don't need to grant other local 1.497 +users permission to \emph{write} to your repository in order for them 1.498 +to be able to clone it or pull changes from it; they only need 1.499 +\emph{read} permission. (This is \emph{not} a common feature among 1.500 +revision control systems, so don't take it for granted! Most require 1.501 +readers to be able to lock a repository to access it safely, and this 1.502 +requires write permission on at least one directory, which of course 1.503 +makes for all kinds of nasty and annoying security and administrative 1.504 +problems.) 1.505 + 1.506 +Mercurial uses locks to ensure that only one process can write to a 1.507 +repository at a time (the locking mechanism is safe even over 1.508 +filesystems that are notoriously hostile to locking, such as NFS). If 1.509 +a repository is locked, a writer will wait for a while to retry if the 1.510 +repository becomes unlocked, but if the repository remains locked for 1.511 +too long, the process attempting to write will time out after a while. 1.512 +This means that your daily automated scripts won't get stuck forever 1.513 +and pile up if a system crashes unnoticed, for example. (Yes, the 1.514 +timeout is configurable, from zero to infinity.) 1.515 + 1.516 +\subsubsection{Safe dirstate access} 1.517 + 1.518 +As with revision data, Mercurial doesn't take a lock to read the 1.519 +dirstate file; it does acquire a lock to write it. To avoid the 1.520 +possibility of reading a partially written copy of the dirstate file, 1.521 +Mercurial writes to a file with a unique name in the same directory as 1.522 +the dirstate file, then renames the temporary file atomically to 1.523 +\filename{dirstate}. The file named \filename{dirstate} is thus 1.524 +guaranteed to be complete, not partially written. 1.525 + 1.526 +\subsection{Avoiding seeks} 1.527 + 1.528 +Critical to Mercurial's performance is the avoidance of seeks of the 1.529 +disk head, since any seek is far more expensive than even a 1.530 +comparatively large read operation. 1.531 + 1.532 +This is why, for example, the dirstate is stored in a single file. If 1.533 +there were a dirstate file per directory that Mercurial tracked, the 1.534 +disk would seek once per directory. Instead, Mercurial reads the 1.535 +entire single dirstate file in one step. 1.536 + 1.537 +Mercurial also uses a ``copy on write'' scheme when cloning a 1.538 +repository on local storage. Instead of copying every revlog file 1.539 +from the old repository into the new repository, it makes a ``hard 1.540 +link'', which is a shorthand way to say ``these two names point to the 1.541 +same file''. When Mercurial is about to write to one of a revlog's 1.542 +files, it checks to see if the number of names pointing at the file is 1.543 +greater than one. If it is, more than one repository is using the 1.544 +file, so Mercurial makes a new copy of the file that is private to 1.545 +this repository. 1.546 + 1.547 +A few revision control developers have pointed out that this idea of 1.548 +making a complete private copy of a file is not very efficient in its 1.549 +use of storage. While this is true, storage is cheap, and this method 1.550 +gives the highest performance while deferring most book-keeping to the 1.551 +operating system. An alternative scheme would most likely reduce 1.552 +performance and increase the complexity of the software, each of which 1.553 +is much more important to the ``feel'' of day-to-day use. 1.554 + 1.555 +\subsection{Other contents of the dirstate} 1.556 + 1.557 +Because Mercurial doesn't force you to tell it when you're modifying a 1.558 +file, it uses the dirstate to store some extra information so it can 1.559 +determine efficiently whether you have modified a file. For each file 1.560 +in the working directory, it stores the time that it last modified the 1.561 +file itself, and the size of the file at that time. 1.562 + 1.563 +When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or 1.564 +\hgcmd{copy} files, Mercurial updates the dirstate so that it knows 1.565 +what to do with those files when you commit. 1.566 + 1.567 +When Mercurial is checking the states of files in the working 1.568 +directory, it first checks a file's modification time. If that has 1.569 +not changed, the file must not have been modified. If the file's size 1.570 +has changed, the file must have been modified. If the modification 1.571 +time has changed, but the size has not, only then does Mercurial need 1.572 +to read the actual contents of the file to see if they've changed. 1.573 +Storing these few extra pieces of information dramatically reduces the 1.574 +amount of data that Mercurial needs to read, which yields large 1.575 +performance improvements compared to other revision control systems. 1.576 + 1.577 +%%% Local Variables: 1.578 +%%% mode: latex 1.579 +%%% TeX-master: "00book" 1.580 +%%% End: