hgbook

changeset 110:75c076c7a374

More concepts stuff.
author Bryan O'Sullivan <bos@serpentine.com>
date Fri Nov 10 15:09:49 2006 -0800 (2006-11-10)
parents 1b67dc96f27a
children 34b8b7a15ea1
files en/Makefile en/concepts.tex en/metadata.svg
line diff
     1.1 --- a/en/Makefile	Fri Nov 10 12:42:00 2006 -0800
     1.2 +++ b/en/Makefile	Fri Nov 10 15:09:49 2006 -0800
     1.3 @@ -25,6 +25,7 @@
     1.4  	kdiff3.png \
     1.5  	metadata.svg \
     1.6  	mq-stack.svg \
     1.7 +	snapshot.svg \
     1.8  	tour-history.svg \
     1.9  	tour-merge-conflict.svg \
    1.10  	tour-merge-merge.svg \
     2.1 --- a/en/concepts.tex	Fri Nov 10 12:42:00 2006 -0800
     2.2 +++ b/en/concepts.tex	Fri Nov 10 15:09:49 2006 -0800
     2.3 @@ -77,15 +77,15 @@
     2.4    \label{fig:concepts:metadata}
     2.5  \end{figure}
     2.6  
     2.7 -Note that there is not a ``one to one'' relationship between revisions
     2.8 -in these different metadata files.  If the manifest hasn't changed
     2.9 -between two changesets, their changelog entries will point to the same
    2.10 -revision of the manifest.  If a file that Mercurial tracks hasn't
    2.11 -changed between two changesets, the entry for that file in the two
    2.12 -revisions of the manifest will point to the same revision of its
    2.13 -filelog.
    2.14 -
    2.15 -\section{An efficient, unified, safe storage mechanism}
    2.16 +As the illustration shows, there is \emph{not} a ``one to one''
    2.17 +relationship between revisions in the changelog, manifest, or filelog.
    2.18 +If the manifest hasn't changed between two changesets, the changelog
    2.19 +entries for those changesets will point to the same revision of the
    2.20 +manifest.  If a file that Mercurial tracks hasn't changed between two
    2.21 +changesets, the entry for that file in the two revisions of the
    2.22 +manifest will point to the same revision of its filelog.
    2.23 +
    2.24 +\section{Safe, efficient storage}
    2.25  
    2.26  The underpinnings of changelogs, manifests, and filelogs are provided
    2.27  by a single structure called the \emph{revlog}.
    2.28 @@ -136,13 +136,24 @@
    2.29  revisions you must read, hence the longer it takes to reconstruct a
    2.30  particular revision.
    2.31  
    2.32 +\begin{figure}[ht]
    2.33 +  \centering
    2.34 +  \grafix{snapshot}
    2.35 +  \caption{Snapshot of a revlog, with incremental deltas}
    2.36 +  \label{fig:concepts:snapshot}
    2.37 +\end{figure}
    2.38 +
    2.39  The innovation that Mercurial applies to this problem is simple but
    2.40  effective.  Once the cumulative amount of delta information stored
    2.41  since the last snapshot exceeds a fixed threshold, it stores a new
    2.42  snapshot (compressed, of course), instead of another delta.  This
    2.43  makes it possible to reconstruct \emph{any} revision of a file
    2.44 -quickly.  This approach works so well that it has subsequently been
    2.45 -copied by several other revision control systems.
    2.46 +quickly.  This approach works so well that it has since been copied by
    2.47 +several other revision control systems.
    2.48 +
    2.49 +Figure~\ref{fig:concepts:snapshot} illustrates the idea.  In an entry
    2.50 +in a revlog's index file, Mercurial stores the range of entries from
    2.51 +the data file that it must read to reconstruct a particular revision.
    2.52  
    2.53  \subsubsection{Aside: the influence of video compression}
    2.54  
    2.55 @@ -163,26 +174,6 @@
    2.56  the next key frame is received.  Also, the accumulation of encoding
    2.57  errors restarts anew with each key frame.
    2.58  
    2.59 -\subsection{Clever compression}
    2.60 -
    2.61 -When appropriate, Mercurial will store both snapshots and deltas in
    2.62 -compressed form.  It does this by always \emph{trying to} compress a
    2.63 -snapshot or delta, but only storing the compressed version if it's
    2.64 -smaller than the uncompressed version.
    2.65 -
    2.66 -This means that Mercurial does ``the right thing'' when storing a file
    2.67 -whose native form is compressed, such as a \texttt{zip} archive or a
    2.68 -JPEG image.  When these types of files are compressed a second time,
    2.69 -the resulting file is usually bigger than the once-compressed form,
    2.70 -and so Mercurial will store the plain \texttt{zip} or JPEG.
    2.71 -
    2.72 -Deltas between revisions of a compressed file are usually larger than
    2.73 -snapshots of the file, and Mercurial again does ``the right thing'' in
    2.74 -these cases.  It finds that such a delta exceeds the threshold at
    2.75 -which it should store a complete snapshot of the file, so it stores
    2.76 -the snapshot, again saving space compared to a naive delta-only
    2.77 -approach.
    2.78 -
    2.79  \subsection{Strong integrity}
    2.80  
    2.81  Along with delta or snapshot information, a revlog entry contains a
    2.82 @@ -202,6 +193,83 @@
    2.83  after the corrupted section.  This would not be possible with a
    2.84  delta-only storage model.
    2.85  
    2.86 +\section{The working directory}
    2.87 +
    2.88 +Mercurial's good ideas are not confined to the repository; it also
    2.89 +needs to manage the working directory.  The \emph{dirstate} contains
    2.90 +Mercurial's knowledge of the working directory.  This details which
    2.91 +revision(s) the working directory is updated to, and all files that
    2.92 +Mercurial is tracking in the working directory.
    2.93 +
    2.94 +Because Mercurial doesn't force you to tell it when you're modifying a
    2.95 +file, it uses the dirstate to store some extra information so it can
    2.96 +determine efficiently whether you have modified a file.  For each file
    2.97 +in the working directory, it stores the time that it last modified the
    2.98 +file itself, and the size of the file at that time.  
    2.99 +
   2.100 +When Mercurial is checking the states of files in the working
   2.101 +directory, it first checks a file's modification time.  If that has
   2.102 +not changed, the file must not have been modified.  If the file's size
   2.103 +has changed, the file must have been modified.  If the modification
   2.104 +time has changed, but the size has not, only then does Mercurial need
   2.105 +to read the actual contents of the file to see if they've changed.
   2.106 +Storing these few extra pieces of information dramatically reduces the
   2.107 +amount of data that Mercurial needs to read, which yields large
   2.108 +performance improvements compared to other revision control systems.
   2.109 +
   2.110 +\section{Other interesting design features}
   2.111 +
   2.112 +In the sections above, I've tried to highlight some of the most
   2.113 +important aspects of Mercurial's design, to illustrate that it pays
   2.114 +careful attention to reliability and performance.  However, the
   2.115 +attention to detail doesn't stop there.  There are a number of other
   2.116 +aspects of Mercurial's construction that I personally find
   2.117 +interesting.  I'll detail a few of them here, separate from the ``big
   2.118 +ticket'' items above, so that if you're interested, you can gain a
   2.119 +better idea of the amount of thinking that goes into a well-designed
   2.120 +system.
   2.121 +
   2.122 +\subsection{Clever compression}
   2.123 +
   2.124 +When appropriate, Mercurial will store both snapshots and deltas in
   2.125 +compressed form.  It does this by always \emph{trying to} compress a
   2.126 +snapshot or delta, but only storing the compressed version if it's
   2.127 +smaller than the uncompressed version.
   2.128 +
   2.129 +This means that Mercurial does ``the right thing'' when storing a file
   2.130 +whose native form is compressed, such as a \texttt{zip} archive or a
   2.131 +JPEG image.  When these types of files are compressed a second time,
   2.132 +the resulting file is usually bigger than the once-compressed form,
   2.133 +and so Mercurial will store the plain \texttt{zip} or JPEG.
   2.134 +
   2.135 +Deltas between revisions of a compressed file are usually larger than
   2.136 +snapshots of the file, and Mercurial again does ``the right thing'' in
   2.137 +these cases.  It finds that such a delta exceeds the threshold at
   2.138 +which it should store a complete snapshot of the file, so it stores
   2.139 +the snapshot, again saving space compared to a naive delta-only
   2.140 +approach.
   2.141 +
   2.142 +\subsubsection{Network recompression}
   2.143 +
   2.144 +When storing revisions on disk, Mercurial uses the ``deflate''
   2.145 +compression algorithm (the same one used by the popular \texttt{zip}
   2.146 +archive format), which balances good speed with a respectable
   2.147 +compression ratio.  However, when transmitting revision data over a
   2.148 +network connection, Mercurial uncompresses the compressed revision
   2.149 +data.
   2.150 +
   2.151 +If the connection is over HTTP, Mercurial recompresses the entire
   2.152 +stream of data using a compression algorithm that gives a etter
   2.153 +compression ratio (the Burrows-Wheeler algorithm from the widely used
   2.154 +\texttt{bzip2} compression package).  This combination of algorithm
   2.155 +and compression of the entire stream (instead of a revision at a time)
   2.156 +substantially reduces the number of bytes to be transferred, yielding
   2.157 +better network performance over almost all kinds of network.
   2.158 +
   2.159 +(If the connection is over \command{ssh}, Mercurial \emph{doesn't}
   2.160 +recompress the stream, because \command{ssh} can already do this
   2.161 +itself.)
   2.162 +
   2.163  \subsection{Read/write ordering and atomicity}
   2.164  
   2.165  Appending to files isn't the whole story when it comes to guaranteeing
   2.166 @@ -241,15 +309,25 @@
   2.167  makes for all kinds of nasty and annoying security and administrative
   2.168  problems.)
   2.169  
   2.170 -Mercurial uses a locking mechanism to ensure that only one process can
   2.171 -write to a repository at a time.  This locking mechanism is safe even
   2.172 -over filesystems that are notoriously unsafe for locking, such as NFS.
   2.173 -If a repository is locked, a writer will wait for a while to retry if
   2.174 -the repository becomes unlocked, but if the repository remains locked
   2.175 -for too long, the process attempting to write will time out after a
   2.176 -while.  This means that your daily automated scripts won't get stuck
   2.177 -forever and pile up if a system crashes unnoticed, for example.  (Yes,
   2.178 -the timeout is configurable, from zero to infinity.)
   2.179 +Mercurial uses locks to ensure that only one process can write to a
   2.180 +repository at a time (the locking mechanism is safe even over
   2.181 +filesystems that are notoriously hostile to locking, such as NFS).  If
   2.182 +a repository is locked, a writer will wait for a while to retry if the
   2.183 +repository becomes unlocked, but if the repository remains locked for
   2.184 +too long, the process attempting to write will time out after a while.
   2.185 +This means that your daily automated scripts won't get stuck forever
   2.186 +and pile up if a system crashes unnoticed, for example.  (Yes, the
   2.187 +timeout is configurable, from zero to infinity.)
   2.188 +
   2.189 +\subsubsection{Safe dirstate access}
   2.190 +
   2.191 +As with revision data, Mercurial doesn't take a lock to read the
   2.192 +dirstate file; it does acquire a lock to write it.  To avoid the
   2.193 +possibility of reading a partially written copy of the dirstate file,
   2.194 +Mercurial writes to a file with a unique name in the same directory as
   2.195 +the dirstate file, then renames the temporary file atomically to
   2.196 +\filename{dirstate}.  The file named \filename{dirstate} is thus
   2.197 +guaranteed to be complete, not partially written.
   2.198  
   2.199  
   2.200  
     3.1 --- a/en/metadata.svg	Fri Nov 10 12:42:00 2006 -0800
     3.2 +++ b/en/metadata.svg	Fri Nov 10 15:09:49 2006 -0800
     3.3 @@ -67,7 +67,7 @@
     3.4       id="layer1">
     3.5      <path
     3.6         style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#a7a7a7;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0;stroke-opacity:1;display:inline"
     3.7 -       d="M 326.94646,471.18359 L 326.94646,510.98123"
     3.8 +       d="M 326.94646,467.18359 L 326.94646,510.98123"
     3.9         id="path1910"
    3.10         inkscape:connector-type="polyline"
    3.11         inkscape:connection-end="#rect2962"
    3.12 @@ -87,8 +87,8 @@
    3.13         inkscape:connection-end="#rect3038"
    3.14         inkscape:connection-start="#rect2962" />
    3.15      <path
    3.16 -       style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#484848;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5,1.5;stroke-dashoffset:0"
    3.17 -       d="M 254.23217,471.18359 L 254.23216,510.98123"
    3.18 +       style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#484848;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0;stroke-opacity:1"
    3.19 +       d="M 254.23217,467.18359 L 254.23216,510.98123"
    3.20         id="path3088"
    3.21         inkscape:connector-type="polyline"
    3.22         inkscape:connection-start="#rect1872"
    3.23 @@ -113,52 +113,52 @@
    3.24         width="51.42857"
    3.25         height="20"
    3.26         x="228.51788"
    3.27 -       y="450.68359" />
    3.28 +       y="446.68359" />
    3.29      <rect
    3.30         style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
    3.31         id="rect2764"
    3.32         width="51.42857"
    3.33         height="20"
    3.34         x="301.23218"
    3.35 -       y="450.68359" />
    3.36 +       y="446.68359" />
    3.37      <rect
    3.38         style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
    3.39         id="rect2766"
    3.40         width="51.42857"
    3.41         height="20"
    3.42         x="155.80359"
    3.43 -       y="450.68359" />
    3.44 +       y="446.68359" />
    3.45      <rect
    3.46         style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
    3.47         id="rect2768"
    3.48         width="51.42857"
    3.49         height="20"
    3.50         x="83.089294"
    3.51 -       y="450.68359" />
    3.52 -    <path
    3.53 -       style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1"
    3.54 -       d="M 135.01786,460.68359 L 155.30359,460.68359"
    3.55 +       y="446.68359" />
    3.56 +    <path
    3.57 +       style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1"
    3.58 +       d="M 135.01786,456.68359 L 155.30359,456.68359"
    3.59         id="path2770"
    3.60         inkscape:connector-type="polyline"
    3.61         inkscape:connection-start="#rect2768"
    3.62         inkscape:connection-end="#rect2766" />
    3.63      <path
    3.64         style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1"
    3.65 -       d="M 207.73216,460.68359 L 228.01788,460.68359"
    3.66 +       d="M 207.73216,456.68359 L 228.01788,456.68359"
    3.67         id="path2772"
    3.68         inkscape:connector-type="polyline"
    3.69         inkscape:connection-start="#rect2766"
    3.70         inkscape:connection-end="#rect1872" />
    3.71      <path
    3.72         style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1"
    3.73 -       d="M 280.44645,460.68359 L 300.73218,460.68359"
    3.74 +       d="M 280.44645,456.68359 L 300.73218,456.68359"
    3.75         id="path2774"
    3.76         inkscape:connector-type="polyline"
    3.77         inkscape:connection-start="#rect1872"
    3.78         inkscape:connection-end="#rect2764" />
    3.79      <path
    3.80         style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:3, 3;stroke-dashoffset:0;stroke-opacity:1"
    3.81 -       d="M 62.303571,460.68359 L 82.589293,460.68359"
    3.82 +       d="M 62.303571,456.68359 L 82.589294,456.68359"
    3.83         id="path2778"
    3.84         inkscape:connector-type="polyline"
    3.85         inkscape:connection-end="#rect2768" />
    3.86 @@ -298,12 +298,12 @@
    3.87         xml:space="preserve"
    3.88         style="font-size:12px;font-style:normal;font-weight:normal;fill:black;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;font-family:Times New Roman"
    3.89         x="82.072548"
    3.90 -       y="436.64789"
    3.91 +       y="432.64789"
    3.92         id="text3094"><tspan
    3.93           sodipodi:role="line"
    3.94           id="tspan3096"
    3.95           x="82.072548"
    3.96 -         y="436.64789">Changelog</tspan></text>
    3.97 +         y="432.64789">Changelog</tspan></text>
    3.98      <text
    3.99         xml:space="preserve"
   3.100         style="font-size:12px;font-style:normal;font-weight:normal;fill:black;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;font-family:Times New Roman"