hgbook
diff en/filenames.tex @ 146:65f6f9d18fa1
Oops! I forgot that I need the undoctored output files in the book!
Now they're named "*.lxo", instead of "*.out". Ugh.
Now they're named "*.lxo", instead of "*.out". Ugh.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Tue Mar 06 21:55:48 2007 -0800 (2007-03-06) |
parents | |
children | 7f07aca44938 |
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/en/filenames.tex Tue Mar 06 21:55:48 2007 -0800 1.3 @@ -0,0 +1,306 @@ 1.4 +\chapter{File names and pattern matching} 1.5 +\label{chap:names} 1.6 + 1.7 +Mercurial provides mechanisms that let you work with file names in a 1.8 +consistent and expressive way. 1.9 + 1.10 +\section{Simple file naming} 1.11 + 1.12 +Mercurial uses a unified piece of machinery ``under the hood'' to 1.13 +handle file names. Every command behaves uniformly with respect to 1.14 +file names. The way in which commands work with file names is as 1.15 +follows. 1.16 + 1.17 +If you explicitly name real files on the command line, Mercurial works 1.18 +with exactly those files, as you would expect. 1.19 +\interaction{filenames.files} 1.20 + 1.21 +When you provide a directory name, Mercurial will interpret this as 1.22 +``operate on every file in this directory and its subdirectories''. 1.23 +Mercurial traverses the files and subdirectories in a directory in 1.24 +alphabetical order. When it encounters a subdirectory, it will 1.25 +traverse that subdirectory before continuing with the current 1.26 +directory. 1.27 +\interaction{filenames.dirs} 1.28 + 1.29 +\section{Running commands without any file names} 1.30 + 1.31 +Mercurial's commands that work with file names have useful default 1.32 +behaviours when you invoke them without providing any file names or 1.33 +patterns. What kind of behaviour you should expect depends on what 1.34 +the command does. Here are a few rules of thumb you can use to 1.35 +predict what a command is likely to do if you don't give it any names 1.36 +to work with. 1.37 +\begin{itemize} 1.38 +\item Most commands will operate on the entire working directory. 1.39 + This is what the \hgcmd{add} command does, for example. 1.40 +\item If the command has effects that are difficult or impossible to 1.41 + reverse, it will force you to explicitly provide at least one name 1.42 + or pattern (see below). This protects you from accidentally 1.43 + deleting files by running \hgcmd{remove} with no arguments, for 1.44 + example. 1.45 +\end{itemize} 1.46 + 1.47 +It's easy to work around these default behaviours if they don't suit 1.48 +you. If a command normally operates on the whole working directory, 1.49 +you can invoke it on just the current directory and its subdirectories 1.50 +by giving it the name ``\dirname{.}''. 1.51 +\interaction{filenames.wdir-subdir} 1.52 + 1.53 +Along the same lines, some commands normally print file names relative 1.54 +to the root of the repository, even if you're invoking them from a 1.55 +subdirectory. Such a command will print file names relative to your 1.56 +subdirectory if you give it explicit names. Here, we're going to run 1.57 +\hgcmd{status} from a subdirectory, and get it to operate on the 1.58 +entire working directory while printing file names relative to our 1.59 +subdirectory, by passing it the output of the \hgcmd{root} command. 1.60 +\interaction{filenames.wdir-relname} 1.61 + 1.62 +\section{Telling you what's going on} 1.63 + 1.64 +The \hgcmd{add} example in the preceding section illustrates something 1.65 +else that's helpful about Mercurial commands. If a command operates 1.66 +on a file that you didn't name explicitly on the command line, it will 1.67 +usually print the name of the file, so that you will not be surprised 1.68 +what's going on. 1.69 + 1.70 +The principle here is of \emph{least surprise}. If you've exactly 1.71 +named a file on the command line, there's no point in repeating it 1.72 +back at you. If Mercurial is acting on a file \emph{implicitly}, 1.73 +because you provided no names, or a directory, or a pattern (see 1.74 +below), it's safest to tell you what it's doing. 1.75 + 1.76 +For commands that behave this way, you can silence them using the 1.77 +\hggopt{-q} option. You can also get them to print the name of every 1.78 +file, even those you've named explicitly, using the \hggopt{-v} 1.79 +option. 1.80 + 1.81 +\section{Using patterns to identify files} 1.82 + 1.83 +In addition to working with file and directory names, Mercurial lets 1.84 +you use \emph{patterns} to identify files. Mercurial's pattern 1.85 +handling is expressive. 1.86 + 1.87 +On Unix-like systems (Linux, MacOS, etc.), the job of matching file 1.88 +names to patterns normally falls to the shell. On these systems, you 1.89 +must explicitly tell Mercurial that a name is a pattern. On Windows, 1.90 +the shell does not expand patterns, so Mercurial will automatically 1.91 +identify names that are patterns, and expand them for you. 1.92 + 1.93 +To provide a pattern in place of a regular name on the command line, 1.94 +the mechanism is simple: 1.95 +\begin{codesample2} 1.96 + syntax:patternbody 1.97 +\end{codesample2} 1.98 +That is, a pattern is identified by a short text string that says what 1.99 +kind of pattern this is, followed by a colon, followed by the actual 1.100 +pattern. 1.101 + 1.102 +Mercurial supports two kinds of pattern syntax. The most frequently 1.103 +used is called \texttt{glob}; this is the same kind of pattern 1.104 +matching used by the Unix shell, and should be familiar to Windows 1.105 +command prompt users, too. 1.106 + 1.107 +When Mercurial does automatic pattern matching on Windows, it uses 1.108 +\texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix 1.109 +on Windows, but it's safe to use it, too. 1.110 + 1.111 +The \texttt{re} syntax is more powerful; it lets you specify patterns 1.112 +using regular expressions, also known as regexps. 1.113 + 1.114 +By the way, in the examples that follow, notice that I'm careful to 1.115 +wrap all of my patterns in quote characters, so that they won't get 1.116 +expanded by the shell before Mercurial sees them. 1.117 + 1.118 +\subsection{Shell-style \texttt{glob} patterns} 1.119 + 1.120 +This is an overview of the kinds of patterns you can use when you're 1.121 +matching on glob patterns. 1.122 + 1.123 +The ``\texttt{*}'' character matches any string, within a single 1.124 +directory. 1.125 +\interaction{filenames.glob.star} 1.126 + 1.127 +The ``\texttt{**}'' pattern matches any string, and crosses directory 1.128 +boundaries. It's not a standard Unix glob token, but it's accepted by 1.129 +several popular Unix shells, and is very useful. 1.130 +\interaction{filenames.glob.starstar} 1.131 + 1.132 +The ``\texttt{?}'' pattern matches any single character. 1.133 +\interaction{filenames.glob.question} 1.134 + 1.135 +The ``\texttt{[}'' character begins a \emph{character class}. This 1.136 +matches any single character within the class. The class ends with a 1.137 +``\texttt{]}'' character. A class may contain multiple \emph{range}s 1.138 +of the form ``\texttt{a-f}'', which is shorthand for 1.139 +``\texttt{abcdef}''. 1.140 +\interaction{filenames.glob.range} 1.141 +If the first character after the ``\texttt{[}'' in a character class 1.142 +is a ``\texttt{!}'', it \emph{negates} the class, making it match any 1.143 +single character not in the class. 1.144 + 1.145 +A ``\texttt{\{}'' begins a group of subpatterns, where the whole group 1.146 +matches if any subpattern in the group matches. The ``\texttt{,}'' 1.147 +character separates subpatterns, and ``\texttt{\}}'' ends the group. 1.148 +\interaction{filenames.glob.group} 1.149 + 1.150 +\subsubsection{Watch out!} 1.151 + 1.152 +Don't forget that if you want to match a pattern in any directory, you 1.153 +should not be using the ``\texttt{*}'' match-any token, as this will 1.154 +only match within one directory. Instead, use the ``\texttt{**}'' 1.155 +token. This small example illustrates the difference between the two. 1.156 +\interaction{filenames.glob.star-starstar} 1.157 + 1.158 +\subsection{Regular expression matching with \texttt{re} patterns} 1.159 + 1.160 +Mercurial accepts the same regular expression syntax as the Python 1.161 +programming language (it uses Python's regexp engine internally). 1.162 +This is based on the Perl language's regexp syntax, which is the most 1.163 +popular dialect in use (it's also used in Java, for example). 1.164 + 1.165 +I won't discuss Mercurial's regexp dialect in any detail here, as 1.166 +regexps are not often used. Perl-style regexps are in any case 1.167 +already exhaustively documented on a multitude of web sites, and in 1.168 +many books. Instead, I will focus here on a few things you should 1.169 +know if you find yourself needing to use regexps with Mercurial. 1.170 + 1.171 +A regexp is matched against an entire file name, relative to the root 1.172 +of the repository. In other words, even if you're already in 1.173 +subbdirectory \dirname{foo}, if you want to match files under this 1.174 +directory, your pattern must start with ``\texttt{foo/}''. 1.175 + 1.176 +One thing to note, if you're familiar with Perl-style regexps, is that 1.177 +Mercurial's are \emph{rooted}. That is, a regexp starts matching 1.178 +against the beginning of a string; it doesn't look for a match 1.179 +anywhere within the string it. To match anywhere in a string, start 1.180 +your pattern with ``\texttt{.*}''. 1.181 + 1.182 +\section{Filtering files} 1.183 + 1.184 +Not only does Mercurial give you a variety of ways to specify files; 1.185 +it lets you further winnow those files using \emph{filters}. Commands 1.186 +that work with file names accept two filtering options. 1.187 +\begin{itemize} 1.188 +\item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern 1.189 + that file names must match in order to be processed. 1.190 +\item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to 1.191 + \emph{avoid} processing files, if they match this pattern. 1.192 +\end{itemize} 1.193 +You can provide multiple \hggopt{-I} and \hggopt{-X} options on the 1.194 +command line, and intermix them as you please. Mercurial interprets 1.195 +the patterns you provide using glob syntax by default (but you can use 1.196 +regexps if you need to). 1.197 + 1.198 +You can read a \hggopt{-I} filter as ``process only the files that 1.199 +match this filter''. 1.200 +\interaction{filenames.filter.include} 1.201 +The \hggopt{-X} filter is best read as ``process only the files that 1.202 +don't match this pattern''. 1.203 +\interaction{filenames.filter.exclude} 1.204 + 1.205 +\section{Ignoring unwanted files and directories} 1.206 + 1.207 +XXX. 1.208 + 1.209 +\section{Case sensitivity} 1.210 +\label{sec:names:case} 1.211 + 1.212 +If you're working in a mixed development environment that contains 1.213 +both Linux (or other Unix) systems and Macs or Windows systems, you 1.214 +should keep in the back of your mind the knowledge that they treat the 1.215 +case (``N'' versus ``n'') of file names in incompatible ways. This is 1.216 +not very likely to affect you, and it's easy to deal with if it does, 1.217 +but it could surprise you if you don't know about it. 1.218 + 1.219 +Operating systems and filesystems differ in the way they handle the 1.220 +\emph{case} of characters in file and directory names. There are 1.221 +three common ways to handle case in names. 1.222 +\begin{itemize} 1.223 +\item Completely case insensitive. Uppercase and lowercase versions 1.224 + of a letter are treated as identical, both when creating a file and 1.225 + during subsequent accesses. This is common on older DOS-based 1.226 + systems. 1.227 +\item Case preserving, but insensitive. When a file or directory is 1.228 + created, the case of its name is stored, and can be retrieved and 1.229 + displayed by the operating system. When an existing file is being 1.230 + looked up, its case is ignored. This is the standard arrangement on 1.231 + Windows and MacOS. The names \filename{foo} and \filename{FoO} 1.232 + identify the same file. This treatment of uppercase and lowercase 1.233 + letters as interchangeable is also referred to as \emph{case 1.234 + folding}. 1.235 +\item Case sensitive. The case of a name is significant at all times. 1.236 + The names \filename{foo} and {FoO} identify different files. This 1.237 + is the way Linux and Unix systems normally work. 1.238 +\end{itemize} 1.239 + 1.240 +On Unix-like systems, it is possible to have any or all of the above 1.241 +ways of handling case in action at once. For example, if you use a 1.242 +USB thumb drive formatted with a FAT32 filesystem on a Linux system, 1.243 +Linux will handle names on that filesystem in a case preserving, but 1.244 +insensitive, way. 1.245 + 1.246 +\subsection{Safe, portable repository storage} 1.247 + 1.248 +Mercurial's repository storage mechanism is \emph{case safe}. It 1.249 +translates file names so that they can be safely stored on both case 1.250 +sensitive and case insensitive filesystems. This means that you can 1.251 +use normal file copying tools to transfer a Mercurial repository onto, 1.252 +for example, a USB thumb drive, and safely move that drive and 1.253 +repository back and forth between a Mac, a PC running Windows, and a 1.254 +Linux box. 1.255 + 1.256 +\subsection{Detecting case conflicts} 1.257 + 1.258 +When operating in the working directory, Mercurial honours the naming 1.259 +policy of the filesystem where the working directory is located. If 1.260 +the filesystem is case preserving, but insensitive, Mercurial will 1.261 +treat names that differ only in case as the same. 1.262 + 1.263 +An important aspect of this approach is that it is possible to commit 1.264 +a changeset on a case sensitive (typically Linux or Unix) filesystem 1.265 +that will cause trouble for users on case insensitive (usually Windows 1.266 +and MacOS) users. If a Linux user commits changes to two files, one 1.267 +named \filename{myfile.c} and the other named \filename{MyFile.C}, 1.268 +they will be stored correctly in the repository. And in the working 1.269 +directories of other Linux users, they will be correctly represented 1.270 +as separate files. 1.271 + 1.272 +If a Windows or Mac user pulls this change, they will not initially 1.273 +have a problem, because Mercurial's repository storage mechanism is 1.274 +case safe. However, once they try to \hgcmd{update} the working 1.275 +directory to that changeset, or \hgcmd{merge} with that changeset, 1.276 +Mercurial will spot the conflict between the two file names that the 1.277 +filesystem would treat as the same, and forbid the update or merge 1.278 +from occurring. 1.279 + 1.280 +\subsection{Fixing a case conflict} 1.281 + 1.282 +If you are using Windows or a Mac in a mixed environment where some of 1.283 +your collaborators are using Linux or Unix, and Mercurial reports a 1.284 +case folding conflict when you try to \hgcmd{update} or \hgcmd{merge}, 1.285 +the procedure to fix the problem is simple. 1.286 + 1.287 +Just find a nearby Linux or Unix box, clone the problem repository 1.288 +onto it, and use Mercurial's \hgcmd{rename} command to change the 1.289 +names of any offending files or directories so that they will no 1.290 +longer cause case folding conflicts. Commit this change, \hgcmd{pull} 1.291 +or \hgcmd{push} it across to your Windows or MacOS system, and 1.292 +\hgcmd{update} to the revision with the non-conflicting names. 1.293 + 1.294 +The changeset with case-conflicting names will remain in your 1.295 +project's history, and you still won't be able to \hgcmd{update} your 1.296 +working directory to that changeset on a Windows or MacOS system, but 1.297 +you can continue development unimpeded. 1.298 + 1.299 +\begin{note} 1.300 + Prior to version~0.9.3, Mercurial did not use a case safe repository 1.301 + storage mechanism, and did not detect case folding conflicts. If 1.302 + you are using an older version of Mercurial on Windows or MacOS, I 1.303 + strongly recommend that you upgrade. 1.304 +\end{note} 1.305 + 1.306 +%%% Local Variables: 1.307 +%%% mode: latex 1.308 +%%% TeX-master: "00book" 1.309 +%%% End: