hgbook
annotate en/filenames.tex @ 146:65f6f9d18fa1
Oops! I forgot that I need the undoctored output files in the book!
Now they're named "*.lxo", instead of "*.out". Ugh.
Now they're named "*.lxo", instead of "*.out". Ugh.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Tue Mar 06 21:55:48 2007 -0800 (2007-03-06) |
parents | |
children | 7f07aca44938 |
rev | line source |
---|---|
bos@133 | 1 \chapter{File names and pattern matching} |
bos@133 | 2 \label{chap:names} |
bos@133 | 3 |
bos@133 | 4 Mercurial provides mechanisms that let you work with file names in a |
bos@133 | 5 consistent and expressive way. |
bos@133 | 6 |
bos@133 | 7 \section{Simple file naming} |
bos@133 | 8 |
bos@133 | 9 Mercurial uses a unified piece of machinery ``under the hood'' to |
bos@133 | 10 handle file names. Every command behaves uniformly with respect to |
bos@133 | 11 file names. The way in which commands work with file names is as |
bos@133 | 12 follows. |
bos@133 | 13 |
bos@133 | 14 If you explicitly name real files on the command line, Mercurial works |
bos@133 | 15 with exactly those files, as you would expect. |
bos@133 | 16 \interaction{filenames.files} |
bos@133 | 17 |
bos@133 | 18 When you provide a directory name, Mercurial will interpret this as |
bos@133 | 19 ``operate on every file in this directory and its subdirectories''. |
bos@133 | 20 Mercurial traverses the files and subdirectories in a directory in |
bos@133 | 21 alphabetical order. When it encounters a subdirectory, it will |
bos@133 | 22 traverse that subdirectory before continuing with the current |
bos@133 | 23 directory. |
bos@133 | 24 \interaction{filenames.dirs} |
bos@133 | 25 |
bos@133 | 26 \section{Running commands without any file names} |
bos@133 | 27 |
bos@133 | 28 Mercurial's commands that work with file names have useful default |
bos@133 | 29 behaviours when you invoke them without providing any file names or |
bos@133 | 30 patterns. What kind of behaviour you should expect depends on what |
bos@133 | 31 the command does. Here are a few rules of thumb you can use to |
bos@133 | 32 predict what a command is likely to do if you don't give it any names |
bos@133 | 33 to work with. |
bos@133 | 34 \begin{itemize} |
bos@133 | 35 \item Most commands will operate on the entire working directory. |
bos@133 | 36 This is what the \hgcmd{add} command does, for example. |
bos@133 | 37 \item If the command has effects that are difficult or impossible to |
bos@133 | 38 reverse, it will force you to explicitly provide at least one name |
bos@133 | 39 or pattern (see below). This protects you from accidentally |
bos@133 | 40 deleting files by running \hgcmd{remove} with no arguments, for |
bos@133 | 41 example. |
bos@133 | 42 \end{itemize} |
bos@133 | 43 |
bos@133 | 44 It's easy to work around these default behaviours if they don't suit |
bos@133 | 45 you. If a command normally operates on the whole working directory, |
bos@133 | 46 you can invoke it on just the current directory and its subdirectories |
bos@133 | 47 by giving it the name ``\dirname{.}''. |
bos@133 | 48 \interaction{filenames.wdir-subdir} |
bos@133 | 49 |
bos@133 | 50 Along the same lines, some commands normally print file names relative |
bos@133 | 51 to the root of the repository, even if you're invoking them from a |
bos@133 | 52 subdirectory. Such a command will print file names relative to your |
bos@133 | 53 subdirectory if you give it explicit names. Here, we're going to run |
bos@133 | 54 \hgcmd{status} from a subdirectory, and get it to operate on the |
bos@133 | 55 entire working directory while printing file names relative to our |
bos@133 | 56 subdirectory, by passing it the output of the \hgcmd{root} command. |
bos@133 | 57 \interaction{filenames.wdir-relname} |
bos@133 | 58 |
bos@133 | 59 \section{Telling you what's going on} |
bos@133 | 60 |
bos@133 | 61 The \hgcmd{add} example in the preceding section illustrates something |
bos@133 | 62 else that's helpful about Mercurial commands. If a command operates |
bos@133 | 63 on a file that you didn't name explicitly on the command line, it will |
bos@133 | 64 usually print the name of the file, so that you will not be surprised |
bos@133 | 65 what's going on. |
bos@133 | 66 |
bos@133 | 67 The principle here is of \emph{least surprise}. If you've exactly |
bos@133 | 68 named a file on the command line, there's no point in repeating it |
bos@133 | 69 back at you. If Mercurial is acting on a file \emph{implicitly}, |
bos@133 | 70 because you provided no names, or a directory, or a pattern (see |
bos@133 | 71 below), it's safest to tell you what it's doing. |
bos@133 | 72 |
bos@133 | 73 For commands that behave this way, you can silence them using the |
bos@133 | 74 \hggopt{-q} option. You can also get them to print the name of every |
bos@133 | 75 file, even those you've named explicitly, using the \hggopt{-v} |
bos@133 | 76 option. |
bos@133 | 77 |
bos@133 | 78 \section{Using patterns to identify files} |
bos@133 | 79 |
bos@133 | 80 In addition to working with file and directory names, Mercurial lets |
bos@133 | 81 you use \emph{patterns} to identify files. Mercurial's pattern |
bos@133 | 82 handling is expressive. |
bos@133 | 83 |
bos@133 | 84 On Unix-like systems (Linux, MacOS, etc.), the job of matching file |
bos@133 | 85 names to patterns normally falls to the shell. On these systems, you |
bos@133 | 86 must explicitly tell Mercurial that a name is a pattern. On Windows, |
bos@133 | 87 the shell does not expand patterns, so Mercurial will automatically |
bos@133 | 88 identify names that are patterns, and expand them for you. |
bos@133 | 89 |
bos@133 | 90 To provide a pattern in place of a regular name on the command line, |
bos@133 | 91 the mechanism is simple: |
bos@133 | 92 \begin{codesample2} |
bos@133 | 93 syntax:patternbody |
bos@133 | 94 \end{codesample2} |
bos@133 | 95 That is, a pattern is identified by a short text string that says what |
bos@133 | 96 kind of pattern this is, followed by a colon, followed by the actual |
bos@133 | 97 pattern. |
bos@133 | 98 |
bos@133 | 99 Mercurial supports two kinds of pattern syntax. The most frequently |
bos@133 | 100 used is called \texttt{glob}; this is the same kind of pattern |
bos@133 | 101 matching used by the Unix shell, and should be familiar to Windows |
bos@133 | 102 command prompt users, too. |
bos@133 | 103 |
bos@133 | 104 When Mercurial does automatic pattern matching on Windows, it uses |
bos@133 | 105 \texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix |
bos@133 | 106 on Windows, but it's safe to use it, too. |
bos@133 | 107 |
bos@133 | 108 The \texttt{re} syntax is more powerful; it lets you specify patterns |
bos@133 | 109 using regular expressions, also known as regexps. |
bos@133 | 110 |
bos@133 | 111 By the way, in the examples that follow, notice that I'm careful to |
bos@133 | 112 wrap all of my patterns in quote characters, so that they won't get |
bos@133 | 113 expanded by the shell before Mercurial sees them. |
bos@133 | 114 |
bos@133 | 115 \subsection{Shell-style \texttt{glob} patterns} |
bos@133 | 116 |
bos@133 | 117 This is an overview of the kinds of patterns you can use when you're |
bos@133 | 118 matching on glob patterns. |
bos@133 | 119 |
bos@133 | 120 The ``\texttt{*}'' character matches any string, within a single |
bos@133 | 121 directory. |
bos@133 | 122 \interaction{filenames.glob.star} |
bos@133 | 123 |
bos@133 | 124 The ``\texttt{**}'' pattern matches any string, and crosses directory |
bos@133 | 125 boundaries. It's not a standard Unix glob token, but it's accepted by |
bos@133 | 126 several popular Unix shells, and is very useful. |
bos@133 | 127 \interaction{filenames.glob.starstar} |
bos@133 | 128 |
bos@133 | 129 The ``\texttt{?}'' pattern matches any single character. |
bos@133 | 130 \interaction{filenames.glob.question} |
bos@133 | 131 |
bos@133 | 132 The ``\texttt{[}'' character begins a \emph{character class}. This |
bos@133 | 133 matches any single character within the class. The class ends with a |
bos@133 | 134 ``\texttt{]}'' character. A class may contain multiple \emph{range}s |
bos@133 | 135 of the form ``\texttt{a-f}'', which is shorthand for |
bos@133 | 136 ``\texttt{abcdef}''. |
bos@133 | 137 \interaction{filenames.glob.range} |
bos@133 | 138 If the first character after the ``\texttt{[}'' in a character class |
bos@133 | 139 is a ``\texttt{!}'', it \emph{negates} the class, making it match any |
bos@133 | 140 single character not in the class. |
bos@133 | 141 |
bos@133 | 142 A ``\texttt{\{}'' begins a group of subpatterns, where the whole group |
bos@133 | 143 matches if any subpattern in the group matches. The ``\texttt{,}'' |
bos@133 | 144 character separates subpatterns, and ``\texttt{\}}'' ends the group. |
bos@133 | 145 \interaction{filenames.glob.group} |
bos@133 | 146 |
bos@133 | 147 \subsubsection{Watch out!} |
bos@133 | 148 |
bos@133 | 149 Don't forget that if you want to match a pattern in any directory, you |
bos@133 | 150 should not be using the ``\texttt{*}'' match-any token, as this will |
bos@133 | 151 only match within one directory. Instead, use the ``\texttt{**}'' |
bos@133 | 152 token. This small example illustrates the difference between the two. |
bos@133 | 153 \interaction{filenames.glob.star-starstar} |
bos@133 | 154 |
bos@133 | 155 \subsection{Regular expression matching with \texttt{re} patterns} |
bos@133 | 156 |
bos@133 | 157 Mercurial accepts the same regular expression syntax as the Python |
bos@133 | 158 programming language (it uses Python's regexp engine internally). |
bos@133 | 159 This is based on the Perl language's regexp syntax, which is the most |
bos@133 | 160 popular dialect in use (it's also used in Java, for example). |
bos@133 | 161 |
bos@133 | 162 I won't discuss Mercurial's regexp dialect in any detail here, as |
bos@133 | 163 regexps are not often used. Perl-style regexps are in any case |
bos@133 | 164 already exhaustively documented on a multitude of web sites, and in |
bos@133 | 165 many books. Instead, I will focus here on a few things you should |
bos@133 | 166 know if you find yourself needing to use regexps with Mercurial. |
bos@133 | 167 |
bos@133 | 168 A regexp is matched against an entire file name, relative to the root |
bos@133 | 169 of the repository. In other words, even if you're already in |
bos@133 | 170 subbdirectory \dirname{foo}, if you want to match files under this |
bos@133 | 171 directory, your pattern must start with ``\texttt{foo/}''. |
bos@133 | 172 |
bos@133 | 173 One thing to note, if you're familiar with Perl-style regexps, is that |
bos@133 | 174 Mercurial's are \emph{rooted}. That is, a regexp starts matching |
bos@133 | 175 against the beginning of a string; it doesn't look for a match |
bos@133 | 176 anywhere within the string it. To match anywhere in a string, start |
bos@133 | 177 your pattern with ``\texttt{.*}''. |
bos@133 | 178 |
bos@133 | 179 \section{Filtering files} |
bos@133 | 180 |
bos@133 | 181 Not only does Mercurial give you a variety of ways to specify files; |
bos@133 | 182 it lets you further winnow those files using \emph{filters}. Commands |
bos@133 | 183 that work with file names accept two filtering options. |
bos@133 | 184 \begin{itemize} |
bos@133 | 185 \item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern |
bos@133 | 186 that file names must match in order to be processed. |
bos@133 | 187 \item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to |
bos@133 | 188 \emph{avoid} processing files, if they match this pattern. |
bos@133 | 189 \end{itemize} |
bos@133 | 190 You can provide multiple \hggopt{-I} and \hggopt{-X} options on the |
bos@133 | 191 command line, and intermix them as you please. Mercurial interprets |
bos@133 | 192 the patterns you provide using glob syntax by default (but you can use |
bos@133 | 193 regexps if you need to). |
bos@133 | 194 |
bos@133 | 195 You can read a \hggopt{-I} filter as ``process only the files that |
bos@133 | 196 match this filter''. |
bos@133 | 197 \interaction{filenames.filter.include} |
bos@133 | 198 The \hggopt{-X} filter is best read as ``process only the files that |
bos@133 | 199 don't match this pattern''. |
bos@133 | 200 \interaction{filenames.filter.exclude} |
bos@133 | 201 |
bos@133 | 202 \section{Ignoring unwanted files and directories} |
bos@133 | 203 |
bos@133 | 204 XXX. |
bos@133 | 205 |
bos@133 | 206 \section{Case sensitivity} |
bos@133 | 207 \label{sec:names:case} |
bos@133 | 208 |
bos@133 | 209 If you're working in a mixed development environment that contains |
bos@133 | 210 both Linux (or other Unix) systems and Macs or Windows systems, you |
bos@133 | 211 should keep in the back of your mind the knowledge that they treat the |
bos@133 | 212 case (``N'' versus ``n'') of file names in incompatible ways. This is |
bos@133 | 213 not very likely to affect you, and it's easy to deal with if it does, |
bos@133 | 214 but it could surprise you if you don't know about it. |
bos@133 | 215 |
bos@133 | 216 Operating systems and filesystems differ in the way they handle the |
bos@133 | 217 \emph{case} of characters in file and directory names. There are |
bos@133 | 218 three common ways to handle case in names. |
bos@133 | 219 \begin{itemize} |
bos@133 | 220 \item Completely case insensitive. Uppercase and lowercase versions |
bos@133 | 221 of a letter are treated as identical, both when creating a file and |
bos@133 | 222 during subsequent accesses. This is common on older DOS-based |
bos@133 | 223 systems. |
bos@133 | 224 \item Case preserving, but insensitive. When a file or directory is |
bos@133 | 225 created, the case of its name is stored, and can be retrieved and |
bos@133 | 226 displayed by the operating system. When an existing file is being |
bos@133 | 227 looked up, its case is ignored. This is the standard arrangement on |
bos@133 | 228 Windows and MacOS. The names \filename{foo} and \filename{FoO} |
bos@133 | 229 identify the same file. This treatment of uppercase and lowercase |
bos@133 | 230 letters as interchangeable is also referred to as \emph{case |
bos@133 | 231 folding}. |
bos@133 | 232 \item Case sensitive. The case of a name is significant at all times. |
bos@133 | 233 The names \filename{foo} and {FoO} identify different files. This |
bos@133 | 234 is the way Linux and Unix systems normally work. |
bos@133 | 235 \end{itemize} |
bos@133 | 236 |
bos@133 | 237 On Unix-like systems, it is possible to have any or all of the above |
bos@133 | 238 ways of handling case in action at once. For example, if you use a |
bos@133 | 239 USB thumb drive formatted with a FAT32 filesystem on a Linux system, |
bos@133 | 240 Linux will handle names on that filesystem in a case preserving, but |
bos@133 | 241 insensitive, way. |
bos@133 | 242 |
bos@133 | 243 \subsection{Safe, portable repository storage} |
bos@133 | 244 |
bos@133 | 245 Mercurial's repository storage mechanism is \emph{case safe}. It |
bos@133 | 246 translates file names so that they can be safely stored on both case |
bos@133 | 247 sensitive and case insensitive filesystems. This means that you can |
bos@133 | 248 use normal file copying tools to transfer a Mercurial repository onto, |
bos@133 | 249 for example, a USB thumb drive, and safely move that drive and |
bos@133 | 250 repository back and forth between a Mac, a PC running Windows, and a |
bos@133 | 251 Linux box. |
bos@133 | 252 |
bos@133 | 253 \subsection{Detecting case conflicts} |
bos@133 | 254 |
bos@133 | 255 When operating in the working directory, Mercurial honours the naming |
bos@133 | 256 policy of the filesystem where the working directory is located. If |
bos@133 | 257 the filesystem is case preserving, but insensitive, Mercurial will |
bos@133 | 258 treat names that differ only in case as the same. |
bos@133 | 259 |
bos@133 | 260 An important aspect of this approach is that it is possible to commit |
bos@133 | 261 a changeset on a case sensitive (typically Linux or Unix) filesystem |
bos@133 | 262 that will cause trouble for users on case insensitive (usually Windows |
bos@133 | 263 and MacOS) users. If a Linux user commits changes to two files, one |
bos@133 | 264 named \filename{myfile.c} and the other named \filename{MyFile.C}, |
bos@133 | 265 they will be stored correctly in the repository. And in the working |
bos@133 | 266 directories of other Linux users, they will be correctly represented |
bos@133 | 267 as separate files. |
bos@133 | 268 |
bos@133 | 269 If a Windows or Mac user pulls this change, they will not initially |
bos@133 | 270 have a problem, because Mercurial's repository storage mechanism is |
bos@133 | 271 case safe. However, once they try to \hgcmd{update} the working |
bos@133 | 272 directory to that changeset, or \hgcmd{merge} with that changeset, |
bos@133 | 273 Mercurial will spot the conflict between the two file names that the |
bos@133 | 274 filesystem would treat as the same, and forbid the update or merge |
bos@133 | 275 from occurring. |
bos@133 | 276 |
bos@133 | 277 \subsection{Fixing a case conflict} |
bos@133 | 278 |
bos@133 | 279 If you are using Windows or a Mac in a mixed environment where some of |
bos@133 | 280 your collaborators are using Linux or Unix, and Mercurial reports a |
bos@133 | 281 case folding conflict when you try to \hgcmd{update} or \hgcmd{merge}, |
bos@133 | 282 the procedure to fix the problem is simple. |
bos@133 | 283 |
bos@133 | 284 Just find a nearby Linux or Unix box, clone the problem repository |
bos@133 | 285 onto it, and use Mercurial's \hgcmd{rename} command to change the |
bos@133 | 286 names of any offending files or directories so that they will no |
bos@133 | 287 longer cause case folding conflicts. Commit this change, \hgcmd{pull} |
bos@133 | 288 or \hgcmd{push} it across to your Windows or MacOS system, and |
bos@133 | 289 \hgcmd{update} to the revision with the non-conflicting names. |
bos@133 | 290 |
bos@133 | 291 The changeset with case-conflicting names will remain in your |
bos@133 | 292 project's history, and you still won't be able to \hgcmd{update} your |
bos@133 | 293 working directory to that changeset on a Windows or MacOS system, but |
bos@133 | 294 you can continue development unimpeded. |
bos@133 | 295 |
bos@133 | 296 \begin{note} |
bos@133 | 297 Prior to version~0.9.3, Mercurial did not use a case safe repository |
bos@133 | 298 storage mechanism, and did not detect case folding conflicts. If |
bos@133 | 299 you are using an older version of Mercurial on Windows or MacOS, I |
bos@133 | 300 strongly recommend that you upgrade. |
bos@133 | 301 \end{note} |
bos@133 | 302 |
bos@133 | 303 %%% Local Variables: |
bos@133 | 304 %%% mode: latex |
bos@133 | 305 %%% TeX-master: "00book" |
bos@133 | 306 %%% End: |