hgbook
diff en/ch07-filenames.xml @ 1079:58fefdf069c5
reformated
author | Zhaoping Sun <zhaopingsun@gmail.com> |
---|---|
date | Fri Nov 20 22:53:55 2009 -0500 (2009-11-20) |
parents | 477d6a3e5023 |
children |
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/en/ch07-filenames.xml Fri Nov 20 22:53:55 2009 -0500 1.3 @@ -0,0 +1,451 @@ 1.4 +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> 1.5 + 1.6 +<chapter id="chap:names"> 1.7 + <?dbhtml filename="file-names-and-pattern-matching.html"?> 1.8 + <title>File names and pattern matching</title> 1.9 + 1.10 + <para id="x_543">Mercurial provides mechanisms that let you work with file 1.11 + names in a consistent and expressive way.</para> 1.12 + 1.13 + <sect1> 1.14 + <title>Simple file naming</title> 1.15 + 1.16 + <para id="x_544">Mercurial uses a unified piece of machinery <quote>under the 1.17 + hood</quote> to handle file names. Every command behaves 1.18 + uniformly with respect to file names. The way in which commands 1.19 + work with file names is as follows.</para> 1.20 + 1.21 + <para id="x_545">If you explicitly name real files on the command line, 1.22 + Mercurial works with exactly those files, as you would expect. 1.23 + &interaction.filenames.files;</para> 1.24 + 1.25 + <para id="x_546">When you provide a directory name, Mercurial will interpret 1.26 + this as <quote>operate on every file in this directory and its 1.27 + subdirectories</quote>. Mercurial traverses the files and 1.28 + subdirectories in a directory in alphabetical order. When it 1.29 + encounters a subdirectory, it will traverse that subdirectory 1.30 + before continuing with the current directory.</para> 1.31 + 1.32 + &interaction.filenames.dirs; 1.33 + </sect1> 1.34 + 1.35 + <sect1> 1.36 + <title>Running commands without any file names</title> 1.37 + 1.38 + <para id="x_547">Mercurial's commands that work with file names have useful 1.39 + default behaviors when you invoke them without providing any 1.40 + file names or patterns. What kind of behavior you should 1.41 + expect depends on what the command does. Here are a few rules 1.42 + of thumb you can use to predict what a command is likely to do 1.43 + if you don't give it any names to work with.</para> 1.44 + <itemizedlist> 1.45 + <listitem><para id="x_548">Most commands will operate on the entire working 1.46 + directory. This is what the <command role="hg-cmd">hg 1.47 + add</command> command does, for example.</para> 1.48 + </listitem> 1.49 + <listitem><para id="x_549">If the command has effects that are difficult or 1.50 + impossible to reverse, it will force you to explicitly 1.51 + provide at least one name or pattern (see below). This 1.52 + protects you from accidentally deleting files by running 1.53 + <command role="hg-cmd">hg remove</command> with no 1.54 + arguments, for example.</para> 1.55 + </listitem></itemizedlist> 1.56 + 1.57 + <para id="x_54a">It's easy to work around these default behaviors if they 1.58 + don't suit you. If a command normally operates on the whole 1.59 + working directory, you can invoke it on just the current 1.60 + directory and its subdirectories by giving it the name 1.61 + <quote><filename class="directory">.</filename></quote>.</para> 1.62 + 1.63 + &interaction.filenames.wdir-subdir; 1.64 + 1.65 + <para id="x_54b">Along the same lines, some commands normally print file 1.66 + names relative to the root of the repository, even if you're 1.67 + invoking them from a subdirectory. Such a command will print 1.68 + file names relative to your subdirectory if you give it explicit 1.69 + names. Here, we're going to run <command role="hg-cmd">hg 1.70 + status</command> from a subdirectory, and get it to operate on 1.71 + the entire working directory while printing file names relative 1.72 + to our subdirectory, by passing it the output of the <command 1.73 + role="hg-cmd">hg root</command> command.</para> 1.74 + 1.75 + &interaction.filenames.wdir-relname; 1.76 + </sect1> 1.77 + 1.78 + <sect1> 1.79 + <title>Telling you what's going on</title> 1.80 + 1.81 + <para id="x_54c">The <command role="hg-cmd">hg add</command> example in the 1.82 + preceding section illustrates something else that's helpful 1.83 + about Mercurial commands. If a command operates on a file that 1.84 + you didn't name explicitly on the command line, it will usually 1.85 + print the name of the file, so that you will not be surprised 1.86 + what's going on.</para> 1.87 + 1.88 + <para id="x_54d">The principle here is of <emphasis>least 1.89 + surprise</emphasis>. If you've exactly named a file on the 1.90 + command line, there's no point in repeating it back at you. If 1.91 + Mercurial is acting on a file <emphasis>implicitly</emphasis>, e.g. 1.92 + because you provided no names, or a directory, or a pattern (see 1.93 + below), it is safest to tell you what files it's operating on.</para> 1.94 + 1.95 + <para id="x_54e">For commands that behave this way, you can silence them 1.96 + using the <option role="hg-opt-global">-q</option> option. You 1.97 + can also get them to print the name of every file, even those 1.98 + you've named explicitly, using the <option 1.99 + role="hg-opt-global">-v</option> option.</para> 1.100 + </sect1> 1.101 + 1.102 + <sect1> 1.103 + <title>Using patterns to identify files</title> 1.104 + 1.105 + <para id="x_54f">In addition to working with file and directory names, 1.106 + Mercurial lets you use <emphasis>patterns</emphasis> to identify 1.107 + files. Mercurial's pattern handling is expressive.</para> 1.108 + 1.109 + <para id="x_550">On Unix-like systems (Linux, MacOS, etc.), the job of 1.110 + matching file names to patterns normally falls to the shell. On 1.111 + these systems, you must explicitly tell Mercurial that a name is 1.112 + a pattern. On Windows, the shell does not expand patterns, so 1.113 + Mercurial will automatically identify names that are patterns, 1.114 + and expand them for you.</para> 1.115 + 1.116 + <para id="x_551">To provide a pattern in place of a regular name on the 1.117 + command line, the mechanism is simple:</para> 1.118 + <programlisting>syntax:patternbody</programlisting> 1.119 + <para id="x_552">That is, a pattern is identified by a short text string that 1.120 + says what kind of pattern this is, followed by a colon, followed 1.121 + by the actual pattern.</para> 1.122 + 1.123 + <para id="x_553">Mercurial supports two kinds of pattern syntax. The most 1.124 + frequently used is called <literal>glob</literal>; this is the 1.125 + same kind of pattern matching used by the Unix shell, and should 1.126 + be familiar to Windows command prompt users, too.</para> 1.127 + 1.128 + <para id="x_554">When Mercurial does automatic pattern matching on Windows, 1.129 + it uses <literal>glob</literal> syntax. You can thus omit the 1.130 + <quote><literal>glob:</literal></quote> prefix on Windows, but 1.131 + it's safe to use it, too.</para> 1.132 + 1.133 + <para id="x_555">The <literal>re</literal> syntax is more powerful; it lets 1.134 + you specify patterns using regular expressions, also known as 1.135 + regexps.</para> 1.136 + 1.137 + <para id="x_556">By the way, in the examples that follow, notice that I'm 1.138 + careful to wrap all of my patterns in quote characters, so that 1.139 + they won't get expanded by the shell before Mercurial sees 1.140 + them.</para> 1.141 + 1.142 + <sect2> 1.143 + <title>Shell-style <literal>glob</literal> patterns</title> 1.144 + 1.145 + <para id="x_557">This is an overview of the kinds of patterns you can use 1.146 + when you're matching on glob patterns.</para> 1.147 + 1.148 + <para id="x_558">The <quote><literal>*</literal></quote> character matches 1.149 + any string, within a single directory.</para> 1.150 + 1.151 + &interaction.filenames.glob.star; 1.152 + 1.153 + <para id="x_559">The <quote><literal>**</literal></quote> pattern matches 1.154 + any string, and crosses directory boundaries. It's not a 1.155 + standard Unix glob token, but it's accepted by several popular 1.156 + Unix shells, and is very useful.</para> 1.157 + 1.158 + &interaction.filenames.glob.starstar; 1.159 + 1.160 + <para id="x_55a">The <quote><literal>?</literal></quote> pattern matches 1.161 + any single character.</para> 1.162 + 1.163 + &interaction.filenames.glob.question; 1.164 + 1.165 + <para id="x_55b">The <quote><literal>[</literal></quote> character begins a 1.166 + <emphasis>character class</emphasis>. This matches any single 1.167 + character within the class. The class ends with a 1.168 + <quote><literal>]</literal></quote> character. A class may 1.169 + contain multiple <emphasis>range</emphasis>s of the form 1.170 + <quote><literal>a-f</literal></quote>, which is shorthand for 1.171 + <quote><literal>abcdef</literal></quote>.</para> 1.172 + 1.173 + &interaction.filenames.glob.range; 1.174 + 1.175 + <para id="x_55c">If the first character after the 1.176 + <quote><literal>[</literal></quote> in a character class is a 1.177 + <quote><literal>!</literal></quote>, it 1.178 + <emphasis>negates</emphasis> the class, making it match any 1.179 + single character not in the class.</para> 1.180 + 1.181 + <para id="x_55d">A <quote><literal>{</literal></quote> begins a group of 1.182 + subpatterns, where the whole group matches if any subpattern 1.183 + in the group matches. The <quote><literal>,</literal></quote> 1.184 + character separates subpatterns, and 1.185 + <quote><literal>}</literal></quote> ends the group.</para> 1.186 + 1.187 + &interaction.filenames.glob.group; 1.188 + 1.189 + <sect3> 1.190 + <title>Watch out!</title> 1.191 + 1.192 + <para id="x_55e">Don't forget that if you want to match a pattern in any 1.193 + directory, you should not be using the 1.194 + <quote><literal>*</literal></quote> match-any token, as this 1.195 + will only match within one directory. Instead, use the 1.196 + <quote><literal>**</literal></quote> token. This small 1.197 + example illustrates the difference between the two.</para> 1.198 + 1.199 + &interaction.filenames.glob.star-starstar; 1.200 + </sect3> 1.201 + </sect2> 1.202 + 1.203 + <sect2> 1.204 + <title>Regular expression matching with <literal>re</literal> 1.205 + patterns</title> 1.206 + 1.207 + <para id="x_55f">Mercurial accepts the same regular expression syntax as 1.208 + the Python programming language (it uses Python's regexp 1.209 + engine internally). This is based on the Perl language's 1.210 + regexp syntax, which is the most popular dialect in use (it's 1.211 + also used in Java, for example).</para> 1.212 + 1.213 + <para id="x_560">I won't discuss Mercurial's regexp dialect in any detail 1.214 + here, as regexps are not often used. Perl-style regexps are 1.215 + in any case already exhaustively documented on a multitude of 1.216 + web sites, and in many books. Instead, I will focus here on a 1.217 + few things you should know if you find yourself needing to use 1.218 + regexps with Mercurial.</para> 1.219 + 1.220 + <para id="x_561">A regexp is matched against an entire file name, relative 1.221 + to the root of the repository. In other words, even if you're 1.222 + already in subbdirectory <filename 1.223 + class="directory">foo</filename>, if you want to match files 1.224 + under this directory, your pattern must start with 1.225 + <quote><literal>foo/</literal></quote>.</para> 1.226 + 1.227 + <para id="x_562">One thing to note, if you're familiar with Perl-style 1.228 + regexps, is that Mercurial's are <emphasis>rooted</emphasis>. 1.229 + That is, a regexp starts matching against the beginning of a 1.230 + string; it doesn't look for a match anywhere within the 1.231 + string. To match anywhere in a string, start your pattern 1.232 + with <quote><literal>.*</literal></quote>.</para> 1.233 + </sect2> 1.234 + </sect1> 1.235 + 1.236 + <sect1> 1.237 + <title>Filtering files</title> 1.238 + 1.239 + <para id="x_563">Not only does Mercurial give you a variety of ways to 1.240 + specify files; it lets you further winnow those files using 1.241 + <emphasis>filters</emphasis>. Commands that work with file 1.242 + names accept two filtering options.</para> 1.243 + <itemizedlist> 1.244 + <listitem><para id="x_564"><option role="hg-opt-global">-I</option>, or 1.245 + <option role="hg-opt-global">--include</option>, lets you 1.246 + specify a pattern that file names must match in order to be 1.247 + processed.</para> 1.248 + </listitem> 1.249 + <listitem><para id="x_565"><option role="hg-opt-global">-X</option>, or 1.250 + <option role="hg-opt-global">--exclude</option>, gives you a 1.251 + way to <emphasis>avoid</emphasis> processing files, if they 1.252 + match this pattern.</para> 1.253 + </listitem></itemizedlist> 1.254 + <para id="x_566">You can provide multiple <option 1.255 + role="hg-opt-global">-I</option> and <option 1.256 + role="hg-opt-global">-X</option> options on the command line, 1.257 + and intermix them as you please. Mercurial interprets the 1.258 + patterns you provide using glob syntax by default (but you can 1.259 + use regexps if you need to).</para> 1.260 + 1.261 + <para id="x_567">You can read a <option role="hg-opt-global">-I</option> 1.262 + filter as <quote>process only the files that match this 1.263 + filter</quote>.</para> 1.264 + 1.265 + &interaction.filenames.filter.include; 1.266 + 1.267 + <para id="x_568">The <option role="hg-opt-global">-X</option> filter is best 1.268 + read as <quote>process only the files that don't match this 1.269 + pattern</quote>.</para> 1.270 + 1.271 + &interaction.filenames.filter.exclude; 1.272 + </sect1> 1.273 + 1.274 + <sect1> 1.275 + <title>Permanently ignoring unwanted files and directories</title> 1.276 + 1.277 + <para id="x_569">When you create a new repository, the chances are 1.278 + that over time it will grow to contain files that ought to 1.279 + <emphasis>not</emphasis> be managed by Mercurial, but which you 1.280 + don't want to see listed every time you run <command>hg 1.281 + status</command>. For instance, <quote>build products</quote> 1.282 + are files that are created as part of a build but which should 1.283 + not be managed by a revision control system. The most common 1.284 + build products are output files produced by software tools such 1.285 + as compilers. As another example, many text editors litter a 1.286 + directory with lock files, temporary working files, and backup 1.287 + files, which it also makes no sense to manage.</para> 1.288 + 1.289 + <para id="x_6b4">To have Mercurial permanently ignore such files, create a 1.290 + file named <filename>.hgignore</filename> in the root of your 1.291 + repository. You <emphasis>should</emphasis> <command>hg 1.292 + add</command> this file so that it gets tracked with the rest of 1.293 + your repository contents, since your collaborators will probably 1.294 + find it useful too.</para> 1.295 + 1.296 + <para id="x_6b5">By default, the <filename>.hgignore</filename> file should 1.297 + contain a list of regular expressions, one per line. Empty 1.298 + lines are skipped. Most people prefer to describe the files they 1.299 + want to ignore using the <quote>glob</quote> syntax that we 1.300 + described above, so a typical <filename>.hgignore</filename> 1.301 + file will start with this directive:</para> 1.302 + 1.303 + <programlisting>syntax: glob</programlisting> 1.304 + 1.305 + <para id="x_6b6">This tells Mercurial to interpret the lines that follow as 1.306 + glob patterns, not regular expressions.</para> 1.307 + 1.308 + <para id="x_6b7">Here is a typical-looking <filename>.hgignore</filename> 1.309 + file.</para> 1.310 + 1.311 + <programlisting>syntax: glob 1.312 +# This line is a comment, and will be skipped. 1.313 +# Empty lines are skipped too. 1.314 + 1.315 +# Backup files left behind by the Emacs editor. 1.316 +*~ 1.317 + 1.318 +# Lock files used by the Emacs editor. 1.319 +# Notice that the "#" character is quoted with a backslash. 1.320 +# This prevents it from being interpreted as starting a comment. 1.321 +.\#* 1.322 + 1.323 +# Temporary files used by the vim editor. 1.324 +.*.swp 1.325 + 1.326 +# A hidden file created by the Mac OS X Finder. 1.327 +.DS_Store 1.328 +</programlisting> 1.329 + </sect1> 1.330 + 1.331 + <sect1 id="sec:names:case"> 1.332 + <title>Case sensitivity</title> 1.333 + 1.334 + <para id="x_56a">If you're working in a mixed development environment that 1.335 + contains both Linux (or other Unix) systems and Macs or Windows 1.336 + systems, you should keep in the back of your mind the knowledge 1.337 + that they treat the case (<quote>N</quote> versus 1.338 + <quote>n</quote>) of file names in incompatible ways. This is 1.339 + not very likely to affect you, and it's easy to deal with if it 1.340 + does, but it could surprise you if you don't know about 1.341 + it.</para> 1.342 + 1.343 + <para id="x_56b">Operating systems and filesystems differ in the way they 1.344 + handle the <emphasis>case</emphasis> of characters in file and 1.345 + directory names. There are three common ways to handle case in 1.346 + names.</para> 1.347 + <itemizedlist> 1.348 + <listitem><para id="x_56c">Completely case insensitive. Uppercase and 1.349 + lowercase versions of a letter are treated as identical, 1.350 + both when creating a file and during subsequent accesses. 1.351 + This is common on older DOS-based systems.</para> 1.352 + </listitem> 1.353 + <listitem><para id="x_56d">Case preserving, but insensitive. When a file 1.354 + or directory is created, the case of its name is stored, and 1.355 + can be retrieved and displayed by the operating system. 1.356 + When an existing file is being looked up, its case is 1.357 + ignored. This is the standard arrangement on Windows and 1.358 + MacOS. The names <filename>foo</filename> and 1.359 + <filename>FoO</filename> identify the same file. This 1.360 + treatment of uppercase and lowercase letters as 1.361 + interchangeable is also referred to as <emphasis>case 1.362 + folding</emphasis>.</para> 1.363 + </listitem> 1.364 + <listitem><para id="x_56e">Case sensitive. The case of a name 1.365 + is significant at all times. The names 1.366 + <filename>foo</filename> and <filename>FoO</filename> 1.367 + identify different files. This is the way Linux and Unix 1.368 + systems normally work.</para> 1.369 + </listitem></itemizedlist> 1.370 + 1.371 + <para id="x_56f">On Unix-like systems, it is possible to have any or all of 1.372 + the above ways of handling case in action at once. For example, 1.373 + if you use a USB thumb drive formatted with a FAT32 filesystem 1.374 + on a Linux system, Linux will handle names on that filesystem in 1.375 + a case preserving, but insensitive, way.</para> 1.376 + 1.377 + <sect2> 1.378 + <title>Safe, portable repository storage</title> 1.379 + 1.380 + <para id="x_570">Mercurial's repository storage mechanism is <emphasis>case 1.381 + safe</emphasis>. It translates file names so that they can 1.382 + be safely stored on both case sensitive and case insensitive 1.383 + filesystems. This means that you can use normal file copying 1.384 + tools to transfer a Mercurial repository onto, for example, a 1.385 + USB thumb drive, and safely move that drive and repository 1.386 + back and forth between a Mac, a PC running Windows, and a 1.387 + Linux box.</para> 1.388 + 1.389 + </sect2> 1.390 + <sect2> 1.391 + <title>Detecting case conflicts</title> 1.392 + 1.393 + <para id="x_571">When operating in the working directory, Mercurial honours 1.394 + the naming policy of the filesystem where the working 1.395 + directory is located. If the filesystem is case preserving, 1.396 + but insensitive, Mercurial will treat names that differ only 1.397 + in case as the same.</para> 1.398 + 1.399 + <para id="x_572">An important aspect of this approach is that it is 1.400 + possible to commit a changeset on a case sensitive (typically 1.401 + Linux or Unix) filesystem that will cause trouble for users on 1.402 + case insensitive (usually Windows and MacOS) users. If a 1.403 + Linux user commits changes to two files, one named 1.404 + <filename>myfile.c</filename> and the other named 1.405 + <filename>MyFile.C</filename>, they will be stored correctly 1.406 + in the repository. And in the working directories of other 1.407 + Linux users, they will be correctly represented as separate 1.408 + files.</para> 1.409 + 1.410 + <para id="x_573">If a Windows or Mac user pulls this change, they will not 1.411 + initially have a problem, because Mercurial's repository 1.412 + storage mechanism is case safe. However, once they try to 1.413 + <command role="hg-cmd">hg update</command> the working 1.414 + directory to that changeset, or <command role="hg-cmd">hg 1.415 + merge</command> with that changeset, Mercurial will spot the 1.416 + conflict between the two file names that the filesystem would 1.417 + treat as the same, and forbid the update or merge from 1.418 + occurring.</para> 1.419 + </sect2> 1.420 + 1.421 + <sect2> 1.422 + <title>Fixing a case conflict</title> 1.423 + 1.424 + <para id="x_574">If you are using Windows or a Mac in a mixed environment 1.425 + where some of your collaborators are using Linux or Unix, and 1.426 + Mercurial reports a case folding conflict when you try to 1.427 + <command role="hg-cmd">hg update</command> or <command 1.428 + role="hg-cmd">hg merge</command>, the procedure to fix the 1.429 + problem is simple.</para> 1.430 + 1.431 + <para id="x_575">Just find a nearby Linux or Unix box, clone the problem 1.432 + repository onto it, and use Mercurial's <command 1.433 + role="hg-cmd">hg rename</command> command to change the 1.434 + names of any offending files or directories so that they will 1.435 + no longer cause case folding conflicts. Commit this change, 1.436 + <command role="hg-cmd">hg pull</command> or <command 1.437 + role="hg-cmd">hg push</command> it across to your Windows or 1.438 + MacOS system, and <command role="hg-cmd">hg update</command> 1.439 + to the revision with the non-conflicting names.</para> 1.440 + 1.441 + <para id="x_576">The changeset with case-conflicting names will remain in 1.442 + your project's history, and you still won't be able to 1.443 + <command role="hg-cmd">hg update</command> your working 1.444 + directory to that changeset on a Windows or MacOS system, but 1.445 + you can continue development unimpeded.</para> 1.446 + </sect2> 1.447 + </sect1> 1.448 +</chapter> 1.449 + 1.450 +<!-- 1.451 +local variables: 1.452 +sgml-parent-document: ("00book.xml" "book" "chapter") 1.453 +end: 1.454 +-->