hgbook
annotate en/collab.tex @ 281:a880d07f2d29
Fix repository paths of data/index files in filelog diagram.
Data/index files are stored in the repository at .hg/store/data, not .hg/data.
Modify the filelog diagram to reflect this.
Data/index files are stored in the repository at .hg/store/data, not .hg/data.
Modify the filelog diagram to reflect this.
author | Arun Thomas <arun.thomas@gmail.com> |
---|---|
date | Mon Dec 17 23:16:59 2007 -0500 (2007-12-17) |
parents | 699771d085c6 |
children | 97e929385442 |
rev | line source |
---|---|
bos@159 | 1 \chapter{Collaborating with other people} |
bos@159 | 2 \label{cha:collab} |
bos@159 | 3 |
bos@159 | 4 As a completely decentralised tool, Mercurial doesn't impose any |
bos@159 | 5 policy on how people ought to work with each other. However, if |
bos@159 | 6 you're new to distributed revision control, it helps to have some |
bos@159 | 7 tools and examples in mind when you're thinking about possible |
bos@159 | 8 workflow models. |
bos@159 | 9 |
bos@209 | 10 \section{Mercurial's web interface} |
bos@209 | 11 |
bos@209 | 12 Mercurial has a powerful web interface that provides several |
bos@209 | 13 useful capabilities. |
bos@209 | 14 |
bos@209 | 15 For interactive use, the web interface lets you browse a single |
bos@209 | 16 repository or a collection of repositories. You can view the history |
bos@209 | 17 of a repository, examine each change (comments and diffs), and view |
bos@209 | 18 the contents of each directory and file. |
bos@209 | 19 |
bos@209 | 20 Also for human consumption, the web interface provides an RSS feed of |
bos@209 | 21 the changes in a repository. This lets you ``subscribe'' to a |
bos@209 | 22 repository using your favourite feed reader, and be automatically |
bos@209 | 23 notified of activity in that repository as soon as it happens. I find |
bos@209 | 24 this capability much more convenient than the model of subscribing to |
bos@209 | 25 a mailing list to which notifications are sent, as it requires no |
bos@209 | 26 additional configuration on the part of whoever is serving the |
bos@209 | 27 repository. |
bos@209 | 28 |
bos@209 | 29 The web interface also lets remote users clone a repository, pull |
bos@209 | 30 changes from it, and (when the server is configured to permit it) push |
bos@209 | 31 changes back to it. Mercurial's HTTP tunneling protocol aggressively |
bos@209 | 32 compresses data, so that it works efficiently even over low-bandwidth |
bos@209 | 33 network connections. |
bos@209 | 34 |
bos@209 | 35 The easiest way to get started with the web interface is to use your |
bos@209 | 36 web browser to visit an existing repository, such as the master |
bos@209 | 37 Mercurial repository at |
bos@209 | 38 \url{http://www.selenic.com/repo/hg?style=gitweb}. |
bos@209 | 39 |
bos@209 | 40 If you're interested in providing a web interface to your own |
bos@209 | 41 repositories, Mercurial provides two ways to do this. The first is |
bos@209 | 42 using the \hgcmd{serve} command, which is best suited to short-term |
bos@209 | 43 ``lightweight'' serving. See section~\ref{sec:collab:serve} below for |
bos@209 | 44 details of how to use this command. If you have a long-lived |
bos@209 | 45 repository that you'd like to make permanently available, Mercurial |
bos@209 | 46 has built-in support for the CGI (Common Gateway Interface) standard, |
bos@209 | 47 which all common web servers support. See |
bos@209 | 48 section~\ref{sec:collab:cgi} for details of CGI configuration. |
bos@209 | 49 |
bos@159 | 50 \section{Collaboration models} |
bos@159 | 51 |
bos@159 | 52 With a suitably flexible tool, making decisions about workflow is much |
bos@159 | 53 more of a social engineering challenge than a technical one. |
bos@159 | 54 Mercurial imposes few limitations on how you can structure the flow of |
bos@159 | 55 work in a project, so it's up to you and your group to set up and live |
bos@159 | 56 with a model that matches your own particular needs. |
bos@159 | 57 |
bos@159 | 58 \subsection{Factors to keep in mind} |
bos@159 | 59 |
bos@159 | 60 The most important aspect of any model that you must keep in mind is |
bos@159 | 61 how well it matches the needs and capabilities of the people who will |
bos@159 | 62 be using it. This might seem self-evident; even so, you still can't |
bos@159 | 63 afford to forget it for a moment. |
bos@159 | 64 |
bos@159 | 65 I once put together a workflow model that seemed to make perfect sense |
bos@159 | 66 to me, but that caused a considerable amount of consternation and |
bos@159 | 67 strife within my development team. In spite of my attempts to explain |
bos@159 | 68 why we needed a complex set of branches, and how changes ought to flow |
bos@159 | 69 between them, a few team members revolted. Even though they were |
bos@159 | 70 smart people, they didn't want to pay attention to the constraints we |
bos@159 | 71 were operating under, or face the consequences of those constraints in |
bos@159 | 72 the details of the model that I was advocating. |
bos@159 | 73 |
bos@159 | 74 Don't sweep foreseeable social or technical problems under the rug. |
bos@159 | 75 Whatever scheme you put into effect, you should plan for mistakes and |
bos@159 | 76 problem scenarios. Consider adding automated machinery to prevent, or |
bos@159 | 77 quickly recover from, trouble that you can anticipate. As an example, |
bos@159 | 78 if you intend to have a branch with not-for-release changes in it, |
bos@159 | 79 you'd do well to think early about the possibility that someone might |
bos@159 | 80 accidentally merge those changes into a release branch. You could |
bos@159 | 81 avoid this particular problem by writing a hook that prevents changes |
bos@159 | 82 from being merged from an inappropriate branch. |
bos@159 | 83 |
bos@159 | 84 \subsection{Informal anarchy} |
bos@159 | 85 |
bos@159 | 86 I wouldn't suggest an ``anything goes'' approach as something |
bos@159 | 87 sustainable, but it's a model that's easy to grasp, and it works |
bos@159 | 88 perfectly well in a few unusual situations. |
bos@159 | 89 |
bos@159 | 90 As one example, many projects have a loose-knit group of collaborators |
bos@159 | 91 who rarely physically meet each other. Some groups like to overcome |
bos@159 | 92 the isolation of working at a distance by organising occasional |
bos@159 | 93 ``sprints''. In a sprint, a number of people get together in a single |
bos@159 | 94 location (a company's conference room, a hotel meeting room, that kind |
bos@159 | 95 of place) and spend several days more or less locked in there, hacking |
bos@159 | 96 intensely on a handful of projects. |
bos@159 | 97 |
bos@159 | 98 A sprint is the perfect place to use the \hgcmd{serve} command, since |
bos@159 | 99 \hgcmd{serve} does not requires any fancy server infrastructure. You |
bos@159 | 100 can get started with \hgcmd{serve} in moments, by reading |
bos@159 | 101 section~\ref{sec:collab:serve} below. Then simply tell the person |
bos@159 | 102 next to you that you're running a server, send the URL to them in an |
bos@159 | 103 instant message, and you immediately have a quick-turnaround way to |
bos@159 | 104 work together. They can type your URL into their web browser and |
bos@159 | 105 quickly review your changes; or they can pull a bugfix from you and |
bos@159 | 106 verify it; or they can clone a branch containing a new feature and try |
bos@159 | 107 it out. |
bos@159 | 108 |
bos@159 | 109 The charm, and the problem, with doing things in an ad hoc fashion |
bos@159 | 110 like this is that only people who know about your changes, and where |
bos@159 | 111 they are, can see them. Such an informal approach simply doesn't |
bos@159 | 112 scale beyond a handful people, because each individual needs to know |
bos@159 | 113 about $n$ different repositories to pull from. |
bos@159 | 114 |
bos@159 | 115 \subsection{A single central repository} |
bos@159 | 116 |
bos@179 | 117 For smaller projects migrating from a centralised revision control |
bos@159 | 118 tool, perhaps the easiest way to get started is to have changes flow |
bos@159 | 119 through a single shared central repository. This is also the |
bos@159 | 120 most common ``building block'' for more ambitious workflow schemes. |
bos@159 | 121 |
bos@159 | 122 Contributors start by cloning a copy of this repository. They can |
bos@159 | 123 pull changes from it whenever they need to, and some (perhaps all) |
bos@159 | 124 developers have permission to push a change back when they're ready |
bos@159 | 125 for other people to see it. |
bos@159 | 126 |
bos@179 | 127 Under this model, it can still often make sense for people to pull |
bos@159 | 128 changes directly from each other, without going through the central |
bos@159 | 129 repository. Consider a case in which I have a tentative bug fix, but |
bos@159 | 130 I am worried that if I were to publish it to the central repository, |
bos@159 | 131 it might subsequently break everyone else's trees as they pull it. To |
bos@159 | 132 reduce the potential for damage, I can ask you to clone my repository |
bos@159 | 133 into a temporary repository of your own and test it. This lets us put |
bos@159 | 134 off publishing the potentially unsafe change until it has had a little |
bos@159 | 135 testing. |
bos@159 | 136 |
bos@159 | 137 In this kind of scenario, people usually use the \command{ssh} |
bos@159 | 138 protocol to securely push changes to the central repository, as |
bos@159 | 139 documented in section~\ref{sec:collab:ssh}. It's also usual to |
bos@159 | 140 publish a read-only copy of the repository over HTTP using CGI, as in |
bos@159 | 141 section~\ref{sec:collab:cgi}. Publishing over HTTP satisfies the |
bos@159 | 142 needs of people who don't have push access, and those who want to use |
bos@159 | 143 web browsers to browse the repository's history. |
bos@159 | 144 |
bos@179 | 145 \subsection{Working with multiple branches} |
bos@179 | 146 |
bos@179 | 147 Projects of any significant size naturally tend to make progress on |
bos@179 | 148 several fronts simultaneously. In the case of software, it's common |
bos@179 | 149 for a project to go through periodic official releases. A release |
bos@179 | 150 might then go into ``maintenance mode'' for a while after its first |
bos@179 | 151 publication; maintenance releases tend to contain only bug fixes, not |
bos@179 | 152 new features. In parallel with these maintenance releases, one or |
bos@179 | 153 more future releases may be under development. People normally use |
bos@179 | 154 the word ``branch'' to refer to one of these many slightly different |
bos@179 | 155 directions in which development is proceeding. |
bos@179 | 156 |
bos@179 | 157 Mercurial is particularly well suited to managing a number of |
bos@179 | 158 simultaneous, but not identical, branches. Each ``development |
bos@179 | 159 direction'' can live in its own central repository, and you can merge |
bos@179 | 160 changes from one to another as the need arises. Because repositories |
bos@179 | 161 are independent of each other, unstable changes in a development |
bos@179 | 162 branch will never affect a stable branch unless someone explicitly |
bos@179 | 163 merges those changes in. |
bos@179 | 164 |
bos@179 | 165 Here's an example of how this can work in practice. Let's say you |
bos@179 | 166 have one ``main branch'' on a central server. |
bos@179 | 167 \interaction{branching.init} |
bos@179 | 168 People clone it, make changes locally, test them, and push them back. |
bos@179 | 169 |
bos@179 | 170 Once the main branch reaches a release milestone, you can use the |
bos@179 | 171 \hgcmd{tag} command to give a permanent name to the milestone |
bos@179 | 172 revision. |
bos@179 | 173 \interaction{branching.tag} |
bos@179 | 174 Let's say some ongoing development occurs on the main branch. |
bos@179 | 175 \interaction{branching.main} |
bos@179 | 176 Using the tag that was recorded at the milestone, people who clone |
bos@179 | 177 that repository at any time in the future can use \hgcmd{update} to |
bos@179 | 178 get a copy of the working directory exactly as it was when that tagged |
bos@179 | 179 revision was committed. |
bos@179 | 180 \interaction{branching.update} |
bos@179 | 181 |
bos@179 | 182 In addition, immediately after the main branch is tagged, someone can |
bos@179 | 183 then clone the main branch on the server to a new ``stable'' branch, |
bos@179 | 184 also on the server. |
bos@179 | 185 \interaction{branching.clone} |
bos@179 | 186 |
bos@179 | 187 Someone who needs to make a change to the stable branch can then clone |
bos@179 | 188 \emph{that} repository, make their changes, commit, and push their |
bos@179 | 189 changes back there. |
bos@179 | 190 \interaction{branching.stable} |
bos@179 | 191 Because Mercurial repositories are independent, and Mercurial doesn't |
bos@179 | 192 move changes around automatically, the stable and main branches are |
bos@179 | 193 \emph{isolated} from each other. The changes that you made on the |
bos@179 | 194 main branch don't ``leak'' to the stable branch, and vice versa. |
bos@179 | 195 |
bos@179 | 196 You'll often want all of your bugfixes on the stable branch to show up |
bos@179 | 197 on the main branch, too. Rather than rewrite a bugfix on the main |
bos@179 | 198 branch, you can simply pull and merge changes from the stable to the |
bos@179 | 199 main branch, and Mercurial will bring those bugfixes in for you. |
bos@179 | 200 \interaction{branching.merge} |
bos@179 | 201 The main branch will still contain changes that are not on the stable |
bos@179 | 202 branch, but it will also contain all of the bugfixes from the stable |
bos@179 | 203 branch. The stable branch remains unaffected by these changes. |
bos@179 | 204 |
bos@179 | 205 \subsection{Feature branches} |
bos@179 | 206 |
bos@179 | 207 For larger projects, an effective way to manage change is to break up |
bos@179 | 208 a team into smaller groups. Each group has a shared branch of its |
bos@179 | 209 own, cloned from a single ``master'' branch used by the entire |
bos@179 | 210 project. People working on an individual branch are typically quite |
bos@179 | 211 isolated from developments on other branches. |
bos@179 | 212 |
bos@179 | 213 \begin{figure}[ht] |
bos@179 | 214 \centering |
bos@179 | 215 \grafix{feature-branches} |
bos@179 | 216 \caption{Feature branches} |
bos@179 | 217 \label{fig:collab:feature-branches} |
bos@179 | 218 \end{figure} |
bos@179 | 219 |
bos@179 | 220 When a particular feature is deemed to be in suitable shape, someone |
bos@179 | 221 on that feature team pulls and merges from the master branch into the |
bos@179 | 222 feature branch, then pushes back up to the master branch. |
bos@179 | 223 |
bos@179 | 224 \subsection{The release train} |
bos@179 | 225 |
bos@179 | 226 Some projects are organised on a ``train'' basis: a release is |
bos@179 | 227 scheduled to happen every few months, and whatever features are ready |
bos@179 | 228 when the ``train'' is ready to leave are allowed in. |
bos@179 | 229 |
bos@179 | 230 This model resembles working with feature branches. The difference is |
bos@179 | 231 that when a feature branch misses a train, someone on the feature team |
bos@184 | 232 pulls and merges the changes that went out on that train release into |
bos@184 | 233 the feature branch, and the team continues its work on top of that |
bos@184 | 234 release so that their feature can make the next release. |
bos@179 | 235 |
bos@159 | 236 \subsection{The Linux kernel model} |
bos@159 | 237 |
bos@159 | 238 The development of the Linux kernel has a shallow hierarchical |
bos@159 | 239 structure, surrounded by a cloud of apparent chaos. Because most |
bos@159 | 240 Linux developers use \command{git}, a distributed revision control |
bos@159 | 241 tool with capabilities similar to Mercurial, it's useful to describe |
bos@159 | 242 the way work flows in that environment; if you like the ideas, the |
bos@159 | 243 approach translates well across tools. |
bos@159 | 244 |
bos@159 | 245 At the center of the community sits Linus Torvalds, the creator of |
bos@159 | 246 Linux. He publishes a single source repository that is considered the |
bos@159 | 247 ``authoritative'' current tree by the entire developer community. |
bos@159 | 248 Anyone can clone Linus's tree, but he is very choosy about whose trees |
bos@159 | 249 he pulls from. |
bos@159 | 250 |
bos@159 | 251 Linus has a number of ``trusted lieutenants''. As a general rule, he |
bos@159 | 252 pulls whatever changes they publish, in most cases without even |
bos@159 | 253 reviewing those changes. Some of those lieutenants are generally |
bos@159 | 254 agreed to be ``maintainers'', responsible for specific subsystems |
bos@159 | 255 within the kernel. If a random kernel hacker wants to make a change |
bos@159 | 256 to a subsystem that they want to end up in Linus's tree, they must |
bos@159 | 257 find out who the subsystem's maintainer is, and ask that maintainer to |
bos@159 | 258 take their change. If the maintainer reviews their changes and agrees |
bos@159 | 259 to take them, they'll pass them along to Linus in due course. |
bos@159 | 260 |
bos@159 | 261 Individual lieutenants have their own approaches to reviewing, |
bos@159 | 262 accepting, and publishing changes; and for deciding when to feed them |
bos@159 | 263 to Linus. In addition, there are several well known branches that |
bos@159 | 264 people use for different purposes. For example, a few people maintain |
bos@159 | 265 ``stable'' repositories of older versions of the kernel, to which they |
bos@184 | 266 apply critical fixes as needed. Some maintainers publish multiple |
bos@184 | 267 trees: one for experimental changes; one for changes that they are |
bos@184 | 268 about to feed upstream; and so on. Others just publish a single |
bos@184 | 269 tree. |
bos@159 | 270 |
bos@159 | 271 This model has two notable features. The first is that it's ``pull |
bos@159 | 272 only''. You have to ask, convince, or beg another developer to take a |
bos@184 | 273 change from you, because there are almost no trees to which more than |
bos@184 | 274 one person can push, and there's no way to push changes into a tree |
bos@184 | 275 that someone else controls. |
bos@159 | 276 |
bos@159 | 277 The second is that it's based on reputation and acclaim. If you're an |
bos@159 | 278 unknown, Linus will probably ignore changes from you without even |
bos@159 | 279 responding. But a subsystem maintainer will probably review them, and |
bos@159 | 280 will likely take them if they pass their criteria for suitability. |
bos@159 | 281 The more ``good'' changes you contribute to a maintainer, the more |
bos@159 | 282 likely they are to trust your judgment and accept your changes. If |
bos@159 | 283 you're well-known and maintain a long-lived branch for something Linus |
bos@159 | 284 hasn't yet accepted, people with similar interests may pull your |
bos@159 | 285 changes regularly to keep up with your work. |
bos@159 | 286 |
bos@159 | 287 Reputation and acclaim don't necessarily cross subsystem or ``people'' |
bos@159 | 288 boundaries. If you're a respected but specialised storage hacker, and |
bos@159 | 289 you try to fix a networking bug, that change will receive a level of |
bos@159 | 290 scrutiny from a network maintainer comparable to a change from a |
bos@159 | 291 complete stranger. |
bos@159 | 292 |
bos@159 | 293 To people who come from more orderly project backgrounds, the |
bos@159 | 294 comparatively chaotic Linux kernel development process often seems |
bos@159 | 295 completely insane. It's subject to the whims of individuals; people |
bos@159 | 296 make sweeping changes whenever they deem it appropriate; and the pace |
bos@159 | 297 of development is astounding. And yet Linux is a highly successful, |
bos@159 | 298 well-regarded piece of software. |
bos@159 | 299 |
bos@187 | 300 \subsection{Pull-only versus shared-push collaboration} |
bos@187 | 301 |
bos@187 | 302 A perpetual source of heat in the open source community is whether a |
bos@187 | 303 development model in which people only ever pull changes from others |
bos@187 | 304 is ``better than'' one in which multiple people can push changes to a |
bos@187 | 305 shared repository. |
bos@187 | 306 |
bos@187 | 307 Typically, the backers of the shared-push model use tools that |
bos@187 | 308 actively enforce this approach. If you're using a centralised |
bos@187 | 309 revision control tool such as Subversion, there's no way to make a |
bos@187 | 310 choice over which model you'll use: the tool gives you shared-push, |
bos@187 | 311 and if you want to do anything else, you'll have to roll your own |
bos@187 | 312 approach on top (such as applying a patch by hand). |
bos@187 | 313 |
bos@187 | 314 A good distributed revision control tool, such as Mercurial, will |
bos@187 | 315 support both models. You and your collaborators can then structure |
bos@187 | 316 how you work together based on your own needs and preferences, not on |
bos@187 | 317 what contortions your tools force you into. |
bos@187 | 318 |
bos@187 | 319 \subsection{Where collaboration meets branch management} |
bos@187 | 320 |
bos@187 | 321 Once you and your team set up some shared repositories and start |
bos@187 | 322 propagating changes back and forth between local and shared repos, you |
bos@187 | 323 begin to face a related, but slightly different challenge: that of |
bos@187 | 324 managing the multiple directions in which your team may be moving at |
bos@187 | 325 once. Even though this subject is intimately related to how your team |
bos@187 | 326 collaborates, it's dense enough to merit treatment of its own, in |
bos@187 | 327 chapter~\ref{chap:branch}. |
bos@187 | 328 |
bos@159 | 329 \section{The technical side of sharing} |
bos@159 | 330 |
bos@210 | 331 The remainder of this chapter is devoted to the question of serving |
bos@210 | 332 data to your collaborators. |
bos@210 | 333 |
bos@210 | 334 \section{Informal sharing with \hgcmd{serve}} |
bos@159 | 335 \label{sec:collab:serve} |
bos@159 | 336 |
bos@159 | 337 Mercurial's \hgcmd{serve} command is wonderfully suited to small, |
bos@159 | 338 tight-knit, and fast-paced group environments. It also provides a |
bos@159 | 339 great way to get a feel for using Mercurial commands over a network. |
bos@159 | 340 |
bos@159 | 341 Run \hgcmd{serve} inside a repository, and in under a second it will |
bos@159 | 342 bring up a specialised HTTP server; this will accept connections from |
bos@159 | 343 any client, and serve up data for that repository until you terminate |
bos@159 | 344 it. Anyone who knows the URL of the server you just started, and can |
bos@159 | 345 talk to your computer over the network, can then use a web browser or |
bos@159 | 346 Mercurial to read data from that repository. A URL for a |
bos@159 | 347 \hgcmd{serve} instance running on a laptop is likely to look something |
bos@159 | 348 like \Verb|http://my-laptop.local:8000/|. |
bos@159 | 349 |
bos@159 | 350 The \hgcmd{serve} command is \emph{not} a general-purpose web server. |
bos@159 | 351 It can do only two things: |
bos@159 | 352 \begin{itemize} |
bos@159 | 353 \item Allow people to browse the history of the repository it's |
bos@159 | 354 serving, from their normal web browsers. |
bos@159 | 355 \item Speak Mercurial's wire protocol, so that people can |
bos@159 | 356 \hgcmd{clone} or \hgcmd{pull} changes from that repository. |
bos@159 | 357 \end{itemize} |
bos@159 | 358 In particular, \hgcmd{serve} won't allow remote users to \emph{modify} |
bos@159 | 359 your repository. It's intended for read-only use. |
bos@159 | 360 |
bos@159 | 361 If you're getting started with Mercurial, there's nothing to prevent |
bos@159 | 362 you from using \hgcmd{serve} to serve up a repository on your own |
bos@159 | 363 computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and |
bos@159 | 364 so on to talk to that server as if the repository was hosted remotely. |
bos@159 | 365 This can help you to quickly get acquainted with using commands on |
bos@159 | 366 network-hosted repositories. |
bos@159 | 367 |
bos@210 | 368 \subsection{A few things to keep in mind} |
bos@159 | 369 |
bos@159 | 370 Because it provides unauthenticated read access to all clients, you |
bos@159 | 371 should only use \hgcmd{serve} in an environment where you either don't |
bos@159 | 372 care, or have complete control over, who can access your network and |
bos@159 | 373 pull data from your repository. |
bos@159 | 374 |
bos@159 | 375 The \hgcmd{serve} command knows nothing about any firewall software |
bos@159 | 376 you might have installed on your system or network. It cannot detect |
bos@159 | 377 or control your firewall software. If other people are unable to talk |
bos@159 | 378 to a running \hgcmd{serve} instance, the second thing you should do |
bos@159 | 379 (\emph{after} you make sure that they're using the correct URL) is |
bos@159 | 380 check your firewall configuration. |
bos@159 | 381 |
bos@159 | 382 By default, \hgcmd{serve} listens for incoming connections on |
bos@159 | 383 port~8000. If another process is already listening on the port you |
bos@159 | 384 want to use, you can specify a different port to listen on using the |
bos@159 | 385 \hgopt{serve}{-p} option. |
bos@159 | 386 |
bos@159 | 387 Normally, when \hgcmd{serve} starts, it prints no output, which can be |
bos@159 | 388 a bit unnerving. If you'd like to confirm that it is indeed running |
bos@159 | 389 correctly, and find out what URL you should send to your |
bos@159 | 390 collaborators, start it with the \hggopt{-v} option. |
bos@159 | 391 |
bos@210 | 392 \section{Using the Secure Shell (ssh) protocol} |
bos@159 | 393 \label{sec:collab:ssh} |
bos@159 | 394 |
bos@184 | 395 You can pull and push changes securely over a network connection using |
bos@184 | 396 the Secure Shell (\texttt{ssh}) protocol. To use this successfully, |
bos@184 | 397 you may have to do a little bit of configuration on the client or |
bos@184 | 398 server sides. |
bos@184 | 399 |
bos@184 | 400 If you're not familiar with ssh, it's a network protocol that lets you |
bos@184 | 401 securely communicate with another computer. To use it with Mercurial, |
bos@184 | 402 you'll be setting up one or more user accounts on a server so that |
bos@184 | 403 remote users can log in and execute commands. |
bos@184 | 404 |
bos@184 | 405 (If you \emph{are} familiar with ssh, you'll probably find some of the |
bos@184 | 406 material that follows to be elementary in nature.) |
bos@184 | 407 |
bos@210 | 408 \subsection{How to read and write ssh URLs} |
bos@184 | 409 |
bos@184 | 410 An ssh URL tends to look like this: |
bos@184 | 411 \begin{codesample2} |
bos@184 | 412 ssh://bos@hg.serpentine.com:22/hg/hgbook |
bos@184 | 413 \end{codesample2} |
bos@184 | 414 \begin{enumerate} |
bos@184 | 415 \item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh |
bos@184 | 416 protocol. |
bos@184 | 417 \item The ``\texttt{bos@}'' component indicates what username to log |
bos@184 | 418 into the server as. You can leave this out if the remote username |
bos@184 | 419 is the same as your local username. |
bos@184 | 420 \item The ``\texttt{hg.serpentine.com}'' gives the hostname of the |
bos@184 | 421 server to log into. |
bos@184 | 422 \item The ``:22'' identifies the port number to connect to the server |
bos@184 | 423 on. The default port is~22, so you only need to specify this part |
bos@184 | 424 if you're \emph{not} using port~22. |
bos@184 | 425 \item The remainder of the URL is the local path to the repository on |
bos@184 | 426 the server. |
bos@184 | 427 \end{enumerate} |
bos@184 | 428 |
bos@184 | 429 There's plenty of scope for confusion with the path component of ssh |
bos@184 | 430 URLs, as there is no standard way for tools to interpret it. Some |
bos@184 | 431 programs behave differently than others when dealing with these paths. |
bos@184 | 432 This isn't an ideal situation, but it's unlikely to change. Please |
bos@184 | 433 read the following paragraphs carefully. |
bos@184 | 434 |
bos@184 | 435 Mercurial treats the path to a repository on the server as relative to |
bos@184 | 436 the remote user's home directory. For example, if user \texttt{foo} |
bos@184 | 437 on the server has a home directory of \dirname{/home/foo}, then an ssh |
bos@184 | 438 URL that contains a path component of \dirname{bar} |
bos@184 | 439 \emph{really} refers to the directory \dirname{/home/foo/bar}. |
bos@184 | 440 |
bos@184 | 441 If you want to specify a path relative to another user's home |
bos@184 | 442 directory, you can use a path that starts with a tilde character |
bos@184 | 443 followed by the user's name (let's call them \texttt{otheruser}), like |
bos@184 | 444 this. |
bos@184 | 445 \begin{codesample2} |
bos@184 | 446 ssh://server/~otheruser/hg/repo |
bos@184 | 447 \end{codesample2} |
bos@184 | 448 |
bos@184 | 449 And if you really want to specify an \emph{absolute} path on the |
bos@184 | 450 server, begin the path component with two slashes, as in this example. |
bos@184 | 451 \begin{codesample2} |
bos@184 | 452 ssh://server//absolute/path |
bos@184 | 453 \end{codesample2} |
bos@184 | 454 |
bos@210 | 455 \subsection{Finding an ssh client for your system} |
bos@184 | 456 |
bos@184 | 457 Almost every Unix-like system comes with OpenSSH preinstalled. If |
bos@184 | 458 you're using such a system, run \Verb|which ssh| to find out if |
bos@184 | 459 the \command{ssh} command is installed (it's usually in |
bos@184 | 460 \dirname{/usr/bin}). In the unlikely event that it isn't present, |
bos@184 | 461 take a look at your system documentation to figure out how to install |
bos@184 | 462 it. |
bos@184 | 463 |
bos@184 | 464 On Windows, you'll first need to choose download a suitable ssh |
bos@184 | 465 client. There are two alternatives. |
bos@184 | 466 \begin{itemize} |
bos@184 | 467 \item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides |
bos@184 | 468 a complete suite of ssh client commands. |
bos@184 | 469 \item If you have a high tolerance for pain, you can use the Cygwin |
bos@184 | 470 port of OpenSSH. |
bos@184 | 471 \end{itemize} |
bos@184 | 472 In either case, you'll need to edit your \hgini\ file to tell |
bos@184 | 473 Mercurial where to find the actual client command. For example, if |
bos@184 | 474 you're using PuTTY, you'll need to use the \command{plink} command as |
bos@184 | 475 a command-line ssh client. |
bos@184 | 476 \begin{codesample2} |
bos@184 | 477 [ui] |
bos@184 | 478 ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key" |
bos@184 | 479 \end{codesample2} |
bos@184 | 480 |
bos@184 | 481 \begin{note} |
bos@184 | 482 The path to \command{plink} shouldn't contain any whitespace |
bos@184 | 483 characters, or Mercurial may not be able to run it correctly (so |
arne@264 | 484 putting it in \dirname{C:\\Program Files} is probably not a good |
bos@184 | 485 idea). |
bos@184 | 486 \end{note} |
bos@184 | 487 |
bos@210 | 488 \subsection{Generating a key pair} |
bos@184 | 489 |
bos@184 | 490 To avoid the need to repetitively type a password every time you need |
bos@184 | 491 to use your ssh client, I recommend generating a key pair. On a |
bos@184 | 492 Unix-like system, the \command{ssh-keygen} command will do the trick. |
bos@184 | 493 On Windows, if you're using PuTTY, the \command{puttygen} command is |
bos@184 | 494 what you'll need. |
bos@184 | 495 |
bos@184 | 496 When you generate a key pair, it's usually \emph{highly} advisable to |
bos@184 | 497 protect it with a passphrase. (The only time that you might not want |
bos@184 | 498 to do this id when you're using the ssh protocol for automated tasks |
bos@184 | 499 on a secure network.) |
bos@184 | 500 |
bos@184 | 501 Simply generating a key pair isn't enough, however. You'll need to |
bos@184 | 502 add the public key to the set of authorised keys for whatever user |
bos@184 | 503 you're logging in remotely as. For servers using OpenSSH (the vast |
bos@184 | 504 majority), this will mean adding the public key to a list in a file |
bos@184 | 505 called \sfilename{authorized\_keys} in their \sdirname{.ssh} |
bos@184 | 506 directory. |
bos@184 | 507 |
bos@184 | 508 On a Unix-like system, your public key will have a \filename{.pub} |
bos@184 | 509 extension. If you're using \command{puttygen} on Windows, you can |
bos@184 | 510 save the public key to a file of your choosing, or paste it from the |
bos@184 | 511 window it's displayed in straight into the |
bos@184 | 512 \sfilename{authorized\_keys} file. |
bos@184 | 513 |
bos@210 | 514 \subsection{Using an authentication agent} |
bos@184 | 515 |
bos@184 | 516 An authentication agent is a daemon that stores passphrases in memory |
bos@184 | 517 (so it will forget passphrases if you log out and log back in again). |
bos@184 | 518 An ssh client will notice if it's running, and query it for a |
bos@184 | 519 passphrase. If there's no authentication agent running, or the agent |
bos@184 | 520 doesn't store the necessary passphrase, you'll have to type your |
bos@184 | 521 passphrase every time Mercurial tries to communicate with a server on |
bos@184 | 522 your behalf (e.g.~whenever you pull or push changes). |
bos@184 | 523 |
bos@184 | 524 The downside of storing passphrases in an agent is that it's possible |
bos@184 | 525 for a well-prepared attacker to recover the plain text of your |
bos@184 | 526 passphrases, in some cases even if your system has been power-cycled. |
bos@184 | 527 You should make your own judgment as to whether this is an acceptable |
bos@184 | 528 risk. It certainly saves a lot of repeated typing. |
bos@184 | 529 |
bos@184 | 530 On Unix-like systems, the agent is called \command{ssh-agent}, and |
bos@184 | 531 it's often run automatically for you when you log in. You'll need to |
bos@184 | 532 use the \command{ssh-add} command to add passphrases to the agent's |
bos@184 | 533 store. On Windows, if you're using PuTTY, the \command{pageant} |
bos@184 | 534 command acts as the agent. It adds an icon to your system tray that |
bos@184 | 535 will let you manage stored passphrases. |
bos@184 | 536 |
bos@210 | 537 \subsection{Configuring the server side properly} |
bos@184 | 538 |
bos@184 | 539 Because ssh can be fiddly to set up if you're new to it, there's a |
bos@184 | 540 variety of things that can go wrong. Add Mercurial on top, and |
bos@184 | 541 there's plenty more scope for head-scratching. Most of these |
bos@184 | 542 potential problems occur on the server side, not the client side. The |
bos@184 | 543 good news is that once you've gotten a configuration working, it will |
bos@184 | 544 usually continue to work indefinitely. |
bos@184 | 545 |
bos@184 | 546 Before you try using Mercurial to talk to an ssh server, it's best to |
bos@184 | 547 make sure that you can use the normal \command{ssh} or \command{putty} |
bos@184 | 548 command to talk to the server first. If you run into problems with |
bos@184 | 549 using these commands directly, Mercurial surely won't work. Worse, it |
bos@184 | 550 will obscure the underlying problem. Any time you want to debug |
bos@184 | 551 ssh-related Mercurial problems, you should drop back to making sure |
bos@184 | 552 that plain ssh client commands work first, \emph{before} you worry |
bos@184 | 553 about whether there's a problem with Mercurial. |
bos@184 | 554 |
bos@184 | 555 The first thing to be sure of on the server side is that you can |
bos@184 | 556 actually log in from another machine at all. If you can't use |
bos@184 | 557 \command{ssh} or \command{putty} to log in, the error message you get |
bos@184 | 558 may give you a few hints as to what's wrong. The most common problems |
bos@184 | 559 are as follows. |
bos@184 | 560 \begin{itemize} |
bos@184 | 561 \item If you get a ``connection refused'' error, either there isn't an |
bos@184 | 562 SSH daemon running on the server at all, or it's inaccessible due to |
bos@184 | 563 firewall configuration. |
bos@184 | 564 \item If you get a ``no route to host'' error, you either have an |
bos@184 | 565 incorrect address for the server or a seriously locked down firewall |
bos@184 | 566 that won't admit its existence at all. |
bos@184 | 567 \item If you get a ``permission denied'' error, you may have mistyped |
bos@184 | 568 the username on the server, or you could have mistyped your key's |
bos@184 | 569 passphrase or the remote user's password. |
bos@184 | 570 \end{itemize} |
bos@184 | 571 In summary, if you're having trouble talking to the server's ssh |
bos@184 | 572 daemon, first make sure that one is running at all. On many systems |
bos@184 | 573 it will be installed, but disabled, by default. Once you're done with |
bos@184 | 574 this step, you should then check that the server's firewall is |
bos@184 | 575 configured to allow incoming connections on the port the ssh daemon is |
bos@184 | 576 listening on (usually~22). Don't worry about more exotic |
bos@184 | 577 possibilities for misconfiguration until you've checked these two |
bos@184 | 578 first. |
bos@184 | 579 |
bos@184 | 580 If you're using an authentication agent on the client side to store |
bos@184 | 581 passphrases for your keys, you ought to be able to log into the server |
bos@184 | 582 without being prompted for a passphrase or a password. If you're |
bos@184 | 583 prompted for a passphrase, there are a few possible culprits. |
bos@184 | 584 \begin{itemize} |
bos@184 | 585 \item You might have forgotten to use \command{ssh-add} or |
bos@184 | 586 \command{pageant} to store the passphrase. |
bos@184 | 587 \item You might have stored the passphrase for the wrong key. |
bos@184 | 588 \end{itemize} |
bos@184 | 589 If you're being prompted for the remote user's password, there are |
bos@184 | 590 another few possible problems to check. |
bos@184 | 591 \begin{itemize} |
bos@184 | 592 \item Either the user's home directory or their \sdirname{.ssh} |
bos@184 | 593 directory might have excessively liberal permissions. As a result, |
bos@184 | 594 the ssh daemon will not trust or read their |
bos@184 | 595 \sfilename{authorized\_keys} file. For example, a group-writable |
bos@184 | 596 home or \sdirname{.ssh} directory will often cause this symptom. |
bos@184 | 597 \item The user's \sfilename{authorized\_keys} file may have a problem. |
bos@184 | 598 If anyone other than the user owns or can write to that file, the |
bos@184 | 599 ssh daemon will not trust or read it. |
bos@184 | 600 \end{itemize} |
bos@184 | 601 |
bos@184 | 602 In the ideal world, you should be able to run the following command |
bos@184 | 603 successfully, and it should print exactly one line of output, the |
bos@184 | 604 current date and time. |
bos@184 | 605 \begin{codesample2} |
bos@184 | 606 ssh myserver date |
bos@184 | 607 \end{codesample2} |
bos@184 | 608 |
bos@209 | 609 If, on your server, you have login scripts that print banners or other |
bos@184 | 610 junk even when running non-interactive commands like this, you should |
bos@184 | 611 fix them before you continue, so that they only print output if |
bos@184 | 612 they're run interactively. Otherwise these banners will at least |
bos@184 | 613 clutter up Mercurial's output. Worse, they could potentially cause |
bos@209 | 614 problems with running Mercurial commands remotely. Mercurial makes |
bos@209 | 615 tries to detect and ignore banners in non-interactive \command{ssh} |
bos@209 | 616 sessions, but it is not foolproof. (If you're editing your login |
bos@209 | 617 scripts on your server, the usual way to see if a login script is |
bos@209 | 618 running in an interactive shell is to check the return code from the |
bos@209 | 619 command \Verb|tty -s|.) |
bos@184 | 620 |
bos@184 | 621 Once you've verified that plain old ssh is working with your server, |
bos@184 | 622 the next step is to ensure that Mercurial runs on the server. The |
bos@184 | 623 following command should run successfully: |
bos@184 | 624 \begin{codesample2} |
bos@184 | 625 ssh myserver hg version |
bos@184 | 626 \end{codesample2} |
bos@184 | 627 If you see an error message instead of normal \hgcmd{version} output, |
bos@184 | 628 this is usually because you haven't installed Mercurial to |
bos@184 | 629 \dirname{/usr/bin}. Don't worry if this is the case; you don't need |
bos@184 | 630 to do that. But you should check for a few possible problems. |
bos@184 | 631 \begin{itemize} |
bos@184 | 632 \item Is Mercurial really installed on the server at all? I know this |
bos@184 | 633 sounds trivial, but it's worth checking! |
bos@184 | 634 \item Maybe your shell's search path (usually set via the \envar{PATH} |
bos@184 | 635 environment variable) is simply misconfigured. |
bos@184 | 636 \item Perhaps your \envar{PATH} environment variable is only being set |
bos@184 | 637 to point to the location of the \command{hg} executable if the login |
bos@184 | 638 session is interactive. This can happen if you're setting the path |
bos@184 | 639 in the wrong shell login script. See your shell's documentation for |
bos@184 | 640 details. |
bos@184 | 641 \item The \envar{PYTHONPATH} environment variable may need to contain |
bos@184 | 642 the path to the Mercurial Python modules. It might not be set at |
bos@184 | 643 all; it could be incorrect; or it may be set only if the login is |
bos@184 | 644 interactive. |
bos@184 | 645 \end{itemize} |
bos@184 | 646 |
bos@184 | 647 If you can run \hgcmd{version} over an ssh connection, well done! |
bos@184 | 648 You've got the server and client sorted out. You should now be able |
bos@184 | 649 to use Mercurial to access repositories hosted by that username on |
bos@184 | 650 that server. If you run into problems with Mercurial and ssh at this |
bos@184 | 651 point, try using the \hggopt{--debug} option to get a clearer picture |
bos@184 | 652 of what's going on. |
bos@184 | 653 |
bos@210 | 654 \subsection{Using compression with ssh} |
bos@184 | 655 |
bos@184 | 656 Mercurial does not compress data when it uses the ssh protocol, |
bos@184 | 657 because the ssh protocol can transparently compress data. However, |
bos@184 | 658 the default behaviour of ssh clients is \emph{not} to request |
bos@184 | 659 compression. |
bos@184 | 660 |
bos@184 | 661 Over any network other than a fast LAN (even a wireless network), |
bos@184 | 662 using compression is likely to significantly speed up Mercurial's |
bos@184 | 663 network operations. For example, over a WAN, someone measured |
bos@184 | 664 compression as reducing the amount of time required to clone a |
bos@184 | 665 particularly large repository from~51 minutes to~17 minutes. |
bos@184 | 666 |
bos@184 | 667 Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C} |
bos@184 | 668 option which turns on compression. You can easily edit your \hgrc\ to |
bos@184 | 669 enable compression for all of Mercurial's uses of the ssh protocol. |
bos@184 | 670 \begin{codesample2} |
bos@184 | 671 [ui] |
bos@184 | 672 ssh = ssh -C |
bos@184 | 673 \end{codesample2} |
bos@184 | 674 |
bos@209 | 675 If you use \command{ssh}, you can configure it to always use |
bos@209 | 676 compression when talking to your server. To do this, edit your |
bos@209 | 677 \sfilename{.ssh/config} file (which may not yet exist), as follows. |
bos@209 | 678 \begin{codesample2} |
bos@209 | 679 Host hg |
bos@209 | 680 Compression yes |
bos@209 | 681 HostName hg.example.com |
bos@209 | 682 \end{codesample2} |
bos@209 | 683 This defines an alias, \texttt{hg}. When you use it on the |
bos@209 | 684 \command{ssh} command line or in a Mercurial \texttt{ssh}-protocol |
bos@209 | 685 URL, it will cause \command{ssh} to connect to \texttt{hg.example.com} |
bos@209 | 686 and use compression. This gives you both a shorter name to type and |
bos@209 | 687 compression, each of which is a good thing in its own right. |
bos@209 | 688 |
bos@210 | 689 \section{Serving over HTTP using CGI} |
bos@159 | 690 \label{sec:collab:cgi} |
bos@159 | 691 |
bos@210 | 692 Depending on how ambitious you are, configuring Mercurial's CGI |
bos@210 | 693 interface can take anything from a few moments to several hours. |
bos@210 | 694 |
bos@210 | 695 We'll begin with the simplest of examples, and work our way towards a |
bos@210 | 696 more complex configuration. Even for the most basic case, you're |
bos@210 | 697 almost certainly going to need to read and modify your web server's |
bos@210 | 698 configuration. |
bos@210 | 699 |
bos@210 | 700 \begin{note} |
bos@210 | 701 Configuring a web server is a complex, fiddly, and highly |
bos@210 | 702 system-dependent activity. I can't possibly give you instructions |
bos@210 | 703 that will cover anything like all of the cases you will encounter. |
bos@210 | 704 Please use your discretion and judgment in following the sections |
bos@210 | 705 below. Be prepared to make plenty of mistakes, and to spend a lot |
bos@210 | 706 of time reading your server's error logs. |
bos@210 | 707 \end{note} |
bos@210 | 708 |
bos@210 | 709 \subsection{Web server configuration checklist} |
bos@210 | 710 |
bos@210 | 711 Before you continue, do take a few moments to check a few aspects of |
bos@210 | 712 your system's setup. |
bos@210 | 713 |
bos@210 | 714 \begin{enumerate} |
bos@210 | 715 \item Do you have a web server installed at all? Mac OS X ships with |
bos@210 | 716 Apache, but many other systems may not have a web server installed. |
bos@210 | 717 \item If you have a web server installed, is it actually running? On |
bos@210 | 718 most systems, even if one is present, it will be disabled by |
bos@210 | 719 default. |
bos@210 | 720 \item Is your server configured to allow you to run CGI programs in |
bos@210 | 721 the directory where you plan to do so? Most servers default to |
bos@210 | 722 explicitly disabling the ability to run CGI programs. |
bos@210 | 723 \end{enumerate} |
bos@210 | 724 |
bos@210 | 725 If you don't have a web server installed, and don't have substantial |
bos@210 | 726 experience configuring Apache, you should consider using the |
bos@210 | 727 \texttt{lighttpd} web server instead of Apache. Apache has a |
bos@210 | 728 well-deserved reputation for baroque and confusing configuration. |
bos@210 | 729 While \texttt{lighttpd} is less capable in some ways than Apache, most |
bos@210 | 730 of these capabilities are not relevant to serving Mercurial |
bos@210 | 731 repositories. And \texttt{lighttpd} is undeniably \emph{much} easier |
bos@210 | 732 to get started with than Apache. |
bos@210 | 733 |
bos@210 | 734 \subsection{Basic CGI configuration} |
bos@210 | 735 |
bos@210 | 736 On Unix-like systems, it's common for users to have a subdirectory |
bos@210 | 737 named something like \dirname{public\_html} in their home directory, |
bos@210 | 738 from which they can serve up web pages. A file named \filename{foo} |
bos@210 | 739 in this directory will be accessible at a URL of the form |
bos@210 | 740 \texttt{http://www.example.com/\~username/foo}. |
bos@210 | 741 |
bos@210 | 742 To get started, find the \sfilename{hgweb.cgi} script that should be |
bos@210 | 743 present in your Mercurial installation. If you can't quickly find a |
bos@210 | 744 local copy on your system, simply download one from the master |
bos@210 | 745 Mercurial repository at |
bos@210 | 746 \url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}. |
bos@210 | 747 |
bos@210 | 748 You'll need to copy this script into your \dirname{public\_html} |
bos@210 | 749 directory, and ensure that it's executable. |
bos@210 | 750 \begin{codesample2} |
bos@210 | 751 cp .../hgweb.cgi ~/public_html |
bos@211 | 752 chmod 755 ~/public_html/hgweb.cgi |
bos@211 | 753 \end{codesample2} |
bos@211 | 754 The \texttt{755} argument to \command{chmod} is a little more general |
bos@211 | 755 than just making the script executable: it ensures that the script is |
bos@211 | 756 executable by anyone, and that ``group'' and ``other'' write |
bos@211 | 757 permissions are \emph{not} set. If you were to leave those write |
bos@211 | 758 permissions enabled, Apache's \texttt{suexec} subsystem would likely |
bos@211 | 759 refuse to execute the script. In fact, \texttt{suexec} also insists |
bos@211 | 760 that the \emph{directory} in which the script resides must not be |
bos@211 | 761 writable by others. |
bos@211 | 762 \begin{codesample2} |
bos@211 | 763 chmod 755 ~/public_html |
bos@210 | 764 \end{codesample2} |
bos@210 | 765 |
bos@210 | 766 \subsubsection{What could \emph{possibly} go wrong?} |
bos@211 | 767 \label{sec:collab:wtf} |
bos@210 | 768 |
bos@210 | 769 Once you've copied the CGI script into place, go into a web browser, |
bos@210 | 770 and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi}, |
bos@210 | 771 \emph{but} brace yourself for instant failure. There's a high |
bos@210 | 772 probability that trying to visit this URL will fail, and there are |
bos@210 | 773 many possible reasons for this. In fact, you're likely to stumble |
bos@210 | 774 over almost every one of the possible errors below, so please read |
bos@210 | 775 carefully. The following are all of the problems I ran into on a |
bos@210 | 776 system running Fedora~7, with a fresh installation of Apache, and a |
bos@211 | 777 user account that I created specially to perform this exercise. |
bos@210 | 778 |
bos@210 | 779 Your web server may have per-user directories disabled. If you're |
bos@210 | 780 using Apache, search your config file for a \texttt{UserDir} |
bos@210 | 781 directive. If there's none present, per-user directories will be |
bos@210 | 782 disabled. If one exists, but its value is \texttt{disabled}, then |
bos@210 | 783 per-user directories will be disabled. Otherwise, the string after |
bos@210 | 784 \texttt{UserDir} gives the name of the subdirectory that Apache will |
bos@210 | 785 look in under your home directory, for example \dirname{public\_html}. |
bos@210 | 786 |
bos@210 | 787 Your file access permissions may be too restrictive. The web server |
bos@210 | 788 must be able to traverse your home directory and directories under |
bos@210 | 789 your \dirname{public\_html} directory, and read files under the latter |
bos@210 | 790 too. Here's a quick recipe to help you to make your permissions more |
bos@210 | 791 appropriate. |
bos@210 | 792 \begin{codesample2} |
bos@210 | 793 chmod 755 ~ |
bos@210 | 794 find ~/public_html -type d -print0 | xargs -0r chmod 755 |
bos@210 | 795 find ~/public_html -type f -print0 | xargs -0r chmod 644 |
bos@210 | 796 \end{codesample2} |
bos@210 | 797 |
bos@210 | 798 The other possibility with permissions is that you might get a |
bos@210 | 799 completely empty window when you try to load the script. In this |
bos@210 | 800 case, it's likely that your access permissions are \emph{too |
bos@210 | 801 permissive}. Apache's \texttt{suexec} subsystem won't execute a |
bos@210 | 802 script that's group-~or world-writable, for example. |
bos@210 | 803 |
bos@210 | 804 Your web server may be configured to disallow execution of CGI |
bos@210 | 805 programs in your per-user web directory. Here's Apache's |
bos@210 | 806 default per-user configuration from my Fedora system. |
bos@210 | 807 \begin{codesample2} |
bos@210 | 808 <Directory /home/*/public_html> |
bos@210 | 809 AllowOverride FileInfo AuthConfig Limit |
bos@210 | 810 Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec |
bos@210 | 811 <Limit GET POST OPTIONS> |
bos@210 | 812 Order allow,deny |
bos@210 | 813 Allow from all |
bos@210 | 814 </Limit> |
bos@210 | 815 <LimitExcept GET POST OPTIONS> |
bos@210 | 816 Order deny,allow |
bos@210 | 817 Deny from all |
bos@210 | 818 </LimitExcept> |
bos@210 | 819 </Directory> |
bos@210 | 820 \end{codesample2} |
bos@210 | 821 If you find a similar-looking \texttt{Directory} group in your Apache |
bos@210 | 822 configuration, the directive to look at inside it is \texttt{Options}. |
bos@210 | 823 Add \texttt{ExecCGI} to the end of this list if it's missing, and |
bos@210 | 824 restart the web server. |
bos@210 | 825 |
bos@210 | 826 If you find that Apache serves you the text of the CGI script instead |
bos@210 | 827 of executing it, you may need to either uncomment (if already present) |
bos@210 | 828 or add a directive like this. |
bos@210 | 829 \begin{codesample2} |
bos@210 | 830 AddHandler cgi-script .cgi |
bos@210 | 831 \end{codesample2} |
bos@210 | 832 |
bos@210 | 833 The next possibility is that you might be served with a colourful |
bos@210 | 834 Python backtrace claiming that it can't import a |
bos@210 | 835 \texttt{mercurial}-related module. This is actually progress! The |
bos@210 | 836 server is now capable of executing your CGI script. This error is |
bos@210 | 837 only likely to occur if you're running a private installation of |
bos@210 | 838 Mercurial, instead of a system-wide version. Remember that the web |
bos@210 | 839 server runs the CGI program without any of the environment variables |
bos@210 | 840 that you take for granted in an interactive session. If this error |
bos@210 | 841 happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the |
bos@210 | 842 directions inside it to correctly set your \envar{PYTHONPATH} |
bos@210 | 843 environment variable. |
bos@210 | 844 |
bos@210 | 845 Finally, you are \emph{certain} to by served with another colourful |
bos@210 | 846 Python backtrace: this one will complain that it can't find |
bos@210 | 847 \dirname{/path/to/repository}. Edit your \sfilename{hgweb.cgi} script |
bos@210 | 848 and replace the \dirname{/path/to/repository} string with the complete |
bos@210 | 849 path to the repository you want to serve up. |
bos@210 | 850 |
bos@210 | 851 At this point, when you try to reload the page, you should be |
bos@210 | 852 presented with a nice HTML view of your repository's history. Whew! |
bos@210 | 853 |
bos@210 | 854 \subsubsection{Configuring lighttpd} |
bos@210 | 855 |
bos@210 | 856 To be exhaustive in my experiments, I tried configuring the |
bos@210 | 857 increasingly popular \texttt{lighttpd} web server to serve the same |
bos@210 | 858 repository as I described with Apache above. I had already overcome |
bos@210 | 859 all of the problems I outlined with Apache, many of which are not |
bos@210 | 860 server-specific. As a result, I was fairly sure that my file and |
bos@210 | 861 directory permissions were good, and that my \sfilename{hgweb.cgi} |
bos@210 | 862 script was properly edited. |
bos@210 | 863 |
bos@210 | 864 Once I had Apache running, getting \texttt{lighttpd} to serve the |
bos@211 | 865 repository was a snap (in other words, even if you're trying to use |
bos@211 | 866 \texttt{lighttpd}, you should read the Apache section). I first had |
bos@211 | 867 to edit the \texttt{mod\_access} section of its config file to enable |
bos@211 | 868 \texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were |
bos@211 | 869 disabled by default on my system. I then added a few lines to the end |
bos@211 | 870 of the config file, to configure these modules. |
bos@210 | 871 \begin{codesample2} |
bos@210 | 872 userdir.path = "public_html" |
bos@210 | 873 cgi.assign = ( ".cgi" => "" ) |
bos@210 | 874 \end{codesample2} |
bos@210 | 875 With this done, \texttt{lighttpd} ran immediately for me. If I had |
bos@210 | 876 configured \texttt{lighttpd} before Apache, I'd almost certainly have |
bos@210 | 877 run into many of the same system-level configuration problems as I did |
bos@210 | 878 with Apache. However, I found \texttt{lighttpd} to be noticeably |
bos@210 | 879 easier to configure than Apache, even though I've used Apache for over |
bos@210 | 880 a decade, and this was my first exposure to \texttt{lighttpd}. |
bos@159 | 881 |
bos@211 | 882 \subsection{Sharing multiple repositories with one CGI script} |
bos@211 | 883 |
bos@211 | 884 The \sfilename{hgweb.cgi} script only lets you publish a single |
bos@211 | 885 repository, which is an annoying restriction. If you want to publish |
bos@211 | 886 more than one without wracking yourself with multiple copies of the |
bos@211 | 887 same script, each with different names, a better choice is to use the |
bos@211 | 888 \sfilename{hgwebdir.cgi} script. |
bos@211 | 889 |
bos@211 | 890 The procedure to configure \sfilename{hgwebdir.cgi} is only a little |
bos@211 | 891 more involved than for \sfilename{hgweb.cgi}. First, you must obtain |
bos@211 | 892 a copy of the script. If you don't have one handy, you can download a |
bos@211 | 893 copy from the master Mercurial repository at |
bos@211 | 894 \url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}. |
bos@211 | 895 |
bos@211 | 896 You'll need to copy this script into your \dirname{public\_html} |
bos@211 | 897 directory, and ensure that it's executable. |
bos@211 | 898 \begin{codesample2} |
bos@211 | 899 cp .../hgwebdir.cgi ~/public_html |
bos@211 | 900 chmod 755 ~/public_html ~/public_html/hgwebdir.cgi |
bos@211 | 901 \end{codesample2} |
bos@211 | 902 With basic configuration out of the way, try to visit |
bos@211 | 903 \url{http://myhostname/~myuser/hgwebdir.cgi} in your browser. It |
bos@211 | 904 should display an empty list of repositories. If you get a blank |
bos@211 | 905 window or error message, try walking through the list of potential |
bos@211 | 906 problems in section~\ref{sec:collab:wtf}. |
bos@211 | 907 |
bos@211 | 908 The \sfilename{hgwebdir.cgi} script relies on an external |
bos@211 | 909 configuration file. By default, it searches for a file named |
bos@211 | 910 \sfilename{hgweb.config} in the same directory as itself. You'll need |
bos@211 | 911 to create this file, and make it world-readable. The format of the |
bos@211 | 912 file is similar to a Windows ``ini'' file, as understood by Python's |
bos@211 | 913 \texttt{ConfigParser}~\cite{web:configparser} module. |
bos@211 | 914 |
bos@211 | 915 The easiest way to configure \sfilename{hgwebdir.cgi} is with a |
bos@211 | 916 section named \texttt{collections}. This will automatically publish |
bos@211 | 917 \emph{every} repository under the directories you name. The section |
bos@211 | 918 should look like this: |
bos@211 | 919 \begin{codesample2} |
bos@211 | 920 [collections] |
bos@211 | 921 /my/root = /my/root |
bos@211 | 922 \end{codesample2} |
bos@211 | 923 Mercurial interprets this by looking at the directory name on the |
bos@211 | 924 \emph{right} hand side of the ``\texttt{=}'' sign; finding |
bos@211 | 925 repositories in that directory hierarchy; and using the text on the |
bos@211 | 926 \emph{left} to strip off matching text from the names it will actually |
bos@212 | 927 list in the web interface. The remaining component of a path after |
bos@212 | 928 this stripping has occurred is called a ``virtual path''. |
bos@211 | 929 |
bos@211 | 930 Given the example above, if we have a repository whose local path is |
bos@211 | 931 \dirname{/my/root/this/repo}, the CGI script will strip the leading |
bos@212 | 932 \dirname{/my/root} from the name, and publish the repository with a |
bos@212 | 933 virtual path of \dirname{this/repo}. If the base URL for our CGI |
bos@212 | 934 script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete |
bos@212 | 935 URL for that repository will be |
bos@211 | 936 \url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}. |
bos@211 | 937 |
bos@211 | 938 If we replace \dirname{/my/root} on the left hand side of this example |
bos@211 | 939 with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off |
bos@212 | 940 \dirname{/my} from the repository name, and will give us a virtual |
bos@212 | 941 path of \dirname{root/this/repo} instead of \dirname{this/repo}. |
bos@211 | 942 |
bos@211 | 943 The \sfilename{hgwebdir.cgi} script will recursively search each |
bos@211 | 944 directory listed in the \texttt{collections} section of its |
bos@211 | 945 configuration file, but it will \texttt{not} recurse into the |
bos@211 | 946 repositories it finds. |
bos@211 | 947 |
bos@211 | 948 The \texttt{collections} mechanism makes it easy to publish many |
bos@211 | 949 repositories in a ``fire and forget'' manner. You only need to set up |
bos@211 | 950 the CGI script and configuration file one time. Afterwards, you can |
bos@211 | 951 publish or unpublish a repository at any time by simply moving it |
bos@211 | 952 into, or out of, the directory hierarchy in which you've configured |
bos@211 | 953 \sfilename{hgwebdir.cgi} to look. |
bos@159 | 954 |
bos@215 | 955 \subsubsection{Explicitly specifying which repositories to publish} |
bos@212 | 956 |
bos@212 | 957 In addition to the \texttt{collections} mechanism, the |
bos@212 | 958 \sfilename{hgwebdir.cgi} script allows you to publish a specific list |
bos@212 | 959 of repositories. To do so, create a \texttt{paths} section, with |
bos@212 | 960 contents of the following form. |
bos@212 | 961 \begin{codesample2} |
bos@212 | 962 [paths] |
bos@212 | 963 repo1 = /my/path/to/some/repo |
bos@212 | 964 repo2 = /some/path/to/another |
bos@212 | 965 \end{codesample2} |
bos@212 | 966 In this case, the virtual path (the component that will appear in a |
bos@212 | 967 URL) is on the left hand side of each definition, while the path to |
bos@212 | 968 the repository is on the right. Notice that there does not need to be |
bos@212 | 969 any relationship between the virtual path you choose and the location |
bos@212 | 970 of a repository in your filesystem. |
bos@212 | 971 |
bos@212 | 972 If you wish, you can use both the \texttt{collections} and |
bos@212 | 973 \texttt{paths} mechanisms simultaneously in a single configuration |
bos@212 | 974 file. |
bos@212 | 975 |
bos@212 | 976 \begin{note} |
bos@212 | 977 If multiple repositories have the same virtual path, |
bos@212 | 978 \sfilename{hgwebdir.cgi} will not report an error. Instead, it will |
bos@212 | 979 behave unpredictably. |
bos@212 | 980 \end{note} |
bos@212 | 981 |
bos@215 | 982 \subsection{Downloading source archives} |
bos@215 | 983 |
bos@215 | 984 Mercurial's web interface lets users download an archive of any |
bos@215 | 985 revision. This archive will contain a snapshot of the working |
bos@215 | 986 directory as of that revision, but it will not contain a copy of the |
bos@215 | 987 repository data. |
bos@215 | 988 |
bos@215 | 989 By default, this feature is not enabled. To enable it, you'll need to |
bos@215 | 990 add an \rcitem{web}{allow\_archive} item to the \rcsection{web} |
bos@215 | 991 section of your \hgrc. |
bos@215 | 992 |
bos@215 | 993 \subsection{Web configuration options} |
bos@215 | 994 |
bos@215 | 995 Mercurial's web interfaces (the \hgcmd{serve} command, and the |
bos@215 | 996 \sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a |
bos@215 | 997 number of configuration options that you can set. These belong in a |
bos@215 | 998 section named \rcsection{web}. |
bos@215 | 999 \begin{itemize} |
bos@215 | 1000 \item[\rcitem{web}{allow\_archive}] Determines which (if any) archive |
bos@215 | 1001 download mechanisms Mercurial supports. If you enable this |
bos@215 | 1002 feature, users of the web interface will be able to download an |
bos@215 | 1003 archive of whatever revision of a repository they are viewing. |
bos@215 | 1004 To enable the archive feature, this item must take the form of a |
bos@215 | 1005 sequence of words drawn from the list below. |
bos@215 | 1006 \begin{itemize} |
bos@215 | 1007 \item[\texttt{bz2}] A \command{tar} archive, compressed using |
bos@215 | 1008 \texttt{bzip2} compression. This has the best compression ratio, |
bos@215 | 1009 but uses the most CPU time on the server. |
bos@215 | 1010 \item[\texttt{gz}] A \command{tar} archive, compressed using |
bos@215 | 1011 \texttt{gzip} compression. |
bos@215 | 1012 \item[\texttt{zip}] A \command{zip} archive, compressed using LZW |
bos@215 | 1013 compression. This format has the worst compression ratio, but is |
bos@215 | 1014 widely used in the Windows world. |
bos@215 | 1015 \end{itemize} |
bos@215 | 1016 If you provide an empty list, or don't have an |
bos@215 | 1017 \rcitem{web}{allow\_archive} entry at all, this feature will be |
bos@215 | 1018 disabled. Here is an example of how to enable all three supported |
bos@215 | 1019 formats. |
bos@215 | 1020 \begin{codesample4} |
bos@215 | 1021 [web] |
bos@215 | 1022 allow_archive = bz2 gz zip |
bos@215 | 1023 \end{codesample4} |
bos@215 | 1024 \item[\rcitem{web}{allowpull}] Boolean. Determines whether the web |
bos@215 | 1025 interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this |
bos@215 | 1026 repository over~HTTP. If set to \texttt{no} or \texttt{false}, only |
bos@215 | 1027 the ``human-oriented'' portion of the web interface is available. |
bos@215 | 1028 \item[\rcitem{web}{contact}] String. A free-form (but preferably |
bos@215 | 1029 brief) string identifying the person or group in charge of the |
bos@215 | 1030 repository. This often contains the name and email address of a |
bos@216 | 1031 person or mailing list. It often makes sense to place this entry in |
bos@216 | 1032 a repository's own \sfilename{.hg/hgrc} file, but it can make sense |
bos@216 | 1033 to use in a global \hgrc\ if every repository has a single |
bos@216 | 1034 maintainer. |
bos@215 | 1035 \item[\rcitem{web}{maxchanges}] Integer. The default maximum number |
bos@215 | 1036 of changesets to display in a single page of output. |
bos@215 | 1037 \item[\rcitem{web}{maxfiles}] Integer. The default maximum number |
bos@215 | 1038 of modified files to display in a single page of output. |
bos@215 | 1039 \item[\rcitem{web}{stripes}] Integer. If the web interface displays |
bos@215 | 1040 alternating ``stripes'' to make it easier to visually align rows |
bos@215 | 1041 when you are looking at a table, this number controls the number of |
bos@215 | 1042 rows in each stripe. |
bos@215 | 1043 \item[\rcitem{web}{style}] Controls the template Mercurial uses to |
bos@215 | 1044 display the web interface. Mercurial ships with two web templates, |
bos@215 | 1045 named \texttt{default} and \texttt{gitweb} (the latter is much more |
bos@215 | 1046 visually attractive). You can also specify a custom template of |
bos@215 | 1047 your own; see chapter~\ref{chap:template} for details. Here, you |
bos@215 | 1048 can see how to enable the \texttt{gitweb} style. |
bos@215 | 1049 \begin{codesample4} |
bos@215 | 1050 [web] |
bos@215 | 1051 style = gitweb |
bos@215 | 1052 \end{codesample4} |
bos@215 | 1053 \item[\rcitem{web}{templates}] Path. The directory in which to search |
bos@215 | 1054 for template files. By default, Mercurial searches in the directory |
bos@215 | 1055 in which it was installed. |
bos@215 | 1056 \end{itemize} |
bos@215 | 1057 If you are using \sfilename{hgwebdir.cgi}, you can place a few |
bos@215 | 1058 configuration items in a \rcsection{web} section of the |
bos@215 | 1059 \sfilename{hgweb.config} file instead of a \hgrc\ file, for |
bos@215 | 1060 convenience. These items are \rcitem{web}{motd} and |
bos@215 | 1061 \rcitem{web}{style}. |
bos@215 | 1062 |
bos@216 | 1063 \subsubsection{Options specific to an individual repository} |
bos@216 | 1064 |
bos@216 | 1065 A few \rcsection{web} configuration items ought to be placed in a |
bos@216 | 1066 repository's local \sfilename{.hg/hgrc}, rather than a user's or |
bos@216 | 1067 global \hgrc. |
bos@216 | 1068 \begin{itemize} |
bos@216 | 1069 \item[\rcitem{web}{description}] String. A free-form (but preferably |
bos@216 | 1070 brief) string that describes the contents or purpose of the |
bos@216 | 1071 repository. |
bos@216 | 1072 \item[\rcitem{web}{name}] String. The name to use for the repository |
bos@216 | 1073 in the web interface. This overrides the default name, which is the |
bos@216 | 1074 last component of the repository's path. |
bos@216 | 1075 \end{itemize} |
bos@216 | 1076 |
bos@215 | 1077 \subsubsection{Options specific to the \hgcmd{serve} command} |
bos@215 | 1078 |
bos@215 | 1079 Some of the items in the \rcsection{web} section of a \hgrc\ file are |
bos@215 | 1080 only for use with the \hgcmd{serve} command. |
bos@215 | 1081 \begin{itemize} |
bos@215 | 1082 \item[\rcitem{web}{accesslog}] Path. The name of a file into which to |
bos@215 | 1083 write an access log. By default, the \hgcmd{serve} command writes |
bos@215 | 1084 this information to standard output, not to a file. Log entries are |
bos@215 | 1085 written in the standard ``combined'' file format used by almost all |
bos@215 | 1086 web servers. |
bos@215 | 1087 \item[\rcitem{web}{address}] String. The local address on which the |
bos@215 | 1088 server should listen for incoming connections. By default, the |
bos@215 | 1089 server listens on all addresses. |
bos@215 | 1090 \item[\rcitem{web}{errorlog}] Path. The name of a file into which to |
bos@215 | 1091 write an error log. By default, the \hgcmd{serve} command writes this |
bos@215 | 1092 information to standard error, not to a file. |
bos@215 | 1093 \item[\rcitem{web}{ipv6}] Boolean. Whether to use the IPv6 protocol. |
bos@215 | 1094 By default, IPv6 is not used. |
bos@215 | 1095 \item[\rcitem{web}{port}] Integer. The TCP~port number on which the |
bos@215 | 1096 server should listen. The default port number used is~8000. |
bos@215 | 1097 \end{itemize} |
bos@215 | 1098 |
bos@216 | 1099 \subsubsection{Choosing the right \hgrc\ file to add \rcsection{web} |
bos@216 | 1100 items to} |
bos@216 | 1101 |
bos@216 | 1102 It is important to remember that a web server like Apache or |
bos@216 | 1103 \texttt{lighttpd} will run under a user~ID that is different to yours. |
bos@216 | 1104 CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will |
bos@216 | 1105 usually also run under that user~ID. |
bos@216 | 1106 |
bos@216 | 1107 If you add \rcsection{web} items to your own personal \hgrc\ file, CGI |
bos@216 | 1108 scripts won't read that \hgrc\ file. Those settings will thus only |
bos@216 | 1109 affect the behaviour of the \hgcmd{serve} command when you run it. To |
bos@216 | 1110 cause CGI scripts to see your settings, either create a \hgrc\ file in |
bos@216 | 1111 the home directory of the user ID that runs your web server, or add |
bos@216 | 1112 those settings to a system-wide \hgrc\ file. |
bos@216 | 1113 |
bos@216 | 1114 |
bos@159 | 1115 %%% Local Variables: |
bos@159 | 1116 %%% mode: latex |
bos@159 | 1117 %%% TeX-master: "00book" |
bos@159 | 1118 %%% End: |