bos@159: \chapter{Collaborating with other people}
bos@159: \label{cha:collab}
bos@159: 
bos@159: As a completely decentralised tool, Mercurial doesn't impose any
bos@159: policy on how people ought to work with each other.  However, if
bos@159: you're new to distributed revision control, it helps to have some
bos@159: tools and examples in mind when you're thinking about possible
bos@159: workflow models.
bos@159: 
bos@209: \section{Mercurial's web interface}
bos@209: 
bos@209: Mercurial has a powerful web interface that provides several 
bos@209: useful capabilities.
bos@209: 
bos@209: For interactive use, the web interface lets you browse a single
bos@209: repository or a collection of repositories.  You can view the history
bos@209: of a repository, examine each change (comments and diffs), and view
bos@209: the contents of each directory and file.
bos@209: 
bos@209: Also for human consumption, the web interface provides an RSS feed of
bos@209: the changes in a repository.  This lets you ``subscribe'' to a
bos@209: repository using your favourite feed reader, and be automatically
bos@209: notified of activity in that repository as soon as it happens.  I find
bos@209: this capability much more convenient than the model of subscribing to
bos@209: a mailing list to which notifications are sent, as it requires no
bos@209: additional configuration on the part of whoever is serving the
bos@209: repository.
bos@209: 
bos@209: The web interface also lets remote users clone a repository, pull
bos@209: changes from it, and (when the server is configured to permit it) push
bos@209: changes back to it.  Mercurial's HTTP tunneling protocol aggressively
bos@209: compresses data, so that it works efficiently even over low-bandwidth
bos@209: network connections.
bos@209: 
bos@209: The easiest way to get started with the web interface is to use your
bos@209: web browser to visit an existing repository, such as the master
bos@209: Mercurial repository at
bos@209: \url{http://www.selenic.com/repo/hg?style=gitweb}.
bos@209: 
bos@209: If you're interested in providing a web interface to your own
bos@209: repositories, Mercurial provides two ways to do this.  The first is
bos@209: using the \hgcmd{serve} command, which is best suited to short-term
bos@209: ``lightweight'' serving.  See section~\ref{sec:collab:serve} below for
bos@209: details of how to use this command.  If you have a long-lived
bos@209: repository that you'd like to make permanently available, Mercurial
bos@209: has built-in support for the CGI (Common Gateway Interface) standard,
bos@209: which all common web servers support.  See
bos@209: section~\ref{sec:collab:cgi} for details of CGI configuration.
bos@209: 
bos@159: \section{Collaboration models}
bos@159: 
bos@159: With a suitably flexible tool, making decisions about workflow is much
bos@159: more of a social engineering challenge than a technical one.
bos@159: Mercurial imposes few limitations on how you can structure the flow of
bos@159: work in a project, so it's up to you and your group to set up and live
bos@159: with a model that matches your own particular needs.
bos@159: 
bos@159: \subsection{Factors to keep in mind}
bos@159: 
bos@159: The most important aspect of any model that you must keep in mind is
bos@159: how well it matches the needs and capabilities of the people who will
bos@159: be using it.  This might seem self-evident; even so, you still can't
bos@159: afford to forget it for a moment.
bos@159: 
bos@159: I once put together a workflow model that seemed to make perfect sense
bos@159: to me, but that caused a considerable amount of consternation and
bos@159: strife within my development team.  In spite of my attempts to explain
bos@159: why we needed a complex set of branches, and how changes ought to flow
bos@159: between them, a few team members revolted.  Even though they were
bos@159: smart people, they didn't want to pay attention to the constraints we
bos@159: were operating under, or face the consequences of those constraints in
bos@159: the details of the model that I was advocating.
bos@159: 
bos@159: Don't sweep foreseeable social or technical problems under the rug.
bos@159: Whatever scheme you put into effect, you should plan for mistakes and
bos@159: problem scenarios.  Consider adding automated machinery to prevent, or
bos@159: quickly recover from, trouble that you can anticipate.  As an example,
bos@159: if you intend to have a branch with not-for-release changes in it,
bos@159: you'd do well to think early about the possibility that someone might
bos@159: accidentally merge those changes into a release branch.  You could
bos@159: avoid this particular problem by writing a hook that prevents changes
bos@159: from being merged from an inappropriate branch.
bos@159: 
bos@159: \subsection{Informal anarchy}
bos@159: 
bos@159: I wouldn't suggest an ``anything goes'' approach as something
bos@159: sustainable, but it's a model that's easy to grasp, and it works
bos@159: perfectly well in a few unusual situations.
bos@159: 
bos@159: As one example, many projects have a loose-knit group of collaborators
bos@159: who rarely physically meet each other.  Some groups like to overcome
bos@159: the isolation of working at a distance by organising occasional
bos@159: ``sprints''.  In a sprint, a number of people get together in a single
bos@159: location (a company's conference room, a hotel meeting room, that kind
bos@159: of place) and spend several days more or less locked in there, hacking
bos@159: intensely on a handful of projects.
bos@159: 
bos@159: A sprint is the perfect place to use the \hgcmd{serve} command, since
bos@159: \hgcmd{serve} does not requires any fancy server infrastructure.  You
bos@159: can get started with \hgcmd{serve} in moments, by reading
bos@159: section~\ref{sec:collab:serve} below.  Then simply tell the person
bos@159: next to you that you're running a server, send the URL to them in an
bos@159: instant message, and you immediately have a quick-turnaround way to
bos@159: work together.  They can type your URL into their web browser and
bos@159: quickly review your changes; or they can pull a bugfix from you and
bos@159: verify it; or they can clone a branch containing a new feature and try
bos@159: it out.
bos@159: 
bos@159: The charm, and the problem, with doing things in an ad hoc fashion
bos@159: like this is that only people who know about your changes, and where
bos@159: they are, can see them.  Such an informal approach simply doesn't
bos@159: scale beyond a handful people, because each individual needs to know
bos@159: about $n$ different repositories to pull from.
bos@159: 
bos@159: \subsection{A single central repository}
bos@159: 
bos@179: For smaller projects migrating from a centralised revision control
bos@159: tool, perhaps the easiest way to get started is to have changes flow
bos@159: through a single shared central repository.  This is also the
bos@159: most common ``building block'' for more ambitious workflow schemes.
bos@159: 
bos@159: Contributors start by cloning a copy of this repository.  They can
bos@159: pull changes from it whenever they need to, and some (perhaps all)
bos@159: developers have permission to push a change back when they're ready
bos@159: for other people to see it.
bos@159: 
bos@179: Under this model, it can still often make sense for people to pull
bos@159: changes directly from each other, without going through the central
bos@159: repository.  Consider a case in which I have a tentative bug fix, but
bos@159: I am worried that if I were to publish it to the central repository,
bos@159: it might subsequently break everyone else's trees as they pull it.  To
bos@159: reduce the potential for damage, I can ask you to clone my repository
bos@159: into a temporary repository of your own and test it.  This lets us put
bos@159: off publishing the potentially unsafe change until it has had a little
bos@159: testing.
bos@159: 
bos@159: In this kind of scenario, people usually use the \command{ssh}
bos@159: protocol to securely push changes to the central repository, as
bos@159: documented in section~\ref{sec:collab:ssh}.  It's also usual to
bos@159: publish a read-only copy of the repository over HTTP using CGI, as in
bos@159: section~\ref{sec:collab:cgi}.  Publishing over HTTP satisfies the
bos@159: needs of people who don't have push access, and those who want to use
bos@159: web browsers to browse the repository's history.
bos@159: 
bos@179: \subsection{Working with multiple branches}
bos@179: 
bos@179: Projects of any significant size naturally tend to make progress on
bos@179: several fronts simultaneously.  In the case of software, it's common
bos@179: for a project to go through periodic official releases.  A release
bos@179: might then go into ``maintenance mode'' for a while after its first
bos@179: publication; maintenance releases tend to contain only bug fixes, not
bos@179: new features.  In parallel with these maintenance releases, one or
bos@179: more future releases may be under development.  People normally use
bos@179: the word ``branch'' to refer to one of these many slightly different
bos@179: directions in which development is proceeding.
bos@179: 
bos@179: Mercurial is particularly well suited to managing a number of
bos@179: simultaneous, but not identical, branches.  Each ``development
bos@179: direction'' can live in its own central repository, and you can merge
bos@179: changes from one to another as the need arises.  Because repositories
bos@179: are independent of each other, unstable changes in a development
bos@179: branch will never affect a stable branch unless someone explicitly
bos@179: merges those changes in.
bos@179: 
bos@179: Here's an example of how this can work in practice.  Let's say you
bos@179: have one ``main branch'' on a central server.
bos@179: \interaction{branching.init}
bos@179: People clone it, make changes locally, test them, and push them back.
bos@179: 
bos@179: Once the main branch reaches a release milestone, you can use the
bos@179: \hgcmd{tag} command to give a permanent name to the milestone
bos@179: revision.
bos@179: \interaction{branching.tag}
bos@179: Let's say some ongoing development occurs on the main branch.
bos@179: \interaction{branching.main}
bos@179: Using the tag that was recorded at the milestone, people who clone
bos@179: that repository at any time in the future can use \hgcmd{update} to
bos@179: get a copy of the working directory exactly as it was when that tagged
bos@179: revision was committed.  
bos@179: \interaction{branching.update}
bos@179: 
bos@179: In addition, immediately after the main branch is tagged, someone can
bos@179: then clone the main branch on the server to a new ``stable'' branch,
bos@179: also on the server.
bos@179: \interaction{branching.clone}
bos@179: 
bos@179: Someone who needs to make a change to the stable branch can then clone
bos@179: \emph{that} repository, make their changes, commit, and push their
bos@179: changes back there.
bos@179: \interaction{branching.stable}
bos@179: Because Mercurial repositories are independent, and Mercurial doesn't
bos@179: move changes around automatically, the stable and main branches are
bos@179: \emph{isolated} from each other.  The changes that you made on the
bos@179: main branch don't ``leak'' to the stable branch, and vice versa.
bos@179: 
bos@179: You'll often want all of your bugfixes on the stable branch to show up
bos@179: on the main branch, too.  Rather than rewrite a bugfix on the main
bos@179: branch, you can simply pull and merge changes from the stable to the
bos@179: main branch, and Mercurial will bring those bugfixes in for you.
bos@179: \interaction{branching.merge}
bos@179: The main branch will still contain changes that are not on the stable
bos@179: branch, but it will also contain all of the bugfixes from the stable
bos@179: branch.  The stable branch remains unaffected by these changes.
bos@179: 
bos@179: \subsection{Feature branches}
bos@179: 
bos@179: For larger projects, an effective way to manage change is to break up
bos@179: a team into smaller groups.  Each group has a shared branch of its
bos@179: own, cloned from a single ``master'' branch used by the entire
bos@179: project.  People working on an individual branch are typically quite
bos@179: isolated from developments on other branches.
bos@179: 
bos@179: \begin{figure}[ht]
bos@179:   \centering
bos@179:   \grafix{feature-branches}
bos@179:   \caption{Feature branches}
bos@179:   \label{fig:collab:feature-branches}
bos@179: \end{figure}
bos@179: 
bos@179: When a particular feature is deemed to be in suitable shape, someone
bos@179: on that feature team pulls and merges from the master branch into the
bos@179: feature branch, then pushes back up to the master branch.
bos@179: 
bos@179: \subsection{The release train}
bos@179: 
bos@179: Some projects are organised on a ``train'' basis: a release is
bos@179: scheduled to happen every few months, and whatever features are ready
bos@179: when the ``train'' is ready to leave are allowed in.
bos@179: 
bos@179: This model resembles working with feature branches.  The difference is
bos@179: that when a feature branch misses a train, someone on the feature team
bos@184: pulls and merges the changes that went out on that train release into
bos@184: the feature branch, and the team continues its work on top of that
bos@184: release so that their feature can make the next release.
bos@179: 
bos@159: \subsection{The Linux kernel model}
bos@159: 
bos@159: The development of the Linux kernel has a shallow hierarchical
bos@159: structure, surrounded by a cloud of apparent chaos.  Because most
bos@159: Linux developers use \command{git}, a distributed revision control
bos@159: tool with capabilities similar to Mercurial, it's useful to describe
bos@159: the way work flows in that environment; if you like the ideas, the
bos@159: approach translates well across tools.
bos@159: 
bos@159: At the center of the community sits Linus Torvalds, the creator of
bos@159: Linux.  He publishes a single source repository that is considered the
bos@159: ``authoritative'' current tree by the entire developer community.
bos@159: Anyone can clone Linus's tree, but he is very choosy about whose trees
bos@159: he pulls from.
bos@159: 
bos@159: Linus has a number of ``trusted lieutenants''.  As a general rule, he
bos@159: pulls whatever changes they publish, in most cases without even
bos@159: reviewing those changes.  Some of those lieutenants are generally
bos@159: agreed to be ``maintainers'', responsible for specific subsystems
bos@159: within the kernel.  If a random kernel hacker wants to make a change
bos@159: to a subsystem that they want to end up in Linus's tree, they must
bos@159: find out who the subsystem's maintainer is, and ask that maintainer to
bos@159: take their change.  If the maintainer reviews their changes and agrees
bos@159: to take them, they'll pass them along to Linus in due course.
bos@159: 
bos@159: Individual lieutenants have their own approaches to reviewing,
bos@159: accepting, and publishing changes; and for deciding when to feed them
bos@159: to Linus.  In addition, there are several well known branches that
bos@159: people use for different purposes.  For example, a few people maintain
bos@159: ``stable'' repositories of older versions of the kernel, to which they
bos@184: apply critical fixes as needed.  Some maintainers publish multiple
bos@184: trees: one for experimental changes; one for changes that they are
bos@184: about to feed upstream; and so on.  Others just publish a single
bos@184: tree.
bos@159: 
bos@159: This model has two notable features.  The first is that it's ``pull
bos@159: only''.  You have to ask, convince, or beg another developer to take a
bos@184: change from you, because there are almost no trees to which more than
bos@184: one person can push, and there's no way to push changes into a tree
bos@184: that someone else controls.
bos@159: 
bos@159: The second is that it's based on reputation and acclaim.  If you're an
bos@159: unknown, Linus will probably ignore changes from you without even
bos@159: responding.  But a subsystem maintainer will probably review them, and
bos@159: will likely take them if they pass their criteria for suitability.
bos@159: The more ``good'' changes you contribute to a maintainer, the more
bos@159: likely they are to trust your judgment and accept your changes.  If
bos@159: you're well-known and maintain a long-lived branch for something Linus
bos@159: hasn't yet accepted, people with similar interests may pull your
bos@159: changes regularly to keep up with your work.
bos@159: 
bos@159: Reputation and acclaim don't necessarily cross subsystem or ``people''
bos@159: boundaries.  If you're a respected but specialised storage hacker, and
bos@159: you try to fix a networking bug, that change will receive a level of
bos@159: scrutiny from a network maintainer comparable to a change from a
bos@159: complete stranger.
bos@159: 
bos@159: To people who come from more orderly project backgrounds, the
bos@159: comparatively chaotic Linux kernel development process often seems
bos@159: completely insane.  It's subject to the whims of individuals; people
bos@159: make sweeping changes whenever they deem it appropriate; and the pace
bos@159: of development is astounding.  And yet Linux is a highly successful,
bos@159: well-regarded piece of software.
bos@159: 
bos@187: \subsection{Pull-only versus shared-push collaboration}
bos@187: 
bos@187: A perpetual source of heat in the open source community is whether a
bos@187: development model in which people only ever pull changes from others
bos@187: is ``better than'' one in which multiple people can push changes to a
bos@187: shared repository.
bos@187: 
bos@187: Typically, the backers of the shared-push model use tools that
bos@187: actively enforce this approach.  If you're using a centralised
bos@187: revision control tool such as Subversion, there's no way to make a
bos@187: choice over which model you'll use: the tool gives you shared-push,
bos@187: and if you want to do anything else, you'll have to roll your own
bos@187: approach on top (such as applying a patch by hand).
bos@187: 
bos@187: A good distributed revision control tool, such as Mercurial, will
bos@187: support both models.  You and your collaborators can then structure
bos@187: how you work together based on your own needs and preferences, not on
bos@187: what contortions your tools force you into.
bos@187: 
bos@187: \subsection{Where collaboration meets branch management}
bos@187: 
bos@187: Once you and your team set up some shared repositories and start
bos@187: propagating changes back and forth between local and shared repos, you
bos@187: begin to face a related, but slightly different challenge: that of
bos@187: managing the multiple directions in which your team may be moving at
bos@187: once.  Even though this subject is intimately related to how your team
bos@187: collaborates, it's dense enough to merit treatment of its own, in
bos@187: chapter~\ref{chap:branch}.
bos@187: 
bos@159: \section{The technical side of sharing}
bos@159: 
bos@210: The remainder of this chapter is devoted to the question of serving
bos@210: data to your collaborators.
bos@210: 
bos@210: \section{Informal sharing with \hgcmd{serve}}
bos@159: \label{sec:collab:serve}
bos@159: 
bos@159: Mercurial's \hgcmd{serve} command is wonderfully suited to small,
bos@159: tight-knit, and fast-paced group environments.  It also provides a
bos@159: great way to get a feel for using Mercurial commands over a network.
bos@159: 
bos@159: Run \hgcmd{serve} inside a repository, and in under a second it will
bos@159: bring up a specialised HTTP server; this will accept connections from
bos@159: any client, and serve up data for that repository until you terminate
bos@159: it.  Anyone who knows the URL of the server you just started, and can
bos@159: talk to your computer over the network, can then use a web browser or
bos@159: Mercurial to read data from that repository.  A URL for a
bos@159: \hgcmd{serve} instance running on a laptop is likely to look something
bos@159: like \Verb|http://my-laptop.local:8000/|.
bos@159: 
bos@159: The \hgcmd{serve} command is \emph{not} a general-purpose web server.
bos@159: It can do only two things:
bos@159: \begin{itemize}
bos@159: \item Allow people to browse the history of the repository it's
bos@159:   serving, from their normal web browsers.
bos@159: \item Speak Mercurial's wire protocol, so that people can
bos@159:   \hgcmd{clone} or \hgcmd{pull} changes from that repository.
bos@159: \end{itemize}
bos@159: In particular, \hgcmd{serve} won't allow remote users to \emph{modify}
bos@159: your repository.  It's intended for read-only use.
bos@159: 
bos@159: If you're getting started with Mercurial, there's nothing to prevent
bos@159: you from using \hgcmd{serve} to serve up a repository on your own
bos@159: computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and
bos@159: so on to talk to that server as if the repository was hosted remotely.
bos@159: This can help you to quickly get acquainted with using commands on
bos@159: network-hosted repositories.
bos@159: 
bos@210: \subsection{A few things to keep in mind}
bos@159: 
bos@159: Because it provides unauthenticated read access to all clients, you
bos@159: should only use \hgcmd{serve} in an environment where you either don't
bos@159: care, or have complete control over, who can access your network and
bos@159: pull data from your repository.
bos@159: 
bos@159: The \hgcmd{serve} command knows nothing about any firewall software
bos@159: you might have installed on your system or network.  It cannot detect
bos@159: or control your firewall software.  If other people are unable to talk
bos@159: to a running \hgcmd{serve} instance, the second thing you should do
bos@159: (\emph{after} you make sure that they're using the correct URL) is
bos@159: check your firewall configuration.
bos@159: 
bos@159: By default, \hgcmd{serve} listens for incoming connections on
bos@159: port~8000.  If another process is already listening on the port you
bos@159: want to use, you can specify a different port to listen on using the
bos@159: \hgopt{serve}{-p} option.
bos@159: 
bos@159: Normally, when \hgcmd{serve} starts, it prints no output, which can be
bos@159: a bit unnerving.  If you'd like to confirm that it is indeed running
bos@159: correctly, and find out what URL you should send to your
bos@159: collaborators, start it with the \hggopt{-v} option.
bos@159: 
bos@210: \section{Using the Secure Shell (ssh) protocol}
bos@159: \label{sec:collab:ssh}
bos@159: 
bos@184: You can pull and push changes securely over a network connection using
bos@184: the Secure Shell (\texttt{ssh}) protocol.  To use this successfully,
bos@184: you may have to do a little bit of configuration on the client or
bos@184: server sides.
bos@184: 
bos@184: If you're not familiar with ssh, it's a network protocol that lets you
bos@184: securely communicate with another computer.  To use it with Mercurial,
bos@184: you'll be setting up one or more user accounts on a server so that
bos@184: remote users can log in and execute commands.
bos@184: 
bos@184: (If you \emph{are} familiar with ssh, you'll probably find some of the
bos@184: material that follows to be elementary in nature.)
bos@184: 
bos@210: \subsection{How to read and write ssh URLs}
bos@184: 
bos@184: An ssh URL tends to look like this:
bos@184: \begin{codesample2}
bos@184:   ssh://bos@hg.serpentine.com:22/hg/hgbook
bos@184: \end{codesample2}
bos@184: \begin{enumerate}
bos@184: \item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh
bos@184:   protocol.
bos@184: \item The ``\texttt{bos@}'' component indicates what username to log
bos@184:   into the server as.  You can leave this out if the remote username
bos@184:   is the same as your local username.
bos@184: \item The ``\texttt{hg.serpentine.com}'' gives the hostname of the
bos@184:   server to log into.
bos@184: \item The ``:22'' identifies the port number to connect to the server
bos@184:   on.  The default port is~22, so you only need to specify this part
bos@184:   if you're \emph{not} using port~22.
bos@184: \item The remainder of the URL is the local path to the repository on
bos@184:   the server.
bos@184: \end{enumerate}
bos@184: 
bos@184: There's plenty of scope for confusion with the path component of ssh
bos@184: URLs, as there is no standard way for tools to interpret it.  Some
bos@184: programs behave differently than others when dealing with these paths.
bos@184: This isn't an ideal situation, but it's unlikely to change.  Please
bos@184: read the following paragraphs carefully.
bos@184: 
bos@184: Mercurial treats the path to a repository on the server as relative to
bos@184: the remote user's home directory.  For example, if user \texttt{foo}
bos@184: on the server has a home directory of \dirname{/home/foo}, then an ssh
bos@184: URL that contains a path component of \dirname{bar}
bos@184: \emph{really} refers to the directory \dirname{/home/foo/bar}.
bos@184: 
bos@184: If you want to specify a path relative to another user's home
bos@184: directory, you can use a path that starts with a tilde character
bos@184: followed by the user's name (let's call them \texttt{otheruser}), like
bos@184: this.
bos@184: \begin{codesample2}
bos@184:   ssh://server/~otheruser/hg/repo
bos@184: \end{codesample2}
bos@184: 
bos@184: And if you really want to specify an \emph{absolute} path on the
bos@184: server, begin the path component with two slashes, as in this example.
bos@184: \begin{codesample2}
bos@184:   ssh://server//absolute/path
bos@184: \end{codesample2}
bos@184: 
bos@210: \subsection{Finding an ssh client for your system}
bos@184: 
bos@184: Almost every Unix-like system comes with OpenSSH preinstalled.  If
bos@184: you're using such a system, run \Verb|which ssh| to find out if
bos@184: the \command{ssh} command is installed (it's usually in
bos@184: \dirname{/usr/bin}).  In the unlikely event that it isn't present,
bos@184: take a look at your system documentation to figure out how to install
bos@184: it.
bos@184: 
bos@184: On Windows, you'll first need to choose download a suitable ssh
bos@184: client.  There are two alternatives.
bos@184: \begin{itemize}
bos@184: \item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides
bos@184:   a complete suite of ssh client commands.
bos@184: \item If you have a high tolerance for pain, you can use the Cygwin
bos@184:   port of OpenSSH.
bos@184: \end{itemize}
bos@184: In either case, you'll need to edit your \hgini\ file to tell
bos@184: Mercurial where to find the actual client command.  For example, if
bos@184: you're using PuTTY, you'll need to use the \command{plink} command as
bos@184: a command-line ssh client.
bos@184: \begin{codesample2}
bos@184:   [ui]
bos@184:   ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key"
bos@184: \end{codesample2}
bos@184: 
bos@184: \begin{note}
bos@184:   The path to \command{plink} shouldn't contain any whitespace
bos@184:   characters, or Mercurial may not be able to run it correctly (so
bos@184:   putting it in \dirname{C:\\Program Files} is probably not be a good
bos@184:   idea).
bos@184: \end{note}
bos@184: 
bos@210: \subsection{Generating a key pair}
bos@184: 
bos@184: To avoid the need to repetitively type a password every time you need
bos@184: to use your ssh client, I recommend generating a key pair.  On a
bos@184: Unix-like system, the \command{ssh-keygen} command will do the trick.
bos@184: On Windows, if you're using PuTTY, the \command{puttygen} command is
bos@184: what you'll need.
bos@184: 
bos@184: When you generate a key pair, it's usually \emph{highly} advisable to
bos@184: protect it with a passphrase.  (The only time that you might not want
bos@184: to do this id when you're using the ssh protocol for automated tasks
bos@184: on a secure network.)
bos@184: 
bos@184: Simply generating a key pair isn't enough, however.  You'll need to
bos@184: add the public key to the set of authorised keys for whatever user
bos@184: you're logging in remotely as.  For servers using OpenSSH (the vast
bos@184: majority), this will mean adding the public key to a list in a file
bos@184: called \sfilename{authorized\_keys} in their \sdirname{.ssh}
bos@184: directory.
bos@184: 
bos@184: On a Unix-like system, your public key will have a \filename{.pub}
bos@184: extension.  If you're using \command{puttygen} on Windows, you can
bos@184: save the public key to a file of your choosing, or paste it from the
bos@184: window it's displayed in straight into the
bos@184: \sfilename{authorized\_keys} file.
bos@184: 
bos@210: \subsection{Using an authentication agent}
bos@184: 
bos@184: An authentication agent is a daemon that stores passphrases in memory
bos@184: (so it will forget passphrases if you log out and log back in again).
bos@184: An ssh client will notice if it's running, and query it for a
bos@184: passphrase.  If there's no authentication agent running, or the agent
bos@184: doesn't store the necessary passphrase, you'll have to type your
bos@184: passphrase every time Mercurial tries to communicate with a server on
bos@184: your behalf (e.g.~whenever you pull or push changes).
bos@184: 
bos@184: The downside of storing passphrases in an agent is that it's possible
bos@184: for a well-prepared attacker to recover the plain text of your
bos@184: passphrases, in some cases even if your system has been power-cycled.
bos@184: You should make your own judgment as to whether this is an acceptable
bos@184: risk.  It certainly saves a lot of repeated typing.
bos@184: 
bos@184: On Unix-like systems, the agent is called \command{ssh-agent}, and
bos@184: it's often run automatically for you when you log in.  You'll need to
bos@184: use the \command{ssh-add} command to add passphrases to the agent's
bos@184: store.  On Windows, if you're using PuTTY, the \command{pageant}
bos@184: command acts as the agent.  It adds an icon to your system tray that
bos@184: will let you manage stored passphrases.
bos@184: 
bos@210: \subsection{Configuring the server side properly}
bos@184: 
bos@184: Because ssh can be fiddly to set up if you're new to it, there's a
bos@184: variety of things that can go wrong.  Add Mercurial on top, and
bos@184: there's plenty more scope for head-scratching.  Most of these
bos@184: potential problems occur on the server side, not the client side.  The
bos@184: good news is that once you've gotten a configuration working, it will
bos@184: usually continue to work indefinitely.
bos@184: 
bos@184: Before you try using Mercurial to talk to an ssh server, it's best to
bos@184: make sure that you can use the normal \command{ssh} or \command{putty}
bos@184: command to talk to the server first.  If you run into problems with
bos@184: using these commands directly, Mercurial surely won't work.  Worse, it
bos@184: will obscure the underlying problem.  Any time you want to debug
bos@184: ssh-related Mercurial problems, you should drop back to making sure
bos@184: that plain ssh client commands work first, \emph{before} you worry
bos@184: about whether there's a problem with Mercurial.
bos@184: 
bos@184: The first thing to be sure of on the server side is that you can
bos@184: actually log in from another machine at all.  If you can't use
bos@184: \command{ssh} or \command{putty} to log in, the error message you get
bos@184: may give you a few hints as to what's wrong.  The most common problems
bos@184: are as follows.
bos@184: \begin{itemize}
bos@184: \item If you get a ``connection refused'' error, either there isn't an
bos@184:   SSH daemon running on the server at all, or it's inaccessible due to
bos@184:   firewall configuration.
bos@184: \item If you get a ``no route to host'' error, you either have an
bos@184:   incorrect address for the server or a seriously locked down firewall
bos@184:   that won't admit its existence at all.
bos@184: \item If you get a ``permission denied'' error, you may have mistyped
bos@184:   the username on the server, or you could have mistyped your key's
bos@184:   passphrase or the remote user's password.
bos@184: \end{itemize}
bos@184: In summary, if you're having trouble talking to the server's ssh
bos@184: daemon, first make sure that one is running at all.  On many systems
bos@184: it will be installed, but disabled, by default.  Once you're done with
bos@184: this step, you should then check that the server's firewall is
bos@184: configured to allow incoming connections on the port the ssh daemon is
bos@184: listening on (usually~22).  Don't worry about more exotic
bos@184: possibilities for misconfiguration until you've checked these two
bos@184: first.
bos@184: 
bos@184: If you're using an authentication agent on the client side to store
bos@184: passphrases for your keys, you ought to be able to log into the server
bos@184: without being prompted for a passphrase or a password.  If you're
bos@184: prompted for a passphrase, there are a few possible culprits.
bos@184: \begin{itemize}
bos@184: \item You might have forgotten to use \command{ssh-add} or
bos@184:   \command{pageant} to store the passphrase.
bos@184: \item You might have stored the passphrase for the wrong key.
bos@184: \end{itemize}
bos@184: If you're being prompted for the remote user's password, there are
bos@184: another few possible problems to check.
bos@184: \begin{itemize}
bos@184: \item Either the user's home directory or their \sdirname{.ssh}
bos@184:   directory might have excessively liberal permissions.  As a result,
bos@184:   the ssh daemon will not trust or read their
bos@184:   \sfilename{authorized\_keys} file.  For example, a group-writable
bos@184:   home or \sdirname{.ssh} directory will often cause this symptom.
bos@184: \item The user's \sfilename{authorized\_keys} file may have a problem.
bos@184:   If anyone other than the user owns or can write to that file, the
bos@184:   ssh daemon will not trust or read it.
bos@184: \end{itemize}
bos@184: 
bos@184: In the ideal world, you should be able to run the following command
bos@184: successfully, and it should print exactly one line of output, the
bos@184: current date and time.
bos@184: \begin{codesample2}
bos@184:   ssh myserver date
bos@184: \end{codesample2}
bos@184: 
bos@209: If, on your server, you have login scripts that print banners or other
bos@184: junk even when running non-interactive commands like this, you should
bos@184: fix them before you continue, so that they only print output if
bos@184: they're run interactively.  Otherwise these banners will at least
bos@184: clutter up Mercurial's output.  Worse, they could potentially cause
bos@209: problems with running Mercurial commands remotely.  Mercurial makes
bos@209: tries to detect and ignore banners in non-interactive \command{ssh}
bos@209: sessions, but it is not foolproof.  (If you're editing your login
bos@209: scripts on your server, the usual way to see if a login script is
bos@209: running in an interactive shell is to check the return code from the
bos@209: command \Verb|tty -s|.)
bos@184: 
bos@184: Once you've verified that plain old ssh is working with your server,
bos@184: the next step is to ensure that Mercurial runs on the server.  The
bos@184: following command should run successfully:
bos@184: \begin{codesample2}
bos@184:   ssh myserver hg version
bos@184: \end{codesample2}
bos@184: If you see an error message instead of normal \hgcmd{version} output,
bos@184: this is usually because you haven't installed Mercurial to
bos@184: \dirname{/usr/bin}.  Don't worry if this is the case; you don't need
bos@184: to do that.  But you should check for a few possible problems.
bos@184: \begin{itemize}
bos@184: \item Is Mercurial really installed on the server at all?  I know this
bos@184:   sounds trivial, but it's worth checking!
bos@184: \item Maybe your shell's search path (usually set via the \envar{PATH}
bos@184:   environment variable) is simply misconfigured.
bos@184: \item Perhaps your \envar{PATH} environment variable is only being set
bos@184:   to point to the location of the \command{hg} executable if the login
bos@184:   session is interactive.  This can happen if you're setting the path
bos@184:   in the wrong shell login script.  See your shell's documentation for
bos@184:   details.
bos@184: \item The \envar{PYTHONPATH} environment variable may need to contain
bos@184:   the path to the Mercurial Python modules.  It might not be set at
bos@184:   all; it could be incorrect; or it may be set only if the login is
bos@184:   interactive.
bos@184: \end{itemize}
bos@184: 
bos@184: If you can run \hgcmd{version} over an ssh connection, well done!
bos@184: You've got the server and client sorted out.  You should now be able
bos@184: to use Mercurial to access repositories hosted by that username on
bos@184: that server.  If you run into problems with Mercurial and ssh at this
bos@184: point, try using the \hggopt{--debug} option to get a clearer picture
bos@184: of what's going on.
bos@184: 
bos@210: \subsection{Using compression with ssh}
bos@184: 
bos@184: Mercurial does not compress data when it uses the ssh protocol,
bos@184: because the ssh protocol can transparently compress data.  However,
bos@184: the default behaviour of ssh clients is \emph{not} to request
bos@184: compression.
bos@184: 
bos@184: Over any network other than a fast LAN (even a wireless network),
bos@184: using compression is likely to significantly speed up Mercurial's
bos@184: network operations.  For example, over a WAN, someone measured
bos@184: compression as reducing the amount of time required to clone a
bos@184: particularly large repository from~51 minutes to~17 minutes.
bos@184: 
bos@184: Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C}
bos@184: option which turns on compression.  You can easily edit your \hgrc\ to
bos@184: enable compression for all of Mercurial's uses of the ssh protocol.
bos@184: \begin{codesample2}
bos@184:   [ui]
bos@184:   ssh = ssh -C
bos@184: \end{codesample2}
bos@184: 
bos@209: If you use \command{ssh}, you can configure it to always use
bos@209: compression when talking to your server.  To do this, edit your
bos@209: \sfilename{.ssh/config} file (which may not yet exist), as follows.
bos@209: \begin{codesample2}
bos@209:   Host hg
bos@209:     Compression yes
bos@209:     HostName hg.example.com
bos@209: \end{codesample2}
bos@209: This defines an alias, \texttt{hg}.  When you use it on the
bos@209: \command{ssh} command line or in a Mercurial \texttt{ssh}-protocol
bos@209: URL, it will cause \command{ssh} to connect to \texttt{hg.example.com}
bos@209: and use compression.  This gives you both a shorter name to type and
bos@209: compression, each of which is a good thing in its own right.
bos@209: 
bos@210: \section{Serving over HTTP using CGI}
bos@159: \label{sec:collab:cgi}
bos@159: 
bos@210: Depending on how ambitious you are, configuring Mercurial's CGI
bos@210: interface can take anything from a few moments to several hours.
bos@210: 
bos@210: We'll begin with the simplest of examples, and work our way towards a
bos@210: more complex configuration.  Even for the most basic case, you're
bos@210: almost certainly going to need to read and modify your web server's
bos@210: configuration.
bos@210: 
bos@210: \begin{note}
bos@210:   Configuring a web server is a complex, fiddly, and highly
bos@210:   system-dependent activity.  I can't possibly give you instructions
bos@210:   that will cover anything like all of the cases you will encounter.
bos@210:   Please use your discretion and judgment in following the sections
bos@210:   below.  Be prepared to make plenty of mistakes, and to spend a lot
bos@210:   of time reading your server's error logs.
bos@210: \end{note}
bos@210: 
bos@210: \subsection{Web server configuration checklist}
bos@210: 
bos@210: Before you continue, do take a few moments to check a few aspects of
bos@210: your system's setup.
bos@210: 
bos@210: \begin{enumerate}
bos@210: \item Do you have a web server installed at all?  Mac OS X ships with
bos@210:   Apache, but many other systems may not have a web server installed.
bos@210: \item If you have a web server installed, is it actually running?  On
bos@210:   most systems, even if one is present, it will be disabled by
bos@210:   default.
bos@210: \item Is your server configured to allow you to run CGI programs in
bos@210:   the directory where you plan to do so?  Most servers default to
bos@210:   explicitly disabling the ability to run CGI programs.
bos@210: \end{enumerate}
bos@210: 
bos@210: If you don't have a web server installed, and don't have substantial
bos@210: experience configuring Apache, you should consider using the
bos@210: \texttt{lighttpd} web server instead of Apache.  Apache has a
bos@210: well-deserved reputation for baroque and confusing configuration.
bos@210: While \texttt{lighttpd} is less capable in some ways than Apache, most
bos@210: of these capabilities are not relevant to serving Mercurial
bos@210: repositories.  And \texttt{lighttpd} is undeniably \emph{much} easier
bos@210: to get started with than Apache.
bos@210: 
bos@210: \subsection{Basic CGI configuration}
bos@210: 
bos@210: On Unix-like systems, it's common for users to have a subdirectory
bos@210: named something like \dirname{public\_html} in their home directory,
bos@210: from which they can serve up web pages.  A file named \filename{foo}
bos@210: in this directory will be accessible at a URL of the form
bos@210: \texttt{http://www.example.com/\~username/foo}.
bos@210: 
bos@210: To get started, find the \sfilename{hgweb.cgi} script that should be
bos@210: present in your Mercurial installation.  If you can't quickly find a
bos@210: local copy on your system, simply download one from the master
bos@210: Mercurial repository at
bos@210: \url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}.
bos@210: 
bos@210: You'll need to copy this script into your \dirname{public\_html}
bos@210: directory, and ensure that it's executable.
bos@210: \begin{codesample2}
bos@210:   cp .../hgweb.cgi ~/public_html
bos@210:   chmod +x ~/public_html/hgweb.cgi
bos@210: \end{codesample2}
bos@210: 
bos@210: \subsubsection{What could \emph{possibly} go wrong?}
bos@210: 
bos@210: Once you've copied the CGI script into place, go into a web browser,
bos@210: and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi},
bos@210: \emph{but} brace yourself for instant failure.  There's a high
bos@210: probability that trying to visit this URL will fail, and there are
bos@210: many possible reasons for this.  In fact, you're likely to stumble
bos@210: over almost every one of the possible errors below, so please read
bos@210: carefully.  The following are all of the problems I ran into on a
bos@210: system running Fedora~7, with a fresh installation of Apache, and a
bos@210: user account that I created specially.
bos@210: 
bos@210: Your web server may have per-user directories disabled.  If you're
bos@210: using Apache, search your config file for a \texttt{UserDir}
bos@210: directive.  If there's none present, per-user directories will be
bos@210: disabled.  If one exists, but its value is \texttt{disabled}, then
bos@210: per-user directories will be disabled.  Otherwise, the string after
bos@210: \texttt{UserDir} gives the name of the subdirectory that Apache will
bos@210: look in under your home directory, for example \dirname{public\_html}.
bos@210: 
bos@210: Your file access permissions may be too restrictive.  The web server
bos@210: must be able to traverse your home directory and directories under
bos@210: your \dirname{public\_html} directory, and read files under the latter
bos@210: too.  Here's a quick recipe to help you to make your permissions more
bos@210: appropriate.
bos@210: \begin{codesample2}
bos@210:   chmod 755 ~
bos@210:   find ~/public_html -type d -print0 | xargs -0r chmod 755
bos@210:   find ~/public_html -type f -print0 | xargs -0r chmod 644
bos@210: \end{codesample2}
bos@210: 
bos@210: The other possibility with permissions is that you might get a
bos@210: completely empty window when you try to load the script.  In this
bos@210: case, it's likely that your access permissions are \emph{too
bos@210:   permissive}.  Apache's \texttt{suexec} subsystem won't execute a
bos@210: script that's group-~or world-writable, for example.
bos@210: 
bos@210: Your web server may be configured to disallow execution of CGI
bos@210: programs in your per-user web directory.  Here's Apache's
bos@210: default per-user configuration from my Fedora system.
bos@210: \begin{codesample2}
bos@210:   <Directory /home/*/public_html>
bos@210:       AllowOverride FileInfo AuthConfig Limit
bos@210:       Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
bos@210:       <Limit GET POST OPTIONS>
bos@210:           Order allow,deny
bos@210:           Allow from all
bos@210:       </Limit>
bos@210:       <LimitExcept GET POST OPTIONS>
bos@210:           Order deny,allow
bos@210:           Deny from all
bos@210:       </LimitExcept>
bos@210:   </Directory>
bos@210: \end{codesample2}
bos@210: If you find a similar-looking \texttt{Directory} group in your Apache
bos@210: configuration, the directive to look at inside it is \texttt{Options}.
bos@210: Add \texttt{ExecCGI} to the end of this list if it's missing, and
bos@210: restart the web server.
bos@210: 
bos@210: If you find that Apache serves you the text of the CGI script instead
bos@210: of executing it, you may need to either uncomment (if already present)
bos@210: or add a directive like this.
bos@210: \begin{codesample2}
bos@210:   AddHandler cgi-script .cgi
bos@210: \end{codesample2}
bos@210: 
bos@210: The next possibility is that you might be served with a colourful
bos@210: Python backtrace claiming that it can't import a
bos@210: \texttt{mercurial}-related module.  This is actually progress!  The
bos@210: server is now capable of executing your CGI script.  This error is
bos@210: only likely to occur if you're running a private installation of
bos@210: Mercurial, instead of a system-wide version.  Remember that the web
bos@210: server runs the CGI program without any of the environment variables
bos@210: that you take for granted in an interactive session.  If this error
bos@210: happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the
bos@210: directions inside it to correctly set your \envar{PYTHONPATH}
bos@210: environment variable.
bos@210: 
bos@210: Finally, you are \emph{certain} to by served with another colourful
bos@210: Python backtrace: this one will complain that it can't find
bos@210: \dirname{/path/to/repository}.  Edit your \sfilename{hgweb.cgi} script
bos@210: and replace the \dirname{/path/to/repository} string with the complete
bos@210: path to the repository you want to serve up.
bos@210: 
bos@210: At this point, when you try to reload the page, you should be
bos@210: presented with a nice HTML view of your repository's history.  Whew!
bos@210: 
bos@210: \subsubsection{Configuring lighttpd}
bos@210: 
bos@210: To be exhaustive in my experiments, I tried configuring the
bos@210: increasingly popular \texttt{lighttpd} web server to serve the same
bos@210: repository as I described with Apache above.  I had already overcome
bos@210: all of the problems I outlined with Apache, many of which are not
bos@210: server-specific.  As a result, I was fairly sure that my file and
bos@210: directory permissions were good, and that my \sfilename{hgweb.cgi}
bos@210: script was properly edited.
bos@210: 
bos@210: Once I had Apache running, getting \texttt{lighttpd} to serve the
bos@210: repository was a snap.  I first had to edit the \texttt{mod\_access}
bos@210: section of the config file to enable \texttt{mod\_cgi} and
bos@210: \texttt{mod\_userdir}, both of which were disabled by default on my
bos@210: system.  I then added a few lines to the end of the config file, to
bos@210: configure these modules.
bos@210: \begin{codesample2}
bos@210:   userdir.path = "public_html"
bos@210:   cgi.assign = ( ".cgi" => "" )
bos@210: \end{codesample2}
bos@210: With this done, \texttt{lighttpd} ran immediately for me.  If I had
bos@210: configured \texttt{lighttpd} before Apache, I'd almost certainly have
bos@210: run into many of the same system-level configuration problems as I did
bos@210: with Apache.  However, I found \texttt{lighttpd} to be noticeably
bos@210: easier to configure than Apache, even though I've used Apache for over
bos@210: a decade, and this was my first exposure to \texttt{lighttpd}.
bos@159: 
bos@159: 
bos@159: %%% Local Variables: 
bos@159: %%% mode: latex
bos@159: %%% TeX-master: "00book"
bos@159: %%% End: