Converting from CVS, Darcs and Mercurial to Git

2014-03-01 · in Tech Notes · 525 words

I've been using version control for all my software and writing projects — and for other things, like the config files in my home directory — for well over ten years now. I've tried out various new systems as they've become available. As a result I've had projects using a mix of CVS, Darcs, Mercurial and Git repositories, some of which have been converted several times.

Git appears to have more or less won the VCS wars at this point, and I'm generally happy with its features; in particular, it's now got usable versions of the Darcs incremental commit and revert tools, and I really like its history-rewriting features. So I wanted to convert everything to Git — importing as much history as possible, and tidying up the messes created by previous conversions.

Conversion

I started by creating an empty Git repository:

git init project-git
cd project-git

For projects in CVS, I used git cvsimport. This is no longer recommended, but it did the job for my repos.

For projects in Darcs, I've found that darcs-bridge does the best job of conversion:

darcs-fastconvert export ~/darcs/project | git fast-import

For projects in Mercurial, I used hg-fast-export:

hg-fast-export.sh -r ~/hg/project

To see the result:

git checkout master

Where possible, I also imported tarball releases into the history.

Tidying

Most of the problems in my projects resulted from limitations of the previous version control systems I'd used, or were artefacts from previous conversions:

CVS only stored usernames, not full author information.
CVS didn't support moving files; to preserve history you needed to do a repomove. (I didn't find a way of fixing this automatically, and decided not to worry about it since it only affected a couple of files in one project.)
The Darcs-to-Mercurial conversion synthesised extra commits for merges, with convert-repo as a username.
The CVS-to-Darcs conversion I'd done had duplicated the first lines of long commit messages.
Later versions of Darcs added an Ignore-this: line to commit messages, which Darcs itself hid, but the Darcs-to-Mercurial converter had preserved.
Earlier systems used different conventions for release tag naming, whereas the Git world has more-or-less standardised on v1.2.3.

I wrote tidy-imported-git, which uses git filter-branch to fix most of these problems. It needs to know about the usernames and tag styles used in the project; you can list the usernames by doing:

git log --pretty='format:Author: %an <%ae>%nCommitter: %cn <%ce>%n' | sort -u

I also took this opportunity to filter out irrelevant history from some projects — for example, where I'd started with a shared repository for several projects which I'd later split, or where I'd imported a generated or temporary file by accident. To list the files that show up in a repository's history, you can do:

git log --numstat | awk '/^[0-9]/ { print $3 }' | sort -u

And then use git filter-branch to remove the ones you don't want (specifying a dummy --tag-name-filter to preserve the tags):

git filter-branch -f --prune-empty \
    --index-filter 'git rm -r --cached --ignore-unmatch UNWANTED' \
    --tag-name-filter cat \
    HEAD

Publishing

When pushing your converted repo, don't forget to include the tags:

git init --bare ~/pub/project
git push --tags ~/pub/project master

I publish my Git repositories by plain HTTP, using git update-server-info and rsync.