2014-03-01 · in Tech Notes · 525 words

I've been using version control for all my software and writing projects — and for other things, like the config files in my home directory — for well over ten years now. I've tried out various new systems as they've become available. As a result I've had projects using a mix of CVS, Darcs, Mercurial and Git repositories, some of which have been converted several times.

Git appears to have more or less won the VCS wars at this point, and I'm generally happy with its features; in particular, it's now got usable versions of the Darcs incremental commit and revert tools, and I really like its history-rewriting features. So I wanted to convert everything to Git — importing as much history as possible, and tidying up the messes created by previous conversions.

Conversion

I started by creating an empty Git repository:

git init project-git
cd project-git

For projects in CVS, I used git cvsimport. This is no longer recommended, but it did the job for my repos.

For projects in Darcs, I've found that darcs-bridge does the best job of conversion:

darcs-fastconvert export ~/darcs/project | git fast-import

For projects in Mercurial, I used hg-fast-export:

hg-fast-export.sh -r ~/hg/project

To see the result:

git checkout master

Where possible, I also imported tarball releases into the history.

Tidying

Most of the problems in my projects resulted from limitations of the previous version control systems I'd used, or were artefacts from previous conversions:

I wrote tidy-imported-git, which uses git filter-branch to fix most of these problems. It needs to know about the usernames and tag styles used in the project; you can list the usernames by doing:

git log --pretty='format:Author: %an <%ae>%nCommitter: %cn <%ce>%n' | sort -u

I also took this opportunity to filter out irrelevant history from some projects — for example, where I'd started with a shared repository for several projects which I'd later split, or where I'd imported a generated or temporary file by accident. To list the files that show up in a repository's history, you can do:

git log --numstat | awk '/^[0-9]/ { print $3 }' | sort -u

And then use git filter-branch to remove the ones you don't want (specifying a dummy --tag-name-filter to preserve the tags):

git filter-branch -f --prune-empty \
    --index-filter 'git rm -r --cached --ignore-unmatch UNWANTED' \
    --tag-name-filter cat \
    HEAD

Publishing

When pushing your converted repo, don't forget to include the tags:

git init --bare ~/pub/project
git push --tags ~/pub/project master

I publish my Git repositories by plain HTTP, using git update-server-info and rsync.