For my oldest projects, I had source snapshots and releases from before they were first imported into a version control system. With Git, it's possible to insert these into the history of the project.
I did this by creating a new branch that's not attached to the existing history, and then pasting the existing history onto the end of it. Since this requires rewriting the existing history, it's best to do this as part of a conversion, rather than messing up an existing published repository (grafting would be a better option in that case).
First, start a new history
branch within the repository by creating an
empty orphan
commit, dated as early as possible:
git checkout --orphan history
git rm -rf .
GIT_AUTHOR_DATE="1970-01-01T00:00:00 +0000" \
GIT_COMMITTER_DATE="1970-01-01T00:00:00 +0000" \
git commit --allow-empty -m 'Initial empty commit.'
(We could start with a commit that imports our first tarball instead, but there's no harm in having an empty commit, and it makes editing the history again later a bit easier.)
Next we need to extract each of our tarballs, remove anything we don't want added to Git, commit the rest, and create any release tags we want. Doing this by hand gets tedious very quickly, so I wrote git-import-snapshots:
git-import-snapshots $(ls -Str ~/snapshots/project*.gz)
You'll definitely want to edit the script if you use it yourself — it needs to understand how you've named your snapshots.
In most cases I was able to use the modification date of the snapshot file to identify an appropriate commit date. In a few cases I couldn't do that; instead, I needed to look at the contents of the tarball and find the latest modification date of the files inside it:
tar tzvf snapshot.tar.gz | grep -v '/$' | sort -k 4
Viewing the branch with gitk history
should now show an appropriate
series of commits and tags.
If not, tweak the script's rules and do it again.
Now we need to join the two histories together, by rewriting the
master
branch's first commit so that it follows the latest commit on
the history
branch:
ref=$(git rev-parse history)
git filter-branch -f \
--parent-filter 'sed "s/^\$/-p '$ref'/"' \
--tag-name-filter cat \
master
While we're only really changing the first commit, all the subsequent
ones will need to be rewritten too, in order to catch up with the hash
changes.
The --tag-name-filter cat
option is required in order to preserve
tags (i.e. rewrite them to point at the rewritten commits).
We can now check out the rewritten master
branch, and throw away the
history
branch:
git checkout master
git branch -d history