Migrating a large subversion repo to git in small chunks

I recently needed to migrate a relatively large repository from SVN to GIT. For simple applications, this migration is done easily with git svn clone https://...

However, due to the large repo size, or a shady network, or the fact that I was running on windows, the clone failed repeatedly. The work around is to pull the changes in pieces.

Start by initializing the project without fetching the history.

mkdir my-project && cd my-project

# the -s indicates a 'standard subversion layout'
# i.e. trunk is in the trunk folder, branches=branches, tags=tags
git svn init -s $URL

Optionally, you can setup a mapping for for your users. You’ll need to checkout the existing SVN repo, then run this command in that context to get your authors.

cd path/to/svn/folder

# Linux/OSX
svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt

# Windows
svn log -q | awk -F "|" "/^r/ {sub(\"^ \", \"\", $2); sub(\"$\", \"\", $2); print $2\" = \"$2\" ^<\"$2^>\"}" | sort > authors-transform.txt

Next you’ll need to update the file by hand with your user transformations.

# authors-transform.txt format:
# svn-user-id = git-full-name <git-email-address>

# example:
# hpaddock = Heath Paddock <heathpaddock@heathpaddock.com>

Tell your git repo about the authors file.

git config svn.authorsfile /path/to/authors-transform.txt

With your project initialized and optionally your authors-transform setup, you are ready to start pulling commits.

git svn fetch

In my case, this ran for only a few commits before it locked up due to a misalignment of the planets, or gremlins, or maybe even pebkac.

Fortunately, git-svn is pretty smart about not re-fetching revisions you already have so if/when your system crashes, you can simply re-run git svn fetch and it will continue from the last good commit

If your machine chokes like mine did you can specify a sub-range of revisions

# get revisions 1 through 10
git svn fetch -r 1:10

Once you’ve pulled down all revisions, you’ll run git log and be disappointed to find almost nothing. That’s because HEAD is still pointed to the first commit. This is easily resolved with

git rebase origin/trunk

Now you should be up-to-date as if you had run a successful git svn clone. The only thing left is to add a new origin remote and push the code.

git remote add origin $URL
git push -u origin master

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>