Skip navigation.

Notes to self

Whatever I feel like writing

Posts tagged with "git"

Why git-svn is not a real solution

, ,

For the past two months or so, my primary Subversion client has been git-svn. Obviously, I like Git. It is my favorite SCM tool and cheap local branches, coupled with an excellent ability to merge branches, has a lot do with why that is.

So why git-svn? Well, we have historically been standardizing on CVS and Subversion at my work place, and all of our code is versioned by one of these systems. I could mention many things that I dislike about these two SCMs, but their Eclipse integration is pretty good, and that has been politically decided to be a big plus. Therefore, all code I produce at work must be put in either CVS or Subversion, and Git is not under consideration until it can be shown that the EGit Eclipse plug-in has accrued an acceptable amount of awesome. This is why I decided to put my faith in git-svn two months back.

As you might have guessed from the title, things aren't all rosy. It's not that git-svn doesn't work, it's the way it works that is bothering me. Git allows you to manipulate the history of your repository, and when you do a merge in Git, the commits from each branch will be chronically interspersed in between each other. This model does not map very well to Subversion. If I merge a branch into trunk in subversion, I am in essence making one big fat commit to trunk that goes on top of all of the existing commits, and contains all of the changes from the branch. There is no history rewriting because the history is lost by flattening all of these commits into one.

Therefor, to shoe-horn itself into this model, git-svn will use rebasing instead of merging. When you rebase a branch A unto another branch B in git, you are taking all commits from A that are not already in B, and reapply them, in order, to the HEAD of B. This makes it look like these commits where written and committed after all other commits in B.

This is a clever trick but it comes at a pretty high cost. A commit that is retrofitted to have a different past, is no longer the same commit. So what happens when you make some changes in B that you back in A? With ordinary merging in Git, this problem is trivial - you simply merge B back into A. However, because of the rebasing, A and B now contain two sets of commits that are essentially the same changes, but with different past and therefore different commits. This means that merging breaks down. If its just one or two commits we're talking about, then we can cherry-pick. But if we are talking about a larger set of changes, then we quickly find ourselves reaching for the
rebase-katana.

Rebasing is a powerful tool that allows us to rewrite history. However, if we use it too much, we can quickly rewrite ourselves hairy mess of unmergeable branches. Consider this: just as we can get merge failures when two commits are incompatible, we can likewise get rebase failures. But because rebasing is reapplying all distinct commits in order, a rebase failure will affect all subsequent commits waiting in line of the rebasing process. A merge failure is resolved once for a merge, but if you are rebasing 10 commits unto a branch and the third commit fails, then that failure could potentially propagate to the seven other commits, if they depend on the changes in a failing earlier commit.

What are the consequences of this? Because git-svn does a rebase on every git-svn dcommit and git-svn rebase, thus (more or less) rewriting history, I am pretty much left with a git repository where cheap local branches and merging is somewhere between troublesome and non-functioning.

That's a pretty steep cost, in my humble opinion. But I still use it. Yes, despite these dragons and drawbacks, I still use git-svn as my primary Subversion client. I still like my ability to freely mold and modify unpushed history, I still like my git-bisect and I still like colored console output.

I have experienced these pitfalls on my own repositories and I have learned to avoid them. But if you think that you can reap the benefits of Git while keeping ye olden Subversion around, then you're wrong. With git-svn, you are voluntarily pulling a SVN branded straight jacket over your Git repository. To truly get the most out of Git, you absolutely have to sync with a real remote Git repository.

Keeping a tidy history with Git.

,

Git is a powertool that allows you to modify your history and be selective with the contents each time you commit. I believe this power should be used to keep as tidy a history as practically possible.

Mind you that "modifying history" does not include history that has been pushed and is visible to others. While it is possible to do, it is also considered a cardinal sin among Gitters, because you ruining the repositories of people who have pulled the history that you since modified.

There are two qualities that I try achieve when I prepare a commmit: coherence and consistency. Explanation follows.

Consistency is about not committing a broken build. This is important in any SCM, but with Git you might argue that only the latest commit in any push needs to build - I beg to differ.

The reason that every commit must at least build, is for the sake of any future git-bisect. I was debugging a nasty memory leak in a Java application once - a server was leaking interned Strings. The project was kept in CVS but for the purpose of debugging (among other things) I had exported the repository to Git. This allowed me to hunt for the commit that had introduced the memory leak with git-bisect, which I though was pretty clever. That is, until I happened upon a commit that didn't built. If I could not build the software, then I could not test it for the presence of the bug that I was hunting. And making the software build means changing it, which introduces the risk of getting a skewed result from the test - what if my changes removed or reintroduced the bug? Or some other bug that would mask the real bug that I was hunting? I don't recall exactly how I handled the situation but I certainly wasn't happy about it.

So consistency is important, and foul-ups in this regard is the reason (or at least one of the reasons) why "git commit --amend" exists.

Cohesion is about not mixing unrelated changes in the same commit. Think of the git-bisect use-case above; now that you have finally found the bad commit, it turns out that it implements one new feature, two refactorings and four files have their indentation corrected - good luck finding that bug.

But that's not even my primary concern with cohesion. My primary concern is actually code reviews: if you had to review the hypothetical commit mentioned above, then you'll have your work cut out, and it won't be much fun. Instead, if things were properly split up in 4 to 7 commits (depending on how you cut the indendation changes) then the review would go much easier: you'd know that you can gloss over the indentation changes pretty quick, verify the refactorings with good speed and save your best brain-cycles for the new feature.

I find the following list of "themes" to be pretty good natural boundaries for commits:

  • Styling, indentation, spelling and grammar.
  • A new feature.
  • A bug fix.
  • Renaming, moving and adding files, or moving existing code chunks into their own files.
  • Refactoring or clean-up of cohesive code chunks.


If you happen to mix these changes in the same file, then you can use the interactive adding feature of Git ("git add -i") to split up the chunks of your changes and put them in different commits. For instance, I often correct indentation, style, grammar and spelling on sight and often while doing something else with the file. Then I break up the changes afterwards with interactive adding.

So, these are the qualities I reach for when I prepare to commit. They are just guide lines, though. Common sense is important: if I am certain that I will get a better history by breaking some of these rules, then I will do so. And I sometimes do, though it is rare.

From CVS to Git with pserver.

, , ,

Because I for the third time now have not been able to remember how to do this...

First some useful variables:
  • $USER my username in CVS
  • $MODULE the CVS module I want to export
  • $HOST the domain name or IP of the CVS server
  • $CVSROOT the CVS root directory
  • $TARGET the name of the target directory/name of our new git repository

Alright, step one is to log into CVS or authenticate with it, or make it love you or whatever:
$ cvs -d :pserver:$USER@$HOST:$CVSROOT login

There... once the air clears of the hot love, we're good to go export our history:
git-cvsimport -d :pserver:$USER@$HOST:$CVSROOT -C $TARGET -m -a $MODULE

The -a makes it export all commits, ever, and the -m option causes it to try an recognize branch merges based on commit messages using some standard regular expressions. You can define your own regexes for that purpose with the -M option.

And that's it.