Monday, April 30, 2012

Git Rebase Across Many Commits

Not all git merge conflicts are real.


The Scenario

In both my personal and my work projects I prefer to use git rebase to keep my commit histories simple and readable. To make this work in a team setting, we never work on the master branch, instead always working on a feature branch in our local repositories. Our process flow looks something like this:
$ git branch feature           #create the working branch
$ git checkout feature          #do all development work on that branch
#Edit files, etc.
$ git commit -m "Implement Feature"
#Repeat the above as desired during development.
#When ready to merge to master, do the following:
$ git checkout master
$ git pull                      #update master from shared repository
$ git checkout feature
$ git rebase master             #optionally with -i if squashing is desired
$ git checkout master
$ git merge feature
$ git push origin master
$ git branch -d feature
Because we never use our local master branch for development, the git pull on master is always a fast-forward merge. Likewise, because we have just rebased the feature branch against the master right before we merge that feature branch back into master, that merge is also always a fast-forward merge. Looking at it another way, we don't have any merge conflicts when updating or merging master because we resolve all of the merge conflicts when we rebase the feature branch against the latest master.

The Problem

At work, we have a large codebase and a handful of active developers who typically merge feature branches to the master using the above workflow multiple times each day. Sometimes somebody has a feature branch that takes a long time to finish, so that between the time that branch was started and the time it is ready to go into master, there may have been 40 or 50 other commits made to master. In general in this situation we will occasionally rebase our local feature branch against the latest master a few times during feature development, but inevitably there are occasions when a large rebase across many commits ends up being done.

Even if there are many commits on the master branch, if none of those commits touched any of the same code as the commits on the feature branch, then there should be no merge conflicts when rebasing the feature branch against the updated main branch. However, in my experience this has not always been the case. Sometimes git rebase reports merge conflicts when I think there should not be any. Since I don't generally know exactly what code the other team members have edited, I can't immediately tell if the merge conflicts make sense.

The normal advice for how to handle merge conflicts is to edit the named file, look for the conflict markers, inspect the conflicting code fragments, determine what to keep, edit out what is not being kept along with the conflict markers, git add the repaired file, and git rebase --continue to let it tell you about the next merge conflict.

That's a lot of work, and it might all be completely unnecessary.

The Solution

It seems that git sometimes just gets confused when doing a rebase across a large number of commits. Sometimes if you rebase in smaller steps, git will happily rebase each smaller step with no merge conflicts, until you have stepped all the way up to the latest master, at which point your rebase is done.

You could rebase against every single commit and work your way up to master, but that, too, is a lot of work. Here's what I do when the initial rebase of the feature branch against the latest master tells me there are merge conflicts.

When the initial git rebase reports a merge conflict, I immediately do git rebase --abort to undo that rebase attempt. Using gitk --all to view the commit tree, which lets me see the master branch and the commit at which my feature branch branches off the master branch, I select a commit on the master branch about half way between those two commits. I copy the commit ID and paste it into a rebase command that looks something like this:
$ git rebase 8bc85584989e4435c2d98b13447bcab37648ba7f
If this rebase reports no merge conflicts, then I try rebasing against master and repeat the process.

If there are merge conflicts, then I abort the rebase and pick another commit half way again to the branch point. I repeat this until either the rebase succeeds or I am trying to rebase across a single commit. At that point, if there are still merge conflicts, they are real and I address them in the normal way. Since the conflict is only across a single commit, it is easier to see the cause of the conflict and to resolve it.

After resolving the conflict across that one commit, I go back to the first step and try rebasing against master again, repeating the process.

I have followed this process a number of times. I think that a majority of these times I binary-divide my commits a few times and end up piecemeal stepping through the commits until I have rebased against master without ever having to resolve any conflicts. The other times I typically have to resolve one or two small conflicts, after which I can rebase against master.

The next time you do a rebase across more than one commit and git tells you there are merge conflicts, try this approach. You might save yourself a lot of work.


basetta said...

Nice. I'll try it out. Did u try to activate also the rerere feature of git ?

It helped me a lot in the scenario described by you.

Jim McBeath said...

basetta: Because my typical process flow uses rebase rather than merge, I don't find myself in the situation where I have done a merge and then discarded it, so I have not had occasion to use rerere.