Most developers understand that it’s important to use branches in Git. In fact, I’ve written an entire article on branching strategies in Git, explaining Git’s powerful branching model, the different types of branches, and two of the most common branching workflows. To sum it up: having separate containers, i.e. branches, for your work is incredibly helpful and one of the main reasons for using a version control system.
In this article we’re going to look at integrating branches. How can you get new code back into an existing line of development? There are different ways to achieve this. The fifth episode of our “Advanced Git” series discusses integrating changes in Git, namely merging and rebasing.
Before we go into detail, it’s important to understand that both commands — git merge and git rebase — solve the same problem. They integrate changes from one Git branch into another branch; they just do it differently. Let’s start with merges and how they actually work.
Advanced Git series:
To merge one branch into another, you can use the git merge command. Let’s say you have some new commits on one of your branches, branch-B, and you now want to merge this branch into another one, branch-A. To do so, you can type something like this:
$ git checkout branch-A
$ git merge branch-B
As a result, Git creates a new merge commit in your current working branch (
branch-A in this example), connecting the histories of both branches. To pull this off, Git looks for three commits:
- The first one is the “common ancestor commit.” If you follow the history of two branches in a project, they always have at least one commit in common. At this point, both branches have the same content. After that, they evolved differently.
- The two other interesting commits are the endpoints of each branch, i.e. their current states. Remember that the goal of an integration is to combine the current states of two branches. So their latest revisions are important, of course.
Combining these three commits performs the integration that we’re aiming for.
Admittedly, this is a simplified scenario — one of the two branches (
branch-A) hasn’t seen any new commits since it was created, which is very unlikely in most software projects. Its last commit in this example is, therefore, also the common ancestor.
In this case, the integration is dead simple: Git can just add all the new commits from
branch-B on top of the common ancestor commit. In Git, this simplest form of integration is called a “fast-forward” merge. Both branches then share the exact same history (and no additional “merge commit” is necessary).
In most cases, however, both branches would move forward with different commits. So let’s take a more realistic example:
To make an integration, Git has to create a new commit that contains all the changes and take care of the differences between the branches — this is what we call a merge commit.
Human commits and merge commits
Normally, a commit is carefully created by a human being. It’s a meaningful unit that only includes related changes, plus a meaningful commit message which provides context and notes.
Now, a merge commit is a bit different: it’s not created by a developer, but automatically by Git. Also, a merge commit doesn’t necessarily contain a “semantic collection of related changes.” Instead, its purpose is simply to connect two (or more) branches and tie the knot.
If you want to understand such an automatic merge operation, you have to take a look at the history of all branches and their respective commit histories.
Integrating with rebases
Before we talk about rebasing, let me make one thing clear: a rebase is not better or worse than a merge, it’s just different. You can live a happy (Git) life just by merging branches and never even think about rebasing. It does help to understand what a rebase does, though, and learn about the pros and cons that come with it. Maybe you’ll reach a point in a project when a rebase could be helpful…
Alright, let’s go! Remember that we just talked about automatic merge commits? Some people are not too keen on these and prefer to go without them. Besides, there are developers who like their project history to look like a straight line — without any indication that it had been split into multiple branches at some point, even after the branches have been integrated. This is basically what happens during a Git rebase.
Rebasing: Step by step
Let’s walk through a rebase operation step-by-step. The scenario is the same as in the previous examples, and this is what the starting point looks like:
We want to integrate the changes from
branch-A — but by rebasing, not merging. The actual Git command for this is very simple:
$ git checkout branch-A
$ git rebase branch-B
Similar to a
git merge command, you tell Git which branch you want to integrate. Let’s take a look behind the scenes…
In this first step, Git will “remove” all commits on branch-A that happened after the common ancestor commit. Don’t worry, it will not throw them away: you can think of those commits as being “parked” or temporarily saved in a safe place.
In the second step, Git applies the new commits from
branch-B. At this point, temporarily, both branches actually look exactly the same.
Finally, those “parked” commits (the new commits from
branch-A) are included. Since they are positioned on top of the integrated commits from
branch-B, they are rebased.
As a result, the project history looks like development happened in a straight line. There is no merge commit that contains all combined changes, and the original commit structure is preserved.
Potential pitfalls of rebasing
One more thing — and this is important to understand about Git rebase — is that it rewrites the commit history. Take another look at our last diagram. Commit C3* has an asterisk. While C3* has the same content as C3, it’s effectively a different commit. Why? Because it has a new parent commit after the rebase. Before the rebase, C1 was the parent. After the rebase, the parent is C4 — which it was rebased into.
A commit has only a handful of important properties, like the author, date, changeset, and its parent commit. Changing any of this information creates a completely new commit, with a new SHA-1 hash ID.
Rewriting history like that is not a problem for commits that haven’t published yet. But you might be in trouble if you’re rewriting commits you’ve already pushed to a remote repository. Maybe someone else has based their work on the original C3 commit, and now it suddenly no longer exists…
To keep you out of trouble, here’s a simple rule for using rebase: Never use rebases on public branches, i.e. on commits which have already been pushed to a remote repository! Instead, use
git rebase to clean up your local commit history before integrating it into a shared team branch.
Integration is everything!
At the end of the day, merging and rebasing are both useful Git strategies, depending on what you want to achieve. Merging is kind of non-destructive since a merge doesn’t change existing history. Rebasing, on the other hand, can help clean up your project history by avoiding unnecessary merge commits. Just remember not to do it in a public branch to avoid messing with your fellow developers.
If you want to dive deeper into advanced Git tools, feel free to check out my (free!) “Advanced Git Kit”: it’s a collection of short videos about topics like branching strategies, Interactive Rebase, Reflog, Submodules and much more.
Happy merging and rebasing — and see you soon for the next part in our “Advanced Git” series!