I use Git a lot, in my daily job as well as for this blog. When using it, I often rebase locally before pushing, to have a clean and readable history.
A sample workflow
For my blog, the branching model looks like the following:
o---o---o---o master \ \---o---o feature/newposts
As expected, this branch is the production site.
The branch is dedicated for new posts. There’s one post per commit.
| Also, to speed up rendering, there are only a handful of the latest posts. The first commit after |
To publish a new post, I cherry pick from
feature/newposts to the
master branch. Also, when I make changes to
master, I do rebase
master, to have the latest updates.
The impact of rebasing
Things start to get interesting when I rebase interactively on
- Initial state
A---B---C---D master \ \---a---b feature/newposts
- Rebase interactively on master
A---B---C---D \ \ \ \---a---b feature/newposts \ \---D' master
- Rebase onto master
A---B---C---D \ \--C'---D' master \ \---a---b feature/newposts
D? Notice they are not referenced by any branch, and they are not displayed with
git log. Still, they can be displayed via
|Likewise, those commits are not displayed in GUI such as SourceTree.|
Dangling and unreachable commits
Time for some definitions:
- unreachable object
An object which is not reachable from a branch, tag, or any other reference.
- dangling object
An unreachable object which is not reachable even from other unreachable objects; a dangling object has no references to it from any reference or object in the repository.
Using those definitions, commits
D in the above diagrams are considered unreachable because no reference points to either of them. Moreover, commit
D is also dangling, because no other object reference it, while commit
C is not because
D points to it.
To list those dangling and unreachable objects, one can use the
git fsck command:
git-fsck - Verifies the connectivity and validity of the objects in the database
For example, to display unreachable commits:
git fsck --unreachable
If an expected commit is not displayed, then perhaps it’s because it’s referenced by a reflog. In that case, there’s an option to ignore reflog references.
git fsck --unreachable --no-reflog
| The same command can be used to list dangling commits only. Replace |
Git is quite efficient at storing text. And yet, there’s no point to store neither reflogs nor unreachable commits past a certain point.
There’s a garbage collector in Git. It might run automatically along some commands. You know the GC has been run when there’s an output like the following:
Counting objects: 9451, done. Delta compression using up to 8 threads. Compressing objects: 100% (4657/4657), done. Writing objects: 100% (9451/9451), done. Total 9451 (delta 3843), reused 8900 (delta 3584)
It’s also possible to run it explicitly:
Calling the GC will remove unreachable objects.
|The GC not only removes unreachable objects but also compresses file revisions|
However, remember that most unused objects are still referenced by reflogs. Thus, they are not considered unreachable, and therefore neither are they garbage collected. The question now is how to expire reflogs to make objects unreachable?
To expire reflogs, run:
git reflog expire
Reflogs are separated between standard and unreachable:
Expired after (by default, days)
| || |
For example, to expire reflogs older than two weeks instead of the default 90 days value, use:
git reflog expire --expire=2.weeks.ago
After reflogs have been expired, then relevant commits truly become unreachable, and can finally be removed by the garbage collector.
This post has looked into how commits references each other in Git, and how they can be cleaned up. In most cases however, the default regular automated cleanup should be enough. Remember that by removing reflogs and commits, you make it harder on yourself to recover from your mistakes.