git rebase, is a command I've heard only the true git master ninjas use. A command that baffled me for years was finally demystified! Let's first set the scene. I wanted to read some code (source code) from the site you are currently browsing, so like all programmers I went ahead and cloned the repo. It was SLOW, why? well, git was trying to clone ~150 MB of data. WHAT, why?
Note: I had already fixed the issues, force pushed and cleared my reflog. Hence I had to rebuild the said issues, so it may seem a bit off.
Large Objects: Why did I just clone ~150 MB?
So, typically I first did this :
git clone email@example.com:NavinShrinivas/homebrew-internethome.git
Note: The above repo is a fork of this website with this issue replicated.
And this is where we first notice something off :
One's first guess would be large file size, so let's go in search of it :
du | sort -n
What we instead see here is that the .git folder (something that we don't manually track) is the largest! Well, now I gotta see why is the image in the .git being so large. After googling a bit and I found this nifty script that tells which item in the repository is consuming the maximum space in the images.
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
Note: the above command acts as 1, it doesn't hold much importance in this blog.
Well well well, it's not a single file making this happen. It's a bunch of font files that are bloating the image. Phew, easy fix, let's just delete those files no? Well, if one does a
ls on the repo, there is no folder 16/ or those files!!! That's when it occurs to me, those folders are no more part of the tracked files but git is having them tracked in previous commits (One for the addition of those files and another commit for deletion of them).
Solution: Rebase or git filter-branch
One of the other solutions that I came across was a git filter branch that allows you to execute a command on every commit (between 2 hashes). So one could have simply removed the existence of that folder from every commit between those two.
But I went the rebase way. Now would be a good time to gain intuition on rebase.
Surprisingly, rebase has all it has to right in its name. It lets you change the base of a given branch. That is, it lets you modify the base (previous commit) keeping the head the same!
I used an "interactive" mode for git rebase, this can be done by :
git rebase -i hash_of_commit_you_want_to_rebase_from
In our case, I decided to rebase from here(it's the commit right behind the addition commit)
git rebase -i 886448910588f6332c665abd8af44cbca4fd7e2d
This opens up a nice document in your editor : Let's save and quit this document after changing those 2 commits to edit. Git being the very elaborate tool it is stops at all those commits we've asked to edit. Now, we can do whatever change we want, and do
git add .. Then let's simply follow what git tells us by doing amend and continue. In our case :
$ rm -rf 16
$ git add .
$ git commit --amend #save and quit
$ git rebase --continue
For our second commit fix, it's like we never had the folder called 16. So we don't have to do anything. Just amend and continue! So in theory we have entered our commit history and removed the existence of those files EVER. All we gotta do is push these changes to the remote. Oh, uhh. The histories are different, which implies we need to force push. This also means others have to force pull :(
It was at this point I figured that this solution does fix my problem but is not the perfect way. Little googling tells me that there are better ways to rebase. Anyways, let's force push and clone again to see if our fix worked.
git clone firstname.lastname@example.org:homebrew-ec-foss/homebrew-internethome.git
Note: the repo I'm cloning here is the one that is fixed!
WOOT! 5 MB was our clone size and it was much faster! This goes to show, git is a powerful tool, we simply don't know what it can do. I conclude this blog with a quote by one of my friends "The best way to learn git is to need it."