Jeff Palmer

Technology, and so on and such like.

Speeding Up Your Initial Git Clone

I’ve been working with an Open Source project called NetHunter. For those who are into the InfoSec side of things, you may have heard of Kali-Linux. NetHunter is a project to bring Kali to select android devices. The project is run by Offensive Security which is the same organization that develops/funds Kali Linux.

My goal was to setup Jenkins for continuous integration. While tweaking the setup/configuration, the jenkins installation was running in a virtual machine within my home lab. Unfortunately, my home internet is horrendously slow (I live in a sparsely populated area) and doing the initial git clone takes a fair amount of time. I have jenkins configured to start with a clean environment each time, which means it has to do a full git clone for every job it runs. Due to the internet/bandwidth issues, this quickly became fairly painful.

In exploring around to see if there was a way I could speed up this initial clone, I stumbled across git clone --reference in one of the git manpages. It took a few minutes of experimentation to get it to work, but work it did! I was now able to do the initial clone from a local git cache on the machines hard drive!

To setup the git cache:

mkdir /home/gitcache
cd /home/gitcache
git init --bare

git remote add offensive-security/kali-nethunter
git remote add offensive-secrity/gcc-arm-linux-gnueabihf-4.7
git remote add binkybear/kernel_samsung_manta
git remote add binkybear/kangaroo
git remote add binkybear/kernel_msm
git remote add binkybear/flo
git remote add binkybear/furnace_kernel_lge_hammerhead
git remote add binkybear/KTSGS5
git remote add binkybear/android_kernel_samsung_jf
git remote add binkybear/android_kernel_samsung_exynos5410

git fetch --all

The git fetch --all command should be used occasionally to update the cache with the latest upstream commits. I do it daily, via crontab. You’ll notice I have multiple git remotes in the cache. This allows the same cache directory to be used for multiple projects and repos at the same time. It’s not limited to just 1!

After the cache has been established, you can use git clone --reference /home/gitcache to do the initial clone while using the locally stored cache. After the clone, you are free to use other git commands like git pull or git push as you normally would.

The only drawback that I’ve found is that the newly cloned repo requires the cache to always be available. However, if you’d like the resulting repository to be standalone and independent of the cache after it is cloned, you want to cd into the new repo directory and run git repack -a -d and then rm .git/objects/info/alternates