GitHub, my public code hosting platform of choice (I have my own Gitea server, too), has GitHub Pages, a free static file hosting platform. I use GitHub Pages for a few personal projects, where I either don't need or want to host the code myself, or I want to explicitly…
I recently had a far from fun morning. I had my website cloned on my desktop, with some commits which weren’t upstreamed. Having done some work on my laptop, upstream was ahead of my desktop, which needed rectifying.
git pull -r is a great feature of git, which does a rebase rather than a merge commit when pulling a diverged remote. Normally this would rebase my commits, leaving me with the 2 local commits I’d made to push locally. For reasons beyond my understanding, instead I was left with 0 commits to push - the 2 commits I had locally vanished.
#Git object storage
Git does some pretty magical thing behind the scenes. The important one here being that Git stores files in “objects”, referenced by a hash of their contents. If you move a file, it doesn’t duplicate the object, as the content hasn’t changed. When you create a commit, it references these objects. Commits themselves are also objects, which are referenced by branches. If you’re interested in more, check out Plumbers guide to Git.
The most important thing about objects is that it’s very rare they actually get deleted. If have a committed file you delete, the object is still there. Most importantly to my case, if you do some branch-fu and remove a commit, the commit may still exist.
#Finding commits / files
As mentioned before, commits and files are both “objects”. To find my missing commit, all I need to do is look through the object files for a string which I know to be in the commit message or file body, right?
Wrong! Sort of. Git objects are stored compressed, which means simply using
rg if you’re cool) to search the files doesn’t work.
git does have a command to search through files (
git grep) but for this use case it wasn’t appropriate, as that only searches the current state of the checked out repository. Instead, we need to use some
git tooling to get at the data cleanly.
The first step is to list all objects
git knows about, including those not referenced by branches. StackOverflow to the rescue on this one. This script will list out SHAs of all objects, which can then be pass into
git show to get the real content, rather than the compressed version. Piping that into a text file, I’ve now got an entire dump of everything
git knows about my repository: commits, files, the lot.
bash ~/object-list.sh | xargs -n1 git show > ~/out.txt
It ended up being a lot more than I wanted (the file was around 79MB), but hey I’ll take having too much context over not enough!
#Searching large files
For searching large files, I recommend using
glogg. It’s pretty barebones, but it deals with huge files incredibly well (not that 79MB is very large).
Searching through the output file, I eventually found the commits I needed. Because the file contained the output of
git show, it gave me 2 options. Either I could copy the content / diff out and store it for later use, or, because
git show shows commit information, I could
git cherry-pick the commit SHA onto my branch, and push it. I went for the former, because it was simpler, easier, and I decided I didn’t want to push those changes quite yet.
Whilst rather stressful in the moment, experiences like these aren’t without their lessons:
- Don’t delete your local copies of posts until they’re actually live, rather than just committed
- Be careful when relying on
gitmagic and rebasing
gitis pretty damn good at making sure you don’t lose any data
Share this page
View all →
As a developer I do basically everything in git and for fun I run my own git server on my home server. I’ve swapped around quite a lot between GitLab and Gitea, but finally settled on GitLab. It’s a bit heavy, but the deep CI integration is really nice. Since…
If you need to make HTTP requests in Python, you're probably using the fantastically simple and ergonomic requests library (or httpx in the async world). However, it's another dependency to manage and keep up-to-date. If you want to make HTTP requests without an additional dependency, or another library author has…