Article based on the work of Angelo D’Autilia posted on Salug Journal in Italian. An useful tool for the management of a shared server is etckeeper. This software is a collection of bash scripts that allow you to control through a distributed revision system our directory /etc/ where there are the configuration files of most […]
This is a small update (1 year later) of a great article by Ilmari Kontulainen, first posted on blog.deveo.com.
I’ll post the original article in blockquote and my notes in green.
Storing large binary files in Git repositories seems to be a bottleneck for many Git users. Because of the decentralized nature of Git, which means every developer has the full change history on his or her computer, changes in large binary files cause Git repositories to grow by the size of the file in question every time the file is changed and the change is committed. The growth directly affects the amount of data end users need to retrieve when they need to clone the repository. Storing a snapshot of a virtual machine image, changing its state and storing the new state to a Git repository would grow the repository size approximately with the size of the respective snapshots. If this is day-to-day operation in your team, it might be that you are already feeling the pain from overly swollen Git repositories.
Luckily there are multiple 3rd party implementations that will try to solve the problem, many of them using similar paradigm as a solution. In this blog post I will go through seven alternative approaches for handling large binary files in Git repositories with respective their pros and cons. I will conclude the post with some personal thoughts on choosing appropriate solution.