Managing large Git repositories
Because of its distributed nature, Git includes the full change history in each copy of the repository. Every clone gets not only all the files, but every revision of every file ever committed. This allows for efficient development (local operations not involving a network are usually fast enough so that they are not a bottleneck) and efficient collaboration with others (distributed nature allows for many collaborative workflows).
But what happens when the repository you want to work on is really huge? Can we avoid taking a large amount of disk space for version control storage? Is it possible to reduce the amount of data that end users need to retrieve while cloning the repository?
If you think about it, there are broadly two main reasons for repositories to grow massive: they can accumulate a very long history (the every revision direction), or they can include huge binary assets that need to be managed together with code (the every file direction), or both...