top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

How to use git to store large files without keeping track of versions?

+1 vote
459 views

I have some data files that need to be stored along with source code. These data files are large, but I don't need to keep their versions. I only need to keep the versions of the source code.

git-annex is mainly for large files with version. Therefore, it is not suitable for my situation.

Does anybody know whether there is a way to use git to manage source code (with version) as well data files (without version)?

posted Feb 23, 2015 by Tarun Singhal

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

0 votes

Have a look at git bup which is a git extension to smartly store large binaries in a git repo.

You'd want to have it as a submodule but you won't have to worry about the repo getting hard to handle. One of their sample use cases is storing VM images in git.

Technique 1: sparse checkout
A mild help to the binary assets problem is sparse checkout (available since Git 1.7.0]). This technique allows to keep the working directory clean by explicitly detailing which folders you want to populate. Unfortunately it does not affect the size of the overall local repository but can be helpful if you have a huge tree of folders.

  1. Clone the full repository once: git clone
  2. Activate the feature: git config core.sparsecheckout true
  3. Add folders that are needed explicitly, ignoring assets folders:

    echo src> .git/info/sparse-checkout
    Read the tree as specified: git read-tree -m -u HEAD

After the above you can go back to use your normal git commands, but your work directory will only contain the folders you specified above.

Technique 2: Use of submodules
Another way to handle huge binary asset folders is to split those into a separate repository and pull the assets in your main project using submodules. This gives you a way a way to control when you update the assets. See more on submodules in these posts: core concept and tips and
alternatives.

If you go the way of the submodules way you might want to checkout the complexities of handling project dependencies, since some of the possible approaches to the huge binaries problem might be helped by the approaches I mention there.

answer Feb 27, 2015 by Amit Kumar Pandey
Similar Questions
+2 votes

I just want to know why GIT doesnt track read/write permission? What I want is just GIT keep what every I checked in? ( I am OK with the executable permission control)

+2 votes

I wanted to know if Git supports large binary files of Gigabyte/ terabytes size. What is the cost of using GIT by approximately 10 (this would certainly increase to large number later)users?

+1 vote

I just had the bare vs non-bare repo concept smack me in the face. Painful way to learn things, but I won't forget it any time soon. Since my remote repos are no longer work trees, how can I keep two bare repos in sync? This is primarily for DR purposes.

Here's more detail in case it'll help:
I have two rhel6 systems running git 1.7.1 that will be maintaining OS and web configuration files for a variety of teams, once I get the bugs in my understanding ironed out. One git server is in datacenter A (prod) where most of the updates will be occurring. Appropriate people will clone the bare repo, make their updates and push it back. The other git server is at our warm DR site. While rare, updates to this server should be possible.

I need to be able to fetch changes from the production git server and apply them to the DR one. When I tried it straight, I got the expected "fatal: This operation must be run in a work tree"

I suppose I could hack out a script to pull the configs down to a temp repo and push them back up to the DR one (and vice versa), but that seems like a kludge. As flexible and seemingly well thought out as git appears to be, I have to believe there's a better approach.

Could someone clue me in on what I'm missing or how a generic DR process is typically set up?

+1 vote

I wanted to avoid push if any of the files is deleted from the local git clone area. Can anyone please help me with that?

I am using Stash for repository management.

+1 vote

When cloning a large repo stalls, hitting Ctrl+c cleans what been downloaded, and process needs re-start.

Is there a way to recover or continue from already downloaded files during cloning? Please point me to an archive url if solution exists. (though I continue to search through them as I email this)

Can there be something like: git clone --use-method=rsync

...