top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Multi-threaded 'git clone'

+2 votes
1,107 views

Cloning huge repositories like Linux kernel takes considerable amount of time. Is it possible to incorporate a multi-threaded simultaneous connections functionality for cloning? To what extent do we need to change the architecture of the current code and how large would be the scope of the work? That just seems an interesting idea to me and would liked to share it with the community.

posted Feb 16, 2015 by anonymous

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+1 vote

They key question is what is it that takes the time in clonding and can that be multi-threaded.

If it's the netwrok traffic that takes the most time, where is the bottleneck?

Is it in the server software assembling what will be sent? Is it in the receiving software processing it? If so, multiple threads could help.

Is it in network bandwidth? If so doing multiple connections won't help much.
TCP connections favour a few connections passing a lot of data rather than many connections passing a little. The one place where multiple connections can help is when you have non-congestion induced packet loss as a lost packet on a connection will cause the throughput of that connection to drop (if the drop is due to congestion, this is TCP working as designed, throttling back to match the available bandwidth). This can be a significant effect if you have a very high bandwidth, high latency connection (think multiple Gb on international
connections), but for lower bandwidth connections it's much less of a factor. You can look at projects like bbcp

I think it's an interesting question to look at, but before you start looking at changing the architecture of the current code, I would suggest doing a bit more analisys of the problem to see if the bottleneck is really where you think it is.

answer Feb 16, 2015 by anonymous
Similar Questions
0 votes

We recently upgraded from Git 2.8 to 2.9 and saw an issue when there are multiple keys added to my ssh-agent.

I have two keys.
- KeyA (my company that has access to the repository I want to clone)
- KeyB (just my personal key with access to my personal stuff)

Having both keys in loaded and listed in ssh-add -L fails to clone the repository. I tried to change the order of the key in the agent but neither KeyA, KeyB nor KeyB, KeyA will work. The only case that works if I have KeyA loaded an no other key is added to the ssh-agent.

Having multiple Keys loaded works with Git 2.8 and Git 2.7 (I didn't try older versions)
Cloning fails with 'Unauthorized Access' of our Git provider. (It's Bitbucket in this case)

I read the Changelog for 2.9 and couldn't find any reference to changed key handling. Is there anything that I can add to the git clone command to get the old behavior?

+1 vote

I can get the latest revision number by command "git describe --tags", but how can I display a list of revisions or a particular revision based on the date of my commit id?

+1 vote

I wanted to avoid push if any of the files is deleted from the local git clone area. Can anyone please help me with that?

I am using Stash for repository management.

+3 votes

When we clone a remote GIT repository, all folders/files will be cloned. This will consume lot of disk space in our local machine.
Is there a way to clone only few folders & exclude others?

This is possible in clearcase snapshot view by changing load rules.

+2 votes

I normally use git on linux, though I have an installation on a Windows 7 laptop.When using it yesterday to clone a repository on my linux machine, the clonewould open with one of the files modified. I could not undo the modification. Repeated cloning would do the same thing. I then tried cloning from the originalrepo, but using a linux partition on that laptop. It worked fine. Any idea why theWindows git doesn't work?

...