top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Why GIT calculates checksum (SHA-1) of a file?

+1 vote
798 views

I would like to know why GIT calculates checksum of a file.Typically, checksum is used for the purpose of integrity. An example would really help.

posted Aug 7, 2016 by anonymous

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

2 Answers

0 votes

YOU HAVE IT IN ONE.
YES THAT IS THE REASON THAT GIT COMPUTES THE SHA1 OF THE FILE'S CONTENTS - IT PROVIDES INTEGRITY, VERACITY AND NON-REPUDIATION (THE LAST ONE IS STILL TRUE THOUGH CRYO-ANALYSIS IS GETTING BETTER, SO SHA1 IS NO LONGER RECOMMENDED, AND GIT IS LOOKING AT HOW TO PROGRESS TO NEWER CRYPTO-HASHES). ONCE GIT HAS THE SHA1'S OF THE FILES IN A DIRECTORY, IT DOES THE SAME AGAIN FOR THE 'FILE' THAT LISTS THE FILE NAMES, MODE BITS AND THEIR CONTENT'S SHA1S, AND EVER ONWARDS UP THE TREES TO THE COMMIT, WHICH LISTS THE SHA1S OF ITS PARENTS. SO IT YOU HAVE THE SHA1 OF THE TIP OF A BRANCH, SUCH AS MASTER, AND YOU HAVE A REPO THAT HOLDS THAT SHA1, THEN YOU HAVE THE FULL CRYPTO INTEGRITY THAT YOUR COPY (WITH ALL ITS HISTORY) IS IDENTICAL TO THAT OF THE ORIGINATORS - YOUR OWN DALI, REMBRANT, GOGIN, HANGING IN YOUR HALL... AND IT ISN'T EVEN A REPLICA, IT'S THE REAL THING!

answer Aug 7, 2016 by Jai Prakash
0 votes

An example? Ok. Back when something else was using a simple CRC, someone tried to replace a file with another, bypassing the normal history system. The CRC was good enough to detect it; so, something was needed that was good enough to detect/stop this.

But more importantly: The hash is the filename of the file. It is critical that the hash be good enough that you won't get duplicate filenames. CRC doesn't do that. Sha-1 does.

The checksum has to be good enough to make a unique filename in normal use.
It does not have to be good enough to guarantee non-alteration, but that's a really good secondary; it does have to be good enough to detect accidental damage (such as memory/disk/network/driver/etc corruption).

Now, a secondary benefit of the whole "layer upon layer" approach: The hash of the last commit is only valid if every file and commit to date is accurate. If you know the hash of your last commit (20 bytes, I think), and you can validate all the hashes in the past, then you know that nothing has altered any file outside of the git mechanism.

answer Aug 7, 2016 by Deepak Dasgupta
Similar Questions
+1 vote

I am new in git and I am trying to understand it.

I have this case:

a. I develop a html file in several days with daily commit.
b. Some weeks after I noticed that I lost part of the code.
c. I located a code 3 commits ago.

then how I can fetch from the remote repository the html file as was 3 commit before (the whole file)?

0 votes

I am trying to convert my SVN repo to GIT using GIT-SVN and after a few commits being process am running in to following error

: git svn fetch...r1878 = 79e09734fdb4916276da8273f25ecfbff37954a6 (refs/remotes/svn/nt/notecards)Checksum mismatch: dashboard-merged/_images/bg-header-public.jpg 2969d1b5f818325a7516ea392be9564e1da2a3e7expected: 42865291d24451e2dc7be44a60f3f692 got: 679bc6fd19e3879fed0c17e1b6735161

Here is what I did to verify the issue of the file

$ svn export -r1878 bg-header-public.jpg A bg-header-public.jpgExport complete.
$ openssl md5 bg-header-public.jpg MD5(bg-header-public.jpg)= 42865291d24451e2dc7be44a60f3f692
$ svn export -r1879 bg-header-public.jpg A bg-header-public.jpgExport complete.
$ openssl md5 bg-header-public.jpg MD5(bg-header-public.jpg)= 679bc6fd19e3879fed0c17e1b6735161
:_images ashah$ 

It is evident that the file has changed from r1878 to r1879, but while migrating the changes for 1878 why the changes of 1879 are showing up ? Or am i missing something here ?

+1 vote

I'm getting this warning:

warning: Could not find section in .gitmodules where path=XXX

whenever I use "git mv" to move a file in a repository containing a submodule. The file is outside the submodule and is completely unrelated, so I do not understand the intent of the warning.

My understanding (without looking at the code in detail) is that Git tries to be clever about submodule renames, hence checks whether the source file is a submodule. But then if the lookup fails, it should just silently move on to "normal file move" mode I guess...

+1 vote

I've been trying to put my filesystem for a very small busybox-based distro into a git-repository. And with success. The only strange thing I can not get my head around is the following :

When making a compressed tarball from the files from the repository (after clone/checkout) I get a very much larger tar.gz-file. Size goes up from 16M to 21M (!?)

0 votes

I want to retrieve the commit history of a given file.What command should I issue? I expect the command like below.

D:GitTest> git show --commit-history test.txt
8194aaa
c419234
...
...