summaryrefslogtreecommitdiff
path: root/sha1dc
AgeCommit message (Collapse)AuthorFilesLines
2017-05-22sha1dc: update from upstreamLibravatar Ævar Arnfjörð Bjarmason4-87/+121
Update sha1dc from the latest version by the upstream maintainer[1]. This version includes a commit of mine which allows for replacing the local modifications done to the upstream files in git.git with macro definitions to monkeypatch it in place. It also brings in a change[2] upstream made for the breakage 2.13.0 introduced on SPARC and other platforms that forbid unaligned access[3]. This means that the code customizations done since the initial import in commit 28dc98e343 ("sha1dc: add collision-detecting sha1 implementation", 2017-03-16) can be done purely via Makefile definitions and by including the content of our own sha1dc_git.[ch] in sha1dc/sha1.c via a macro. 1. https://github.com/cr-marcstevens/sha1collisiondetection/commit/cc465543b310e5f59a1d534381690052e8509b22 2. https://github.com/cr-marcstevens/sha1collisiondetection/commit/33a694a9ee1b79c24be45f9eab5ac0e1aeeaf271 3. "Git 2.13.0 segfaults on Solaris SPARC due to DC_SHA1=YesPlease being on by default" (https://public-inbox.org/git/CACBZZX6nmKK8af0-UpjCKWV4R+hV-uk2xWXVA5U+_UQ3VXU03g@mail.gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-26sha1dc: avoid CPP macro collisionsLibravatar Junio C Hamano1-5/+6
In an early part of sha1dc/sha1.c, the code checks the endianness of the target platform by inspecting common CPP macros defined on big-endian boxes, and sets BIGENDIAN macro to 1. If these common CPP macros are not defined, the code declares that the target platform is little endian and does nothing (most notably, it does not #undef its BIGENDIAN macro). The code that does so even has this comment Note that all MSFT platforms are little endian, so none of these will be defined under the MSC compiler. and later, the defined-ness of the BIGENDIAN macro is used to switch the implementation of sha1_load() macro. One thing the code did not anticipate is that somebody might define BIGENDIAN macro in some header it includes to 0 on a little-endian target platform. Because the auto-detection based on common macros do not touch BIGENDIAN macro when it detects a little-endian target, such a definition is still valid and then defined-ness test will say "Ah, BIGENDIAN is defined" and takes the wrong sha1_load(). As this auto-detection logic pretends as if it owns the BIGENDIAN macro by ignoring the setting that may come from the outside and by not explicitly unsetting when it decides that it is working for a little-endian target, solve this problem without breaking that assumption. Namely, we can rename BIGENDIAN this code uses to something much less generic, i.e. SHA1DC_BIGENDIAN. For extra protection, undef the macro on a little-endian target. It is possible to work it around by instead #undef BIGENDIAN in the auto-detection code, but a macro (or include) that happens later in the code can be implemented in terms of BIGENDIAN on Windows and it is possible that the implementation gets upset when it sees the CPP macro undef'ed (instead of set to 0). Renaming the private macro intended to be used only in this file to a less generic name relieves us from having to worry about that kind of breakage. Noticed-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-17Makefile: add DC_SHA1 knobLibravatar Jeff King2-0/+35
This knob lets you use the sha1dc implementation from: https://github.com/cr-marcstevens/sha1collisiondetection which can detect certain types of collision attacks (even when we only see half of the colliding pair). So it mitigates any attack which consists of getting the "good" half of a collision into a trusted repository, and then later replacing it with the "bad" half. The "good" half is rejected by the victim's version of Git (and even if they run an old version of Git, any sha1dc-enabled git will complain loudly if it ever has to interact with the object). The big downside is that it's slower than either the openssl or block-sha1 implementations. Here are some timings based off of linux.git: - compute sha1 over whole packfile sha1dc: 3.580s blk-sha1: 2.046s (-43%) openssl: 1.335s (-62%) - rev-list --all --objects sha1dc: 33.512s blk-sha1: 33.514s (+0.0%) openssl: 33.650s (+0.4%) - git log --no-merges -10000 -p sha1dc: 8.124s blk-sha1: 7.986s (-1.6%) openssl: 8.203s (+0.9%) - index-pack --verify sha1dc: 4m19s blk-sha1: 2m57s (-32%) openssl: 2m19s (-42%) So overall the sha1 computation with collision detection is about 1.75x slower than block-sha1, and 2.7x slower than sha1. But of course most operations do more than just sha1. Normal object access isn't really slowed at all (both the +/- changes there are well within the run-to-run noise); any changes are drowned out by the other work Git is doing. The most-affected operation is `index-pack --verify`, which is essentially just computing the sha1 on every object. This is similar to the `index-pack` invocation that the receiver of a push or fetch would perform. So clearly there's some extra CPU load here. There will also be some latency for the user, though keep in mind that such an operation will generally be network bound (this is about a 1.2GB packfile). Some of that extra CPU is "free" in the sense that we use it while the pack is streaming in anyway. But most of it comes during the delta-resolution phase, after the whole pack has been received. So we can imagine that for this (quite large) push, the user might have to wait an extra 100 seconds over openssl (which is what we use now). If we assume they can push to us at 20Mbit/s, that's 480s for a 1.2GB pack, which is only 20% slower. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16sha1dc: disable safe_hash featureLibravatar Jeff King1-1/+1
The safe_hash feature is designed to make sha1dc a drop-in replacement for sha1, where colliding entries will get a permuted hash to un-collide them. However, since we're handling the collision case ourselves, this isn't helpful (and is actually harmful, as it means you get the wrong object id if you want to show it in a log message). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16sha1dc: adjust header includes for gitLibravatar Jeff King4-13/+9
We can replace system includes with git-compat-util.h or cache.h (and should make sure it is included first in all C files). And we can drop includes from headers entirely, as every C file should include git-compat-util.h itself. We will add in new include guards around the header files, though (otherwise you get into trouble including both sha1dc/sha1.h and cache.h). And finally, we'll use the full "sha1dc/" path for including related files. This isn't strictly necessary, but makes the expected resolution more obvious. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16sha1dc: add collision-detecting sha1 implementationLibravatar Jeff King5-0/+2336
This is pulled straight from: https://github.com/cr-marcstevens/sha1collisiondetection with no modifications yet (though I've pulled in only the subset of files necessary for Git to use). This is commit 007905a93c973f55b2daed6585f9f6c23545bf66. Further updates can be done like: git checkout -b vendor-sha1dc $this_commit cp /path/to/sha1dc/{LICENSE.txt,lib/*} sha1dc/ git add -A sha1dc git commit -m "update sha1dc" git checkout -b update-sha1dc origin git merge vendor-sha1dc Thanks to both Marc and Dan for making the code fit our needs by doing both optimization work, cutting down on the object size, and doing some syntactic changes to work better with git. And to Linus for kicking off the "diet" work that removed some of the unused code. The license of the sha1dc code is the MIT license, which is obviously compatible with the GPLv2 of git. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>