tgif.git - Terin's Improved Git Fork

Age	Commit message (Collapse)	Author	Files	Lines
2022-03-10	block-sha1: remove use of obsolete x86 assembly	brian m. carlson	1	-17/+0
	In the block SHA-1 code, we have special assembly code for i386 and amd64 to perform rotations with assembly. This is supposed to help pick the correct rotation operation depending on which rotation is smaller, which can help some systems perform slightly better, since any circular rotation can be specified as either a rotate left or a rotate right. However, this isn't needed, so we should remove it. First, SHA-1, like SHA-2, uses fixed constant rotates. Thus, all rotation amounts are known at compile time and are in fact baked into the code. Fortunately, peephole optimizers recognize rotations specified in the normal way and automatically emit the correct code, including a preference for choosing a rotate left versus a rotate right. This has been the case for well over a decade, and is a standard example of the utility of a peephole optimizer. Moreover, all modern CPUs, with the exception of extremely limited embedded CPUs such as some Cortex-M processors, provide a barrel shifter, which lets the CPU perform rotates of any bit amount in constant time. This is valuable for many cryptographic algorithms to improve performance, and is required to prevent timing attacks in algorithms which use data-dependent rotations (which don't include the hash algorithms we use). As a result, even though the compiler does the correct optimization, it isn't even needed here and either a left or a right rotate is equally acceptable. In fact, the SHA-256 code already takes this into account and just writes the simple code using an inline function to let the compiler optimize it for us. The downside of using this code, however, is that it uses a GCC extension, which makes the compiler complain when using -pedantic unless it's prefixed with __extension__. We could fix that, but since it's not needed, let's just remove it. We haven't noticed this because almost everyone uses the SHA1DC code instead, but it still shows up for some people. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-17	block-sha1: drop trailing semicolon from macro definition	René Scharfe	1	-1/+1
	23119ffb4e (block-sha1: put expanded macro parameters in parentheses, 2012-07-22) added a trailing semicolon to the definition of SHA_MIX without explanation. It doesn't matter with the current code, but make sure to avoid potential surprises by removing it again. This allows the macro to be used almost like a function: Users can combine it with operators of their choice, but still must not pass an expression with side-effects as a parameter, as it would be evaluated multiple times. Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16	block-sha1: take a size_t length parameter	Jeff King	2	-2/+2
	The block-sha1 implementation takes an "unsigned long" for the length of a buffer to hash, but our hash algorithm wrappers take a size_t, as do other implementations we support like openssl or sha1dc. On many systems, including Linux, these two are equivalent, but they are not on Windows (where only a "long long" is 64 bits). As a result, passing large chunks to a single the_hash_algo->update_fn() would produce wrong answers there. Note that we don't need to update any other sizes outside of the function interface. We store the cumulative size in a "long long" (which we must do since we hash things bigger than 4GB, like packfiles, even on 32-bit platforms). And internally, we break that size_t len down into 64-byte blocks to feed into the guts of the algorithm. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-11-05	sha1: provide another level of indirection for the SHA-1 functions	Atousa Pahlevan Duprat	1	-4/+4
	The git source uses git_SHA1_Update() and friends to call into the code that computes the hashes. Traditionally, we used to map these directly to underlying implementation of the SHA-1 hash (e.g. SHA1_Update() from OpenSSL or blk_SHA1_Update() from block-sha1/). This arrangement however makes it hard to tweak behaviour of the underlying implementation without fully replacing. If we want to introduce a tweaked_SHA1_Update() wrapper to implement the "Update" in a slightly different way, for example, the implementation of the wrapper still would want to call into the underlying implementation, but tweaked_SHA1_Update() cannot call git_SHA1_Update() to get to the underlying implementation (often but not always SHA1_Update()). Add another level of indirection that maps platform_SHA1_Update() and friends to their underlying implementations, and by default make git_SHA1_Update() and friends map to platform_SHA1_* functions. Doing it this way will later allow us to map git_SHA1_Update() to tweaked_SHA1_Update(), and the latter can use platform_SHA1_Update() in its implementation. Signed-off-by: Atousa Pahlevan Duprat <apahlevan@ieee.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27	Merge branch 'jk/pack-bitmap'	Junio C Hamano	1	-32/+0
	Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...
2014-01-23	block-sha1: factor out get_be and put_be wrappers	Jeff King	1	-32/+0
	The BLK_SHA1 code has optimized wrappers for doing endian conversions on memory that may not be aligned. Let's pull them out so that we can use them elsewhere, especially the time-tested list of platforms that prefer each strategy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-16	block-sha1/sha1.c: have SP around arithmetic operators	Junio C Hamano	1	-2/+2
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-23	Merge branch 'jn/block-sha1'	Junio C Hamano	1	-4/+4
	The code to load a word one-byte-at-a-time was optimized into a word-wide load instruction even when the pointer was not aligned, which caused issues on architectures that do not like unaligned access. * jn/block-sha1: Makefile: BLK_SHA1 does not require fast htonl() and unaligned loads block-sha1: put expanded macro parameters in parentheses block-sha1: avoid pointer conversion that violates alignment constraints
2012-07-22	block-sha1: put expanded macro parameters in parentheses	Jonathan Nieder	1	-2/+2
	't' is currently always a numeric constant, but it can't hurt to prepare for the day that it becomes useful for a caller to pass in a more complex expression. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-22	block-sha1: avoid pointer conversion that violates alignment constraints	Jonathan Nieder	1	-3/+3
	With 660231aa (block-sha1: support for architectures with memory alignment restrictions, 2009-08-12), blk_SHA1_Update was modified to access 32-bit chunks of memory one byte at a time on arches that prefer that: #define get_be32(p) ( \ (((unsigned char )(p) + 0) << 24) \| \ (((unsigned char )(p) + 1) << 16) \| \ (((unsigned char )(p) + 2) << 8) \| \ (((unsigned char )(p) + 3) << 0) ) The code previously accessed these values by just using htonl(p). Unfortunately, Michael noticed on an Alpha machine that git was using plain 32-bit reads anyway. As soon as we convert a pointer to int , the compiler can assume that the object pointed to is correctly aligned as an int (C99 section 6.3.2.3 "pointer conversions" paragraph 7), and gcc takes full advantage by using a single 32-bit load, resulting in a whole bunch of unaligned access traps. So we need to obey the alignment constraints even when only dealing with pointers instead of actual values. Do so by changing the type of 'data' to void *. This patch renames 'data' to 'block' at the same time to make sure all references are updated to reflect the new type. Reported-tested-and-explained-by: Michael Cree <mcree@orcon.net.nz> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-06-27	msvc: Select the "fast" definition of the {get,put}_be32() macros	Ramsay Jones	1	-0/+1
	On Intel machines, the msvc compiler defines the CPU architecture macros _M_IX86 and _M_X64 (equivalent to __i386__ and __x86_64__ respectively). Use these macros in the pre-processor expression to select the "fast" definition of the {get,put}_be32() macros. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-06-25	msvc: Fix some compiler warnings	Ramsay Jones	1	-4/+4
	In particular, using the normal (or production) compiler warning level (-W3), msvc complains as follows: .../sha1.c(244) : warning C4018: '<' : signed/unsigned mismatch .../sha1.c(270) : warning C4244: 'function' : conversion from \ 'unsigned __int64' to 'unsigned long', possible loss of data .../sha1.c(271) : warning C4244: 'function' : conversion from \ 'unsigned __int64' to 'unsigned long', possible loss of data Note that gcc issues a similar complaint about line 244 when compiling with -Wextra. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-18	remove ARM and Mozilla SHA1 implementations	Nicolas Pitre	2	-4/+8
	They are both slower than the new BLK_SHA1 implementation, so it is pointless to keep them around. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-18	block-sha1: guard gcc extensions with __GNUC__	Nicolas Pitre	1	-2/+2
	With this, the code should now be portable to any C compiler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-18	make sure byte swapping is optimal for git	Nicolas Pitre	1	-2/+2
	We rely on ntohl() and htonl() to perform byte swapping in many places. However, some platforms have libraries providing really poor implementations of those which might cause significant performance issues, especially with the block-sha1 code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-18	block-sha1: make the size member first in the context struct	Nicolas Pitre	1	-1/+1
	This is a 64-bit value, hence having it first provides a better alignment. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-14	block-sha1/sha1.c: silence compiler complaints by casting void * to char *	Brandon Casey	1	-2/+2
	Some compilers produce errors when arithmetic is attempted on pointers to void. We want computations done on byte addresses, so cast them to char * to work them around. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-13	block-sha1: more good unaligned memory access candidates	Nicolas Pitre	1	-1/+4
	In addition to X86, PowerPC and S390 are capable of unaligned memory accesses. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-12	block-sha1: support for architectures with memory alignment restrictions	Nicolas Pitre	1	-2/+30
	This is needed on architectures with poor or non-existent unaligned memory support and/or no fast byte swap instruction (such as ARM) by using byte accesses to memory and shifting the result together. This also makes the code portable, therefore the byte access methods are the defaults. Any architecture that properly supports unaligned word accesses in hardware simply has to enable the alternative methods. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-12	block-sha1: split the different "hacks" to be individually selected	Nicolas Pitre	1	-5/+18
	This is to make it easier for them to be selected individually depending on the architecture instead of the other way around i.e. having each architecture select a list of hacks up front. That makes for clearer documentation as well. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-12	block-sha1: move code around	Nicolas Pitre	1	-69/+60
	Move the code around so specific architecture hacks are defined first. Also make one line comments actually one line. No code change. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-10	block-sha1: improve code on large-register-set machines	Linus Torvalds	1	-1/+24
	For x86 performance (especially in 32-bit mode) I added that hack to write the SHA1 internal temporary hash using a volatile pointer, in order to get gcc to not try to cache the array contents. Because gcc will do all the wrong things, and then spill things in insane random ways. But on architectures like PPC, where you have 32 registers, it's actually perfectly reasonable to put the whole temporary array[] into the register set, and gcc can do so. So make the 'volatile unsigned int ' cast be dependent on a SMALL_REGISTER_SET preprocessor symbol, and enable it (currently) on just x86 and x86-64. With that, the routine is fairly reasonable even when compared to the hand-scheduled PPC version. Ben Herrenschmidt reports on a G5: Paulus asm version: about 3.67s * Yours with no change: about 5.74s * Yours without "volatile": about 3.78s so with this the C version is within about 3% of the asm one. And add a lot of commentary on what the heck is going on. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-07	block-sha1: improved SHA1 hashing	Linus Torvalds	1	-3/+4
	I think I have found a way to avoid the gcc crazyness. Lookie here: # TIME[s] SPEED[MB/s] rfc3174 5.094 119.8 rfc3174 5.098 119.7 linus 1.462 417.5 linusas 2.008 304 linusas2 1.878 325 mozilla 5.566 109.6 mozillaas 5.866 104.1 openssl 1.609 379.3 spelvin 1.675 364.5 spelvina 1.601 381.3 nettle 1.591 383.6 notice? I outperform all the hand-tuned asm on 32-bit too. By quite a margin, in fact. Now, I didn't try a P4, and it's possible that it won't do that there, but the 32-bit code generation sure looks impressive on my Nehalem box. The magic? I force the stores to the 512-bit hash bucket to be done in order. That seems to help a lot. The diff is trivial (on top of the "rename registers with cpp" patch), as appended. And it does seem to fix the P4 issues too, although I can obviously (once again) only test Prescott, and only in 64-bit mode: # TIME[s] SPEED[MB/s] rfc3174 1.662 36.73 rfc3174 1.64 37.22 linus 0.2523 241.9 linusas 0.4367 139.8 linusas2 0.4487 136 mozilla 0.9704 62.9 mozillaas 0.9399 64.94 that's some really impressive improvement. All from just saying "do the stores in the order I told you to, dammit!" to the compiler. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-07	block-sha1: perform register rotation using cpp	Linus Torvalds	1	-27/+90
	Instead of letting the compiler to figure out the optimal way to rotate register usage, explicitly rotate the register names with cpp. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-06	block-sha1: get rid of redundant 'lenW' context	Linus Torvalds	2	-11/+7
	.. and simplify the ctx->size logic. We now count the size in bytes, which means that 'lenW' was always just the low 6 bits of the total size, so we don't carry it around separately any more. And we do the 'size in bits' shift at the end. Suggested by Nicolas Pitre and linux@horizon.com. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-06	block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)\|(D&(B\|C))' in round 3	Linus Torvalds	1	-1/+1
	It's an equivalent expression, but the '+' gives us some freedom in instruction selection (for example, we can use 'lea' rather than 'add'), and associates with the other additions around it to give some minor scheduling freedom. Suggested-by: linux@horizon.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-06	block-sha1: macroize the rounds a bit further	Linus Torvalds	1	-30/+26
	Avoid repeating the shared parts of the different rounds by adding a macro layer or two. It was already more cpp than C. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-06	block-sha1: re-use the temporary array as we calculate the SHA1	Linus Torvalds	1	-12/+16
	The mozilla-SHA1 code did this 80-word array for the 80 iterations. But the SHA1 state is really just 512 bits, and you can actually keep it in a kind of "circular queue" of just 16 words instead. This requires us to do the xor updates as we go along (rather than as a pre-phase), but that's really what we want to do anyway. This gets me really close to the OpenSSL performance on my Nehalem. Look ma, all C code (ok, there's the rol/ror hack, but that one doesn't strictly even matter on my Nehalem, it's just a local optimization). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-06	block-sha1: make the 'ntohl()' part of the first SHA1 loop	Linus Torvalds	1	-12/+16
	This helps a teeny bit. But what I -really- want to do is to avoid the whole 80-array loop, and do the xor updates as I go along.. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-06	block-sha1: minor fixups	Junio C Hamano	1	-2/+2
	Bert Wesarg noticed non-x86 version of SHA_ROT() had a typo. Also spell in-line assembly as __asm__(), otherwise I seem to get error: implicit declaration of function 'asm' from my compiler. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-06	block-sha1: try to use rol/ror appropriately	Linus Torvalds	1	-10/+22
	Use the one with the smaller constant. It _can_ generate slightly smaller code (a constant of 1 is special), but perhaps more importantly it's possibly faster on any uarch that does a rotate with a loop. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-06	block-sha1: undo ctx->size change	Junio C Hamano	1	-3/+3
	Undo the change I picked up from the mailing list discussion suggested by Nico, not because it is wrong, but it will be done at the end of the follow-up series. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-05	Add new optimized C 'block-sha1' routines	Linus Torvalds	2	-0/+166
	Based on the mozilla SHA1 routine, but doing the input data accesses a word at a time and with 'htonl()' instead of loading bytes and shifting. It requires an architecture that is ok with unaligned 32-bit loads and a fast htonl(). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>