summaryrefslogtreecommitdiff
path: root/diff-delta.c
AgeCommit message (Collapse)AuthorFilesLines
2019-09-06diff-delta: set size out-parameter to 0 for NULL deltaLibravatar Jeff King1-0/+2
When we cannot generate a delta, we return NULL but leave delta_size untouched. This is generally OK, as callers rely on NULL to decide if the output is usable or not. But it can confuse compilers; in particular, gcc 9.2.1 with "-flto -O3" complains in fast-import's store_object() that delta_len may be used uninitialized. Let's change the diff-delta code to set the size explicitly to 0 for a NULL return. That silences the compiler and makes it easier to reason about the result. Reported-by: Stephan Beyer <s-beyer@gmx.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-28Merge branch 'mk/diff-delta-avoid-large-offset'Libravatar Junio C Hamano1-0/+3
The delta format used in the packfile cannot reference data at offset larger than what can be expressed in 4-byte, but the generator for the data failed to make sure the offset does not overflow. This has been corrected. * mk/diff-delta-avoid-large-offset: diff-delta: do not allow delta offset truncation
2017-08-11diff-delta: do not allow delta offset truncationLibravatar Martin Koegler1-0/+3
Prevent generating delta offsets beyond 4G, as the xdelta used in the pack format cannot represent such large offset. Signed-off-by: Martin Koegler <martin.koegler@chello.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-10diff-delta: fix encoding size that would not fit in "unsigned int"Libravatar Martin Koegler1-11/+13
The current delta code produces incorrect pack objects for files > 4GB, because the size is copied from size_t field to "unsigned int" variables during the encoding process. Signed-off-by: Martin Koegler <martin.koegler@chello.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-18create_delta_index: simplify condition always evaluating to trueLibravatar Stefan Beller1-1/+1
The code sequence ' (1u << i) < hsize && i < 31 ' is a multi step process, whose first step requires that 'i' is already less that 31, otherwise the result (1u << i) is undefined (and 'undef_val < hsize' can therefore be assumed to be 'false'), and so the later test i < 31 can always be optimized away as dead code ('i' is already less than 31, or the short circuit 'and' applies). So we need to get rid of that code. One way would be to exchange the order of the conditions, so the expression 'i < 31 && (1u << i) < hsize' would remove that optimized unstable code already. However when checking the previous lines in that function, we can deduce that 'hsize' must always be smaller than (1u<<31), since 506049c7df2c6 (fix >4GiB source delta assertion failure), because 'entries' is capped at an upper bound of 0xfffffffeU, so 'hsize' contains a maximum value of 0x3fffffff, which is smaller than (1u<<31), so the value of 'i' will never be larger than 31 and we can remove that condition entirely. Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Acked-by: Philip Oakley <philipoakley@iee.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-21fix >4GiB source delta assertion failureLibravatar Nicolas Pitre1-1/+8
When people try insane things such as delta-compressing 4GiB files, they get this assertion: diff-delta.c:285: create_delta_index: Assertion `packed_entry - (struct index_entry *)mem == entries' failed. This happens because: 1) the 'entries' variable is an unsigned int 2) it is assigned with entries = (bufsize - 1) / RABIN_WINDOW (that itself is not a problem unless bufsize > 4G * RABIN_WINDOW) 3) the buffer is indexed from top to bottom starting at "data = buffer + entries * RABIN_WINDOW" and the multiplication here does indeed overflows, making the resulting top of the buffer much lower than expected. This makes the number of actually produced index entries smaller than what was computed initially, hence the assertion. Furthermore, the current delta encoding format cannot represent offsets into a reference buffer with more than 32 bits anyway. So let's just limit the number of entries to what the delta format can encode. Reported-by: Ilari Liusvaara <ilari.liusvaara@elisanet.fi> Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-09-14Nicolas Pitre has a new email addressLibravatar Nicolas Pitre1-1/+1
Due to problems at cam.org, my nico@cam.org email address is no longer valid. From now on, nico@fluxnic.net should be used instead. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-18fix style of a few comments in diff-delta.cLibravatar Nicolas Pitre1-24/+24
Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-17Fix segfault in diff-delta.c when FLEX_ARRAY is 1Libravatar Pierre Habouzit1-1/+1
aka don't do pointer arithmetics on structs that have a FLEX_ARRAY member, or you'll end up believing your array is 1 cell off its real address. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-09diff-delta.c: Rationalize culling of hash bucketsLibravatar David Kastrup1-10/+31
The previous hash bucket culling resulted in a somewhat unpredictable number of hash bucket entries in the order of magnitude of HASH_LIMIT. Replace this with a Bresenham-like algorithm leaving us with exactly HASH_LIMIT entries by uniform culling. Signed-off-by: David Kastrup <dak@gnu.org> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-09diff-delta.c: pack the index structureLibravatar David Kastrup1-16/+58
In normal use cases, the performance wins are not overly impressive: we get something like 5-10% due to the slightly better locality of memory accesses using the packed structure. However, since the data structure for index entries saves 33% of memory on 32-bit platforms and 40% on 64-bit platforms, the behavior when memory gets limited should be nicer. This is a rather well-contained change. One obvious improvement would be sorting the elements in one bucket according to their hash, then using binary probing to find the elements with the right hash value. As it stands, the output should be strictly the same as previously unless one uses the option for limiting the amount of used memory, in which case the created packs might be better. Signed-off-by: David Kastrup <dak@gnu.org> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-23diff-delta.c: Fix broken skip calculation.Libravatar David Kastrup1-1/+1
A particularly bad case was HASH_LIMIT <= hash_count[i] < 2*HASH_LIMIT: in that case, only a single hash survived. For larger cases, 2*HASH_LIMIT was the actual limiting value after pruning. Signed-off-by: David Kastrup <dak@gnu.org> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-12Support fetching the memory usage of a delta indexLibravatar Brian Downing1-0/+10
Delta indices, at least on 64-bit platforms, tend to be larger than the actual uncompressed data. As such, keeping track of this storage is important if you want to successfully limit the memory size of your pack window. Squirrel away the total allocation size inside the delta_index struct, and add an accessor "sizeof_delta_index" to access it. Signed-off-by: Brian Downing <bdowning@lavos.net> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-05-31diff-delta: use realloc instead of xreallocLibravatar Martin Koegler1-1/+1
Commit 83572c1a914d3f7a8dd66d954c11bbc665b7b923 changed many realloc to xrealloc. This change was made in diff-delta.c too, although the code can handle an out of memory failure. This patch reverts this change in diff-delta.c. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-26update diff-delta.c copyrightLibravatar Nicolas Pitre1-13/+6
There is actually nothing left from the original LibXDiff code I used over 2 years ago, and even the GIT implementation has diverged quite a bit from LibXDiff's at this point. Let's update the copyright notice to better reflect that fact. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-26improve delta long block matching with big filesLibravatar Nicolas Pitre1-51/+57
Martin Koegler noted that create_delta() performs a new hash lookup after every block copy encoding which are currently limited to 64KB. In case of larger identical blocks, the next hash lookup would normally point to the next 64KB block in the reference buffer and multiple block copy operations will be consecutively encoded. It is however possible that the reference buffer be sparsely indexed if hash buckets have been trimmed down in create_delta_index() when hashing of the reference buffer isn't well balanced. In that case the hash lookup following a block copy might fail to match anything and the fact that the reference buffer still matches beyond the previous 64KB block will be missed. Let's rework the code so that buffer comparison isn't bounded to 64KB anymore. The match size should be as large as possible up front and only then should multiple block copy be encoded to cover it all. Also, fewer hash lookups will be performed in the end. According to Martin, this patch should reduce his 92MB pack down to 75MB with the dataset he has. Tests performed on the Linux kernel repo show a slightly smaller pack and a slightly faster repack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20simplify inclusion of system header files.Libravatar Junio C Hamano1-4/+1
This is a mechanical clean-up of the way *.c files include system header files. (1) sources under compat/, platform sha-1 implementations, and xdelta code are exempt from the following rules; (2) the first #include must be "git-compat-util.h" or one of our own header file that includes it first (e.g. config.h, builtin.h, pkt-line.h); (3) system headers that are included in "git-compat-util.h" need not be included in individual C source files. (4) "git-compat-util.h" does not have to include subsystem specific header files (e.g. expat.h). Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-14Revert "move pack creation to version 3"Libravatar Junio C Hamano1-6/+2
This reverts commit 16854571aae6302f457c5fbee41ac64669b09595. Git as recent as v1.1.6 do not understand version 3 delta. v1.2.0 is Ok and I personally would say it is old enough, but the improvement between version 2 and version 3 delta is not bit enough to justify breaking older clients. We should resurrect this later, but when we do so we shold make it conditional. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-22move pack creation to version 3Libravatar Nicolas Pitre1-2/+6
It's been quite a while now that GIT is able to read version 3 packs. Let's create them at last. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-26Use xrealloc instead of reallocLibravatar Jonas Fonseca1-1/+1
Change places that use realloc, without a proper error path, to instead use xrealloc. Drop an erroneous error path in the daemon code that used errno in the die message in favour of the simpler xrealloc. Signed-off-by: Jonas Fonseca <fonseca@diku.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23Fix a comparison bug in diff-delta.cLibravatar Pierre Habouzit1-1/+1
(1 << i) < hspace is compared in the `int` space rather that in the unsigned one. the result will be wrong if hspace is between 0x40000000 and 0x80000000. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-10Fix more typos, primarily in the codeLibravatar Pavel Roskin1-3/+3
The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20Remove all void-pointer arithmetic.Libravatar Florian Forster1-1/+1
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-18Initialize FAMs using `FLEX_ARRAY'.Libravatar Florian Forster1-1/+2
When initializing a `flexible array member' the macro `FLEX_ARRAY' should be used. This was forgotten in `diff-delta.c'. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-10fix diff-delta bad memory accessLibravatar Nicolas Pitre1-5/+0
It cannot be assumed that the given buffer will never be moved when shrinking the allocated memory size with realloc(). So let's ignore that optimization for now. This patch makes Electric Fence happy on Linux. Signed-off-by: Nicolas Pitre <nico@cam.org> Acked-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-02improve diff-delta with sparse and/or repetitive dataLibravatar Nicolas Pitre1-13/+27
It is useless to preserve multiple hash entries for consecutive blocks with the same hash. Keeping only the first one will allow for matching the longest string of identical bytes while subsequent blocks will only allow for shorter matches. The backward matching code will match the end of it as necessary. This improves both performances (no repeated string compare with long successions of identical bytes, or even small group of bytes), as well as compression (less likely to need random hash bucket entry culling), especially with sparse files. With well behaved data sets this patch doesn't change much. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-02tiny optimization to diff-deltaLibravatar Nicolas Pitre1-3/+2
This is my assembly freak side looking at generated code again. And since create_delta() is certainly pretty high on the radar every bits count. In this case shorter code is generated if hash_mask is not copied to a local variable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-28replace adler32 with Rabin's polynomial in diff-deltaLibravatar Nicolas Pitre1-34/+150
This brings another small repacking speedup for sensibly the same pack size. On the Linux kernel repo, git-repack -a -f is 3.7% faster for a 0.4% larger pack. Credits to Geert Bosch who brought the Rabin's polynomial idea to my attention. This also eliminate the issue of adler32() reading past the data buffer, as noticed by Johannes Schindelin. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-24split the diff-delta interfaceLibravatar Nicolas Pitre1-83/+85
This patch splits the diff-delta interface into index creation and delta generation. A wrapper is provided to preserve the diff-delta() call. This will allow for an optimization in pack-objects.c where the source object could be fixed and a full window of objects tentatively tried against that same source object without recomputing the source index each time. This patch only restructure things, plus a couple cleanups for good measure. There is no performance change yet. Signed-off-by: Nicolas Pitre <nico@cam.org>
2006-03-173% tighter packs for freeLibravatar Nicolas Pitre1-1/+16
This patch makes for 3.4% smaller pack with the git repository, and a bit more than 3% smaller pack with the kernel repository. And so with _no_ measurable CPU difference. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-09diff-delta: bound hash list length to avoid O(m*n) behaviorLibravatar Nicolas Pitre1-30/+71
The diff-delta code can exhibit O(m*n) behavior with some patological data set where most hash entries end up in the same hash bucket. To prevent this, a limit is imposed to the number of entries that can exist in the same hash bucket. Because of the above the code is a tiny bit more expensive on average, even if some small optimizations were added as well to atenuate the overhead. But the problematic samples used to diagnoze the issue are now orders of magnitude less expensive to process with only a slight loss in compression. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22diff-delta: big code simplificationLibravatar Nicolas Pitre1-146/+85
This is much smaller and hopefully clearer code now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22diff-delta: fold two special tests into one plus cleanupsLibravatar Nicolas Pitre1-10/+14
Testing for realloc and size limit can be done with only one test per loop. Make it so and fix a theoretical off-by-one comparison error in the process. The output buffer memory allocation is also bounded by max_size when specified. Finally make some variable unsigned to allow the handling of files up to 4GB in size instead of 2GB. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-05Use adler32() from zlib instead of defining our own.Libravatar Peter Eriksen1-38/+1
Since we already depend on zlib, we don't need to define our own adler32(). Spotted by oprofile. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-15small cleanup for diff-delta.cLibravatar Nicolas Pitre1-22/+12
This patch removes unused remnants of the original xdiff source. No functional change. Possible tiny speed improvement. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-12Revert "diff-delta.c: allow delta with empty blob."Libravatar Junio C Hamano1-1/+1
This reverts 962537a3eb03a118cf27d9d0da365a3216ed1caa commit to play safe.
2005-12-12diff-delta.c: allow delta with empty blob.Libravatar Junio C Hamano1-1/+1
Delta computation with an empty blob used to punt and returned NULL. This commit allows creation with empty blob; all combination of empty->empty, empty->something, and something->empty are allowed. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-06-29[PATCH] assorted delta code cleanupLibravatar Nicolas Pitre1-1/+2
This is a wrap-up patch including all the cleanups I've done to the delta code and its usage. The most important change is the factorization of the delta header handling code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28[PATCH] denser delta header encodingLibravatar Nicolas Pitre1-20/+14
Since the delta data format is not tied to any actual git object anymore, now is the time to add a small improvement to the delta data header as it is been done for packed object header. This patch allows for reducing the delta header of about 2 bytes and makes for simpler code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25Add a "max_size" parameter to diff_delta()Libravatar Linus Torvalds1-1/+7
Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens.
2005-05-19[PATCH] Deltification library work by Nicolas Pitre.Libravatar Nicolas Pitre1-0/+333
This patch adds the basic library functions to create and replay delta information. Also included is a test-delta utility to validate the code. diff-delta was based on LibXDiff written by Davide Libenzi Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>