summaryrefslogtreecommitdiff
path: root/builtin/pack-objects.c
AgeCommit message (Collapse)AuthorFilesLines
2012-01-12thin-pack: try harder to use preferred base objects as baseLibravatar Jeff King1-2/+7
When creating a pack using objects that reside in existing packs, we try to avoid recomputing futile delta between an object (trg) and a candidate for its base object (src) if they are stored in the same packfile, and trg is not recorded as a delta already. This heuristics makes sense because it is likely that we tried to express trg as a delta based on src but it did not produce a good delta when we created the existing pack. As the pack heuristics prefer producing delta to remove data, and Linus's law dictates that the size of a file grows over time, we tend to record the newest version of the file as inflated, and older ones as delta against it. When creating a thin-pack to transfer recent history, it is likely that we will try to send an object that is recorded in full, as it is newer. But the heuristics to avoid recomputing futile delta effectively forbids us from attempting to express such an object as a delta based on another object. Sending an object in full is often more expensive than sending a suboptimal delta based on other objects, and it is even more so if we could use an object we know the receiving end already has (i.e. preferred base object) as the delta base. Tweak the recomputation avoidance logic, so that we do not punt on computing delta against a preferred base object. The effect of this change can be seen on two simulated upload-pack workloads. The first is based on 44 reflog entries from my git.git origin/master reflog, and represents the packs that kernel.org sent me git updates for the past month or two. The second workload represents much larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to v1.2.0, and so on. The table below shows the average generated pack size and the average CPU time consumed for each dataset, both before and after the patch: dataset | reflog | tags --------------------------------- before | 53358 | 2750977 size after | 32398 | 2668479 change | -39% | -3% --------------------------------- before | 0.18 | 1.12 CPU after | 0.18 | 1.15 change | +0% | +3% This patch makes a much bigger difference for packs with a shorter slice of history (since its effect is seen at the boundaries of the pack) though it has some benefit even for larger packs. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10zlib: zlib can only process 4GB at a timeLibravatar Junio C Hamano1-5/+5
The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10zlib: wrap deflateBound() tooLibravatar Junio C Hamano1-1/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10zlib: wrap deflate side of the APILibravatar Junio C Hamano1-3/+3
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-05Teach core.bigfilethreashold to pack-objectsLibravatar Junio C Hamano1-2/+6
The pack-objects command should take notice of the object file and refrain from attempting to delta large ones, to be consistent with the fast-import command. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-10thread-utils.h: simplify the inclusionLibravatar Junio C Hamano1-4/+0
All files that include this header file use the same four line incantation: #ifndef NO_PTHREADS #include <pthread.h> #include "thread-utils.h" #endif Move the responsibility for that gymnastics to the header file from the files that include it. This approach makes it easier to later declare new services that are related to threading in thread-utils.h and have them available to all the threading code. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-03Merge branch 'jn/thinner-wrapper'Libravatar Junio C Hamano1-1/+1
* jn/thinner-wrapper: Remove pack file handling dependency from wrapper.o pack-objects: mark file-local variable static wrapper: give zlib wrappers their own translation unit strbuf: move strbuf_branchname to sha1_name.c path helpers: move git_mkstemp* to wrapper.c wrapper: move odb_* to environment.c wrapper: move xmmap() to sha1_file.c
2010-11-10pack-objects: mark file-local variable staticLibravatar Jonathan Nieder1-1/+1
old_try_to_free_routine is not meant for use from other files. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-22make pack-objects a bit more resilient to repo corruptionLibravatar Nicolas Pitre1-1/+15
Right now, packing valid objects could fail when creating a thin pack simply because a pack edge object used as a preferred base is corrupted. Since preferred base objects are not strictly needed to produce a valid pack, let's not consider the inability to read them as a fatal error. Delta compression may well be attempted against other objects in the search window. To avoid warning storms (we are in the inner loop of the delta search window) a warning is emitted only on the first occurrence. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-08Put a space between `<' and argument in pack-objects usage stringLibravatar Štěpán Němec1-1/+1
This makes it cosistent with other places (including the git-pack-objects(1) manpage itself) and avoids possible confusion (I, for one, mistook `<object-list' for a `<object-list>' typo at first when preparing this series). Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-08Use parentheses and `...' where appropriateLibravatar Štěpán Němec1-1/+1
Remove some stray usage of other bracket types and asterisks for the same purpose. Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-08Use angles for placeholders consistentlyLibravatar Štěpán Němec1-3/+3
Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06do not depend on signed integer overflowLibravatar Erik Faye-Lund1-1/+1
Signed integer overflow is not defined in C, so do not depend on it. This fixes a problem with GCC 4.4.0 and -O3 where the optimizer would consider "consumed_bytes > consumed_bytes + bytes" as a constant expression, and never execute the die()-call. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-30Fix typo in pack-objects' usageLibravatar Johannes Schindelin1-1/+1
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-06-13Merge branch 'js/try-to-free-stackable'Libravatar Junio C Hamano1-2/+4
* js/try-to-free-stackable: Do not call release_pack_memory in malloc wrappers when GIT_TRACE is used Have set_try_to_free_routine return the previous routine
2010-05-21Merge branch 'np/malloc-threading'Libravatar Junio C Hamano1-2/+11
* np/malloc-threading: Thread-safe xmalloc and xrealloc needs a recursive mutex Make xmalloc and xrealloc thread-safe
2010-03-10Merge branch 'lt/deepen-builtin-source'Libravatar Junio C Hamano1-0/+2334
* lt/deepen-builtin-source: Move 'builtin-*' into a 'builtin/' subdirectory Conflicts: Makefile
2010-02-22Move 'builtin-*' into a 'builtin/' subdirectoryLibravatar Linus Torvalds1-0/+2375
This shrinks the top-level directory a bit, and makes it much more pleasant to use auto-completion on the thing. Instead of [torvalds@nehalem git]$ em buil<tab> Display all 180 possibilities? (y or n) [torvalds@nehalem git]$ em builtin-sh builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o [torvalds@nehalem git]$ em builtin-shor<tab> builtin-shortlog.c builtin-shortlog.o [torvalds@nehalem git]$ em builtin-shortlog.c you get [torvalds@nehalem git]$ em buil<tab> [type] builtin/ builtin.h [torvalds@nehalem git]$ em builtin [auto-completes to] [torvalds@nehalem git]$ em builtin/sh<tab> [type] shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o [torvalds@nehalem git]$ em builtin/sho [auto-completes to] [torvalds@nehalem git]$ em builtin/shor<tab> [type] shortlog.c shortlog.o [torvalds@nehalem git]$ em builtin/shortlog.c which doesn't seem all that different, but not having that annoying break in "Display all 180 possibilities?" is quite a relief. NOTE! If you do this in a clean tree (no object files etc), or using an editor that has auto-completion rules that ignores '*.o' files, you won't see that annoying 'Display all 180 possibilities?' message - it will just show the choices instead. I think bash has some cut-off around 100 choices or something. So the reason I see this is that I'm using an odd editory, and thus don't have the rules to cut down on auto-completion. But you can simulate that by using 'ls' instead, or something similar. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>