tgif.git - Terin's Improved Git Fork

Age	Commit message (Collapse)	Author	Files	Lines
2018-07-23	pack-objects: fix performance issues on packing large deltas	Nguyễn Thái Ngọc Duy	1	-0/+1
	Let's start with some background about oe_delta_size() and oe_set_delta_size(). If you already know, skip the next paragraph. These two are added in 0aca34e826 (pack-objects: shrink delta_size field in struct object_entry - 2018-04-14) to help reduce 'struct object_entry' size. The delta size field in this struct is reduced to only contain max 1MB. So if any new delta is produced and larger than 1MB, it's dropped because we can't really save such a large size anywhere. Fallback is provided in case existing packfiles already have large deltas, then we can retrieve it from the pack. While this should help small machines repacking large repos without large deltas (i.e. less memory pressure), dropping large deltas during the delta selection process could end up with worse pack files. And if existing packfiles already have >1MB delta and pack-objects is instructed to not reuse deltas, all of them will be dropped on the floor, and the resulting pack would be definitely bigger. There is also a regression in terms of CPU/IO if we have large on-disk deltas because fallback code needs to parse the pack every time the delta size is needed and just access to the mmap'd pack data is enough for extra page faults when memory is under pressure. Both of these issues were reported on the mailing list. Here's some numbers for comparison. Version Pack (MB) MaxRSS(kB) Time (s) ------- --------- ---------- -------- 2.17.0 5498 43513628 2494.85 2.18.0 10531 40449596 4168.94 This patch provides a better fallback that is - cheaper in terms of cpu and io because we won't have to read existing pack files as much - better in terms of pack size because the pack heuristics is back to 2.17.0 time, we do not drop large deltas at all If we encounter any delta (on-disk or created during try_delta phase) that is larger than the 1MB limit, we stop using delta_size_ field for this because it can't contain such size anyway. A new array of delta size is dynamically allocated and can hold all the deltas that 2.17.0 can. This array only contains delta sizes that delta_size_ can't contain. With this, we do not have to drop deltas in try_delta() anymore. Of course the downside is we use slightly more memory, even compared to 2.17.0. But since this is considered an uncommon case, a bit more memory consumption should not be a problem. Delta size limit is also raised from 1MB to 16MB to better cover common case and avoid that extra memory consumption (99.999% deltas in this reported repo are under 12MB; Jeff noted binary artifacts topped out at about 3MB in some other private repos). Other fields are shuffled around to keep this struct packed tight. We don't use more memory in common case even with this limit update. A note about thread synchronization. Since this code can be run in parallel during delta searching phase, we need a mutex. The realloc part in packlist_alloc() is not protected because it only happens during the object counting phase, which is always single-threaded. Access to e->delta_size_ (and by extension pack->delta_size[e - pack->objects]) is unprotected as before, the thread scheduler in pack-objects must make sure "e" is never updated by two different threads. The area under the new lock is as small as possible, avoiding locking at all in common case, since lock contention with high thread count could be expensive (most blobs are small enough that delta compute time is short and we end up taking the lock very often). The previous attempt to always hold a lock in oe_delta_size() and oe_set_delta_size() increases execution time by 33% when repacking linux.git with with 40 threads. Reported-by: Elijah Newren <newren@gmail.com> Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16	ci: exercise the whole test suite with uncommon code in pack-objects	Nguyễn Thái Ngọc Duy	1	-1/+4
	Some recent optimizations have been added to pack-objects to reduce memory usage and some code paths are split into two: one for common use cases and one for rare ones. Make sure the rare cases are tested with Travis since it requires manual test configuration that is unlikely to be done by developers. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16	read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean	Nguyễn Thái Ngọc Duy	1	-1/+1
	While at there, document about this special mode when running the test suite. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-08	Merge branch 'sg/travis-build-during-script-phase'	Junio C Hamano	1	-0/+19
	Build the executable in 'script' phase in Travis CI integration, to follow the established practice, rather than during 'before_script' phase. This allows the CI categorize the failures better ('failed' is project's fault, 'errored' is build environment's). * sg/travis-build-during-script-phase: travis-ci: build Git during the 'script' phase
2018-01-08	travis-ci: build Git during the 'script' phase	SZEDER Gábor	1	-0/+15
	Ever since we started building and testing Git on Travis CI (522354d70 (Add Travis CI support, 2015-11-27)), we build Git in the 'before_script' phase and run the test suite in the 'script' phase (except in the later introduced 32 bit Linux and Windows build jobs, where we build in the 'script' phase'). Contrarily, the Travis CI practice is to build and test in the 'script' phase; indeed Travis CI's default build command for the 'script' phase of C/C++ projects is: ./configure && make && make test The reason why Travis CI does it this way and why it's a better approach than ours lies in how unsuccessful build jobs are categorized. After something went wrong in a build job, its state can be: - 'failed', if a command in the 'script' phase returned an error. This is indicated by a red 'X' on the Travis CI web interface. - 'errored', if a command in the 'before_install', 'install', or 'before_script' phase returned an error, or the build job exceeded the time limit. This is shown as a red '!' on the web interface. This makes it easier, both for humans looking at the Travis CI web interface and for automated tools querying the Travis CI API, to decide when an unsuccessful build is our responsibility requiring human attention, i.e. when a build job 'failed' because of a compiler error or a test failure, and when it's caused by something beyond our control and might be fixed by restarting the build job, e.g. when a build job 'errored' because a dependency couldn't be installed due to a temporary network error or because the OSX build job exceeded its time limit. The drawback of building Git in the 'before_script' phase is that one has to check the trace log of all 'errored' build jobs, too, to see what caused the error, as it might have been caused by a compiler error. This requires additional clicks and page loads on the web interface and additional complexity and API requests in automated tools. Therefore, move building Git from the 'before_script' phase to the 'script' phase, updating the script's name accordingly as well. 'ci/run-builds.sh' now becomes basically empty, remove it. Several of our build job configurations override our default 'before_script' to do nothing; with this change our default 'before_script' won't do anything, either, so remove those overriding directives as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>