summaryrefslogtreecommitdiff
path: root/t/perf
AgeCommit message (Collapse)AuthorFilesLines
2016-09-21Merge branch 'ks/pack-objects-bitmap'Libravatar Junio C Hamano1-1/+13
Some codepaths in "git pack-objects" were not ready to use an existing pack bitmap; now they are and as the result they have become faster. * ks/pack-objects-bitmap: pack-objects: use reachability bitmap index when generating non-stdout pack pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
2016-09-12pack-objects: use reachability bitmap index when generating non-stdout packLibravatar Kirill Smelkov1-1/+13
Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we take care to not let it be activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: The current code has singleton bitmap_git, so it cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because pack-to-file needs to generate index for destination pack, and currently on pack reuse raw entries are directly written out to the destination pack by write_reused_pack(), bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. ( In the future we might teach pack-reuse code about cases when index also needs to be generated for resultant pack and remove pack-reuse-only-for-stdout limitation ) This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. NOTE3 The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh Test 56dfeb62 this tree -------------------------------------------------------------------------------- 5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8% 5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5% 5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0% 5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3% 5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0% 5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5% 5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2% More context: http://marc.info/?t=146792101400001&r=1&w=2 http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-08Merge branch 'jk/delta-base-cache'Libravatar Junio C Hamano1-0/+31
The delta-base-cache mechanism has been a key to the performance in a repository with a tightly packed packfile, but it did not scale well even with a larger value of core.deltaBaseCacheLimit. * jk/delta-base-cache: t/perf: add basic perf tests for delta base cache delta_base_cache: use hashmap.h delta_base_cache: drop special treatment of blobs delta_base_cache: use list.h for LRU release_delta_base_cache: reuse existing detach function clear_delta_base_cache_entry: use a more descriptive name cache_or_unpack_entry: drop keep_cache parameter
2016-08-31Merge branch 'kw/patch-ids-optim'Libravatar Junio C Hamano1-0/+0
* kw/patch-ids-optim: p3400: make test script executable
2016-08-29p3400: make test script executableLibravatar René Scharfe1-0/+0
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-23t/perf: add basic perf tests for delta base cacheLibravatar Jeff King1-0/+31
This just shows off the improvements done by the last few patches, and gives us a baseline for noticing regressions in the future. Here are the results with linux.git as the perf "large repo": Test origin HEAD ------------------------------------------------------------------- 0003.1: log --raw 43.41(40.36+2.69) 33.86(30.96+2.41) -22.0% 0003.2: log -S 313.61(309.74+3.78) 298.75(295.58+3.00) -4.7% (for a large repo, the "log -S" improvements are greater if you bump the delta base cache limit, but I think it makes sense to test the "stock" behavior, since that is what most people will see). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12Merge branch 'kw/patch-ids-optim'Libravatar Junio C Hamano1-0/+36
When "git rebase" tries to compare set of changes on the updated upstream and our own branch, it computes patch-id for all of these changes and attempts to find matches. This has been optimized by lazily computing the full patch-id (which is expensive) to be compared only for changes that touch the same set of paths. * kw/patch-ids-optim: rebase: avoid computing unnecessary patch IDs patch-ids: add flag to create the diff patch id using header only data patch-ids: replace the seen indicator with a commit pointer patch-ids: stop using a hand-rolled hashmap implementation
2016-08-11rebase: avoid computing unnecessary patch IDsLibravatar Kevin Willford1-0/+36
The `rebase` family of Git commands avoid applying patches that were already integrated upstream. They do that by using the revision walking option that computes the patch IDs of the two sides of the rebase (local-only patches vs upstream-only ones) and skipping those local patches whose patch ID matches one of the upstream ones. In many cases, this causes unnecessary churn, as already the set of paths touched by a given commit would suffice to determine that an upstream patch has no local equivalent. This hurts performance in particular when there are a lot of upstream patches, and/or large ones. Therefore, let's introduce the concept of a "diff-header-only" patch ID, compare those first, and only evaluate the "full" patch ID lazily. Please note that in contrast to the "full" patch IDs, those "diff-header-only" patch IDs are prone to collide with one another, as adjacent commits frequently touch the very same files. Hence we now have to be careful to allow multiple hash entries with the same hash. We accomplish that by using the hashmap_add() function that does not even test for hash collisions. This also allows us to evaluate the full patch ID lazily, i.e. only when we found commits with matching diff-header-only patch IDs. We add a performance test that demonstrates ~1-6% improvement. In practice this will depend on various factors such as how many upstream changes and how big those changes are along with whether file system caches are cold or warm. As Git's test suite has no way of catching performance regressions, we also add a regression test that verifies that the full patch ID computation is skipped when the diff-header-only computation suffices. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29t/perf: add tests for many-pack scenariosLibravatar Jeff King1-0/+87
Git's pack storage does efficient (log n) lookups in a single packfile's index, but if we have multiple packfiles, we have to linearly search each for a given object. This patch introduces some timing tests for cases where we have a large number of packs, so that we can measure any improvements we make in the following patches. The main thing we want to time is object lookup. To do this, we measure "git rev-list --objects --all", which does a fairly large number of object lookups (essentially one per object in the repository). However, we also measure the time to do a full repack, which is interesting for two reasons. One is that in addition to the usual pack lookup, it has its own linear iteration over the list of packs. And two is that because it it is the tool one uses to go from an inefficient many-pack situation back to a single pack, we care about its performance not only at marginal numbers of packs, but at the extreme cases (e.g., if you somehow end up with 5,000 packs, it is the only way to get back to 1 pack, so we need to make sure it performs well). We measure the performance of each command in three scenarios: 1 pack, 50 packs, and 1,000 packs. The 1-pack case is a baseline; any optimizations we do to handle multiple packs cannot possibly perform better than this. The 50-pack case is as far as Git should generally allow your repository to go, if you have auto-gc enabled with the default settings. So this represents the maximum performance improvement we would expect under normal circumstances. The 1,000-pack case is hopefully rare, though I have seen it in the wild where automatic maintenance was broken for some time (and the repository continued to receive pushes). This represents cases where we care less about general performance, but want to make sure that a full repack command does not take excessively long. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-11Merge branch 'jk/perf-any-version'Libravatar Junio C Hamano3-6/+17
Allow t/perf framework to use the features from the most recent version of Git even when testing an older installed version. * jk/perf-any-version: p4211: explicitly disable renames in no-rename test t/perf: fix regression in testing older versions of git
2016-06-22p4211: explicitly disable renames in no-rename testLibravatar Jeff King1-3/+3
p4211 tests line-log performance both with and without "-M". In v2.9.0, the case without "-M" appears to have regressed badly, but that is only because we flipped on renames by default. Let's have the test explicitly disable renames to get consistent timings (and to match the presumed intent of the test, which is to see the effects with and without renames). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-22t/perf: fix regression in testing older versions of gitLibravatar Jeff King2-3/+14
Commit 7501b59 (perf: make the tests work in worktrees, 2016-05-13) introduced the use of "git rev-parse --git-path" in the perf-lib setup code. Because the to-be-tested version of git is at the front of the $PATH when this code runs, this means we cannot use modern versions of t/perf to test versions of git older than v2.5.0 (when that option was introduced). This is a symptom of a more general problem. The t/perf suite is essentially independent of git versions, and ideally we would be able to run the most modern and complete set of tests across many historical versions (to see how they compare). But any setup code they run is therefore required to use the lowest common denominator we expect to test. So let's introduce a new variable, $MODERN_GIT, that we can use both in perf-lib and in the test setup to get a reliable set of git features (we might change git and break some tests, of course, but $MODERN_GIT is tied to the same version of git as the t/perf scripts, so they can be fixed or adjusted together). This commit fixes the "--git-path" case, but does not mass-convert existing setup code to use $MODERN_GIT. Most setup code is fairly vanilla and will work with effectively all versions. But now the tool is there to fix any other issues we find going forward. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-21perf: accommodate for MacOSXLibravatar Johannes Schindelin1-1/+5
As this developer has no access to MacOSX developer setups anymore, Travis becomes the best bet to run performance tests on that OS. However, on MacOSX /usr/bin/time is that good old BSD executable that no Linux user cares about, as demonstrated by the perf-lib.sh's use of GNU-ish extensions. And by the hard-coded path. Let's just work around this issue by using gtime on MacOSX, the Homebrew-provided GNU implementation onto which pretty much every MacOSX power user falls back anyway. To help other developers use Travis to run performance tests on MacOSX, the .travis.yml file now sports a commented-out line that installs GNU time via Homebrew. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-31perf: make the tests work without a worktreeLibravatar René Scharfe1-1/+4
In regular repositories $source_git and $objects_dir contain relative paths based on $source. Go there to allow cp to resolve them. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-13perf: run "rebase -i" under perfLibravatar Johannes Schindelin1-0/+36
This developer spent a lot of time trying to speed up the interactive rebase, in particular on Windows. And will continue to do so. To make it easier to demonstrate the performance improvement, let's have a reproducible performance test. The topic branch we use to test performance was found using these shell commands (essentially searching for a long-enough topic branch in Git's own history that touched the same file multiple times): git rev-list --parents origin/master | grep ' .* ' | while read commit rest do patch_count=$(git rev-list --count $commit^..$commit^2) test $patch_count -gt 20 || continue merges="$(git rev-list --parents $commit^..$commit^2 | grep ' .* ')" test -z "$merges" || continue patches_per_file="$(git log --pretty=%H --name-only \ $commit^..$commit^2 | grep -v '^$' | sort | uniq -c -d | sort -n -r)" test -n "$patches_per_file" && test 20 -lt $(echo "$patches_per_file" | sed -n '1s/^ *\([0-9]*\).*/\1/p') || continue printf 'commit %s\n%s\n' "$commit" "$patches_per_file" done Note that we can get away with *not* having to reset to the original branch tip before rebasing: we switch the first two "pick" lines every time, so we end up with the same patch order after two rebases, and the complexity of both rebases is equivalent. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-13perf: make the tests work in worktreesLibravatar Johannes Schindelin1-7/+7
This patch makes perf-lib.sh more robust so that it can run correctly even inside a worktree. For example, it assumed that $GIT_DIR/objects is the objects directory (which is not the case for worktrees) and it used the commondir file verbatim, even if it contained a relative path. Furthermore, the setup code expected `git rev-parse --git-dir` to spit out a relative path, which is also not true for worktrees. Let's just change the code to accept both relative and absolute paths, by avoiding the `cd` into the copied working directory. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-13perf: let's disable symlinks when they are not availableLibravatar Johannes Schindelin1-1/+4
We already have a perfectly fine prereq to tell us whether it is safe to use symlinks. So let's use it. This fixes the performance tests in Git for Windows' SDK, where symlinks are not really available ([*1*]). This is not an issue with Git for Windows itself because it configures core.symlinks=false in its system config. However, the system config is disabled for the performance tests, for obvious reasons: we want them to be independent of the vagaries of any local configuration. Footnote *1*: Windows has symbolic links. Git for Windows disables them by default, though (for example: in standard setups, non-admins lack the privilege to create symbolic links). For details, see https://github.com/git-for-windows/git/wiki/Symbolic-Links Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-25resolve_gitlink_ref: ignore non-repository pathsLibravatar Jeff King1-0/+4
When we want to look up a submodule ref, we use get_ref_cache(path) to find or auto-create its ref cache. But if we feed a path that isn't actually a git repository, we blindly create the ref cache, and then may die deeper in the code when we try to access it. This is a problem because many callers speculatively feed us a path that looks vaguely like a repository, and expect us to tell them when it is not. This patch teaches resolve_gitlink_ref to reject non-repository paths without creating a ref_cache. This avoids the die(), and also performs better if you have a large number of these faux-submodule directories (because the ref_cache lookup is linear, under the assumption that there won't be a large number of submodules). To accomplish this, we also break get_ref_cache into two pieces: the lookup and auto-creation (the latter is lumped into create_ref_cache). This lets us first cheaply ask our cache "is it a submodule we know about?" If so, we can avoid repeating our filesystem lookup. So lookups of real submodules are not penalized; they examine the submodule's .git directory only once. The test in t3000 demonstrates a case where this improves correctness (we used to just die). The new perf case in p7300 shows off the speed improvement in an admittedly pathological repository: Test HEAD^ HEAD ---------------------------------------------------------------- 7300.4: ls-files -o 66.97(66.15+0.87) 0.33(0.08+0.24) -99.5% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-11-06filter-branch: skip index read/write when possibleLibravatar Jeff King1-0/+19
If the user specifies an index filter but not a tree filter, filter-branch cleverly avoids checking out the tree entirely. But we don't do the next level of optimization: if you have no index or tree filter, we do not need to read the index at all. This can greatly speed up cases where we are only changing the commit objects (e.g., cementing a graft into place). Here are numbers from the newly-added perf test: Test HEAD^ HEAD --------------------------------------------------------------- 7000.2: noop filter 13.81(4.95+0.83) 5.43(0.42+0.43) -60.7% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-15Merge branch 'sb/perf-without-installed-git'Libravatar Junio C Hamano1-0/+1
Performance-measurement tests did not work without an installed Git. * sb/perf-without-installed-git: t/perf: make runner work even if Git is not installed
2015-09-25t/perf: make runner work even if Git is not installedLibravatar Stephan Beyer1-0/+1
aggregate.perl did not work when Git.pm is not installed to a directory contained in the default Perl library path list or PERLLIB. This commit prepends the Perl library path of the current Git source tree to enable this. Note that this commit adds a hard-coded relative path use lib '../../perl/blib/lib'; instead of the flexible environment-based variant use lib (split(/:/, $ENV{GITPERLLIB})); which is used in tests written in Perl. The hard-coded variant is used because the whole performance test framework does it that way (and GITPERLLIB is not set there). Signed-off-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-03Merge branch 'ee/clean-remove-dirs'Libravatar Junio C Hamano1-0/+31
Replace "is this subdirectory a separate repository that should not be touched?" check "git clean" does by checking if it has .git/HEAD using the submodule-related code with a more optimized check. * ee/clean-remove-dirs: read_gitfile_gently: fix use-after-free clean: improve performance when removing lots of directories p7300: add performance tests for clean t7300: add tests to document behavior of clean and nested git setup: sanity check file size in read_gitfile_gently setup: add gentle version of read_gitfile
2015-06-26p5310: Fix broken && chain in performance testLibravatar Stefan Beller1-3/+3
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-15p7300: add performance tests for cleanLibravatar Erik Elfström1-0/+31
The tests are run in dry-run mode to avoid having to restore the test directories for each timed iteration. Using dry-run is an acceptable compromise since we are mostly interested in the initial computation of what to clean and not so much in the cleaning it self. Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-25perf-lib: fix ignored exit code inside loopLibravatar Jeff King1-1/+1
When copying the test repository, we try to detect whether the copy succeeded. However, most of the heavy lifting is done inside a for loop, where our "break" will lose the exit code of the failing "cp". We can take advantage of the fact that we are in a subshell, and just "exit 1" to break out with a code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-25Merge branch 'jk/repack-pack-writebitmaps-config'Libravatar Junio C Hamano1-0/+3
* jk/repack-pack-writebitmaps-config: t7700: drop explicit --no-pack-kept-objects from .keep test repack: introduce repack.writeBitmaps config option repack: simplify handling of --write-bitmap-index pack-objects: stop respecting pack.writebitmaps
2014-06-10repack: introduce repack.writeBitmaps config optionLibravatar Jeff King1-0/+3
We currently have pack.writeBitmaps, which originally operated at the pack-objects level. This should really have been a repack.* option from day one. Let's give it the more sensible name, but keep the old version as a deprecated synonym. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-23p5302-pack-index.sh: use the $( ... ) construct for command substitutionLibravatar Elia Pinto1-1/+1
The Git CodingGuidelines prefer the $(...) construct for command substitution instead of using the backquotes `...`. The backquoted form is the traditional method for command substitution, and is supported by POSIX. However, all but the simplest uses become complicated quickly. In particular, embedded command substitutions and/or the use of double quotes require careful escaping with the backslash character. The patch was generated by: for _f in $(find . -name "*.sh") do sed -i 's@`\(.*\)`@$(\1)@g' ${_f} done and then carefully proof-read. Signed-off-by: Elia Pinto <gitter.spiros@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27Merge branch 'jk/pack-bitmap'Libravatar Junio C Hamano1-0/+57
Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...
2014-01-27Merge branch 'jk/mark-edges-uninteresting'Libravatar Junio C Hamano1-0/+12
Fix performance regression in v1.8.4.x and later. * jk/mark-edges-uninteresting: list-objects: only look at cmdline trees with edge_hint t/perf: time rev-list with UNINTERESTING commits
2014-01-21t/perf: time rev-list with UNINTERESTING commitsLibravatar Jeff King1-0/+12
We time a straight "rev-list --all" and its "--object" counterpart, both going all the way to the root. However, we do not time a partial history walk. This patch adds an extreme case: a walk over a very small slice of history, but with a very large set of UNINTERESTING tips. This is similar to the connectivity check run by git on a small fetch, or the walk done by any pre-receive hooks that want to check incoming commits. This test reveals a performance regression in git v1.8.4.2, caused by fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting, 2013-08-16): Test fbd4a703^ fbd4a703 ------------------------------------------------------------------------------------------ 0001.1: rev-list --all 0.69(0.67+0.02) 0.69(0.68+0.01) +0.0% 0001.2: rev-list --all --objects 3.47(3.44+0.02) 3.48(3.44+0.03) +0.3% 0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0% 0001.5: rev-list --objects $commit --not --all 0.04(0.03+0.00) 0.27(0.24+0.02) +575.0% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30pack-bitmap: implement optional name_hash cacheLibravatar Vicent Marti1-1/+2
When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30t/perf: add tests for pack bitmapsLibravatar Jeff King1-0/+56
This adds a few basic perf tests for the pack bitmap code to show off its improvements. The tests are: 1. How long does it take to do a repack (it gets slower with bitmaps, since we have to do extra work)? 2. How long does it take to do a clone (it gets faster with bitmaps)? 3. How does a small fetch perform when we've just repacked? 4. How does a clone perform when we haven't repacked since a week of pushes? Here are results against linux.git: Test origin/master this tree ----------------------------------------------------------------------- 5310.2: repack to disk 33.64(32.64+2.04) 67.67(66.75+1.84) +101.2% 5310.3: simulated clone 30.49(29.47+2.05) 1.20(1.10+0.10) -96.1% 5310.4: simulated fetch 3.49(6.79+0.06) 5.57(22.35+0.07) +59.6% 5310.6: partial bitmap 36.70(43.87+1.81) 8.18(21.92+0.73) -77.7% You can see that we do take longer to repack, but we do way better for further clones. A small fetch performs a bit worse, as we spend way more time on delta compression (note the heavy user CPU time, as we have 8 threads) due to the lack of name hashes for the bitmapped objects. The final test shows how the bitmaps degrade over time between packs. There's still a significant speedup over the non-bitmap case, but we don't do quite as well (we have to spend time accessing the "new" objects the old fashioned way, including delta compression). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-27Merge branch 'tg/diff-no-index-refactor'Libravatar Junio C Hamano1-0/+22
"git diff ../else/where/A ../else/where/B" when ../else/where is clearly outside the repository, and "git diff --no-index A B", do not have to look at the index at all, but we used to read the index unconditionally. * tg/diff-no-index-refactor: diff: avoid some nesting diff: add test for --no-index executed outside repo diff: don't read index when --no-index is given diff: move no-index detection to builtin/diff.c
2013-12-12diff: don't read index when --no-index is givenLibravatar Thomas Gummerer1-0/+22
git diff --no-index ... currently reads the index, during setup, when calling gitmodules_config(). This results in worse performance when the index is not actually needed. This patch avoids calling gitmodules_config() when the --no-index option is given. The times for executing "git diff --no-index" in the WebKit repository are improved as follows: Test HEAD~3 HEAD ------------------------------------------------------------------ 4001.1: diff --no-index 0.24(0.15+0.09) 0.01(0.00+0.00) -95.8% An additional improvement of this patch is that "git diff --no-index" no longer breaks when the index file is corrupt, which makes it possible to use it for investigating the broken repository. To improve the possible usage as investigation tool for broken repositories, setup_git_directory_gently() is also not called when the --no-index option is given. Also add a test to guard against future breakages, and a performance test to show the improvements. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-26test: replace shebangs with descriptions in shell librariesLibravatar Jonathan Nieder1-1/+3
A #! line in these files is misleading, since these scriptlets are meant to be sourced with '.' (using whatever shell sources them) instead of run directly using the interpreter named on the #! line. Removing the #! line shouldn't hurt syntax highlighting since these files have filenames ending with '.sh'. For documentation, add a brief description of how the files are meant to be used in place of the shebang line. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01Merge branch 'lf/echo-n-is-not-portable'Libravatar Junio C Hamano1-2/+2
* lf/echo-n-is-not-portable: Avoid using `echo -n` anywhere
2013-07-29Avoid using `echo -n` anywhereLibravatar Lukas Fleischer1-2/+2
`echo -n` is non-portable. The POSIX specification says: Conforming applications that wish to do prompting without <newline> characters or that could possibly be expecting to echo a -n, should use the printf utility derived from the Ninth Edition system. Since all of the affected shell scripts use a POSIX shell shebang, replace `echo -n` invocations with printf. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-05Merge branch 'tr/test-v-and-v-subtest-only'Libravatar Junio C Hamano1-1/+2
Allows N instances of tests run in parallel, each running 1/N parts of the test suite under Valgrind, to speed things up. * tr/test-v-and-v-subtest-only: perf-lib: fix start/stop of perf tests test-lib: support running tests under valgrind in parallel test-lib: allow prefixing a custom string before "ok N" etc. test-lib: valgrind for only tests matching a pattern test-lib: verbose mode for only tests matching a pattern test-lib: self-test that --verbose works test-lib: rearrange start/end of test_expect_* and test_skip test-lib: refactor $GIT_SKIP_TESTS matching test-lib: enable MALLOC_* for the actual tests
2013-06-29perf-lib: fix start/stop of perf testsLibravatar Thomas Gummerer1-1/+2
ae75342 test-lib: rearrange start/end of test_expect_* and test_skip changed the way tests are started/stopped, but did not update the perf tests. They were therefore giving the wrong output, because of the wrong test count. Fix this by starting and stopping the tests correctly. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Acked-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-22Documentation: Update 'linux-2.6.git' -> 'linux.git'Libravatar W. Trevor King1-1/+1
The 3.x tree has been out for a while now. The -2.6 repository name survived the initial release [1], but kernel.org now only lists 'linux.git' (for aegl as well as torvalds) [2]. [1]: http://article.gmane.org/gmane.linux.kernel/1147422 On 2011-05-30 01:47:57 GMT, Linus Torvalds wrote: > ... yes, that means that my git tree is still called > "linux-2.6.git" on kernel.org. [2]: http://git.kernel.org/cgit/ Signed-off-by: W. Trevor King <wking@tremily.us> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-20Merge branch 'rs/discard-index-discard-array'Libravatar Junio C Hamano1-0/+14
* rs/discard-index-discard-array: read-cache: free cache in discard_index read-cache: add simple performance test
2013-06-09read-cache: add simple performance testLibravatar René Scharfe1-0/+14
Add the helper test-read-cache, which can be used to call read_cache and discard_cache in a loop as well as a performance check based on it. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02Merge branch 'tr/line-log'Libravatar Junio C Hamano1-0/+34
* tr/line-log: git-log(1): remove --full-line-diff description line-log: fix documentation formatting log -L: improve comments in process_all_files() log -L: store the path instead of a diff_filespec log -L: test merge of parallel modify/rename t4211: pass -M to 'git log -M -L...' test log -L: fix overlapping input ranges log -L: check range set invariants when we look it up Speed up log -L... -M log -L: :pattern:file syntax to find by funcname Implement line-history search (git log -L) Export rewrite_parents() for 'log -L' Refactor parse_loc
2013-03-28Implement line-history search (git log -L)Libravatar Thomas Rast1-0/+34
This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-09perf: update documentation of GIT_PERF_REPEAT_COUNTLibravatar Antoine Pelisse1-1/+1
Currently the documentation of GIT_PERF_REPEAT_COUNT says the default is five while "perf-lib.sh" uses a value of three as a default. Update the documentation so that it is consistent with the code. Signed-off-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-01Merge branch 'ep/malloc-check-perturb'Libravatar Junio C Hamano1-1/+1
Fixes a brown-paper bag bug. * ep/malloc-check-perturb: MALLOC_CHECK: enable it, unless disabled explicitly
2012-09-26MALLOC_CHECK: enable it, unless disabled explicitlyLibravatar René Scharfe1-1/+1
The malloc checks in tests are currently disabled. Actually evaluate the variable for turning them off and enable them if it's unset. Also use this opportunity to give it the more descriptive and consistent name TEST_NO_MALLOC_CHECK. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-25Merge branch 'ep/malloc-check-perturb'Libravatar Junio C Hamano1-0/+1
Run our test scripts with MALLOC_CHECK_ and MALLOC_PERTURB_, the built-in memory access checking facility GNU libc has. * ep/malloc-check-perturb: MALLOC_CHECK: various clean-ups Add MALLOC_CHECK_ and MALLOC_PERTURB_ libc env to the test suite for detecting heap corruption
2012-09-17MALLOC_CHECK: various clean-upsLibravatar Junio C Hamano1-0/+1
The most important in this change is to avoid affecting anything when test-lib is used from perf-lib. It also limits the effect of the MALLOC_CHECK only to what is run inside the actual test, and uses a fixed MALLOC_PERTURB_ in order to avoid hurting repeatability of the tests. Signed-off-by: Junio C Hamano <gitster@pobox.com>