summaryrefslogtreecommitdiff
path: root/t/t4216-log-bloom.sh
diff options
context:
space:
mode:
authorLibravatar SZEDER Gábor <szeder.dev@gmail.com>2020-07-01 13:27:30 +0000
committerLibravatar Junio C Hamano <gitster@pobox.com>2020-07-01 14:17:43 -0700
commitc525ce95b46b34f344c360dbef036cec3ea08e53 (patch)
treeb0b01a43d35892b89fd866ac8d0e3943c7aa15a9 /t/t4216-log-bloom.sh
parentrevision: empty pathspecs should not use Bloom filters (diff)
downloadtgif-c525ce95b46b34f344c360dbef036cec3ea08e53.tar.xz
commit-graph: check all leading directories in changed path Bloom filters
The file 'dir/subdir/file' can only be modified if its leading directories 'dir' and 'dir/subdir' are modified as well. So when checking modified path Bloom filters looking for commits modifying a path with multiple path components, then check not only the full path in the Bloom filters, but all its leading directories as well. Take care to check these paths in "deepest first" order, because it's the full path that is least likely to be modified, and the Bloom filter queries can short circuit sooner. This can significantly reduce the average false positive rate, by about an order of magnitude or three(!), and can further speed up pathspec-limited revision walks. The table below compares the average false positive rate and runtime of git rev-list HEAD -- "$path" before and after this change for 5000+ randomly* selected paths from each repository: Average false Average Average positive rate runtime runtime before after before after difference ------------------------------------------------------------------ git 3.220% 0.7853% 0.0558s 0.0387s -30.6% linux 2.453% 0.0296% 0.1046s 0.0766s -26.8% tensorflow 2.536% 0.6977% 0.0594s 0.0420s -29.2% *Path selection was done with the following pipeline: git ls-tree -r --name-only HEAD | sort -R | head -n 5000 The improvements in runtime are much smaller than the improvements in average false positive rate, as we are clearly reaching diminishing returns here. However, all these timings depend on that accessing tree objects is reasonably fast (warm caches). If we had a partial clone and the tree objects had to be fetched from a promisor remote, e.g.: $ git clone --filter=tree:0 --bare file://.../webkit.git webkit.notrees.git $ git -C webkit.git -c core.modifiedPathBloomFilters=1 \ commit-graph write --reachable $ cp webkit.git/objects/info/commit-graph webkit.notrees.git/objects/info/ $ git -C webkit.notrees.git -c core.modifiedPathBloomFilters=1 \ rev-list HEAD -- "$path" then checking all leading path component can reduce the runtime from over an hour to a few seconds (and this is with the clone and the promisor on the same machine). This adjusts the tracing values in t4216-log-bloom.sh, which provides a concrete way to notice the improvement. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t4216-log-bloom.sh')
-rwxr-xr-xt/t4216-log-bloom.sh2
1 files changed, 1 insertions, 1 deletions
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 29338f36bf..4892364e74 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -146,7 +146,7 @@ test_expect_success 'setup - add commit-graph to the chain with Bloom filters' '
test_bloom_filters_used_when_some_filters_are_missing () {
log_args=$1
- bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":8,\"definitely_not\":6"
+ bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":8"
setup "$log_args" &&
grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" &&
test_cmp log_wo_bloom log_w_bloom