diff options
author | SZEDER Gábor <szeder.dev@gmail.com> | 2020-05-11 11:56:18 +0000 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2020-05-11 09:33:56 -0700 |
commit | 002933f3fe2b016022ebbbbb359f6aeba58309a4 (patch) | |
tree | 57dd6eabb38e91914325e4d45aef79406ad54246 /bloom.h | |
parent | line-log: more responsive, incremental 'git log -L' (diff) | |
download | tgif-002933f3fe2b016022ebbbbb359f6aeba58309a4.tar.xz |
line-log: try to use generation number-based topo-ordering
The previous patch made it possible to perform line-level filtering
during history traversal instead of in an expensive preprocessing
step, but it still requires some simpler preprocessing steps, notably
topo-ordering. However, nowadays we have commit-graphs storing
generation numbers, which make it possible to incrementally traverse
the history in topological order, without the preparatory limit_list()
and sort_in_topological_order() steps; see b45424181e (revision.c:
generation-based topo-order algorithm, 2018-11-01).
This patch combines the two, so we can do both the topo-ordering and
the line-level filtering during history traversal, eliminating even
those simpler preprocessing steps, and thus further reducing the delay
before showing the first commit modifying the given line range.
The 'revs->limited' flag plays the central role in this, because, due
to limitations of the current implementation, the generation
number-based topo-ordering is only enabled when this flag remains
unset. Line-level log, however, always sets this flag in
setup_revisions() ever since the feature was introduced in 12da1d1f6f
(Implement line-history search (git log -L), 2013-03-28). The reason
for setting 'limited' is unclear, though, because the line-level log
itself doesn't directly depend on it, and it doesn't affect how the
limit_list() function limits the revision range. However, there is an
indirect dependency: the line-level log requires topo-ordering, and
the "traditional" sort_in_topological_order() requires an already
limited commit list since e6c3505b44 (Make sure we generate the whole
commit list before trying to sort it topologically, 2005-07-06). The
new, generation numbers-based topo-ordering doesn't require a limited
commit list anymore.
So don't set 'revs->limited' for line-level log, unless it is really
necessary, namely:
- The user explicitly requested parent rewriting, because that is
still done in the line_log_filter() preprocessing step (see
previous patch), which requires sort_in_topological_order() and in
turn limit_list() as well.
- A commit-graph file is not available or it doesn't yet contain
generation numbers. In these cases we had to fall back on
sort_in_topological_order() and in turn limit_list(). The
existing condition with generation_numbers_enabled() has already
ensured that the 'limited' flag is set in these cases; this patch
just makes sure that the line-level log sets 'revs->topo_order'
before that condition.
While the reduced delay before showing the first commit is measurable
in git.git, it takes a bigger repository to make it clearly noticable.
In both cases below the line ranges were chosen so that they were
modified rather close to the starting revisions, so the effect of this
change is most noticable.
# git.git
$ time git --no-pager log -L:read_alternate_refs:sha1-file.c -1 v2.23.0
Before:
real 0m0.107s
user 0m0.091s
sys 0m0.013s
After:
real 0m0.058s
user 0m0.050s
sys 0m0.005s
# linux.git
$ time git --no-pager log \
-L:build_restore_work_registers:arch/mips/mm/tlbex.c -1 v5.2
Before:
real 0m1.129s
user 0m1.061s
sys 0m0.069s
After:
real 0m0.096s
user 0m0.087s
sys 0m0.009s
Additional testing by Derrick Stolee: Since this patch improves
the performance for the first result, I repeated the experiment
from the previous patch on the Linux kernel repository, reporting
real time here:
Command: git log -L 100,200:MAINTAINERS -n 1 >/dev/null
Before: 0.71 s
After: 0.05 s
Now, we have dropped the full topo-order of all ~910,000 commits
before reporting the first result. The remaining performance
improvements then are:
1. Update the parent-rewriting logic to be incremental similar to
how "git log --graph" behaves.
2. Use changed-path Bloom filters to reduce the time spend in the
tree-diff to see if the path(s) changed.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'bloom.h')
0 files changed, 0 insertions, 0 deletions