diff options
author | Elijah Newren <newren@gmail.com> | 2021-07-16 05:22:36 +0000 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2021-07-20 14:47:40 -0700 |
commit | 7bee6c100431e5c24cf27b5a7cfc9f67c8c66751 (patch) | |
tree | 39e62b5f70aa517d03b851bb5795bbee65ac1f6f /fsck.c | |
parent | merge-ort: defer recursing into directories when merge base is matched (diff) | |
download | tgif-7bee6c100431e5c24cf27b5a7cfc9f67c8c66751.tar.xz |
merge-ort: avoid recursing into directories when we don't need to
This combines the work of the last several patches, and implements the
conditions when we don't need to recurse into directories. It's perhaps
easiest to see the logic by separating the fact that a directory might
have both rename sources and rename destinations:
* rename sources: only files present in the merge base can serve as
rename sources, and only when one side deletes that file. When the
tree on one side matches the merge base, that means every file
within the subtree matches the merge base. This means that the
skip-irrelevant-rename-detection optimization from before kicks in
and we don't need any of these files as rename sources.
* rename destinations: the tree that does not match the merge base
might have newly added and hence unmatched destination files.
This is what usually prevents us from doing trivial directory
resolutions in the merge machinery. However, the fact that we have
deferred recursing into this directory until the end means we know
whether there are any unmatched relevant potential rename sources
elsewhere in this merge. If there are no unmatched such relevant
sources anywhere, then there is no need to look for unmatched
potential rename destinations to match them with.
This informs our algorithm:
* Search through relevant_sources; if we have entries, they better all
be reflected in cached_pairs or cached_irrelevant, otherwise they
represent an unmatched potential rename source (causing the
optimization to be disallowed).
* For any relevant_source represented in cached_pairs, we do need to
to make sure to get the destination for each source, meaning we need
to recurse into any ancestor directories of those destinations.
* Once we've recursed into all the rename destinations for any
relevant_sources in cached_pairs, we can then do the trivial
directory resolution for the remaining directories.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.235 s ± 0.042 s 205.1 ms ± 3.8 ms
mega-renames: 9.419 s ± 0.107 s 1.564 s ± 0.010 s
just-one-mega: 480.1 ms ± 3.9 ms 479.5 ms ± 3.9 ms
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'fsck.c')
0 files changed, 0 insertions, 0 deletions