diff options
author | Elijah Newren <newren@gmail.com> | 2021-06-22 08:04:40 +0000 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2021-06-28 07:58:25 -0700 |
commit | 1aedd03afb1592be9e4fbacdc614f45a99a865ea (patch) | |
tree | 9a07d7c99cb0e8881b2e883fc5365faa24bc6681 /t/t5100-mailinfo.sh | |
parent | diffcore-rename: allow different missing_object_cb functions (diff) | |
download | tgif-1aedd03afb1592be9e4fbacdc614f45a99a865ea.tar.xz |
diffcore-rename: use a different prefetch for basename comparisons
merge-ort was designed to minimize the amount of data needed and used,
and several changes were made to diffcore-rename to take advantage of
extra metadata to enable this data minimization (particularly the
relevant_sources variable for skipping "irrelevant" renames). This
effort obviously succeeded in drastically reducing computation times,
but should also theoretically allow partial clones to download much less
information. Previously, though, the "prefetch" command used in
diffcore-rename had never been modified and downloaded many blobs that
were unnecessary for merge-ort. This commit corrects that.
When doing basename comparisons, we want to fetch only the objects that
will be used for basename comparisons. If after basename fetching this
leaves us with no more relevant sources (or no more destinations), then
we won't need to do the full inexact rename detection and can skip
downloading additional source and destination files. Even if we have to
do that later full inexact rename detection, irrelevant sources are
culled after basename matching and before the full inexact rename
detection, so we can still avoid downloading the blobs for irrelevant
sources. Rename prefetch() to inexact_prefetch(), and introduce a
new basename_prefetch() to take advantage of this.
If we modify the testcase from commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2021-01-23)
to pass
--sparse --filter=blob:none
to the clone command, and use the new trace2 "fetch_count" output from
a few commits ago to track both the number of fetch subcommands invoked
and the number of objects fetched across all those fetches, then for
the mega-renames testcase we observe the following:
BEFORE this commit, rebasing 35 patches:
strategy # of fetches total # of objects fetched
--------- ------------ --------------------------
recursive 62 11423
ort 30 11391
AFTER this commit, rebasing the same 35 patches:
ort 32 63
This means that the new code only needs to download less than 2 blobs
per patch being rebased. That is especially interesting given that the
repository at the start only had approximately half a dozen TOTAL blobs
downloaded to start with (because the default sparse-checkout of just
the toplevel directory was in use).
So, for this particular linux kernel testcase that involved ~26,000
renames on the upstream side (drivers/ -> pilots/) across which 35
patches were being rebased, this change reduces the number of blobs that
need to be downloaded by a factor of ~180.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t5100-mailinfo.sh')
0 files changed, 0 insertions, 0 deletions