tgif.git - Terin's Improved Git Fork

diff options

author	Elijah Newren <newren@gmail.com>	2021-06-22 08:04:40 +0000
committer	Junio C Hamano <gitster@pobox.com>	2021-06-28 07:58:25 -0700
commit	1aedd03afb1592be9e4fbacdc614f45a99a865ea (patch)
tree	9a07d7c99cb0e8881b2e883fc5365faa24bc6681 /t/t5100-mailinfo.sh
parent	diffcore-rename: allow different missing_object_cb functions (diff)
download	tgif-1aedd03afb1592be9e4fbacdc614f45a99a865ea.tar.xz

diffcore-rename: use a different prefetch for basename comparisons

merge-ort was designed to minimize the amount of data needed and used, and several changes were made to diffcore-rename to take advantage of extra metadata to enable this data minimization (particularly the relevant_sources variable for skipping "irrelevant" renames). This effort obviously succeeded in drastically reducing computation times, but should also theoretically allow partial clones to download much less information. Previously, though, the "prefetch" command used in diffcore-rename had never been modified and downloaded many blobs that were unnecessary for merge-ort. This commit corrects that. When doing basename comparisons, we want to fetch only the objects that will be used for basename comparisons. If after basename fetching this leaves us with no more relevant sources (or no more destinations), then we won't need to do the full inexact rename detection and can skip downloading additional source and destination files. Even if we have to do that later full inexact rename detection, irrelevant sources are culled after basename matching and before the full inexact rename detection, so we can still avoid downloading the blobs for irrelevant sources. Rename prefetch() to inexact_prefetch(), and introduce a new basename_prefetch() to take advantage of this. If we modify the testcase from commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2021-01-23) to pass --sparse --filter=blob:none to the clone command, and use the new trace2 "fetch_count" output from a few commits ago to track both the number of fetch subcommands invoked and the number of objects fetched across all those fetches, then for the mega-renames testcase we observe the following: BEFORE this commit, rebasing 35 patches: strategy # of fetches total # of objects fetched --------- ------------ -------------------------- recursive 62 11423 ort 30 11391 AFTER this commit, rebasing the same 35 patches: ort 32 63 This means that the new code only needs to download less than 2 blobs per patch being rebased. That is especially interesting given that the repository at the start only had approximately half a dozen TOTAL blobs downloaded to start with (because the default sparse-checkout of just the toplevel directory was in use). So, for this particular linux kernel testcase that involved ~26,000 renames on the upstream side (drivers/ -> pilots/) across which 35 patches were being rebased, this change reduces the number of blobs that need to be downloaded by a factor of ~180. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Diffstat (limited to 't/t5100-mailinfo.sh')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: