tgif.git - Terin's Improved Git Fork

diff options

author	Derrick Stolee <dstolee@microsoft.com>	2020-10-09 20:53:51 +0000
committer	Junio C Hamano <gitster@pobox.com>	2020-10-09 14:16:23 -0700
commit	150f11574bcb53a6880a03593ff46d6b25fa03a1 (patch)
tree	b884e1a57ca98dfa0365fed31e5cebf3a2e804ff /patch-ids.h
parent	Git 2.28 (diff)
download	tgif-150f11574bcb53a6880a03593ff46d6b25fa03a1.tar.xz

commit-graph: ignore duplicates when merging layers

Thomas reported [1] that a "git fetch" command was failing with an error saying "unexpected duplicate commit id". The root cause is that they had fetch.writeCommitGraph enabled which generates commit-graph chains, and this instance was merging two layers that both contained the same commit ID. [1] https://lore.kernel.org/git/55f8f00c-a61c-67d4-889e-a9501c596c39@virtuell-zuhause.de/ The initial assumption is that Git would not write a commit ID into a commit-graph layer if it already exists in a lower commit-graph layer. Somehow, this specific case did get into that situation, leading to this error. While unexpected, this isn't actually invalid (as long as the two layers agree on the metadata for the commit). When we parse a commit that does not have a graph_pos in the commit_graph_data_slab, we use binary search in the commit-graph layers to find the commit and set graph_pos. That position is never used again in this case. However, when we parse a commit from the commit-graph file, we load its parents from the commit-graph and assign graph_pos at that point. If those parents were already parsed from the commit-graph, then nothing needs to be done. Otherwise, this graph_pos is a valid position in the commit-graph so we can parse the parents, when necessary. Thus, this die() is too aggressive. The easiest thing to do would be to ignore the duplicates. If we only ignore the duplicates, then we will produce a commit-graph that has identical commit IDs listed in adjacent positions. This excess data will never be removed from the commit-graph, which could cascade into significantly bloated file sizes. Thankfully, we can collapse the list to erase the duplicate commit pointers. This allows us to get the end result we want without extra memory costs and minimal CPU time. The root cause is due to disabling core.commitGraph, which prevents parsing commits from the lower layers during a 'git commit-graph write --split' command. Since we use the 'graph_pos' value to determine whether a commit is in a lower layer, we never discover that those commits are already in the commit-graph chain and add them to the top layer. This layer is then merged down, creating duplicates. The test added in t5324-split-commit-graph.sh fails without this change. However, we still have not completely removed the need for this duplicate check. That will come in a follow-up change. Reported-by: Thomas Braun <thomas.braun@virtuell-zuhause.de> Helped-by: Taylor Blau <me@ttaylorr.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Diffstat (limited to 'patch-ids.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: