From 15f07e061e272079229d1ab2799d8e7a4f65213f Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 12 Jan 2012 17:32:34 -0500 Subject: thin-pack: try harder to use preferred base objects as base When creating a pack using objects that reside in existing packs, we try to avoid recomputing futile delta between an object (trg) and a candidate for its base object (src) if they are stored in the same packfile, and trg is not recorded as a delta already. This heuristics makes sense because it is likely that we tried to express trg as a delta based on src but it did not produce a good delta when we created the existing pack. As the pack heuristics prefer producing delta to remove data, and Linus's law dictates that the size of a file grows over time, we tend to record the newest version of the file as inflated, and older ones as delta against it. When creating a thin-pack to transfer recent history, it is likely that we will try to send an object that is recorded in full, as it is newer. But the heuristics to avoid recomputing futile delta effectively forbids us from attempting to express such an object as a delta based on another object. Sending an object in full is often more expensive than sending a suboptimal delta based on other objects, and it is even more so if we could use an object we know the receiving end already has (i.e. preferred base object) as the delta base. Tweak the recomputation avoidance logic, so that we do not punt on computing delta against a preferred base object. The effect of this change can be seen on two simulated upload-pack workloads. The first is based on 44 reflog entries from my git.git origin/master reflog, and represents the packs that kernel.org sent me git updates for the past month or two. The second workload represents much larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to v1.2.0, and so on. The table below shows the average generated pack size and the average CPU time consumed for each dataset, both before and after the patch: dataset | reflog | tags --------------------------------- before | 53358 | 2750977 size after | 32398 | 2668479 change | -39% | -3% --------------------------------- before | 0.18 | 1.12 CPU after | 0.18 | 1.15 change | +0% | +3% This patch makes a much bigger difference for packs with a shorter slice of history (since its effect is seen at the boundaries of the pack) though it has some benefit even for larger packs. Signed-off-by: Jeff King Acked-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c6e2d8766b..8bfe3a6ffb 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1248,11 +1248,16 @@ static int try_delta(struct unpacked *trg, struct unpacked *src, return -1; /* - * We do not bother to try a delta that we discarded - * on an earlier try, but only when reusing delta data. + * We do not bother to try a delta that we discarded on an + * earlier try, but only when reusing delta data. Note that + * src_entry that is marked as the preferred_base should always + * be considered, as even if we produce a suboptimal delta against + * it, we will still save the transfer cost, as we already know + * the other side has it and we won't send src_entry at all. */ if (reuse_delta && trg_entry->in_pack && trg_entry->in_pack == src_entry->in_pack && + !src_entry->preferred_base && trg_entry->in_pack_type != OBJ_REF_DELTA && trg_entry->in_pack_type != OBJ_OFS_DELTA) return 0; -- cgit v1.2.3 From 04f6785a089e552585ba022f9d9054eca385ca67 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Thu, 12 Jan 2012 23:30:53 -0800 Subject: Update draft release notes to 1.7.6.6 Signed-off-by: Junio C Hamano --- Documentation/RelNotes/1.7.6.6.txt | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/Documentation/RelNotes/1.7.6.6.txt b/Documentation/RelNotes/1.7.6.6.txt index 13ce2dc2d7..5343e00400 100644 --- a/Documentation/RelNotes/1.7.6.6.txt +++ b/Documentation/RelNotes/1.7.6.6.txt @@ -8,4 +8,9 @@ Fixes since v1.7.6.5 directory when two paths in question are in adjacent directories and the name of the one directory is a prefix of the other. + * When producing a "thin pack" (primarily used in bundles and smart + HTTP transfers) out of a fully packed repository, we unnecessarily + avoided sending recent objects as a delta against objects we know + the other side has. + Also contains minor fixes and documentation updates. -- cgit v1.2.3 From 8f83acf77cd14567dfdeff0e15f2da086109df70 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Thu, 12 Jan 2012 23:31:41 -0800 Subject: Update draft release notes to 1.7.7.6 Signed-off-by: Junio C Hamano --- Documentation/RelNotes/1.7.7.6.txt | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/Documentation/RelNotes/1.7.7.6.txt b/Documentation/RelNotes/1.7.7.6.txt index 065ed2ad6c..b8b86ebc61 100644 --- a/Documentation/RelNotes/1.7.7.6.txt +++ b/Documentation/RelNotes/1.7.7.6.txt @@ -8,4 +8,9 @@ Fixes since v1.7.7.5 directory when two paths in question are in adjacent directories and the name of the one directory is a prefix of the other. + * When producing a "thin pack" (primarily used in bundles and smart + HTTP transfers) out of a fully packed repository, we unnecessarily + avoided sending recent objects as a delta against objects we know + the other side has. + Also contains minor fixes and documentation updates. -- cgit v1.2.3