diff options
author | Jeff King <peff@peff.net> | 2017-02-08 15:53:10 -0500 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2017-02-08 15:39:55 -0800 |
commit | ab6eea6f7b9a5289d72c05476da19ab2bb457fd3 (patch) | |
tree | 70640ab6aea899442df3e67d3d5cf32e24ca7758 | |
parent | add oidset API (diff) | |
download | tgif-ab6eea6f7b9a5289d72c05476da19ab2bb457fd3.tar.xz |
receive-pack: use oidset to de-duplicate .have lines
If you have an alternate object store with a very large
number of refs, the peak memory usage of the sha1_array can
grow high, even if most of them are duplicates that end up
not being printed at all.
The similar for_each_alternate_ref() code-paths in
fetch-pack solve this by using flags in "struct object" to
de-duplicate (and so are relying on obj_hash at the core).
But we don't have a "struct object" at all in this case. We
could call lookup_unknown_object() to get one, but if our
goal is reducing memory footprint, it's not great:
- an unknown object is as large as the largest object type
(a commit), which is bigger than an oidset entry
- we can free the memory after our ref advertisement, but
"struct object" entries persist forever (and the
receive-pack may hang around for a long time, as the
bottleneck is often client upload bandwidth).
So let's use an oidset. Note that unlike a sha1-array it
doesn't sort the output as a side effect. However, our
output is at least stable, because for_each_alternate_ref()
will give us the sha1s in ref-sorted order.
In one particularly pathological case with an alternate that
has 60,000 unique refs out of 80 million total, this reduced
the peak heap usage of "git receive-pack . </dev/null" from
13GB to 14MB.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
-rw-r--r-- | builtin/receive-pack.c | 26 |
1 files changed, 12 insertions, 14 deletions
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index d21332d9e7..a4926fcfb4 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -21,6 +21,7 @@ #include "sigchain.h" #include "fsck.h" #include "tmp-objdir.h" +#include "oidset.h" static const char * const receive_pack_usage[] = { N_("git receive-pack <git-dir>"), @@ -271,27 +272,24 @@ static int show_ref_cb(const char *path_full, const struct object_id *oid, return 0; } -static int show_one_alternate_sha1(const unsigned char sha1[20], void *unused) +static void show_one_alternate_ref(const char *refname, + const struct object_id *oid, + void *data) { - show_ref(".have", sha1); - return 0; -} + struct oidset *seen = data; -static void collect_one_alternate_ref(const char *refname, - const struct object_id *oid, - void *data) -{ - struct sha1_array *sa = data; - sha1_array_append(sa, oid->hash); + if (oidset_insert(seen, oid)) + return; + + show_ref(".have", oid->hash); } static void write_head_info(void) { - struct sha1_array sa = SHA1_ARRAY_INIT; + static struct oidset seen = OIDSET_INIT; - for_each_alternate_ref(collect_one_alternate_ref, &sa); - sha1_array_for_each_unique(&sa, show_one_alternate_sha1, NULL); - sha1_array_clear(&sa); + for_each_alternate_ref(show_one_alternate_ref, &seen); + oidset_clear(&seen); for_each_ref(show_ref_cb, NULL); if (!sent_capabilities) show_ref("capabilities^{}", null_sha1); |