diff options
-rw-r--r-- | Documentation/config.txt | 19 | ||||
-rw-r--r-- | Documentation/git-pack-objects.txt | 97 | ||||
-rw-r--r-- | Documentation/git-repack.txt | 5 | ||||
-rw-r--r-- | Makefile | 1 | ||||
-rw-r--r-- | builtin/pack-objects.c | 137 | ||||
-rw-r--r-- | builtin/repack.c | 9 | ||||
-rw-r--r-- | delta-islands.c | 502 | ||||
-rw-r--r-- | delta-islands.h | 11 | ||||
-rw-r--r-- | pack-objects.c | 12 | ||||
-rw-r--r-- | pack-objects.h | 39 | ||||
-rwxr-xr-x | t/t5320-delta-islands.sh | 143 |
11 files changed, 932 insertions, 43 deletions
diff --git a/Documentation/config.txt b/Documentation/config.txt index 6ecd70df0a..112041f407 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2684,6 +2684,21 @@ Note that changing the compression level will not automatically recompress all existing objects. You can force recompression by passing the -F option to linkgit:git-repack[1]. +pack.island:: + An extended regular expression configuring a set of delta + islands. See "DELTA ISLANDS" in linkgit:git-pack-objects[1] + for details. + +pack.islandCore:: + Specify an island name which gets to have its objects be + packed first. This creates a kind of pseudo-pack at the front + of one pack, so that the objects from the specified island are + hopefully faster to copy into any pack that should be served + to a user requesting these objects. In practice this means + that the island specified should likely correspond to what is + the most commonly cloned in the repo. See also "DELTA ISLANDS" + in linkgit:git-pack-objects[1]. + pack.deltaCacheSize:: The maximum memory in bytes used for caching deltas in linkgit:git-pack-objects[1] before writing them out to a pack. @@ -3218,6 +3233,10 @@ repack.packKeptObjects:: index is being written (either via `--write-bitmap-index` or `repack.writeBitmaps`). +repack.useDeltaIslands:: + If set to true, makes `git repack` act as if `--delta-islands` + was passed. Defaults to `false`. + repack.writeBitmaps:: When true, git will write a bitmap index when packing all objects to disk (e.g., when `git repack -a` is run). This diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index d95b472d16..40c825c381 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -289,6 +289,103 @@ Unexpected missing object will raise an error. --unpack-unreachable:: Keep unreachable objects in loose form. This implies `--revs`. +--delta-islands:: + Restrict delta matches based on "islands". See DELTA ISLANDS + below. + + +DELTA ISLANDS +------------- + +When possible, `pack-objects` tries to reuse existing on-disk deltas to +avoid having to search for new ones on the fly. This is an important +optimization for serving fetches, because it means the server can avoid +inflating most objects at all and just send the bytes directly from +disk. This optimization can't work when an object is stored as a delta +against a base which the receiver does not have (and which we are not +already sending). In that case the server "breaks" the delta and has to +find a new one, which has a high CPU cost. Therefore it's important for +performance that the set of objects in on-disk delta relationships match +what a client would fetch. + +In a normal repository, this tends to work automatically. The objects +are mostly reachable from the branches and tags, and that's what clients +fetch. Any deltas we find on the server are likely to be between objects +the client has or will have. + +But in some repository setups, you may have several related but separate +groups of ref tips, with clients tending to fetch those groups +independently. For example, imagine that you are hosting several "forks" +of a repository in a single shared object store, and letting clients +view them as separate repositories through `GIT_NAMESPACE` or separate +repos using the alternates mechanism. A naive repack may find that the +optimal delta for an object is against a base that is only found in +another fork. But when a client fetches, they will not have the base +object, and we'll have to find a new delta on the fly. + +A similar situation may exist if you have many refs outside of +`refs/heads/` and `refs/tags/` that point to related objects (e.g., +`refs/pull` or `refs/changes` used by some hosting providers). By +default, clients fetch only heads and tags, and deltas against objects +found only in those other groups cannot be sent as-is. + +Delta islands solve this problem by allowing you to group your refs into +distinct "islands". Pack-objects computes which objects are reachable +from which islands, and refuses to make a delta from an object `A` +against a base which is not present in all of `A`'s islands. This +results in slightly larger packs (because we miss some delta +opportunities), but guarantees that a fetch of one island will not have +to recompute deltas on the fly due to crossing island boundaries. + +When repacking with delta islands the delta window tends to get +clogged with candidates that are forbidden by the config. Repacking +with a big --window helps (and doesn't take as long as it otherwise +might because we can reject some object pairs based on islands before +doing any computation on the content). + +Islands are configured via the `pack.island` option, which can be +specified multiple times. Each value is a left-anchored regular +expressions matching refnames. For example: + +------------------------------------------- +[pack] +island = refs/heads/ +island = refs/tags/ +------------------------------------------- + +puts heads and tags into an island (whose name is the empty string; see +below for more on naming). Any refs which do not match those regular +expressions (e.g., `refs/pull/123`) is not in any island. Any object +which is reachable only from `refs/pull/` (but not heads or tags) is +therefore not a candidate to be used as a base for `refs/heads/`. + +Refs are grouped into islands based on their "names", and two regexes +that produce the same name are considered to be in the same +island. The names are computed from the regexes by concatenating any +capture groups from the regex, with a '-' dash in between. (And if +there are no capture groups, then the name is the empty string, as in +the above example.) This allows you to create arbitrary numbers of +islands. Only up to 14 such capture groups are supported though. + +For example, imagine you store the refs for each fork in +`refs/virtual/ID`, where `ID` is a numeric identifier. You might then +configure: + +------------------------------------------- +[pack] +island = refs/virtual/([0-9]+)/heads/ +island = refs/virtual/([0-9]+)/tags/ +island = refs/virtual/([0-9]+)/(pull)/ +------------------------------------------- + +That puts the heads and tags for each fork in their own island (named +"1234" or similar), and the pull refs for each go into their own +"1234-pull". + +Note that we pick a single island for each regex to go into, using "last +one wins" ordering (which allows repo-specific config to take precedence +over user-wide config, and so forth). + SEE ALSO -------- linkgit:git-rev-list[1] diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index d056250968..aa0cc8bd44 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -160,6 +160,11 @@ depth is 4095. being removed. In addition, any unreachable loose objects will be packed (and their loose counterparts removed). +-i:: +--delta-islands:: + Pass the `--delta-islands` option to `git-pack-objects`, see + linkgit:git-pack-objects[1]. + Configuration ------------- @@ -850,6 +850,7 @@ LIB_OBJS += csum-file.o LIB_OBJS += ctype.o LIB_OBJS += date.o LIB_OBJS += decorate.o +LIB_OBJS += delta-islands.o LIB_OBJS += diffcore-break.o LIB_OBJS += diffcore-delta.o LIB_OBJS += diffcore-order.o diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index f434e1fb0c..425bdc8ac5 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -24,6 +24,7 @@ #include "streaming.h" #include "thread-utils.h" #include "pack-bitmap.h" +#include "delta-islands.h" #include "reachable.h" #include "sha1-array.h" #include "argv-array.h" @@ -62,6 +63,7 @@ static struct packing_data to_pack; static struct pack_idx_entry **written_list; static uint32_t nr_result, nr_written, nr_seen; static struct bitmap_index *bitmap_git; +static uint32_t write_layer; static int non_empty; static int reuse_delta = 1, reuse_object = 1; @@ -97,6 +99,8 @@ static uint16_t write_bitmap_options; static int exclude_promisor_objects; +static int use_delta_islands; + static unsigned long delta_cache_size = 0; static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE; static unsigned long cache_max_small_delta_size = 1000; @@ -616,7 +620,7 @@ static inline void add_to_write_order(struct object_entry **wo, unsigned int *endp, struct object_entry *e) { - if (e->filled) + if (e->filled || oe_layer(&to_pack, e) != write_layer) return; wo[(*endp)++] = e; e->filled = 1; @@ -676,48 +680,15 @@ static void add_family_to_write_order(struct object_entry **wo, add_descendants_to_write_order(wo, endp, root); } -static struct object_entry **compute_write_order(void) +static void compute_layer_order(struct object_entry **wo, unsigned int *wo_end) { - unsigned int i, wo_end, last_untagged; - - struct object_entry **wo; + unsigned int i, last_untagged; struct object_entry *objects = to_pack.objects; for (i = 0; i < to_pack.nr_objects; i++) { - objects[i].tagged = 0; - objects[i].filled = 0; - SET_DELTA_CHILD(&objects[i], NULL); - SET_DELTA_SIBLING(&objects[i], NULL); - } - - /* - * Fully connect delta_child/delta_sibling network. - * Make sure delta_sibling is sorted in the original - * recency order. - */ - for (i = to_pack.nr_objects; i > 0;) { - struct object_entry *e = &objects[--i]; - if (!DELTA(e)) - continue; - /* Mark me as the first child */ - e->delta_sibling_idx = DELTA(e)->delta_child_idx; - SET_DELTA_CHILD(DELTA(e), e); - } - - /* - * Mark objects that are at the tip of tags. - */ - for_each_tag_ref(mark_tagged, NULL); - - /* - * Give the objects in the original recency order until - * we see a tagged tip. - */ - ALLOC_ARRAY(wo, to_pack.nr_objects); - for (i = wo_end = 0; i < to_pack.nr_objects; i++) { if (objects[i].tagged) break; - add_to_write_order(wo, &wo_end, &objects[i]); + add_to_write_order(wo, wo_end, &objects[i]); } last_untagged = i; @@ -726,7 +697,7 @@ static struct object_entry **compute_write_order(void) */ for (; i < to_pack.nr_objects; i++) { if (objects[i].tagged) - add_to_write_order(wo, &wo_end, &objects[i]); + add_to_write_order(wo, wo_end, &objects[i]); } /* @@ -736,7 +707,7 @@ static struct object_entry **compute_write_order(void) if (oe_type(&objects[i]) != OBJ_COMMIT && oe_type(&objects[i]) != OBJ_TAG) continue; - add_to_write_order(wo, &wo_end, &objects[i]); + add_to_write_order(wo, wo_end, &objects[i]); } /* @@ -745,17 +716,61 @@ static struct object_entry **compute_write_order(void) for (i = last_untagged; i < to_pack.nr_objects; i++) { if (oe_type(&objects[i]) != OBJ_TREE) continue; - add_to_write_order(wo, &wo_end, &objects[i]); + add_to_write_order(wo, wo_end, &objects[i]); } /* * Finally all the rest in really tight order */ for (i = last_untagged; i < to_pack.nr_objects; i++) { - if (!objects[i].filled) - add_family_to_write_order(wo, &wo_end, &objects[i]); + if (!objects[i].filled && oe_layer(&to_pack, &objects[i]) == write_layer) + add_family_to_write_order(wo, wo_end, &objects[i]); + } +} + +static struct object_entry **compute_write_order(void) +{ + uint32_t max_layers = 1; + unsigned int i, wo_end; + + struct object_entry **wo; + struct object_entry *objects = to_pack.objects; + + for (i = 0; i < to_pack.nr_objects; i++) { + objects[i].tagged = 0; + objects[i].filled = 0; + SET_DELTA_CHILD(&objects[i], NULL); + SET_DELTA_SIBLING(&objects[i], NULL); + } + + /* + * Fully connect delta_child/delta_sibling network. + * Make sure delta_sibling is sorted in the original + * recency order. + */ + for (i = to_pack.nr_objects; i > 0;) { + struct object_entry *e = &objects[--i]; + if (!DELTA(e)) + continue; + /* Mark me as the first child */ + e->delta_sibling_idx = DELTA(e)->delta_child_idx; + SET_DELTA_CHILD(DELTA(e), e); } + /* + * Mark objects that are at the tip of tags. + */ + for_each_tag_ref(mark_tagged, NULL); + + if (use_delta_islands) + max_layers = compute_pack_layers(&to_pack); + + ALLOC_ARRAY(wo, to_pack.nr_objects); + wo_end = 0; + + for (; write_layer < max_layers; ++write_layer) + compute_layer_order(wo, &wo_end); + if (wo_end != to_pack.nr_objects) die(_("ordered %u objects, expected %"PRIu32), wo_end, to_pack.nr_objects); @@ -1544,7 +1559,8 @@ static void check_object(struct object_entry *entry) if (base_ref && ( (base_entry = packlist_find(&to_pack, base_ref, NULL)) || (thin && - bitmap_has_sha1_in_uninteresting(bitmap_git, base_ref)))) { + bitmap_has_sha1_in_uninteresting(bitmap_git, base_ref))) && + in_same_island(&entry->idx.oid, &base_entry->idx.oid)) { /* * If base_ref was set above that means we wish to * reuse delta data, and either we found that object in @@ -1867,6 +1883,11 @@ static int type_size_sort(const void *_a, const void *_b) return -1; if (a->preferred_base < b->preferred_base) return 1; + if (use_delta_islands) { + int island_cmp = island_delta_cmp(&a->idx.oid, &b->idx.oid); + if (island_cmp) + return island_cmp; + } if (a_size > b_size) return -1; if (a_size < b_size) @@ -2027,6 +2048,9 @@ static int try_delta(struct unpacked *trg, struct unpacked *src, if (trg_size < src_size / 32) return 0; + if (!in_same_island(&trg->entry->idx.oid, &src->entry->idx.oid)) + return 0; + /* Load data if not already done */ if (!trg->data) { read_lock(); @@ -2569,6 +2593,9 @@ static void prepare_pack(int window, int depth) uint32_t i, nr_deltas; unsigned n; + if (use_delta_islands) + resolve_tree_islands(progress, &to_pack); + get_object_details(); /* @@ -2732,6 +2759,9 @@ static void show_commit(struct commit *commit, void *data) if (write_bitmap_index) index_commit_for_bitmap(commit); + + if (use_delta_islands) + propagate_island_marks(commit); } static void show_object(struct object *obj, const char *name, void *data) @@ -2739,6 +2769,19 @@ static void show_object(struct object *obj, const char *name, void *data) add_preferred_base_object(name); add_object_entry(&obj->oid, obj->type, name, 0); obj->flags |= OBJECT_ADDED; + + if (use_delta_islands) { + const char *p; + unsigned depth = 0; + struct object_entry *ent; + + for (p = strchr(name, '/'); p; p = strchr(p + 1, '/')) + depth++; + + ent = packlist_find(&to_pack, obj->oid.hash, NULL); + if (ent && depth > oe_tree_depth(&to_pack, ent)) + oe_set_tree_depth(&to_pack, ent, depth); + } } static void show_object__ma_allow_any(struct object *obj, const char *name, void *data) @@ -3064,6 +3107,9 @@ static void get_object_list(int ac, const char **av) if (use_bitmap_index && !get_object_list_from_bitmap(&revs)) return; + if (use_delta_islands) + load_delta_islands(); + if (prepare_revision_walk(&revs)) die(_("revision walk setup failed")); mark_edges_uninteresting(&revs, show_edge); @@ -3242,6 +3288,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) option_parse_missing_action }, OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects, N_("do not pack objects in promisor packfiles")), + OPT_BOOL(0, "delta-islands", &use_delta_islands, + N_("respect islands during delta compression")), OPT_END(), }; @@ -3368,6 +3416,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (pack_to_stdout || !rev_list_all) write_bitmap_index = 0; + if (use_delta_islands) + argv_array_push(&rp, "--topo-order"); + if (progress && all_progress_implied) progress = 2; diff --git a/builtin/repack.c b/builtin/repack.c index 42be88e86c..c6a7943d5c 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -15,6 +15,7 @@ static int delta_base_offset = 1; static int pack_kept_objects = -1; static int write_bitmaps; +static int use_delta_islands; static char *packdir, *packtmp; static const char *const git_repack_usage[] = { @@ -43,6 +44,10 @@ static int repack_config(const char *var, const char *value, void *cb) write_bitmaps = git_config_bool(var, value); return 0; } + if (!strcmp(var, "repack.usedeltaislands")) { + use_delta_islands = git_config_bool(var, value); + return 0; + } return git_default_config(var, value, cb); } @@ -303,6 +308,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("pass --local to git-pack-objects")), OPT_BOOL('b', "write-bitmap-index", &write_bitmaps, N_("write bitmap index")), + OPT_BOOL('i', "delta-islands", &use_delta_islands, + N_("pass --delta-islands to git-pack-objects")), OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"), N_("with -A, do not loosen objects older than this")), OPT_BOOL('k', "keep-unreachable", &keep_unreachable, @@ -363,6 +370,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argv_array_push(&cmd.args, "--exclude-promisor-objects"); if (write_bitmaps) argv_array_push(&cmd.args, "--write-bitmap-index"); + if (use_delta_islands) + argv_array_push(&cmd.args, "--delta-islands"); if (pack_everything & ALL_INTO_ONE) { get_non_kept_pack_filenames(&existing_packs, &keep_pack_list); diff --git a/delta-islands.c b/delta-islands.c new file mode 100644 index 0000000000..8e5018e406 --- /dev/null +++ b/delta-islands.c @@ -0,0 +1,502 @@ +#include "cache.h" +#include "attr.h" +#include "object.h" +#include "blob.h" +#include "commit.h" +#include "tag.h" +#include "tree.h" +#include "delta.h" +#include "pack.h" +#include "tree-walk.h" +#include "diff.h" +#include "revision.h" +#include "list-objects.h" +#include "progress.h" +#include "refs.h" +#include "khash.h" +#include "pack-bitmap.h" +#include "pack-objects.h" +#include "delta-islands.h" +#include "sha1-array.h" +#include "config.h" + +KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal) + +static khash_sha1 *island_marks; +static unsigned island_counter; +static unsigned island_counter_core; + +static kh_str_t *remote_islands; + +struct remote_island { + uint64_t hash; + struct oid_array oids; +}; + +struct island_bitmap { + uint32_t refcount; + uint32_t bits[FLEX_ARRAY]; +}; + +static uint32_t island_bitmap_size; + +/* + * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy + * of "old". Otherwise, the new bitmap is empty. + */ +static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old) +{ + size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4); + struct island_bitmap *b = xcalloc(1, size); + + if (old) + memcpy(b, old, size); + + b->refcount = 1; + return b; +} + +static void island_bitmap_or(struct island_bitmap *a, const struct island_bitmap *b) +{ + uint32_t i; + + for (i = 0; i < island_bitmap_size; ++i) + a->bits[i] |= b->bits[i]; +} + +static int island_bitmap_is_subset(struct island_bitmap *self, + struct island_bitmap *super) +{ + uint32_t i; + + if (self == super) + return 1; + + for (i = 0; i < island_bitmap_size; ++i) { + if ((self->bits[i] & super->bits[i]) != self->bits[i]) + return 0; + } + + return 1; +} + +#define ISLAND_BITMAP_BLOCK(x) (x / 32) +#define ISLAND_BITMAP_MASK(x) (1 << (x % 32)) + +static void island_bitmap_set(struct island_bitmap *self, uint32_t i) +{ + self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i); +} + +static int island_bitmap_get(struct island_bitmap *self, uint32_t i) +{ + return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 0; +} + +int in_same_island(const struct object_id *trg_oid, const struct object_id *src_oid) +{ + khiter_t trg_pos, src_pos; + + /* If we aren't using islands, assume everything goes together. */ + if (!island_marks) + return 1; + + /* + * If we don't have a bitmap for the target, we can delta it + * against anything -- it's not an important object + */ + trg_pos = kh_get_sha1(island_marks, trg_oid->hash); + if (trg_pos >= kh_end(island_marks)) + return 1; + + /* + * if the source (our delta base) doesn't have a bitmap, + * we don't want to base any deltas on it! + */ + src_pos = kh_get_sha1(island_marks, src_oid->hash); + if (src_pos >= kh_end(island_marks)) + return 0; + + return island_bitmap_is_subset(kh_value(island_marks, trg_pos), + kh_value(island_marks, src_pos)); +} + +int island_delta_cmp(const struct object_id *a, const struct object_id *b) +{ + khiter_t a_pos, b_pos; + struct island_bitmap *a_bitmap = NULL, *b_bitmap = NULL; + + if (!island_marks) + return 0; + + a_pos = kh_get_sha1(island_marks, a->hash); + if (a_pos < kh_end(island_marks)) + a_bitmap = kh_value(island_marks, a_pos); + + b_pos = kh_get_sha1(island_marks, b->hash); + if (b_pos < kh_end(island_marks)) + b_bitmap = kh_value(island_marks, b_pos); + + if (a_bitmap) { + if (!b_bitmap || !island_bitmap_is_subset(a_bitmap, b_bitmap)) + return -1; + } + if (b_bitmap) { + if (!a_bitmap || !island_bitmap_is_subset(b_bitmap, a_bitmap)) + return 1; + } + + return 0; +} + +static struct island_bitmap *create_or_get_island_marks(struct object *obj) +{ + khiter_t pos; + int hash_ret; + + pos = kh_put_sha1(island_marks, obj->oid.hash, &hash_ret); + if (hash_ret) + kh_value(island_marks, pos) = island_bitmap_new(NULL); + + return kh_value(island_marks, pos); +} + +static void set_island_marks(struct object *obj, struct island_bitmap *marks) +{ + struct island_bitmap *b; + khiter_t pos; + int hash_ret; + + pos = kh_put_sha1(island_marks, obj->oid.hash, &hash_ret); + if (hash_ret) { + /* + * We don't have one yet; make a copy-on-write of the + * parent. + */ + marks->refcount++; + kh_value(island_marks, pos) = marks; + return; + } + + /* + * We do have it. Make sure we split any copy-on-write before + * updating. + */ + b = kh_value(island_marks, pos); + if (b->refcount > 1) { + b->refcount--; + b = kh_value(island_marks, pos) = island_bitmap_new(b); + } + island_bitmap_or(b, marks); +} + +static void mark_remote_island_1(struct remote_island *rl, int is_core_island) +{ + uint32_t i; + + for (i = 0; i < rl->oids.nr; ++i) { + struct island_bitmap *marks; + struct object *obj = parse_object(the_repository, &rl->oids.oid[i]); + + if (!obj) + continue; + + marks = create_or_get_island_marks(obj); + island_bitmap_set(marks, island_counter); + + if (is_core_island && obj->type == OBJ_COMMIT) + obj->flags |= NEEDS_BITMAP; + + /* If it was a tag, also make sure we hit the underlying object. */ + while (obj && obj->type == OBJ_TAG) { + obj = ((struct tag *)obj)->tagged; + if (obj) { + parse_object(the_repository, &obj->oid); + marks = create_or_get_island_marks(obj); + island_bitmap_set(marks, island_counter); + } + } + } + + if (is_core_island) + island_counter_core = island_counter; + + island_counter++; +} + +struct tree_islands_todo { + struct object_entry *entry; + unsigned int depth; +}; + +static int tree_depth_compare(const void *a, const void *b) +{ + const struct tree_islands_todo *todo_a = a; + const struct tree_islands_todo *todo_b = b; + + return todo_a->depth - todo_b->depth; +} + +void resolve_tree_islands(int progress, struct packing_data *to_pack) +{ + struct progress *progress_state = NULL; + struct tree_islands_todo *todo; + int nr = 0; + int i; + + if (!island_marks) + return; + + /* + * We process only trees, as commits and tags have already been handled + * (and passed their marks on to root trees, as well. We must make sure + * to process them in descending tree-depth order so that marks + * propagate down the tree properly, even if a sub-tree is found in + * multiple parent trees. + */ + ALLOC_ARRAY(todo, to_pack->nr_objects); + for (i = 0; i < to_pack->nr_objects; i++) { + if (oe_type(&to_pack->objects[i]) == OBJ_TREE) { + todo[nr].entry = &to_pack->objects[i]; + todo[nr].depth = oe_tree_depth(to_pack, &to_pack->objects[i]); + nr++; + } + } + QSORT(todo, nr, tree_depth_compare); + + if (progress) + progress_state = start_progress(_("Propagating island marks"), nr); + + for (i = 0; i < nr; i++) { + struct object_entry *ent = todo[i].entry; + struct island_bitmap *root_marks; + struct tree *tree; + struct tree_desc desc; + struct name_entry entry; + khiter_t pos; + + pos = kh_get_sha1(island_marks, ent->idx.oid.hash); + if (pos >= kh_end(island_marks)) + continue; + + root_marks = kh_value(island_marks, pos); + + tree = lookup_tree(the_repository, &ent->idx.oid); + if (!tree || parse_tree(tree) < 0) + die(_("bad tree object %s"), oid_to_hex(&ent->idx.oid)); + + init_tree_desc(&desc, tree->buffer, tree->size); + while (tree_entry(&desc, &entry)) { + struct object *obj; + + if (S_ISGITLINK(entry.mode)) + continue; + + obj = lookup_object(the_repository, entry.oid->hash); + if (!obj) + continue; + + set_island_marks(obj, root_marks); + } + + free_tree_buffer(tree); + + display_progress(progress_state, i+1); + } + + stop_progress(&progress_state); + free(todo); +} + +static regex_t *island_regexes; +static unsigned int island_regexes_alloc, island_regexes_nr; +static const char *core_island_name; + +static int island_config_callback(const char *k, const char *v, void *cb) +{ + if (!strcmp(k, "pack.island")) { + struct strbuf re = STRBUF_INIT; + + if (!v) + return config_error_nonbool(k); + + ALLOC_GROW(island_regexes, island_regexes_nr + 1, island_regexes_alloc); + + if (*v != '^') + strbuf_addch(&re, '^'); + strbuf_addstr(&re, v); + + if (regcomp(&island_regexes[island_regexes_nr], re.buf, REG_EXTENDED)) + die(_("failed to load island regex for '%s': %s"), k, re.buf); + + strbuf_release(&re); + island_regexes_nr++; + return 0; + } + + if (!strcmp(k, "pack.islandcore")) + return git_config_string(&core_island_name, k, v); + + return 0; +} + +static void add_ref_to_island(const char *island_name, const struct object_id *oid) +{ + uint64_t sha_core; + struct remote_island *rl = NULL; + + int hash_ret; + khiter_t pos = kh_put_str(remote_islands, island_name, &hash_ret); + + if (hash_ret) { + kh_key(remote_islands, pos) = xstrdup(island_name); + kh_value(remote_islands, pos) = xcalloc(1, sizeof(struct remote_island)); + } + + rl = kh_value(remote_islands, pos); + oid_array_append(&rl->oids, oid); + + memcpy(&sha_core, oid->hash, sizeof(uint64_t)); + rl->hash += sha_core; +} + +static int find_island_for_ref(const char *refname, const struct object_id *oid, + int flags, void *data) +{ + /* + * We should advertise 'ARRAY_SIZE(matches) - 2' as the max, + * so we can diagnose below a config with more capture groups + * than we support. + */ + regmatch_t matches[16]; + int i, m; + struct strbuf island_name = STRBUF_INIT; + + /* walk backwards to get last-one-wins ordering */ + for (i = island_regexes_nr - 1; i >= 0; i--) { + if (!regexec(&island_regexes[i], refname, + ARRAY_SIZE(matches), matches, 0)) + break; + } + + if (i < 0) + return 0; + + if (matches[ARRAY_SIZE(matches) - 1].rm_so != -1) + warning(_("island regex from config has " + "too many capture groups (max=%d)"), + (int)ARRAY_SIZE(matches) - 2); + + for (m = 1; m < ARRAY_SIZE(matches); m++) { + regmatch_t *match = &matches[m]; + + if (match->rm_so == -1) + continue; + + if (island_name.len) + strbuf_addch(&island_name, '-'); + + strbuf_add(&island_name, refname + match->rm_so, match->rm_eo - match->rm_so); + } + + add_ref_to_island(island_name.buf, oid); + strbuf_release(&island_name); + return 0; +} + +static struct remote_island *get_core_island(void) +{ + if (core_island_name) { + khiter_t pos = kh_get_str(remote_islands, core_island_name); + if (pos < kh_end(remote_islands)) + return kh_value(remote_islands, pos); + } + + return NULL; +} + +static void deduplicate_islands(void) +{ + struct remote_island *island, *core = NULL, **list; + unsigned int island_count, dst, src, ref, i = 0; + + island_count = kh_size(remote_islands); + ALLOC_ARRAY(list, island_count); + + kh_foreach_value(remote_islands, island, { + list[i++] = island; + }); + + for (ref = 0; ref + 1 < island_count; ref++) { + for (src = ref + 1, dst = src; src < island_count; src++) { + if (list[ref]->hash == list[src]->hash) + continue; + + if (src != dst) + list[dst] = list[src]; + + dst++; + } + island_count = dst; + } + + island_bitmap_size = (island_count / 32) + 1; + core = get_core_island(); + + for (i = 0; i < island_count; ++i) { + mark_remote_island_1(list[i], core && list[i]->hash == core->hash); + } + + free(list); +} + +void load_delta_islands(void) +{ + island_marks = kh_init_sha1(); + remote_islands = kh_init_str(); + + git_config(island_config_callback, NULL); + for_each_ref(find_island_for_ref, NULL); + deduplicate_islands(); + + fprintf(stderr, _("Marked %d islands, done.\n"), island_counter); +} + +void propagate_island_marks(struct commit *commit) +{ + khiter_t pos = kh_get_sha1(island_marks, commit->object.oid.hash); + + if (pos < kh_end(island_marks)) { + struct commit_list *p; + struct island_bitmap *root_marks = kh_value(island_marks, pos); + + parse_commit(commit); + set_island_marks(&get_commit_tree(commit)->object, root_marks); + for (p = commit->parents; p; p = p->next) + set_island_marks(&p->item->object, root_marks); + } +} + +int compute_pack_layers(struct packing_data *to_pack) +{ + uint32_t i; + + if (!core_island_name || !island_marks) + return 1; + + for (i = 0; i < to_pack->nr_objects; ++i) { + struct object_entry *entry = &to_pack->objects[i]; + khiter_t pos = kh_get_sha1(island_marks, entry->idx.oid.hash); + + oe_set_layer(to_pack, entry, 1); + + if (pos < kh_end(island_marks)) { + struct island_bitmap *bitmap = kh_value(island_marks, pos); + + if (island_bitmap_get(bitmap, island_counter_core)) + oe_set_layer(to_pack, entry, 0); + } + } + + return 2; +} diff --git a/delta-islands.h b/delta-islands.h new file mode 100644 index 0000000000..f9725730f4 --- /dev/null +++ b/delta-islands.h @@ -0,0 +1,11 @@ +#ifndef DELTA_ISLANDS_H +#define DELTA_ISLANDS_H + +int island_delta_cmp(const struct object_id *a, const struct object_id *b); +int in_same_island(const struct object_id *, const struct object_id *); +void resolve_tree_islands(int progress, struct packing_data *to_pack); +void load_delta_islands(void); +void propagate_island_marks(struct commit *commit); +int compute_pack_layers(struct packing_data *to_pack); + +#endif /* DELTA_ISLANDS_H */ diff --git a/pack-objects.c b/pack-objects.c index 3462858772..7f7b7dddf6 100644 --- a/pack-objects.c +++ b/pack-objects.c @@ -164,6 +164,12 @@ struct object_entry *packlist_alloc(struct packing_data *pdata, REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc); if (pdata->delta_size) REALLOC_ARRAY(pdata->delta_size, pdata->nr_alloc); + + if (pdata->tree_depth) + REALLOC_ARRAY(pdata->tree_depth, pdata->nr_alloc); + + if (pdata->layer) + REALLOC_ARRAY(pdata->layer, pdata->nr_alloc); } new_entry = pdata->objects + pdata->nr_objects++; @@ -179,6 +185,12 @@ struct object_entry *packlist_alloc(struct packing_data *pdata, if (pdata->in_pack) pdata->in_pack[pdata->nr_objects - 1] = NULL; + if (pdata->tree_depth) + pdata->tree_depth[pdata->nr_objects - 1] = 0; + + if (pdata->layer) + pdata->layer[pdata->nr_objects - 1] = 0; + return new_entry; } diff --git a/pack-objects.h b/pack-objects.h index b0c1f137c6..2ca39cfcfe 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -103,6 +103,7 @@ struct object_entry { unsigned no_try_delta:1; unsigned type_:TYPE_BITS; unsigned in_pack_type:TYPE_BITS; /* could be delta */ + unsigned preferred_base:1; /* * we do not pack this, but is available * to be used as the base object to delta @@ -158,6 +159,10 @@ struct packing_data { uintmax_t oe_size_limit; uintmax_t oe_delta_size_limit; + + /* delta islands */ + unsigned int *tree_depth; + unsigned char *layer; }; void prepare_packing_data(struct packing_data *pdata); @@ -400,4 +405,38 @@ static inline void oe_set_delta_size(struct packing_data *pack, } } +static inline unsigned int oe_tree_depth(struct packing_data *pack, + struct object_entry *e) +{ + if (!pack->tree_depth) + return 0; + return pack->tree_depth[e - pack->objects]; +} + +static inline void oe_set_tree_depth(struct packing_data *pack, + struct object_entry *e, + unsigned int tree_depth) +{ + if (!pack->tree_depth) + ALLOC_ARRAY(pack->tree_depth, pack->nr_objects); + pack->tree_depth[e - pack->objects] = tree_depth; +} + +static inline unsigned char oe_layer(struct packing_data *pack, + struct object_entry *e) +{ + if (!pack->layer) + return 0; + return pack->layer[e - pack->objects]; +} + +static inline void oe_set_layer(struct packing_data *pack, + struct object_entry *e, + unsigned char layer) +{ + if (!pack->layer) + ALLOC_ARRAY(pack->layer, pack->nr_objects); + pack->layer[e - pack->objects] = layer; +} + #endif diff --git a/t/t5320-delta-islands.sh b/t/t5320-delta-islands.sh new file mode 100755 index 0000000000..fea92a5777 --- /dev/null +++ b/t/t5320-delta-islands.sh @@ -0,0 +1,143 @@ +#!/bin/sh + +test_description='exercise delta islands' +. ./test-lib.sh + +# returns true iff $1 is a delta based on $2 +is_delta_base () { + delta_base=$(echo "$1" | git cat-file --batch-check='%(deltabase)') && + echo >&2 "$1 has base $delta_base" && + test "$delta_base" = "$2" +} + +# generate a commit on branch $1 with a single file, "file", whose +# content is mostly based on the seed $2, but with a unique bit +# of content $3 appended. This should allow us to see whether +# blobs of different refs delta against each other. +commit() { + blob=$({ test-tool genrandom "$2" 10240 && echo "$3"; } | + git hash-object -w --stdin) && + tree=$(printf '100644 blob %s\tfile\n' "$blob" | git mktree) && + commit=$(echo "$2-$3" | git commit-tree "$tree" ${4:+-p "$4"}) && + git update-ref "refs/heads/$1" "$commit" && + eval "$1"'=$(git rev-parse $1:file)' && + eval "echo >&2 $1=\$$1" +} + +test_expect_success 'setup commits' ' + commit one seed 1 && + commit two seed 12 +' + +# Note: This is heavily dependent on the "prefer larger objects as base" +# heuristic. +test_expect_success 'vanilla repack deltas one against two' ' + git repack -adf && + is_delta_base $one $two +' + +test_expect_success 'island repack with no island definition is vanilla' ' + git repack -adfi && + is_delta_base $one $two +' + +test_expect_success 'island repack with no matches is vanilla' ' + git -c "pack.island=refs/foo" repack -adfi && + is_delta_base $one $two +' + +test_expect_success 'separate islands disallows delta' ' + git -c "pack.island=refs/heads/(.*)" repack -adfi && + ! is_delta_base $one $two && + ! is_delta_base $two $one +' + +test_expect_success 'same island allows delta' ' + git -c "pack.island=refs/heads" repack -adfi && + is_delta_base $one $two +' + +test_expect_success 'coalesce same-named islands' ' + git \ + -c "pack.island=refs/(.*)/one" \ + -c "pack.island=refs/(.*)/two" \ + repack -adfi && + is_delta_base $one $two +' + +test_expect_success 'island restrictions drop reused deltas' ' + git repack -adfi && + is_delta_base $one $two && + git -c "pack.island=refs/heads/(.*)" repack -adi && + ! is_delta_base $one $two && + ! is_delta_base $two $one +' + +test_expect_success 'island regexes are left-anchored' ' + git -c "pack.island=heads/(.*)" repack -adfi && + is_delta_base $one $two +' + +test_expect_success 'island regexes follow last-one-wins scheme' ' + git \ + -c "pack.island=refs/heads/(.*)" \ + -c "pack.island=refs/heads/" \ + repack -adfi && + is_delta_base $one $two +' + +test_expect_success 'setup shared history' ' + commit root shared root && + commit one shared 1 root && + commit two shared 12-long root +' + +# We know that $two will be preferred as a base from $one, +# because we can transform it with a pure deletion. +# +# We also expect $root as a delta against $two by the "longest is base" rule. +test_expect_success 'vanilla delta goes between branches' ' + git repack -adf && + is_delta_base $one $two && + is_delta_base $root $two +' + +# Here we should allow $one to base itself on $root; even though +# they are in different islands, the objects in $root are in a superset +# of islands compared to those in $one. +# +# Similarly, $two can delta against $root by our rules. And unlike $one, +# in which we are just allowing it, the island rules actually put $root +# as a possible base for $two, which it would not otherwise be (due to the size +# sorting). +test_expect_success 'deltas allowed against superset islands' ' + git -c "pack.island=refs/heads/(.*)" repack -adfi && + is_delta_base $one $root && + is_delta_base $two $root +' + +# We are going to test the packfile order here, so we again have to make some +# assumptions. We assume that "$root", as part of our core "one", must come +# before "$two". This should be guaranteed by the island code. However, for +# this test to fail without islands, we are also assuming that it would not +# otherwise do so. This is true by the current write order, which will put +# commits (and their contents) before their parents. +test_expect_success 'island core places core objects first' ' + cat >expect <<-EOF && + $root + $two + EOF + git -c "pack.island=refs/heads/(.*)" \ + -c "pack.islandcore=one" \ + repack -adfi && + git verify-pack -v .git/objects/pack/*.pack | + cut -d" " -f1 | + egrep "$root|$two" >actual && + test_cmp expect actual +' + +test_expect_success 'unmatched island core is not fatal' ' + git -c "pack.islandcore=one" repack -adfi +' + +test_done |