summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLibravatar Junio C Hamano <gitster@pobox.com>2021-04-08 13:23:25 -0700
committerLibravatar Junio C Hamano <gitster@pobox.com>2021-04-08 13:23:25 -0700
commite6b971fcf5d85db821636f2d887cfaf204b32bda (patch)
tree8d94e5501218bd7614729a363637e3746568a814
parentThe seventh batch (diff)
parentmidx.c: improve cache locality in midx_pack_order_cmp() (diff)
downloadtgif-e6b971fcf5d85db821636f2d887cfaf204b32bda.tar.xz
Merge branch 'tb/reverse-midx'
An on-disk reverse-index to map the in-pack location of an object back to its object name across multiple packfiles is introduced. * tb/reverse-midx: midx.c: improve cache locality in midx_pack_order_cmp() pack-revindex: write multi-pack reverse indexes pack-write.c: extract 'write_rev_file_order' pack-revindex: read multi-pack reverse indexes Documentation/technical: describe multi-pack reverse indexes midx: make some functions non-static midx: keep track of the checksum midx: don't free midx_name early midx: allow marking a pack as preferred t/helper/test-read-midx.c: add '--show-objects' builtin/multi-pack-index.c: display usage on unrecognized command builtin/multi-pack-index.c: don't enter bogus cmd_mode builtin/multi-pack-index.c: split sub-commands builtin/multi-pack-index.c: define common usage with a macro builtin/multi-pack-index.c: don't handle 'progress' separately builtin/multi-pack-index.c: inline 'flags' with options
-rw-r--r--Documentation/git-multi-pack-index.txt14
-rw-r--r--Documentation/technical/multi-pack-index.txt5
-rw-r--r--Documentation/technical/pack-format.txt83
-rw-r--r--builtin/multi-pack-index.c182
-rw-r--r--builtin/repack.c2
-rw-r--r--midx.c219
-rw-r--r--midx.h11
-rw-r--r--pack-revindex.c126
-rw-r--r--pack-revindex.h53
-rw-r--r--pack-write.c36
-rw-r--r--pack.h1
-rw-r--r--packfile.c3
-rw-r--r--t/helper/test-read-midx.c24
-rwxr-xr-xt/t5319-multi-pack-index.sh42
14 files changed, 733 insertions, 68 deletions
diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index eb0caa0439..ffd601bc17 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,8 @@ git-multi-pack-index - Write and verify multi-pack-indexes
SYNOPSIS
--------
[verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand>
+'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
+ [--preferred-pack=<pack>] <subcommand>
DESCRIPTION
-----------
@@ -30,7 +31,16 @@ OPTIONS
The following subcommands are available:
write::
- Write a new MIDX file.
+ Write a new MIDX file. The following options are available for
+ the `write` sub-command:
++
+--
+ --preferred-pack=<pack>::
+ Optionally specify the tie-breaking pack used when
+ multiple packs contain the same object. If not given,
+ ties are broken in favor of the pack with the lowest
+ mtime.
+--
verify::
Verify the contents of the MIDX file.
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index e8e377a59f..fb688976c4 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -43,8 +43,9 @@ Design Details
a change in format.
- The MIDX keeps only one record per object ID. If an object appears
- in multiple packfiles, then the MIDX selects the copy in the most-
- recently modified packfile.
+ in multiple packfiles, then the MIDX selects the copy in the
+ preferred packfile, otherwise selecting from the most-recently
+ modified packfile.
- If there exist packfiles in the pack directory not registered in
the MIDX, then those packfiles are loaded into the `packed_git`
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 1faa949bf6..8d2f42f29e 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -379,3 +379,86 @@ CHUNK DATA:
TRAILER:
Index checksum of the above contents.
+
+== multi-pack-index reverse indexes
+
+Similar to the pack-based reverse index, the multi-pack index can also
+be used to generate a reverse index.
+
+Instead of mapping between offset, pack-, and index position, this
+reverse index maps between an object's position within the MIDX, and
+that object's position within a pseudo-pack that the MIDX describes
+(i.e., the ith entry of the multi-pack reverse index holds the MIDX
+position of ith object in pseudo-pack order).
+
+To clarify the difference between these orderings, consider a multi-pack
+reachability bitmap (which does not yet exist, but is what we are
+building towards here). Each bit needs to correspond to an object in the
+MIDX, and so we need an efficient mapping from bit position to MIDX
+position.
+
+One solution is to let bits occupy the same position in the oid-sorted
+index stored by the MIDX. But because oids are effectively random, their
+resulting reachability bitmaps would have no locality, and thus compress
+poorly. (This is the reason that single-pack bitmaps use the pack
+ordering, and not the .idx ordering, for the same purpose.)
+
+So we'd like to define an ordering for the whole MIDX based around
+pack ordering, which has far better locality (and thus compresses more
+efficiently). We can think of a pseudo-pack created by the concatenation
+of all of the packs in the MIDX. E.g., if we had a MIDX with three packs
+(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an
+ordering of the objects like:
+
+ |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
+
+where the ordering of the packs is defined by the MIDX's pack list,
+and then the ordering of objects within each pack is the same as the
+order in the actual packfile.
+
+Given the list of packs and their counts of objects, you can
+naïvely reconstruct that pseudo-pack ordering (e.g., the object at
+position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
+slots). But there's a catch. Objects may be duplicated between packs, in
+which case the MIDX only stores one pointer to the object (and thus we'd
+want only one slot in the bitmap).
+
+Callers could handle duplicates themselves by reading objects in order
+of their bit-position, but that's linear in the number of objects, and
+much too expensive for ordinary bitmap lookups. Building a reverse index
+solves this, since it is the logical inverse of the index, and that
+index has already removed duplicates. But, building a reverse index on
+the fly can be expensive. Since we already have an on-disk format for
+pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
+too.
+
+Objects from the MIDX are ordered as follows to string together the
+pseudo-pack. Let `pack(o)` return the pack from which `o` was selected
+by the MIDX, and define an ordering of packs based on their numeric ID
+(as stored by the MIDX). Let `offset(o)` return the object offset of `o`
+within `pack(o)`. Then, compare `o1` and `o2` as follows:
+
+ - If one of `pack(o1)` and `pack(o2)` is preferred and the other
+ is not, then the preferred one sorts first.
++
+(This is a detail that allows the MIDX bitmap to determine which
+pack should be used by the pack-reuse mechanism, since it can ask
+the MIDX for the pack containing the object at bit position 0).
+
+ - If `pack(o1) ≠ pack(o2)`, then sort the two objects in descending
+ order based on the pack ID.
+
+ - Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in
+ pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1)
+ < offset(o2)`).
+
+In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
+objects in packs stored by the MIDX, laid out in pack order, and the
+packs arranged in MIDX order (with the preferred pack coming first).
+
+Finally, note that the MIDX's reverse index is not stored as a chunk in
+the multi-pack-index itself. This is done because the reverse index
+includes the checksum of the pack or MIDX to which it belongs, which
+makes it impossible to write in the MIDX. To avoid races when rewriting
+the MIDX, a MIDX reverse index includes the MIDX's checksum in its
+filename (e.g., `multi-pack-index-xyz.rev`).
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5bf88cd2a8..5d3ea445fd 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -4,67 +4,181 @@
#include "parse-options.h"
#include "midx.h"
#include "trace2.h"
+#include "object-store.h"
+#define BUILTIN_MIDX_WRITE_USAGE \
+ N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]")
+
+#define BUILTIN_MIDX_VERIFY_USAGE \
+ N_("git multi-pack-index [<options>] verify")
+
+#define BUILTIN_MIDX_EXPIRE_USAGE \
+ N_("git multi-pack-index [<options>] expire")
+
+#define BUILTIN_MIDX_REPACK_USAGE \
+ N_("git multi-pack-index [<options>] repack [--batch-size=<size>]")
+
+static char const * const builtin_multi_pack_index_write_usage[] = {
+ BUILTIN_MIDX_WRITE_USAGE,
+ NULL
+};
+static char const * const builtin_multi_pack_index_verify_usage[] = {
+ BUILTIN_MIDX_VERIFY_USAGE,
+ NULL
+};
+static char const * const builtin_multi_pack_index_expire_usage[] = {
+ BUILTIN_MIDX_EXPIRE_USAGE,
+ NULL
+};
+static char const * const builtin_multi_pack_index_repack_usage[] = {
+ BUILTIN_MIDX_REPACK_USAGE,
+ NULL
+};
static char const * const builtin_multi_pack_index_usage[] = {
- N_("git multi-pack-index [<options>] (write|verify|expire|repack --batch-size=<size>)"),
+ BUILTIN_MIDX_WRITE_USAGE,
+ BUILTIN_MIDX_VERIFY_USAGE,
+ BUILTIN_MIDX_EXPIRE_USAGE,
+ BUILTIN_MIDX_REPACK_USAGE,
NULL
};
static struct opts_multi_pack_index {
const char *object_dir;
+ const char *preferred_pack;
unsigned long batch_size;
- int progress;
+ unsigned flags;
} opts;
-int cmd_multi_pack_index(int argc, const char **argv,
- const char *prefix)
+static struct option common_opts[] = {
+ OPT_FILENAME(0, "object-dir", &opts.object_dir,
+ N_("object directory containing set of packfile and pack-index pairs")),
+ OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+ OPT_END(),
+};
+
+static struct option *add_common_options(struct option *prev)
{
- unsigned flags = 0;
+ return parse_options_concat(common_opts, prev);
+}
+
+static int cmd_multi_pack_index_write(int argc, const char **argv)
+{
+ struct option *options;
+ static struct option builtin_multi_pack_index_write_options[] = {
+ OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
+ N_("preferred-pack"),
+ N_("pack for reuse when computing a multi-pack bitmap")),
+ OPT_END(),
+ };
+
+ options = add_common_options(builtin_multi_pack_index_write_options);
+
+ trace2_cmd_mode(argv[0]);
- static struct option builtin_multi_pack_index_options[] = {
- OPT_FILENAME(0, "object-dir", &opts.object_dir,
- N_("object directory containing set of packfile and pack-index pairs")),
- OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
+ argc = parse_options(argc, argv, NULL,
+ options, builtin_multi_pack_index_write_usage,
+ PARSE_OPT_KEEP_UNKNOWN);
+ if (argc)
+ usage_with_options(builtin_multi_pack_index_write_usage,
+ options);
+
+ FREE_AND_NULL(options);
+
+ return write_midx_file(opts.object_dir, opts.preferred_pack,
+ opts.flags);
+}
+
+static int cmd_multi_pack_index_verify(int argc, const char **argv)
+{
+ struct option *options = common_opts;
+
+ trace2_cmd_mode(argv[0]);
+
+ argc = parse_options(argc, argv, NULL,
+ options, builtin_multi_pack_index_verify_usage,
+ PARSE_OPT_KEEP_UNKNOWN);
+ if (argc)
+ usage_with_options(builtin_multi_pack_index_verify_usage,
+ options);
+
+ return verify_midx_file(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_expire(int argc, const char **argv)
+{
+ struct option *options = common_opts;
+
+ trace2_cmd_mode(argv[0]);
+
+ argc = parse_options(argc, argv, NULL,
+ options, builtin_multi_pack_index_expire_usage,
+ PARSE_OPT_KEEP_UNKNOWN);
+ if (argc)
+ usage_with_options(builtin_multi_pack_index_expire_usage,
+ options);
+
+ return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
+}
+
+static int cmd_multi_pack_index_repack(int argc, const char **argv)
+{
+ struct option *options;
+ static struct option builtin_multi_pack_index_repack_options[] = {
OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
OPT_END(),
};
+ options = add_common_options(builtin_multi_pack_index_repack_options);
+
+ trace2_cmd_mode(argv[0]);
+
+ argc = parse_options(argc, argv, NULL,
+ options,
+ builtin_multi_pack_index_repack_usage,
+ PARSE_OPT_KEEP_UNKNOWN);
+ if (argc)
+ usage_with_options(builtin_multi_pack_index_repack_usage,
+ options);
+
+ FREE_AND_NULL(options);
+
+ return midx_repack(the_repository, opts.object_dir,
+ (size_t)opts.batch_size, opts.flags);
+}
+
+int cmd_multi_pack_index(int argc, const char **argv,
+ const char *prefix)
+{
+ struct option *builtin_multi_pack_index_options = common_opts;
+
git_config(git_default_config, NULL);
- opts.progress = isatty(2);
+ if (isatty(2))
+ opts.flags |= MIDX_PROGRESS;
argc = parse_options(argc, argv, prefix,
builtin_multi_pack_index_options,
- builtin_multi_pack_index_usage, 0);
+ builtin_multi_pack_index_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
if (!opts.object_dir)
opts.object_dir = get_object_directory();
- if (opts.progress)
- flags |= MIDX_PROGRESS;
if (argc == 0)
+ goto usage;
+
+ if (!strcmp(argv[0], "repack"))
+ return cmd_multi_pack_index_repack(argc, argv);
+ else if (!strcmp(argv[0], "write"))
+ return cmd_multi_pack_index_write(argc, argv);
+ else if (!strcmp(argv[0], "verify"))
+ return cmd_multi_pack_index_verify(argc, argv);
+ else if (!strcmp(argv[0], "expire"))
+ return cmd_multi_pack_index_expire(argc, argv);
+ else {
+usage:
+ error(_("unrecognized subcommand: %s"), argv[0]);
usage_with_options(builtin_multi_pack_index_usage,
builtin_multi_pack_index_options);
-
- if (argc > 1) {
- die(_("too many arguments"));
- return 1;
}
-
- trace2_cmd_mode(argv[0]);
-
- if (!strcmp(argv[0], "repack"))
- return midx_repack(the_repository, opts.object_dir,
- (size_t)opts.batch_size, flags);
- if (opts.batch_size)
- die(_("--batch-size option is only for 'repack' subcommand"));
-
- if (!strcmp(argv[0], "write"))
- return write_midx_file(opts.object_dir, flags);
- if (!strcmp(argv[0], "verify"))
- return verify_midx_file(the_repository, opts.object_dir, flags);
- if (!strcmp(argv[0], "expire"))
- return expire_midx_packs(the_repository, opts.object_dir, flags);
-
- die(_("unrecognized subcommand: %s"), argv[0]);
}
diff --git a/builtin/repack.c b/builtin/repack.c
index 6ce2556c9e..2847fdfbab 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -721,7 +721,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
remove_temporary_files();
if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
- write_midx_file(get_object_directory(), 0);
+ write_midx_file(get_object_directory(), NULL, 0);
string_list_clear(&names, 0);
string_list_clear(&rollback, 0);
diff --git a/midx.c b/midx.c
index becfafe65e..9e86583172 100644
--- a/midx.c
+++ b/midx.c
@@ -12,6 +12,7 @@
#include "run-command.h"
#include "repository.h"
#include "chunk-format.h"
+#include "pack.h"
#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
#define MIDX_VERSION 1
@@ -47,11 +48,22 @@ static uint8_t oid_version(void)
}
}
+static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+{
+ return m->data + m->data_len - the_hash_algo->rawsz;
+}
+
static char *get_midx_filename(const char *object_dir)
{
return xstrfmt("%s/pack/multi-pack-index", object_dir);
}
+char *get_midx_rev_filename(struct multi_pack_index *m)
+{
+ return xstrfmt("%s/pack/multi-pack-index-%s.rev",
+ m->object_dir, hash_to_hex(get_midx_checksum(m)));
+}
+
static int midx_read_oid_fanout(const unsigned char *chunk_start,
size_t chunk_size, void *data)
{
@@ -239,7 +251,7 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid,
return oid;
}
-static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
{
const unsigned char *offset_data;
uint32_t offset32;
@@ -258,7 +270,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
return offset32;
}
-static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
{
return get_be32(m->chunk_object_offsets +
(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
@@ -431,6 +443,14 @@ static int pack_info_compare(const void *_a, const void *_b)
return strcmp(a->pack_name, b->pack_name);
}
+static int idx_or_pack_name_cmp(const void *_va, const void *_vb)
+{
+ const char *pack_name = _va;
+ const struct pack_info *compar = _vb;
+
+ return cmp_idx_or_pack_name(pack_name, compar->pack_name);
+}
+
struct write_midx_context {
struct pack_info *info;
uint32_t nr;
@@ -443,8 +463,11 @@ struct write_midx_context {
uint32_t entries_nr;
uint32_t *pack_perm;
+ uint32_t *pack_order;
unsigned large_offsets_needed:1;
uint32_t num_large_offsets;
+
+ int preferred_pack_idx;
};
static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -489,6 +512,7 @@ struct pack_midx_entry {
uint32_t pack_int_id;
time_t pack_mtime;
uint64_t offset;
+ unsigned preferred : 1;
};
static int midx_oid_compare(const void *_a, const void *_b)
@@ -500,6 +524,12 @@ static int midx_oid_compare(const void *_a, const void *_b)
if (cmp)
return cmp;
+ /* Sort objects in a preferred pack first when multiple copies exist. */
+ if (a->preferred > b->preferred)
+ return -1;
+ if (a->preferred < b->preferred)
+ return 1;
+
if (a->pack_mtime > b->pack_mtime)
return -1;
else if (a->pack_mtime < b->pack_mtime)
@@ -527,7 +557,8 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
static void fill_pack_entry(uint32_t pack_int_id,
struct packed_git *p,
uint32_t cur_object,
- struct pack_midx_entry *entry)
+ struct pack_midx_entry *entry,
+ int preferred)
{
if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
die(_("failed to locate object %d in packfile"), cur_object);
@@ -536,6 +567,7 @@ static void fill_pack_entry(uint32_t pack_int_id,
entry->pack_mtime = p->mtime;
entry->offset = nth_packed_object_offset(p, cur_object);
+ entry->preferred = !!preferred;
}
/*
@@ -552,7 +584,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
struct pack_info *info,
uint32_t nr_packs,
- uint32_t *nr_objects)
+ uint32_t *nr_objects,
+ int preferred_pack)
{
uint32_t cur_fanout, cur_pack, cur_object;
uint32_t alloc_fanout, alloc_objects, total_objects = 0;
@@ -589,12 +622,17 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
nth_midxed_pack_midx_entry(m,
&entries_by_fanout[nr_fanout],
cur_object);
+ if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+ entries_by_fanout[nr_fanout].preferred = 1;
+ else
+ entries_by_fanout[nr_fanout].preferred = 0;
nr_fanout++;
}
}
for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
uint32_t start = 0, end;
+ int preferred = cur_pack == preferred_pack;
if (cur_fanout)
start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
@@ -602,7 +640,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
for (cur_object = start; cur_object < end; cur_object++) {
ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
- fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, &entries_by_fanout[nr_fanout]);
+ fill_pack_entry(cur_pack,
+ info[cur_pack].p,
+ cur_object,
+ &entries_by_fanout[nr_fanout],
+ preferred);
nr_fanout++;
}
}
@@ -776,10 +818,80 @@ static int write_midx_large_offsets(struct hashfile *f,
return 0;
}
+struct midx_pack_order_data {
+ uint32_t nr;
+ uint32_t pack;
+ off_t offset;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
+{
+ const struct midx_pack_order_data *a = va, *b = vb;
+ if (a->pack < b->pack)
+ return -1;
+ else if (a->pack > b->pack)
+ return 1;
+ else if (a->offset < b->offset)
+ return -1;
+ else if (a->offset > b->offset)
+ return 1;
+ else
+ return 0;
+}
+
+static uint32_t *midx_pack_order(struct write_midx_context *ctx)
+{
+ struct midx_pack_order_data *data;
+ uint32_t *pack_order;
+ uint32_t i;
+
+ ALLOC_ARRAY(data, ctx->entries_nr);
+ for (i = 0; i < ctx->entries_nr; i++) {
+ struct pack_midx_entry *e = &ctx->entries[i];
+ data[i].nr = i;
+ data[i].pack = ctx->pack_perm[e->pack_int_id];
+ if (!e->preferred)
+ data[i].pack |= (1U << 31);
+ data[i].offset = e->offset;
+ }
+
+ QSORT(data, ctx->entries_nr, midx_pack_order_cmp);
+
+ ALLOC_ARRAY(pack_order, ctx->entries_nr);
+ for (i = 0; i < ctx->entries_nr; i++)
+ pack_order[i] = data[i].nr;
+ free(data);
+
+ return pack_order;
+}
+
+static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
+ struct write_midx_context *ctx)
+{
+ struct strbuf buf = STRBUF_INIT;
+ const char *tmp_file;
+
+ strbuf_addf(&buf, "%s-%s.rev", midx_name, hash_to_hex(midx_hash));
+
+ tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
+ midx_hash, WRITE_REV);
+
+ if (finalize_object_file(tmp_file, buf.buf))
+ die(_("cannot store reverse index file"));
+
+ strbuf_release(&buf);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+ unsigned char *keep_hash);
+
static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
- struct string_list *packs_to_drop, unsigned flags)
+ struct string_list *packs_to_drop,
+ const char *preferred_pack_name,
+ unsigned flags)
{
char *midx_name;
+ unsigned char midx_hash[GIT_MAX_RAWSZ];
uint32_t i;
struct hashfile *f = NULL;
struct lock_file lk;
@@ -828,7 +940,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
goto cleanup;
- ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
+ ctx.preferred_pack_idx = -1;
+ if (preferred_pack_name) {
+ for (i = 0; i < ctx.nr; i++) {
+ if (!cmp_idx_or_pack_name(preferred_pack_name,
+ ctx.info[i].pack_name)) {
+ ctx.preferred_pack_idx = i;
+ break;
+ }
+ }
+ }
+
+ ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
+ ctx.preferred_pack_idx);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
@@ -889,13 +1013,30 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
}
+ /* Check that the preferred pack wasn't expired (if given). */
+ if (preferred_pack_name) {
+ struct pack_info *preferred = bsearch(preferred_pack_name,
+ ctx.info, ctx.nr,
+ sizeof(*ctx.info),
+ idx_or_pack_name_cmp);
+
+ if (!preferred)
+ warning(_("unknown preferred pack: '%s'"),
+ preferred_pack_name);
+ else {
+ uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
+ if (perm == PACK_EXPIRED)
+ warning(_("preferred pack '%s' is expired"),
+ preferred_pack_name);
+ }
+ }
+
if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
- FREE_AND_NULL(midx_name);
if (ctx.m)
close_midx(ctx.m);
@@ -927,8 +1068,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
write_chunkfile(cf, &ctx);
- finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+ finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
free_chunkfile(cf);
+
+ if (flags & MIDX_WRITE_REV_INDEX)
+ ctx.pack_order = midx_pack_order(&ctx);
+
+ if (flags & MIDX_WRITE_REV_INDEX)
+ write_midx_reverse_index(midx_name, midx_hash, &ctx);
+ clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
commit_lock_file(&lk);
cleanup:
@@ -943,13 +1092,55 @@ cleanup:
free(ctx.info);
free(ctx.entries);
free(ctx.pack_perm);
+ free(ctx.pack_order);
free(midx_name);
return result;
}
-int write_midx_file(const char *object_dir, unsigned flags)
+int write_midx_file(const char *object_dir,
+ const char *preferred_pack_name,
+ unsigned flags)
{
- return write_midx_internal(object_dir, NULL, NULL, flags);
+ return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
+ flags);
+}
+
+struct clear_midx_data {
+ char *keep;
+ const char *ext;
+};
+
+static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
+ const char *file_name, void *_data)
+{
+ struct clear_midx_data *data = _data;
+
+ if (!(starts_with(file_name, "multi-pack-index-") &&
+ ends_with(file_name, data->ext)))
+ return;
+ if (data->keep && !strcmp(data->keep, file_name))
+ return;
+
+ if (unlink(full_path))
+ die_errno(_("failed to remove %s"), full_path);
+}
+
+static void clear_midx_files_ext(struct repository *r, const char *ext,
+ unsigned char *keep_hash)
+{
+ struct clear_midx_data data;
+ memset(&data, 0, sizeof(struct clear_midx_data));
+
+ if (keep_hash)
+ data.keep = xstrfmt("multi-pack-index-%s%s",
+ hash_to_hex(keep_hash), ext);
+ data.ext = ext;
+
+ for_each_file_in_pack_dir(r->objects->odb->path,
+ clear_midx_file_ext,
+ &data);
+
+ free(data.keep);
}
void clear_midx_file(struct repository *r)
@@ -964,6 +1155,8 @@ void clear_midx_file(struct repository *r)
if (remove_path(midx))
die(_("failed to clear multi-pack-index at %s"), midx);
+ clear_midx_files_ext(r, ".rev", NULL);
+
free(midx);
}
@@ -1184,7 +1377,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
free(count);
if (packs_to_drop.nr)
- result = write_midx_internal(object_dir, m, &packs_to_drop, flags);
+ result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
string_list_clear(&packs_to_drop, 0);
return result;
@@ -1373,7 +1566,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
goto cleanup;
}
- result = write_midx_internal(object_dir, m, NULL, flags);
+ result = write_midx_internal(object_dir, m, NULL, NULL, flags);
m = NULL;
cleanup:
diff --git a/midx.h b/midx.h
index b18cf53bc4..8684cf0fef 100644
--- a/midx.h
+++ b/midx.h
@@ -15,6 +15,10 @@ struct multi_pack_index {
const unsigned char *data;
size_t data_len;
+ const uint32_t *revindex_data;
+ const uint32_t *revindex_map;
+ size_t revindex_len;
+
uint32_t signature;
unsigned char version;
unsigned char hash_len;
@@ -36,10 +40,15 @@ struct multi_pack_index {
};
#define MIDX_PROGRESS (1 << 0)
+#define MIDX_WRITE_REV_INDEX (1 << 1)
+
+char *get_midx_rev_filename(struct multi_pack_index *m);
struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
+off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
+uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
struct object_id *nth_midxed_object_oid(struct object_id *oid,
struct multi_pack_index *m,
uint32_t n);
@@ -47,7 +56,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
-int write_midx_file(const char *object_dir, unsigned flags);
+int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
void clear_midx_file(struct repository *r);
int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
diff --git a/pack-revindex.c b/pack-revindex.c
index 4262530449..0e4a31d9db 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -3,6 +3,7 @@
#include "object-store.h"
#include "packfile.h"
#include "config.h"
+#include "midx.h"
struct revindex_entry {
off_t offset;
@@ -293,6 +294,43 @@ int load_pack_revindex(struct packed_git *p)
return -1;
}
+int load_midx_revindex(struct multi_pack_index *m)
+{
+ char *revindex_name;
+ int ret;
+ if (m->revindex_data)
+ return 0;
+
+ revindex_name = get_midx_rev_filename(m);
+
+ ret = load_revindex_from_disk(revindex_name,
+ m->num_objects,
+ &m->revindex_map,
+ &m->revindex_len);
+ if (ret)
+ goto cleanup;
+
+ m->revindex_data = (const uint32_t *)((const char *)m->revindex_map + RIDX_HEADER_SIZE);
+
+cleanup:
+ free(revindex_name);
+ return ret;
+}
+
+int close_midx_revindex(struct multi_pack_index *m)
+{
+ if (!m || !m->revindex_map)
+ return 0;
+
+ munmap((void*)m->revindex_map, m->revindex_len);
+
+ m->revindex_map = NULL;
+ m->revindex_data = NULL;
+ m->revindex_len = 0;
+
+ return 0;
+}
+
int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
{
unsigned lo, hi;
@@ -347,3 +385,91 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
else
return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
}
+
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
+{
+ if (!m->revindex_data)
+ BUG("pack_pos_to_midx: reverse index not yet loaded");
+ if (m->num_objects <= pos)
+ BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
+ return get_be32(m->revindex_data + pos);
+}
+
+struct midx_pack_key {
+ uint32_t pack;
+ off_t offset;
+
+ uint32_t preferred_pack;
+ struct multi_pack_index *midx;
+};
+
+static int midx_pack_order_cmp(const void *va, const void *vb)
+{
+ const struct midx_pack_key *key = va;
+ struct multi_pack_index *midx = key->midx;
+
+ uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
+ uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
+ off_t versus_offset;
+
+ uint32_t key_preferred = key->pack == key->preferred_pack;
+ uint32_t versus_preferred = versus_pack == key->preferred_pack;
+
+ /*
+ * First, compare the preferred-ness, noting that the preferred pack
+ * comes first.
+ */
+ if (key_preferred && !versus_preferred)
+ return -1;
+ else if (!key_preferred && versus_preferred)
+ return 1;
+
+ /* Then, break ties first by comparing the pack IDs. */
+ if (key->pack < versus_pack)
+ return -1;
+ else if (key->pack > versus_pack)
+ return 1;
+
+ /* Finally, break ties by comparing offsets within a pack. */
+ versus_offset = nth_midxed_offset(midx, versus);
+ if (key->offset < versus_offset)
+ return -1;
+ else if (key->offset > versus_offset)
+ return 1;
+
+ return 0;
+}
+
+int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos)
+{
+ struct midx_pack_key key;
+ uint32_t *found;
+
+ if (!m->revindex_data)
+ BUG("midx_to_pack_pos: reverse index not yet loaded");
+ if (m->num_objects <= at)
+ BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at);
+
+ key.pack = nth_midxed_pack_int_id(m, at);
+ key.offset = nth_midxed_offset(m, at);
+ key.midx = m;
+ /*
+ * The preferred pack sorts first, so determine its identifier by
+ * looking at the first object in pseudo-pack order.
+ *
+ * Note that if no --preferred-pack is explicitly given when writing a
+ * multi-pack index, then whichever pack has the lowest identifier
+ * implicitly is preferred (and includes all its objects, since ties are
+ * broken first by pack identifier).
+ */
+ key.preferred_pack = nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
+
+ found = bsearch(&key, m->revindex_data, m->num_objects,
+ sizeof(*m->revindex_data), midx_pack_order_cmp);
+
+ if (!found)
+ return error("bad offset for revindex");
+
+ *pos = found - m->revindex_data;
+ return 0;
+}
diff --git a/pack-revindex.h b/pack-revindex.h
index ba7c82c125..479b8f2f9c 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -14,6 +14,20 @@
*
* - offset: the byte offset within the .pack file at which the object contents
* can be found
+ *
+ * The revindex can also be used with a multi-pack index (MIDX). In this
+ * setting:
+ *
+ * - index position refers to an object's numeric position within the MIDX
+ *
+ * - pack position refers to an object's position within a non-existent pack
+ * described by the MIDX. The pack structure is described in
+ * Documentation/technical/pack-format.txt.
+ *
+ * It is effectively a concatanation of all packs in the MIDX (ordered by
+ * their numeric ID within the MIDX) in their original order within each
+ * pack), removing duplicates, and placing the preferred pack (if any)
+ * first.
*/
@@ -24,6 +38,7 @@
#define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
struct packed_git;
+struct multi_pack_index;
/*
* load_pack_revindex populates the revindex's internal data-structures for the
@@ -35,6 +50,22 @@ struct packed_git;
int load_pack_revindex(struct packed_git *p);
/*
+ * load_midx_revindex loads the '.rev' file corresponding to the given
+ * multi-pack index by mmap-ing it and assigning pointers in the
+ * multi_pack_index to point at it.
+ *
+ * A negative number is returned on error.
+ */
+int load_midx_revindex(struct multi_pack_index *m);
+
+/*
+ * Frees resources associated with a multi-pack reverse index.
+ *
+ * A negative number is returned on error.
+ */
+int close_midx_revindex(struct multi_pack_index *m);
+
+/*
* offset_to_pack_pos converts an object offset to a pack position. This
* function returns zero on success, and a negative number otherwise. The
* parameter 'pos' is usable only on success.
@@ -71,4 +102,26 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
*/
off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
+/*
+ * pack_pos_to_midx converts the object at position "pos" within the MIDX
+ * pseudo-pack into a MIDX position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in time O(log N) with the number of objects in the MIDX.
+ */
+uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos);
+
+/*
+ * midx_to_pack_pos converts from the MIDX-relative position at "at" to the
+ * corresponding pack position.
+ *
+ * If the reverse index has not yet been loaded, or the position is out of
+ * bounds, this function aborts.
+ *
+ * This function runs in constant time.
+ */
+int midx_to_pack_pos(struct multi_pack_index *midx, uint32_t at, uint32_t *pos);
+
#endif
diff --git a/pack-write.c b/pack-write.c
index 2ca85a9d16..f1fc3ecafa 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -201,21 +201,12 @@ static void write_rev_header(struct hashfile *f)
}
static void write_rev_index_positions(struct hashfile *f,
- struct pack_idx_entry **objects,
+ uint32_t *pack_order,
uint32_t nr_objects)
{
- uint32_t *pack_order;
uint32_t i;
-
- ALLOC_ARRAY(pack_order, nr_objects);
- for (i = 0; i < nr_objects; i++)
- pack_order[i] = i;
- QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
-
for (i = 0; i < nr_objects; i++)
hashwrite_be32(f, pack_order[i]);
-
- free(pack_order);
}
static void write_rev_trailer(struct hashfile *f, const unsigned char *hash)
@@ -229,6 +220,29 @@ const char *write_rev_file(const char *rev_name,
const unsigned char *hash,
unsigned flags)
{
+ uint32_t *pack_order;
+ uint32_t i;
+ const char *ret;
+
+ ALLOC_ARRAY(pack_order, nr_objects);
+ for (i = 0; i < nr_objects; i++)
+ pack_order[i] = i;
+ QSORT_S(pack_order, nr_objects, pack_order_cmp, objects);
+
+ ret = write_rev_file_order(rev_name, pack_order, nr_objects, hash,
+ flags);
+
+ free(pack_order);
+
+ return ret;
+}
+
+const char *write_rev_file_order(const char *rev_name,
+ uint32_t *pack_order,
+ uint32_t nr_objects,
+ const unsigned char *hash,
+ unsigned flags)
+{
struct hashfile *f;
int fd;
@@ -262,7 +276,7 @@ const char *write_rev_file(const char *rev_name,
write_rev_header(f);
- write_rev_index_positions(f, objects, nr_objects);
+ write_rev_index_positions(f, pack_order, nr_objects);
write_rev_trailer(f, hash);
if (rev_name && adjust_shared_perm(rev_name) < 0)
diff --git a/pack.h b/pack.h
index 857cbd5bd4..fa13954526 100644
--- a/pack.h
+++ b/pack.h
@@ -94,6 +94,7 @@ struct ref;
void write_promisor_file(const char *promisor_name, struct ref **sought, int nr_sought);
const char *write_rev_file(const char *rev_name, struct pack_idx_entry **objects, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
+const char *write_rev_file_order(const char *rev_name, uint32_t *pack_order, uint32_t nr_objects, const unsigned char *hash, unsigned flags);
/*
* The "hdr" output buffer should be at least this big, which will handle sizes
diff --git a/packfile.c b/packfile.c
index 6661f3325a..8668345d93 100644
--- a/packfile.c
+++ b/packfile.c
@@ -862,6 +862,9 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
if (!strcmp(file_name, "multi-pack-index"))
return;
+ if (starts_with(file_name, "multi-pack-index") &&
+ ends_with(file_name, ".rev"))
+ return;
if (ends_with(file_name, ".idx") ||
ends_with(file_name, ".rev") ||
ends_with(file_name, ".pack") ||
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 2430880f78..7c2eb11a8e 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -4,7 +4,7 @@
#include "repository.h"
#include "object-store.h"
-static int read_midx_file(const char *object_dir)
+static int read_midx_file(const char *object_dir, int show_objects)
{
uint32_t i;
struct multi_pack_index *m;
@@ -43,13 +43,29 @@ static int read_midx_file(const char *object_dir)
printf("object-dir: %s\n", m->object_dir);
+ if (show_objects) {
+ struct object_id oid;
+ struct pack_entry e;
+
+ for (i = 0; i < m->num_objects; i++) {
+ nth_midxed_object_oid(&oid, m, i);
+ fill_midx_entry(the_repository, &oid, &e, m);
+
+ printf("%s %"PRIu64"\t%s\n",
+ oid_to_hex(&oid), e.offset, e.p->pack_name);
+ }
+ return 0;
+ }
+
return 0;
}
int cmd__read_midx(int argc, const char **argv)
{
- if (argc != 2)
- usage("read-midx <object-dir>");
+ if (!(argc == 2 || argc == 3))
+ usage("read-midx [--show-objects] <object-dir>");
- return read_midx_file(argv[1]);
+ if (!strcmp(argv[1], "--show-objects"))
+ return read_midx_file(argv[2], 1);
+ return read_midx_file(argv[1], 0);
}
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index b4afab1dfc..5641d158df 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -234,6 +234,48 @@ test_expect_success 'warn on improper hash version' '
)
'
+test_expect_success 'midx picks objects from preferred pack' '
+ test_when_finished rm -rf preferred.git &&
+ git init --bare preferred.git &&
+ (
+ cd preferred.git &&
+
+ a=$(echo "a" | git hash-object -w --stdin) &&
+ b=$(echo "b" | git hash-object -w --stdin) &&
+ c=$(echo "c" | git hash-object -w --stdin) &&
+
+ # Set up two packs, duplicating the object "B" at different
+ # offsets.
+ #
+ # Note that the "BC" pack (the one we choose as preferred) sorts
+ # lexically after the "AB" pack, meaning that omitting the
+ # --preferred-pack argument would cause this test to fail (since
+ # the MIDX code would select the copy of "b" in the "AB" pack).
+ git pack-objects objects/pack/test-AB <<-EOF &&
+ $a
+ $b
+ EOF
+ bc=$(git pack-objects objects/pack/test-BC <<-EOF
+ $b
+ $c
+ EOF
+ ) &&
+
+ git multi-pack-index --object-dir=objects \
+ write --preferred-pack=test-BC-$bc.idx 2>err &&
+ test_must_be_empty err &&
+
+ test-tool read-midx --show-objects objects >out &&
+
+ ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
+ cut -d" " -f1) &&
+ printf "%s %s\tobjects/pack/test-BC-%s.pack\n" \
+ "$b" "$ofs" "$bc" >expect &&
+ grep ^$b out >actual &&
+
+ test_cmp expect actual
+ )
+'
test_expect_success 'verify multi-pack-index success' '
git multi-pack-index verify --object-dir=$objdir