From 6fb22ca463077a07f42675be52e68891f319b5c2 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 28 Sep 2021 21:55:04 -0400 Subject: builtin/multi-pack-index.c: support `--stdin-packs` mode To power a new `--write-midx` mode, `git repack` will want to write a multi-pack index containing a certain set of packs in the repository. This new option will be used by `git repack` to write a MIDX which contains only the packs which will survive after the repack (that is, it will exclude any packs which are about to be deleted). This patch effectively exposes the function implemented in the previous commit via the `git multi-pack-index` builtin. An alternative approach would have been to call that function from the `git repack` builtin directly, but this introduces awkward problems around closing and reopening the object store, so the MIDX will be written out-of-process. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-multi-pack-index.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index a9df3dbd32..009c989ef8 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -45,6 +45,10 @@ write:: --[no-]bitmap:: Control whether or not a multi-pack bitmap is written. + + --stdin-packs:: + Write a multi-pack index containing only the set of + line-delimited pack index basenames provided over stdin. -- verify:: -- cgit v1.2.3 From 08944d1c221a7f4fe42a50c0f11f129769edc9b1 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 28 Sep 2021 21:55:07 -0400 Subject: midx: preliminary support for `--refs-snapshot` To figure out which commits we can write a bitmap for, the multi-pack index/bitmap code does a reachability traversal, marking any commit which can be found in the MIDX as eligible to receive a bitmap. This approach will cause a problem when multi-pack bitmaps are able to be generated from `git repack`, since the reference tips can change during the repack. Even though we ignore commits that don't exist in the MIDX (when doing a scan of the ref tips), it's possible that a commit in the MIDX reaches something that isn't. This can happen when a multi-pack index contains some pack which refers to loose objects (e.g., if a pack was pushed after starting the repack but before generating the MIDX which depends on an object which is stored as loose in the repository, and by definition isn't included in the multi-pack index). By taking a snapshot of the references before we start repacking, we can close that race window. In the above scenario (where we have a packed object pointing at a loose one), we'll either (a) take a snapshot of the references before seeing the packed one, or (b) take it after, at which point we can guarantee that the loose object will be packed and included in the MIDX. This patch does just that. It writes a temporary "reference snapshot", which is a list of OIDs that are at the ref tips before writing a multi-pack bitmap. References that are "preferred" (i.e,. are a suffix of at least one value of the 'pack.preferBitmapTips' configuration) are marked with a special '+'. The format is simple: one line per commit at each tip, with an optional '+' at the beginning (for preferred references, as described above). When provided, the reference snapshot is used to drive bitmap selection instead of the MIDX code doing its own traversal. When it isn't provided, the usual traversal takes place instead. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-multi-pack-index.txt | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'Documentation') diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 009c989ef8..27f83932e4 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -49,6 +49,21 @@ write:: --stdin-packs:: Write a multi-pack index containing only the set of line-delimited pack index basenames provided over stdin. + + --refs-snapshot=:: + With `--bitmap`, optionally specify a file which + contains a "refs snapshot" taken prior to repacking. ++ +A reference snapshot is composed of line-delimited OIDs corresponding to +the reference tips, usually taken by `git repack` prior to generating a +new pack. A line may optionally start with a `+` character to indicate +that the reference which corresponds to that OID is "preferred" (see +linkgit:git-config[1]'s `pack.preferBitmapTips`.) ++ +The file given at `` is expected to be readable, and can contain +duplicates. (If a given OID is given more than once, it is marked as +preferred if at least one instance of it begins with the special `+` +marker). -- verify:: -- cgit v1.2.3 From 1d89d88d37d18cd3af37ef4742aa39282441ec14 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 28 Sep 2021 21:55:18 -0400 Subject: builtin/repack.c: support writing a MIDX while repacking Teach `git repack` a new `--write-midx` option for callers that wish to persist a multi-pack index in their repository while repacking. There are two existing alternatives to this new flag, but they don't cover our particular use-case. These alternatives are: - Call 'git multi-pack-index write' after running 'git repack', or - Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running 'git repack'. The former works, but introduces a gap in bitmap coverage between repacking and writing a new MIDX (since the repack may have deleted a pack included in the existing MIDX, invalidating it altogether). Setting the 'GIT_TEST_' environment variable is obviously unsupported. In fact, even if it were supported officially, it still wouldn't work, because it generates the MIDX *after* redundant packs have been dropped, leading to the same issue as above. Introduce a new option which eliminates this race by teaching `git repack` to generate the MIDX at the critical point: after the new packs have been written and moved into place, but before the redundant packs have been removed. This option is compatible with `git repack`'s '--bitmap' option (it changes the interpretation to be: "write a bitmap corresponding to the MIDX after one has been generated"). There is a little bit of additional noise in the patch below to avoid repeating ourselves when selecting which packs to delete. Instead of a single loop as before (where we iterate over 'existing_packs', decide if a pack is worth deleting, and if so, delete it), we have two loops (the first where we decide which ones are worth deleting, and the second where we actually do the deleting). This makes it so we have a single check we can make consistently when (1) telling the MIDX which packs we want to exclude, and (2) actually unlinking the redundant packs. There is also a tiny change to short-circuit the body of write_midx_included_packs() when no packs remain in the case of an empty repository. The MIDX code does not handle this, so avoid trying to generate a MIDX covering zero packs in the first place. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-repack.txt | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 24c00c9384..0f2d235ca5 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository SYNOPSIS -------- [verse] -'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=] [--depth=] [--threads=] [--keep-pack=] +'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] [--window=] [--depth=] [--threads=] [--keep-pack=] [--write-midx] DESCRIPTION ----------- @@ -128,10 +128,11 @@ depth is 4095. -b:: --write-bitmap-index:: Write a reachability bitmap index as part of the repack. This - only makes sense when used with `-a` or `-A`, as the bitmaps + only makes sense when used with `-a`, `-A` or `-m`, as the bitmaps must be able to refer to all reachable objects. This option - overrides the setting of `repack.writeBitmaps`. This option - has no effect if multiple packfiles are created. + overrides the setting of `repack.writeBitmaps`. This option + has no effect if multiple packfiles are created, unless writing a + MIDX (in which case a multi-pack bitmap is created). --pack-kept-objects:: Include objects in `.keep` files when repacking. Note that we @@ -190,6 +191,11 @@ to change in the future. This option (implying a drastically different repack mode) is not guaranteed to work with all other combinations of option to `git repack`. +-m:: +--write-midx:: + Write a multi-pack index (see linkgit:git-multi-pack-index[1]) + containing the non-redundant packs. + CONFIGURATION ------------- -- cgit v1.2.3 From 6d08b9d4caa230441b7d9e2b4f23deaf9ff74c13 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 28 Sep 2021 21:55:20 -0400 Subject: builtin/repack.c: make largest pack preferred When repacking into a geometric series and writing a multi-pack bitmap, it is beneficial to have the largest resulting pack be the preferred object source in the bitmap's MIDX, since selecting the large packs can lead to fewer broken delta chains and better compression. Teach 'git repack' to identify this pack and pass it to the MIDX write machinery in order to mark it as preferred. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-repack.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 0f2d235ca5..7183fb498f 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -190,6 +190,10 @@ this "roll-up", without respect to their reachability. This is subject to change in the future. This option (implying a drastically different repack mode) is not guaranteed to work with all other combinations of option to `git repack`. ++ +When writing a multi-pack bitmap, `git repack` selects the largest resulting +pack as the preferred pack for object selection by the MIDX (see +linkgit:git-multi-pack-index[1]). -m:: --write-midx:: -- cgit v1.2.3