summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-14midx.c: respect 'pack.writeBitmapHashcache' when writing bitmapsLibravatar Taylor Blau4-1/+31
In the previous commit, the bitmap writing code learned to propagate values from an existing hash-cache extension into the bitmap that it is writing. Now that that functionality exists, let's expose it by teaching the 'git multi-pack-index' builtin to respect the `pack.writeBitmapHashCache` option so that the hash-cache may be written at all. Two minor points worth noting here: - The 'git multi-pack-index write' sub-command didn't previously read any configuration (instead this is handled in the base command). A separate handler is added here to respect this write-specific config option. - I briefly considered adding a 'bitmap_flags' field to the static options struct, but decided against it since it would require plumbing through a new parameter to the write_midx_file() function. Instead, a new MIDX-specific flag is added, which is translated to the corresponding bitmap one. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-14pack-bitmap.c: propagate namehash values from existing bitmapsLibravatar Taylor Blau1-6/+8
When an old bitmap exists while writing a new one, we load it and build a "reposition" table which maps bit positions of objects from the old bitmap to their respective positions in the new bitmap. This can help when we encounter a commit which was selected in both the old and new bitmap, since we only need to permute its bit (not recompute it from scratch). We do not, however, repurpose existing namehash values in the case of the hash-cache extension. There has been thus far no good reason to do so, since all of the namehash values for objects in the new bitmap would be populated during the traversal that was just performed by pack-objects when generating single-pack reachability bitmaps. But this isn't the case for multi-pack bitmaps, which are written via `git multi-pack-index write --bitmap` and do not perform any traversal. In this case all namehash values are set to zero, but we don't even bother to check the `pack.writeBitmapHashcache` option anyway, so it fails to matter. There are two approaches we could take to fill in non-zero hash-cache values: - have either the multi-pack-index builtin run its own traversal to attempt to fill in some values, or let a hypothetical caller (like `pack-objects` when `repack` eventually drives the `multi-pack-index` builtin) fill in the values they found during their traversal - or copy any existing namehash values that were stored in an existing bitmap to their corresponding positions in the new bitmap In a system where a repository is generally repacked with `git repack --geometric=<d>` and occasionally repacked with `git repack -a`, the hash-cache coverage will tend towards all objects. Since populating the hash-cache is additive (i.e., doing so only helps our delta search), any intermediate lack of full coverage is just fine. So let's start by just propagating any values from the existing hash-cache if we see one. The next patch will respect the `pack.writeBitmapHashcache` option while writing MIDX bitmaps, and then test this new behavior. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-14t/helper/test-bitmap.c: add 'dump-hashes' modeLibravatar Taylor Blau3-1/+37
The pack-bitmap writer code is about to learn how to propagate values from an existing hash-cache. To prepare, teach the test-bitmap helper to dump the values from a bitmap's hash-cache extension in order to test those changes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09pack-bitmap: drop bitmap_index argument from try_partial_reuse()Libravatar Jeff King1-3/+2
Starting in commit 0f533c7284 (pack-bitmap: read multi-pack bitmaps, 2021-08-31), we no longer look at the "struct bitmap_index" passed to try_partial_reuse(). This is because we only handle verbatim reuse from a single pack: either the pack whose bitmap we're looking at, or the "preferred" pack of a midx bitmap. And thus the primary item we look at is the "pack" parameter added by that same commit, and not the bitmap_git->pack parameter (which would be NULL for a midx bitmap). It's our caller, reuse_partial_packfile_from_bitmap(), which decides which pack to use and passes it in to us. Drop the unused parameter to prevent confusion. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09pack-bitmap: drop repository argument from prepare_midx_bitmap_git()Libravatar Jeff King3-5/+3
We never look at the repository argument which is passed. This makes sense, since the multi_pack_index struct already tells us everything we need to access the files in its associated object directory. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01p5326: perf tests for MIDX bitmapsLibravatar Taylor Blau1-0/+43
These new performance tests demonstrate effectively the same behavior as p5310, but use a multi-pack bitmap instead of a single-pack one. Notably, p5326 does not create a MIDX bitmap with multiple packs. This is so we can measure a direct comparison between it and p5310. Any difference between the two is measuring just the overhead of using MIDX bitmaps. Here are the results of p5310 and p5326 together, measured at the same time and on the same machine (using a Xenon W-2255 CPU): Test HEAD ------------------------------------------------------------------------ 5310.2: repack to disk 96.78(93.39+11.33) 5310.3: simulated clone 9.98(9.79+0.19) 5310.4: simulated fetch 1.75(4.26+0.19) 5310.5: pack to file (bitmap) 28.20(27.87+8.70) 5310.6: rev-list (commits) 0.41(0.36+0.05) 5310.7: rev-list (objects) 1.61(1.54+0.07) 5310.8: rev-list count with blob:none 0.25(0.21+0.04) 5310.9: rev-list count with blob:limit=1k 2.65(2.54+0.10) 5310.10: rev-list count with tree:0 0.23(0.19+0.04) 5310.11: simulated partial clone 4.34(4.21+0.12) 5310.13: clone (partial bitmap) 11.05(12.21+0.48) 5310.14: pack to file (partial bitmap) 31.25(34.22+3.70) 5310.15: rev-list with tree filter (partial bitmap) 0.26(0.22+0.04) versus the same tests (this time using a multi-pack index): Test HEAD ------------------------------------------------------------------------ 5326.2: setup multi-pack index 78.99(75.29+11.58) 5326.3: simulated clone 11.78(11.56+0.22) 5326.4: simulated fetch 1.70(4.49+0.13) 5326.5: pack to file (bitmap) 28.02(27.72+8.76) 5326.6: rev-list (commits) 0.42(0.36+0.06) 5326.7: rev-list (objects) 1.65(1.58+0.06) 5326.8: rev-list count with blob:none 0.26(0.21+0.05) 5326.9: rev-list count with blob:limit=1k 2.97(2.86+0.10) 5326.10: rev-list count with tree:0 0.25(0.20+0.04) 5326.11: simulated partial clone 5.65(5.49+0.16) 5326.13: clone (partial bitmap) 12.22(13.43+0.38) 5326.14: pack to file (partial bitmap) 30.05(31.57+7.25) 5326.15: rev-list with tree filter (partial bitmap) 0.24(0.20+0.04) There is slight overhead in "simulated clone", "simulated partial clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is due to using the MIDX's reverse index to map between bit positions and MIDX positions. This can be reproduced by running "git repack -adb" along with "git multi-pack-index write --bitmap" in a large-ish repository. Then run: $ perf record -o pack.perf git -c core.multiPackIndex=false \ pack-objects --all --stdout >/dev/null </dev/null $ perf record -o midx.perf git -c core.multiPackIndex=true \ pack-objects --all --stdout >/dev/null </dev/null and compare the two with "perf diff -c delta -o 1 pack.perf midx.perf". The most notable results are below (the next largest positive delta is +0.14%): # Event 'cycles' # # Baseline Delta Shared Object Symbol # ........ ....... .................. .......................... # +5.86% git [.] nth_midxed_offset +5.24% git [.] nth_midxed_pack_int_id 3.45% +0.97% git [.] offset_to_pack_pos 3.30% +0.57% git [.] pack_pos_to_offset +0.30% git [.] pack_pos_to_midx Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01p5310: extract full and partial bitmap testsLibravatar Taylor Blau2-62/+72
A new p5326 introduced by the next patch will want these same tests, interjecting its own setup in between. Move them out so that both perf tests can reuse them. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'Libravatar Taylor Blau4-2/+17
Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment variable to also write a multi-pack bitmap when 'GIT_TEST_MULTI_PACK_INDEX' is set. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t7700: update to work with MIDX bitmap test knobLibravatar Taylor Blau1-6/+12
A number of these tests are focused only on pack-based bitmaps and need to be updated to disable 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' where necessary. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t5319: don't write MIDX bitmaps in t5319Libravatar Taylor Blau1-1/+2
This test is specifically about generating a midx still respecting a pack-based bitmap file. Generating a MIDX bitmap would confuse the test. Let's override the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' variable to make sure we don't do so. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAPLibravatar Jeff King1-0/+4
Generating a MIDX bitmap confuses many of the tests in t5310, which expect to control whether and how bitmaps are written. Since the relevant MIDX-bitmap tests here are covered already in t5326, let's just disable the flag for the whole t5310 script. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAPLibravatar Jeff King1-0/+3
Generating a MIDX bitmap causes tests which repack in a partial clone to fail because they are missing objects. Missing objects is an expected component of tests in t0410, so disable this knob altogether. Graceful degradation when writing a bitmap with missing objects is tested in t5326. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t5326: test multi-pack bitmap behaviorLibravatar Taylor Blau1-0/+286
This patch introduces a new test, t5326, which tests the basic functionality of multi-pack bitmaps. Some trivial behavior is tested, such as: - Whether bitmaps can be generated with more than one pack. - Whether clones can be served with all objects in the bitmap. - Whether follow-up fetches can be served with some objects outside of the server's bitmap These use lib-bitmap's tests (which in turn were pulled from t5310), and we cover cases where the MIDX represents both a single pack and multiple packs. In addition, some non-trivial and MIDX-specific behavior is tested, too, including: - Whether multi-pack bitmaps behave correctly with respect to the pack-reuse machinery when the base for some object is selected from a different pack than the delta. - Whether multi-pack bitmaps correctly respect the pack.preferBitmapTips configuration. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t/helper/test-read-midx.c: add --checksum modeLibravatar Taylor Blau2-1/+19
Subsequent tests will want to check for the existence of a multi-pack bitmap which matches the multi-pack-index stored in the pack directory. The multi-pack bitmap includes the hex checksum of the MIDX it corresponds to in its filename (for example, '$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests want a way to learn what '<checksum>' is. This helper addresses that need by printing the checksum of the repository's multi-pack-index. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01t5310: move some tests to lib-bitmap.shLibravatar Taylor Blau2-224/+241
We'll soon be adding a test script that will cover many of the same bitmap concepts as t5310, but for MIDX bitmaps. Let's pull out as many of the applicable tests as we can so we don't have to rewrite them. There should be no functional change to t5310; we still run the same operations in the same order. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01pack-bitmap: write multi-pack bitmapsLibravatar Taylor Blau4-9/+215
Write multi-pack bitmaps in the format described by Documentation/technical/bitmap-format.txt, inferring their presence with the absence of '--bitmap'. To write a multi-pack bitmap, this patch attempts to reuse as much of the existing machinery from pack-objects as possible. Specifically, the MIDX code prepares a packing_data struct that pretends as if a single packfile has been generated containing all of the objects contained within the MIDX. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01pack-bitmap: read multi-pack bitmapsLibravatar Taylor Blau7-42/+336
This prepares the code in pack-bitmap to interpret the new multi-pack bitmaps described in Documentation/technical/bitmap-format.txt, which mostly involves converting bit positions to accommodate looking them up in a MIDX. Note that there are currently no writers who write multi-pack bitmaps, and that this will be implemented in the subsequent commit. Note also that get_midx_checksum() and get_midx_filename() are made non-static so they can be called from pack-bitmap.c. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01pack-bitmap.c: avoid redundant calls to try_partial_reuseLibravatar Taylor Blau1-11/+29
try_partial_reuse() is used to mark any bits in the beginning of a bitmap whose objects can be reused verbatim from the pack they came from. Currently this function returns void, and signals nothing to the caller when bits could not be reused. But multi-pack bitmaps would benefit from having such a signal, because they may try to pass objects which are in bounds, but from a pack other than the preferred one. Any extra calls are noops because of a conditional in reuse_partial_packfile_from_bitmap(), but those loop iterations can be avoided by letting try_partial_reuse() indicate when it can't accept any more bits for reuse, and then listening to that signal. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'Libravatar Taylor Blau2-0/+17
In a recent commit, pack-objects learned support for the 'pack.preferBitmapTips' configuration. This patch prepares the multi-pack bitmap code to respect this configuration, too. The yet-to-be implemented code will find that it is more efficient to check whether each reference contains a prefix found in the configured set of values rather than doing an additional traversal. Implement a function 'bitmap_is_preferred_refname()' which will perform that check. Its caller will be added in a subsequent patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01pack-bitmap.c: introduce 'nth_bitmap_object_oid()'Libravatar Taylor Blau1-3/+10
A subsequent patch to support reading MIDX bitmaps will be less noisy after extracting a generic function to fetch the nth OID contained in the bitmap. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01pack-bitmap.c: introduce 'bitmap_num_objects()'Libravatar Taylor Blau1-16/+21
A subsequent patch to support reading MIDX bitmaps will be less noisy after extracting a generic function to return how many objects are contained in a bitmap. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01midx: avoid opening multiple MIDXs when writingLibravatar Taylor Blau6-34/+51
Opening multiple instance of the same MIDX can lead to problems like two separate packed_git structures which represent the same pack being added to the repository's object store. The above scenario can happen because prepare_midx_pack() checks if `m->packs[pack_int_id]` is NULL in order to determine if a pack has been opened and installed in the repository before. But a caller can construct two copies of the same MIDX by calling get_multi_pack_index() and load_multi_pack_index() since the former manipulates the object store directly but the latter is a lower-level routine which allocates a new MIDX for each call. So if prepare_midx_pack() is called on multiple MIDXs with the same pack_int_id, then that pack will be installed twice in the object store's packed_git pointer. This can lead to problems in, for e.g., the pack-bitmap code, which does something like the following (in pack-bitmap.c:open_pack_bitmap()): struct bitmap_index *bitmap_git = ...; for (p = get_all_packs(r); p; p = p->next) { if (open_pack_bitmap_1(bitmap_git, p) == 0) ret = 0; } which is a problem if two copies of the same pack exist in the packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a conditional like the following: if (bitmap_git->pack || bitmap_git->midx) { /* ignore extra bitmap file; we can only handle one */ warning("ignoring extra bitmap file: %s", packfile->pack_name); close(fd); return -1; } Avoid this scenario by not letting write_midx_internal() open a MIDX that isn't also pointed at by the object store. So long as this is the case, other routines should prefer to open MIDXs with get_multi_pack_index() or reprepare_packed_git() instead of creating instances on their own. Because get_multi_pack_index() returns `r->object_store->multi_pack_index` if it is non-NULL, we'll only have one instance of a MIDX open at one time, avoiding these problems. To encourage this, drop the `struct multi_pack_index *` parameter from `write_midx_internal()`, and rely instead on the `object_dir` to find (or initialize) the correct MIDX instance. Likewise, replace the call to `close_midx()` with `close_object_store()`, since we're about to replace the MIDX with a new one and should invalidate the object store's memory of any MIDX that might have existed beforehand. Note that this now forbids passing object directories that don't belong to alternate repositories over `--object-dir`, since before we would have happily opened a MIDX in any directory, but now restrict ourselves to only those reachable by `r->objects->multi_pack_index` (and alternate MIDXs that we can see by walking the `next` pointer). As far as I can tell, supporting arbitrary directories with `--object-dir` was a historical accident, since even the documentation says `<alt>` when referring to the value passed to this option. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01midx: close linked MIDXs, avoid leaking memoryLibravatar Taylor Blau1-0/+3
When a repository has at least one alternate, the MIDX belonging to each alternate is accessed through the `next` pointer on the main object store's copy of the MIDX. close_midx() didn't bother to close any of the linked MIDXs. It likewise didn't free the memory pointed to by `m`, leaving uninitialized bytes with live pointers to them left around in the heap. Clean this up by closing linked MIDXs, and freeing up the memory pointed to by each of them. When callers call close_midx(), then they can discard the entire linked list of MIDXs and set their pointer to the head of that list to NULL. This isn't strictly required for the upcoming patches, but it makes it much more difficult (though still possible, for e.g., by calling `close_midx(m->next)` which leaves `m->next` pointing at uninitialized bytes) to have pointers to uninitialized memory. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01midx: infer preferred pack when not given oneLibravatar Taylor Blau1-6/+44
In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the multi-pack index code learned how to select a pack which all duplicate objects are selected from. That is, if an object appears in multiple packs, select the copy in the preferred pack before breaking ties according to the other rules like pack mtime and readdir() order. Not specifying a preferred pack can cause serious problems with multi-pack reachability bitmaps, because these bitmaps rely on having at least one pack from which all duplicates are selected. Not having such a pack causes problems with the code in pack-objects to reuse packs verbatim (e.g., that code assumes that a delta object in a chunk of pack sent verbatim will have its base object sent from the same pack). So why does not marking a pack preferred cause problems here? The reason is roughly as follows: - Ties are broken (when handling duplicate objects) by sorting according to midx_oid_compare(), which sorts objects by OID, preferred-ness, pack mtime, and finally pack ID (more on that later). - The psuedo pack-order (described in Documentation/technical/pack-format.txt under the section "multi-pack-index reverse indexes") is computed by midx_pack_order(), and sorts by pack ID and pack offset, with preferred packs sorting first. - But! Pack IDs come from incrementing the pack count in add_pack_to_midx(), which is a callback to for_each_file_in_pack_dir(), meaning that pack IDs are assigned in readdir() order. When specifying a preferred pack, all of that works fine, because duplicate objects are correctly resolved in favor of the copy in the preferred pack, and the preferred pack sorts first in the object order. "Sorting first" is critical, because the bitmap code relies on finding out which pack holds the first object in the MIDX's pseudo pack-order to determine which pack is preferred. But if we didn't specify a preferred pack, and the pack which comes first in readdir() order does not also have the lowest timestamp, then it's possible that that pack (the one that sorts first in pseudo-pack order, which the bitmap code will treat as the preferred one) did *not* have all duplicate objects resolved in its favor, resulting in breakage. The fix is simple: pick a (semi-arbitrary, non-empty) preferred pack when none was specified. This forces that pack to have duplicates resolved in its favor, and (critically) to sort first in pseudo-pack order. Unfortunately, testing this behavior portably isn't possible, since it depends on readdir() order which isn't guaranteed by POSIX. (Note that multi-pack reachability bitmaps have yet to be implemented; so in that sense this patch is fixing a bug which does not yet exist. But by having this patch beforehand, we can prevent the bug from ever materializing.) Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01midx: reject empty `--preferred-pack`'sLibravatar Taylor Blau3-3/+49
The soon-to-be-implemented multi-pack bitmap treats object in the first bit position specially by assuming that all objects in the pack it was selected from are also represented from that pack in the MIDX. In other words, the pack from which the first object was selected must also have all of its other objects selected from that same pack in the MIDX in case of any duplicates. But this assumption relies on the fact that there is at least one object in that pack to begin with; otherwise the object in the first bit position isn't from a preferred pack, in which case we can no longer assume that all objects in that pack were also selected from the same pack. Guard this assumption by checking the number of objects in the given preferred pack, and failing if the given pack is empty. To make sure we can safely perform this check, open any packs which are contained in an existing MIDX via prepare_midx_pack(). The same is done for new packs via the add_pack_to_midx() callback, but packs picked up from a previous MIDX will not yet have these opened. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01midx: clear auxiliary .rev after replacing the MIDXLibravatar Taylor Blau1-1/+2
When writing a new multi-pack index, write_midx_internal() attempts to clean up any auxiliary files (currently just the MIDX's `.rev` file, but soon to include a `.bitmap`, too) corresponding to the MIDX it's replacing. This step should happen after the new MIDX is written into place, since doing so beforehand means that the old MIDX could be read without its corresponding .rev file. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01midx: fix `*.rev` cleanups with `--object-dir`Libravatar Taylor Blau2-5/+33
If using --object-dir to point into an object directory which belongs to a different repository than the one in the current working directory, such as: git init repo git -C repo ... # add some objects cd alternate git multi-pack-index --object-dir ../repo/.git/objects write the binary will segfault trying to access the object-dir via the repo it found, but that's not fully initialized. Worse, if we later call clear_midx_files_ext(), we will use `the_repository` and remove files out of the wrong object directory. Fix this by using the given object_dir (or the object directory of `the_repository` if `--object-dir` wasn't given) to properly to clean up the *.rev files, avoiding the crash. Original-patch-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01midx: disallow running outside of a repositoryLibravatar Taylor Blau2-1/+6
The multi-pack-index command supports working with arbitrary object directories via the `--object-dir` flag. Though this has historically worked in arbitrary repositories (including when the command itself was run outside of a Git repository), this has been somewhat of an accident. For example, running: git multi-pack-index write --object-dir=/path/to/repo/objects outside of a Git repository causes a BUG(). This is because the top-level `cmd_multi_pack_index()` function stops parsing when it sees "write", and then fills in the default object directory (the result of calling `get_object_directory()`) before handing off to `cmd_multi_pack_index_write()`. But there is no repository to initialize, and so calling `get_object_directory()` results in a BUG() (indicating that the current repository is not initialized). Another case where this doesn't quite work as expected is when operating in a SHA-256 repository. To see the failure, try this in your shell: git init --object-format=sha256 repo git -C repo commit --allow-empty base git -C repo repack -d git multi-pack-index --object-dir=$(pwd)/repo/.git/objects write and observe that we cannot open the `.idx` file in "repo", because the outermost process assumes that any repository that it works in also uses the default value of `the_hash_algo` (at the time of writing, SHA-1). There may be compelling reasons for trying to work around these bugs, but working in arbitrary `--object-dir`'s is non-standard enough (and likewise, these bugs prevalent enough) that I don't think any workflows would be broken by abandoning this behavior. Accordingly, restrict the `multi-pack-index` builtin to only work when inside of a Git repository (i.e., its main utility becomes selecting which alternate to operate in), which avoids both of the bugs above. (Note that you can still trigger a bug when writing a MIDX in an alternate which does not use the same object format as the repository which it is an alternate of, but that is an unrelated bug to this one). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24Documentation: describe MIDX-based bitmapsLibravatar Taylor Blau2-21/+60
Update the technical documentation to describe the multi-pack bitmap format. This patch merely introduces the new format, and describes its high-level ideas. Git does not yet know how to read nor write these multi-pack variants, and so the subsequent patches will: - Introduce code to interpret multi-pack bitmaps, according to this document. - Then, introduce code to write multi-pack bitmaps from the 'git multi-pack-index write' sub-command. Finally, the implementation will gain tests in subsequent patches (as opposed to inline with the patch teaching Git how to write multi-pack bitmaps) to avoid a cyclic dependency. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24pack-bitmap-write.c: free existing bitmapsLibravatar Taylor Blau1-0/+1
When writing a new bitmap, the bitmap writer code attempts to read the existing bitmap (if one is present). This is done in order to quickly permute the bits of any bitmaps for commits which appear in the existing bitmap, and were also selected for the new bitmap. But since this code was added in 341fa34887 (pack-bitmap-write: use existing bitmaps, 2020-12-08), the resources associated with opening an existing bitmap were never released. It's fine to ignore this, but it's bad hygiene. It will also cause a problem for the multi-pack-index builtin, which will be responsible not only for writing bitmaps, but also for expiring any old multi-pack bitmaps. If an existing bitmap was reused here, it will also be expired. That will cause a problem on platforms which require file resources to be closed before unlinking them, like Windows. Avoid this by ensuring we close reused bitmaps with free_bitmap_index() before removing them. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24pack-bitmap-write.c: gracefully fail to write non-closed bitmapsLibravatar Taylor Blau4-26/+64
The set of objects covered by a bitmap must be closed under reachability, since it must be the case that there is a valid bit position assigned for every possible reachable object (otherwise the bitmaps would be incomplete). Pack bitmaps are never written from 'git repack' unless repacking all-into-one, and so we never write non-closed bitmaps (except in the case of partial clones where we aren't guaranteed to have all objects). But multi-pack bitmaps change this, since it isn't known whether the set of objects in the MIDX is closed under reachability until walking them. Plumb through a bit that is set when a reachable object isn't found. As soon as a reachable object isn't found in the set of objects to include in the bitmap, bitmap_writer_build() knows that the set is not closed, and so it now fails gracefully. A test is added in t0410 to trigger a bitmap write without full reachability closure by removing local copies of some reachable objects from a promisor remote. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmapsLibravatar Taylor Blau1-0/+48
The special `--test-bitmap` mode of `git rev-list` is used to compare the result of an object traversal with a bitmap to check its integrity. This mode does not, however, assert that the types of reachable objects are stored correctly. Harden this mode by teaching it to also check that each time an object's bit is marked, the corresponding bit should be set in exactly one of the type bitmaps (whose type matches the object's true type). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-16Git 2.33Libravatar Junio C Hamano2-5/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-16Merge branch 'rs/oidtree-alignment-fix'Libravatar Junio C Hamano3-7/+17
Codepath to access recently added oidtree data structure had to make unaligned accesses to oids, which has been corrected. * rs/oidtree-alignment-fix: oidtree: avoid unaligned access to crit-bit tree
2021-08-16Merge tag 'l10n-2.33.0-rnd2' of git://github.com/git-l10n/git-poLibravatar Junio C Hamano17-35596/+37473
l10n-2.33.0-rnd2 * tag 'l10n-2.33.0-rnd2' of git://github.com/git-l10n/git-po: (46 commits) l10n: sv.po: Update Swedish translation (5230t0f0u) l10n: TEAMS: change Simplified Chinese team leader l10n: tr: v2.33 (round 2) l10n: es: 2.33.0 round 2 l10n: zh_CN: for git v2.33.0 l10n round 2 l10n: zh_CN: Revision for git v2.32.0 l10n round 1 l10n: README: refactor to use GFM syntax l10n: update German translation for Git v2.33.0 (rnd2) l10n: pt_PT: v2.33.0 round 2 l10n: pt_PT: git-po-helper update l10n: pt_PT: update translation table l10n: zh_TW.po: remove the obsolete glossary l10n: vi.po(5230t): Updated translation for v2.32.0 round 2 l10n: fr.po v2.33 rnd 2 l10n: id: po-id for 2.33.0 round 2 l10n: zh_TW.po: update for v2.33.0 rnd 2 l10n: git.pot: v2.33.0 round 2 (11 new, 8 removed) l10n: de.po: fix typos l10n: update German translation for Git v2.33.0 l10n: fr.po fix typos in commands and variables ...
2021-08-16l10n: sv.po: Update Swedish translation (5230t0f0u)Libravatar Peter Krefting1-508/+530
Also fixed some typos reported by "git-po-helper". Signed-off-by: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2021-08-16l10n: TEAMS: change Simplified Chinese team leaderLibravatar Jiang Xin1-3/+3
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2021-08-15oidtree: avoid unaligned access to crit-bit treeLibravatar René Scharfe3-7/+17
The flexible array member "k" of struct cb_node is used to store the key of the crit-bit tree node. It offers no alignment guarantees -- in fact the current struct layout puts it one byte after a 4-byte aligned address, i.e. guaranteed to be misaligned. oidtree uses a struct object_id as cb_node key. Since cf0983213c (hash: add an algo member to struct object_id, 2021-04-26) it requires 4-byte alignment. The mismatch is reported by UndefinedBehaviorSanitizer at runtime like this: hash.h:277:2: runtime error: member access within misaligned address 0x00015000802d for type 'struct object_id', which requires 4 byte alignment 0x00015000802d: note: pointer points here 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hash.h:277:2 in We can fix that by: 1. eliminating the alignment requirement of struct object_id, 2. providing the alignment in struct cb_node, or 3. avoiding the issue by only using memcpy to access "k". Currently we only store one of two values in "algo" in struct object_id. We could use a uint8_t for that instead and widen it only once we add support for our twohundredth algorithm or so. That would not only avoid alignment issues, but also reduce the memory requirements for each instance of struct object_id by ca. 9%. Supporting keys with alignment requirements might be useful to spread the use of crit-bit trees. It can be achieved by using a wider type for "k" (e.g. uintmax_t), using different types for the members "byte" and "otherbits" (e.g. uint16_t or uint32_t for each), or by avoiding the use of flexible arrays like khash.h does. This patch implements the third option, though, because it has the least potential for causing side-effects and we're close to the next release. If one of the other options is implemented later as well to get their additional benefits we can get rid of the extra copies introduced here. Reported-by: Andrzej Hunt <andrzej@ahunt.org> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-15Merge branch 'next' of github.com:ChrisADR/git-poLibravatar Jiang Xin1-4302/+3504
* 'next' of github.com:ChrisADR/git-po: l10n: es: 2.33.0 round 2
2021-08-15l10n: tr: v2.33 (round 2)Libravatar Emir Sarı1-514/+538
Signed-off-by: Emir Sarı <bitigchi@me.com>
2021-08-14l10n: es: 2.33.0 round 2Libravatar Christopher Diaz Riveros1-4302/+3504
Signed-off-by: Christopher Diaz Riveros <christopher.diaz.riv@gmail.com> Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Javier Spagnoletti phansys@gmail.com Signed-off-by: Cleydyr Albuquerque <cleydyr@gmail.com> Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Guillermo Ramos <gramosg>
2021-08-15l10n: zh_CN: for git v2.33.0 l10n round 2Libravatar Jiang Xin1-2202/+2345
Translate 48 new messages (5230t0f0u) for git 2.33.0, and also fixed typos found by "git-po-helper". Signed-off-by: Jiang Xin <worldhello.net@gmail.com> Signed-off-by: Fangyi Zhou <me@fangyi.io>
2021-08-15l10n: zh_CN: Revision for git v2.32.0 l10n round 1Libravatar Fangyi Zhou1-4/+4
Signed-off-by: Fangyi Zhou <me@fangyi.io>
2021-08-15l10n: README: refactor to use GFM syntaxLibravatar Jiang Xin1-172/+223
Format README.md using GFM (GitHub Flavored Markdown) syntax. - In order to use more than 3 level headings, use ATX style headings instead of setext style headings. - In order to add highlights for code blocks, use fenced code blocks instead of indented code blocks. Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2021-08-15Merge branch 'l10n-2.33-rnd2' of github.com:ralfth/gitLibravatar Jiang Xin1-501/+522
* 'l10n-2.33-rnd2' of github.com:ralfth/git: l10n: update German translation for Git v2.33.0 (rnd2)
2021-08-15Merge branch 'pt-PT' of github.com:git-l10n-pt-PT/git-poLibravatar Jiang Xin1-752/+679
* 'pt-PT' of github.com:git-l10n-pt-PT/git-po: l10n: pt_PT: v2.33.0 round 2 l10n: pt_PT: git-po-helper update l10n: pt_PT: update translation table
2021-08-14l10n: update German translation for Git v2.33.0 (rnd2)Libravatar Ralf Thielow1-501/+522
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
2021-08-14l10n: pt_PT: v2.33.0 round 2Libravatar Daniel Santos1-26/+25
* translation of new entries Signed-off-by: Daniel Santos <hello@brighterdan.com>
2021-08-14l10n: pt_PT: git-po-helper updateLibravatar Daniel Santos1-741/+617
* run git-po-helper update pt_PT.po Signed-off-by: Daniel Santos <hello@brighterdan.com>
2021-08-14l10n: pt_PT: update translation tableLibravatar Daniel Santos1-11/+63
* updated translation table Signed-off-by: Daniel Santos <hello@brighterdan.com>