summaryrefslogtreecommitdiff
path: root/Documentation/technical/multi-pack-index.txt
diff options
context:
space:
mode:
authorLibravatar Taylor Blau <me@ttaylorr.com>2022-01-25 17:41:03 -0500
committerLibravatar Junio C Hamano <gitster@pobox.com>2022-01-27 12:07:52 -0800
commit95e8383bac13490c6980340c3dc2538a789d7833 (patch)
tree37bc203d98ccd0f2ae8926efe3a34b7bc26f1651 /Documentation/technical/multi-pack-index.txt
parentt5326: demonstrate bitmap corruption after permutation (diff)
downloadtgif-95e8383bac13490c6980340c3dc2538a789d7833.tar.xz
midx.c: make changing the preferred pack safe
The previous patch demonstrates a bug where a MIDX's auxiliary object order can become out of sync with a MIDX bitmap. This is because of two confounding factors: - First, the object order is stored in a file which is named according to the multi-pack index's checksum, and the MIDX does not store the object order. This means that the object order can change without altering the checksum. - But the .rev file is moved into place with finalize_object_file(), which link(2)'s the file into place instead of renaming it. For us, that means that a modified .rev file will not be moved into place if MIDX's checksum was unchanged. This fix is to force the MIDX's checksum to change when the preferred pack changes but the set of packs contained in the MIDX does not. In other words, when the object order changes, the MIDX's checksum needs to change with it (regardless of whether the MIDX is tracking the same or different packs). This prevents a race whereby changing the object order (but not the packs themselves) enables a reader to see the new .rev file with the old MIDX, or similarly seeing the new bitmap with the old object order. But why can't we just stop hardlinking the .rev into place instead adding additional data to the MIDX? Suppose that's what we did. Then when we go to generate the new bitmap, we'll load the old MIDX bitmap, along with the MIDX that it references. That's fine, since the new MIDX isn't moved into place until after the new bitmap is generated. But the new object order *has* been moved into place. So we'll read the old bitmaps in the new order when generating the new bitmap file, meaning that without this secondary change, bitmap generation itself would become a victim of the race described here. This can all be prevented by forcing the MIDX's checksum to change when the object order does. By embedding the entire object order into the MIDX, we do just that. That is, the MIDX's checksum will change in response to any perturbation of the underlying object order. In t5326, this will cause the MIDX's checksum to update (even without changing the set of packs in the MIDX), preventing the stale read problem. Note that this makes it safe to continue to link(2) the MIDX .rev file into place, since it is now impossible to have a .rev file that is out-of-sync with the MIDX whose checksum it references. (But we will do away with MIDX .rev files later in this series anyway, so this is somewhat of a moot point). In theory, it is possible to store a "fingerprint" of the full object order here, so long as that fingerprint changes at least as often as the full object order does. Some possibilities here include storing the identity of the preferred pack, along with the mtimes of the non-preferred packs in a consistent order. But storing a limited part of the information makes it difficult to reason about whether or not there are gaps between the two that would cause us to get bitten by this bug again. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/technical/multi-pack-index.txt')
-rw-r--r--Documentation/technical/multi-pack-index.txt1
1 files changed, 1 insertions, 0 deletions
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index b39c69da8c..f2221d2b44 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -24,6 +24,7 @@ and their offsets into multiple packfiles. It contains:
** An offset within the jth packfile for the object.
* If large offsets are required, we use another list of large
offsets similar to version 2 pack-indexes.
+- An optional list of objects in pseudo-pack order (used with MIDX bitmaps).
Thus, we can provide O(log N) lookup time for any number
of packfiles.