From 0e8189e2708bc1da08c77c7e1d960f420b6890a5 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Fri, 31 Oct 2008 11:31:08 -0400 Subject: close another possibility for propagating pack corruption Abstract -------- With index v2 we have a per object CRC to allow quick and safe reuse of pack data when repacking. This, however, doesn't currently prevent a stealth corruption from being propagated into a new pack when _not_ reusing pack data as demonstrated by the modification to t5302 included here. The Context ----------- The Git database is all checksummed with SHA1 hashes. Any kind of corruption can be confirmed by verifying this per object hash against corresponding data. However this can be costly to perform systematically and therefore this check is often not performed at run time when accessing the object database. First, the loose object format is entirely compressed with zlib which already provide a CRC verification of its own when inflating data. Any disk corruption would be caught already in this case. Then, packed objects are also compressed with zlib but only for their actual payload. The object headers and delta base references are not deflated for obvious performance reasons, however this leave them vulnerable to potentially undetected disk corruptions. Object types are often validated against the expected type when they're requested, and deflated size must always match the size recorded in the object header, so those cases are pretty much covered as well. Where corruptions could go unnoticed is in the delta base reference. Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the reference is a pack offset which would have to match the start boundary of a different base object but still with the same size, and although this is relatively much more "probable" than in the OBJ_REF_DELTA case, the probability is also about zero in absolute terms. Still, the possibility exists as demonstrated in t5302 and is certainly greater than a SHA1 collision, especially in the OBJ_OFS_DELTA case which is now the default when repacking. Again, repacking by reusing existing pack data is OK since the per object CRC provided by index v2 guards against any such corruptions. What t5302 failed to test is a full repack in such case. The Solution ------------ As unlikely as this kind of stealth corruption can be in practice, it certainly isn't acceptable to propagate it into a freshly created pack. But, because this is so unlikely, we don't want to pay the run time cost associated with extra validation checks all the time either. Furthermore, consequences of such corruption in anything but repacking should be rather visible, and even if it could be quite unpleasant, it still has far less severe consequences than actively creating bad packs. So the best compromize is to check packed object CRC when unpacking objects, and only during the compression/writing phase of a repack, and only when not streaming the result. The cost of this is minimal (less than 1% CPU time), and visible only with a full repack. Someone with a stats background could provide an objective evaluation of this, but I suspect that it's bad RAM that has more potential for data corruptions at this point, even in those cases where this extra check is not performed. Still, it is best to prevent a known hole for corruption when recreating object data into a new pack. What about the streamed pack case? Well, any client receiving a pack must always consider that pack as untrusty and perform full validation anyway, hence no such stealth corruption could be propagated to remote repositoryes already. It is therefore worthless doing local validation in that case. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- t/t5302-pack-index.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 't') diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh index 344ab25b8b..29896141b9 100755 --- a/t/t5302-pack-index.sh +++ b/t/t5302-pack-index.sh @@ -160,7 +160,8 @@ test_expect_success \ test_expect_success \ '[index v2] 5) pack-objects refuses to reuse corrupted data' \ - 'test_must_fail git pack-objects test-5 Date: Wed, 29 Oct 2008 19:02:51 -0400 Subject: extend test coverage for latest pack corruption resilience improvements Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- t/t5303-pack-corruption-resilience.sh | 96 ++++++++++++++++++++++++++++++++--- 1 file changed, 89 insertions(+), 7 deletions(-) (limited to 't') diff --git a/t/t5303-pack-corruption-resilience.sh b/t/t5303-pack-corruption-resilience.sh index 31b20b21d2..ac181ea38a 100755 --- a/t/t5303-pack-corruption-resilience.sh +++ b/t/t5303-pack-corruption-resilience.sh @@ -41,11 +41,17 @@ create_new_pack() { git verify-pack -v ${pack}.pack } +do_repack() { + pack=`printf "$blob_1\n$blob_2\n$blob_3\n" | + git pack-objects $@ .git/objects/pack/pack` && + pack=".git/objects/pack/pack-${pack}" +} + do_corrupt_object() { ofs=`git show-index < ${pack}.idx | grep $1 | cut -f1 -d" "` && ofs=$(($ofs + $2)) && chmod +w ${pack}.pack && - dd if=/dev/zero of=${pack}.pack count=1 bs=1 conv=notrunc seek=$ofs && + dd of=${pack}.pack count=1 bs=1 conv=notrunc seek=$ofs && test_must_fail git verify-pack ${pack}.pack } @@ -60,7 +66,7 @@ test_expect_success \ test_expect_success \ 'create corruption in header of first object' \ - 'do_corrupt_object $blob_1 0 && + 'do_corrupt_object $blob_1 0 < /dev/zero && test_must_fail git cat-file blob $blob_1 > /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' @@ -119,7 +125,7 @@ test_expect_success \ 'create corruption in header of first delta' \ 'create_new_pack && git prune-packed && - do_corrupt_object $blob_2 0 && + do_corrupt_object $blob_2 0 < /dev/zero && git cat-file blob $blob_1 > /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' @@ -133,6 +139,15 @@ test_expect_success \ git cat-file blob $blob_2 > /dev/null && git cat-file blob $blob_3 > /dev/null' +test_expect_success \ + '... and then a repack "clears" the corruption' \ + 'do_repack && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + test_expect_success \ 'create corruption in data of first delta' \ 'create_new_pack && @@ -152,11 +167,20 @@ test_expect_success \ git cat-file blob $blob_2 > /dev/null && git cat-file blob $blob_3 > /dev/null' +test_expect_success \ + '... and then a repack "clears" the corruption' \ + 'do_repack && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + test_expect_success \ 'corruption in delta base reference of first delta (OBJ_REF_DELTA)' \ 'create_new_pack && git prune-packed && - do_corrupt_object $blob_2 2 && + do_corrupt_object $blob_2 2 < /dev/zero && git cat-file blob $blob_1 > /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' @@ -171,17 +195,75 @@ test_expect_success \ git cat-file blob $blob_3 > /dev/null' test_expect_success \ - 'corruption in delta base reference of first delta (OBJ_OFS_DELTA)' \ + '... and then a repack "clears" the corruption' \ + 'do_repack && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + 'corruption #0 in delta base reference of first delta (OBJ_OFS_DELTA)' \ 'create_new_pack --delta-base-offset && git prune-packed && - do_corrupt_object $blob_2 2 && + do_corrupt_object $blob_2 2 < /dev/zero && git cat-file blob $blob_1 > /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' test_expect_success \ - '... and a redundant pack allows for full recovery too' \ + '... but having a loose copy allows for full recovery' \ 'mv ${pack}.idx tmp && + git hash-object -t blob -w file_2 && + mv tmp ${pack}.idx && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + '... and then a repack "clears" the corruption' \ + 'do_repack --delta-base-offset && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + 'corruption #1 in delta base reference of first delta (OBJ_OFS_DELTA)' \ + 'create_new_pack --delta-base-offset && + git prune-packed && + printf "\x01" | do_corrupt_object $blob_2 2 && + git cat-file blob $blob_1 > /dev/null && + test_must_fail git cat-file blob $blob_2 > /dev/null && + test_must_fail git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + '... but having a loose copy allows for full recovery' \ + 'mv ${pack}.idx tmp && + git hash-object -t blob -w file_2 && + mv tmp ${pack}.idx && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + '... and then a repack "clears" the corruption' \ + 'do_repack --delta-base-offset && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + '... and a redundant pack allows for full recovery too' \ + 'do_corrupt_object $blob_2 2 < /dev/zero && + git cat-file blob $blob_1 > /dev/null && + test_must_fail git cat-file blob $blob_2 > /dev/null && + test_must_fail git cat-file blob $blob_3 > /dev/null && + mv ${pack}.idx tmp && git hash-object -t blob -w file_1 && git hash-object -t blob -w file_2 && printf "$blob_1\n$blob_2\n" | git pack-objects .git/objects/pack/pack && -- cgit v1.2.3 From 8c1f6f6c5797d1a3fa9ec9d38ae30b4f19e70a3c Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Sun, 9 Nov 2008 13:08:38 -0800 Subject: t5303: work around printf breakage in dash Signed-off-by: Junio C Hamano --- t/t5303-pack-corruption-resilience.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 't') diff --git a/t/t5303-pack-corruption-resilience.sh b/t/t5303-pack-corruption-resilience.sh index ac181ea38a..f471eadb96 100755 --- a/t/t5303-pack-corruption-resilience.sh +++ b/t/t5303-pack-corruption-resilience.sh @@ -234,7 +234,7 @@ test_expect_success \ 'corruption #1 in delta base reference of first delta (OBJ_OFS_DELTA)' \ 'create_new_pack --delta-base-offset && git prune-packed && - printf "\x01" | do_corrupt_object $blob_2 2 && + tr "\000" "\001" /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' -- cgit v1.2.3 From a2d2c478f325690e1575e14e7d310fd7435d83be Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Sun, 9 Nov 2008 13:11:06 -0800 Subject: t5303: fix printf format string for portability printf "\x01" is bad; write printf "\001" for portability. Testing with dash is a good way to find this kind of POSIX.1 violation breakages. Signed-off-by: Junio C Hamano --- t/t5303-pack-corruption-resilience.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 't') diff --git a/t/t5303-pack-corruption-resilience.sh b/t/t5303-pack-corruption-resilience.sh index f471eadb96..d4e30fc43c 100755 --- a/t/t5303-pack-corruption-resilience.sh +++ b/t/t5303-pack-corruption-resilience.sh @@ -234,7 +234,7 @@ test_expect_success \ 'corruption #1 in delta base reference of first delta (OBJ_OFS_DELTA)' \ 'create_new_pack --delta-base-offset && git prune-packed && - tr "\000" "\001" /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' -- cgit v1.2.3