Age | Commit message (Collapse) | Author | Files | Lines |
|
One of the most important use cases for the strict tag object checking
is when transfer.fsckobjects is set to true to catch invalid objects
early on. This new regression test essentially tests the same code path
by directly calling 'index-pack --strict' on a pack containing an
tag object without a 'tagger' line.
Technically, this test is not enough: it only exercises a code path that
*warns*, not one that *fails*. The reason is that hash-object and
pack-objects both insist on parsing the tag objects and would fail on
invalid tag objects at this time.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Our current scheme for naming packfiles is to calculate the
sha1 hash of the sorted list of objects contained in the
packfile. This gives us a unique name, so we are reasonably
sure that two packs with the same name will contain the same
objects.
It does not, however, tell us that two such packs have the
exact same bytes. This makes things awkward if we repack the
same set of objects. Due to run-to-run variations, the bytes
may not be identical (e.g., changed zlib or git versions,
different source object reuse due to new packs in the
repository, or even different deltas due to races during a
multi-threaded delta search).
In theory, this could be helpful to a program that cares
that the packfile contains a certain set of objects, but
does not care about the particular representation. In
practice, no part of git makes use of that, and in many
cases it is potentially harmful. For example, if a dumb http
client fetches the .idx file, it must be sure to get the
exact .pack that matches it. Similarly, a partial transfer
of a .pack file cannot be safely resumed, as the actual
bytes may have changed. This could also affect a local
client which opened the .idx and .pack files, closes the
.pack file (due to memory or file descriptor limits), and
then re-opens a changed packfile.
In all of these cases, git can detect the problem, as we
have the sha1 of the bytes themselves in the pack trailer
(which we verify on transfer), and the .idx file references
the trailer from the matching packfile. But it would be
simpler and more efficient to actually get the correct
bytes, rather than noticing the problem and having to
restart the operation.
This patch simply uses the pack trailer sha1 as the pack
name. It should be similarly unique, but covers the exact
representation of the objects. Other parts of git should not
care, as the pack name is returned by pack-objects and is
essentially opaque.
One test needs to be updated, because it actually corrupts a
pack and expects that re-packing the corrupted bytes will
use the same name. It won't anymore, but we can easily just
use the name that pack-objects hands back.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This finally gets rid of the inefficient verify-pack implementation that
walks objects in the packfile in their object name order and replaces it
with a call to index-pack --verify. As a side effect, it also removes
packed_object_info_detail() API which is rather expensive.
As this changes the way errors are reported (verify-pack used to rely on
the usual runtime error detection routine unpack_entry() to diagnose the
CRC errors in an entry in the *.idx file; index-pack --verify checks the
whole *.idx file in one go), update a test that expected the string "CRC"
to appear in the error message.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A pack v2 .idx file usually records offset using 64-bit representation
only when the offset does not fit within 31-bit, but you can handcraft
your .idx file to record smaller offset using 64-bit, storing all zero
in the upper 4-byte. By inspecting the original idx file when running
index-pack --verify, encode such low offsets that do not need to be in
64-bit but are encoded using 64-bit just like the original idx file so
that we can still validate the pack/idx pair by comparing the idx file
recomputed with the original.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Given an existing .pack file and the .idx file that describes it,
this new mode of operation reads and re-index the packfile and makes
sure the existing .idx file matches the result byte-for-byte.
All the objects in the .pack file are validated during this operation as
well. Unlike verify-pack, which visits each object described in the .idx
file in the SHA-1 order, index-pack efficiently exploits the delta-chain
to avoid rebuilding the objects that are used as the base of deltified
objects over and over again while validating the objects, resulting in
much quicker verification of the .pack file and its .idx file.
This version however cannot verify a .pack/.idx pair with a handcrafted v2
index that uses 64-bit offset representation for offsets that would fit
within 31-bit. You can create such an .idx file by giving a custom offset
to --index-version option to the command.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Breaks in a test assertion's && chain can potentially hide
failures from earlier commands in the chain.
Commands intended to fail should be marked with !, test_must_fail, or
test_might_fail. The examples in this patch do not require that.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
SKIP messages are now part of the TAP plan. A TAP harness now knows
why a particular test was skipped and can report that information. The
non-TAP harness built into Git's test-lib did nothing special with
these messages, and is unaffected by these changes.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The effects of this patch can be tested on Linux by commenting out
#define _FILE_OFFSET_BITS 64
in git-compat-util.h.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
|
|
We do not have /dev/zero on Windows. This replaces it by data generated
with printf, perl, or echo. Most of the cases do not depend on that the
data is a stream of zero bytes, so we use something printable; nor is an
unlimited stream of data needed, so we produce only as many bytes as the
test cases need.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
|
|
If the packs are made using multiple threads, they are no longer identical
on the 4-core Xeon I tested on.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* np/pack-safer:
t5303: fix printf format string for portability
t5303: work around printf breakage in dash
pack-objects: don't leak pack window reference when splitting packs
extend test coverage for latest pack corruption resilience improvements
pack-objects: allow "fixing" a corrupted pack without a full repack
make find_pack_revindex() aware of the nasty world
make check_object() resilient to pack corruptions
make packed_object_info() resilient to pack corruptions
make unpack_object_header() non fatal
better validation on delta base object offsets
close another possibility for propagating pack corruption
|
|
* np/index-pack:
index-pack: don't leak leaf delta result
improve index-pack tests
fix multiple issues in index-pack
index-pack: smarter memory usage during delta resolution
index-pack: rationalize delta resolution code
|
|
Abstract
--------
With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking. This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.
The Context
-----------
The Git database is all checksummed with SHA1 hashes. Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data. However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.
First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data. Any
disk corruption would be caught already in this case.
Then, packed objects are also compressed with zlib but only for their
actual payload. The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions. Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.
Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms. Still, the possibility
exists as demonstrated in t5302 and is certainly greater than a SHA1
collision, especially in the OBJ_OFS_DELTA case which is now the default
when repacking.
Again, repacking by reusing existing pack data is OK since the per object
CRC provided by index v2 guards against any such corruptions. What t5302
failed to test is a full repack in such case.
The Solution
------------
As unlikely as this kind of stealth corruption can be in practice, it
certainly isn't acceptable to propagate it into a freshly created pack.
But, because this is so unlikely, we don't want to pay the run time cost
associated with extra validation checks all the time either. Furthermore,
consequences of such corruption in anything but repacking should be rather
visible, and even if it could be quite unpleasant, it still has far less
severe consequences than actively creating bad packs.
So the best compromize is to check packed object CRC when unpacking
objects, and only during the compression/writing phase of a repack, and
only when not streaming the result. The cost of this is minimal (less
than 1% CPU time), and visible only with a full repack.
Someone with a stats background could provide an objective evaluation of
this, but I suspect that it's bad RAM that has more potential for data
corruptions at this point, even in those cases where this extra check
is not performed. Still, it is best to prevent a known hole for
corruption when recreating object data into a new pack.
What about the streamed pack case? Well, any client receiving a pack
must always consider that pack as untrusty and perform full validation
anyway, hence no such stealth corruption could be propagated to remote
repositoryes already. It is therefore worthless doing local validation
in that case.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 9441b61dc5 introduced serious bugs in index-pack which are
described and fixed by commit ce3f6dc655. However, despite the
boldness of those bugs, the test suite still passed.
This improves t5302-pack-index.sh so to ensure a much better code
path coverage. With commit ce3f6dc655 reverted, 17 of the 26 tests
do fail now.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Before commit d0b92a3f6e it was possible to run 'git index-pack'
directly in the .git/objects/pack/ directory. Restore that ability.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Converts tests between t3600-t6300.
Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This patch changes every occurrence of "! git" -- with the meaning
that a git call has to gracefully fail -- into "test_must_fail git".
This is useful to
- make sure the test does not fail because of a signal,
e.g. SIGSEGV, and
- advertise the use of "test_must_fail" for new tests.
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
On some shells (notably /bin/sh on FreeBSD 6.1), the
construct
foo && ! bar | baz
is true if
foo && baz
whereas for most other shells (such as bash) is true if
foo && ! baz
We can work around this by specifying
foo && ! (bar | baz)
which works everywhere.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "-n" syntax is not supported by System V versions of
tail (which prefer "tail -1"). Unfortunately "tail -1" is
not actually POSIX. We had some of both forms in our
scripts.
Since neither form works everywhere, this patch replaces
both with the equivalent sed invocation:
sed -ne '$p'
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Originally, test_expect_failure was designed to be the opposite
of test_expect_success, but this was a bad decision. Most tests
run a series of commands that leads to the single command that
needs to be tested, like this:
test_expect_{success,failure} 'test title' '
setup1 &&
setup2 &&
setup3 &&
what is to be tested
'
And expecting a failure exit from the whole sequence misses the
point of writing tests. Your setup$N that are supposed to
succeed may have failed without even reaching what you are
trying to test. The only valid use of test_expect_failure is to
check a trivial single command that is expected to fail, which
is a minority in tests of Porcelain-ish commands.
This large-ish patch rewrites all uses of test_expect_failure to
use test_expect_success and rewrites the condition of what is
tested, like this:
test_expect_success 'test title' '
setup1 &&
setup2 &&
setup3 &&
! this command should fail
'
test_expect_failure is redefined to serve as a reminder that
that test *should* succeed but due to a known breakage in git it
currently does not pass. So if git-foo command should create a
file 'bar' but you discovered a bug that it doesn't, you can
write a test like this:
test_expect_failure 'git-foo should create bar' '
rm -f bar &&
git foo &&
test -f bar
'
This construct acts similar to test_expect_success, but instead
of reporting "ok/FAIL" like test_expect_success does, the
outcome is reported as "FIXED/still broken".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 8ed2fca458d085f12c3c6808ef4ddab6aa40ef14 was a bit draconian in
skipping certain tests which should be perfectly valid even on platform
with a 32-bit off_t.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are platforms where off_t is not 64 bits wide. In this case many tests
are doomed to fail. Let's skip them.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This uses the remove-dashes target to replace "git-frotz" to "git frotz".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For example Mac OS X lacks the seq command. So we cannot use it
there. A good old while loop works just as good.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
A Large Angry SCM (gitzilla) noticed that on an unnamed platform, tail -c
wants its byte count as part of the option, not as a separate argument.
Signed-off-by: Junio C Hamano <junkio@cox.net>
|
|
This is a fairly complete list of tests for various aspects of pack
index versions 1 and 2.
Tests on index v2 include 32-bit and 64-bit offsets, as well as a nice
demonstration of the flawed repacking integrity checks that index
version 2 intend to solve over index version 1 with the per object CRC.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
|