summaryrefslogtreecommitdiff
path: root/t/t5004-archive-corner-cases.sh
AgeCommit message (Collapse)AuthorFilesLines
2021-01-23archive tests: use a cheaper "zipinfo -h" invocation to get headerLibravatar Ævar Arnfjörð Bjarmason1-1/+2
Change an invocation of zipinfo added in 19ee29401d (t5004: test ZIP archives with many entries, 2015-08-22) to simply ask zipinfo for the header info, rather than spewing out info about the entire archive and race to kill it with SIGPIPE due to the downstream "head -2". I ran across this because I'm adding a "set -o pipefail" test mode. This won't be needed for the version of the mode that I'm introducing (which currently relies on a patch to GNU bash), but I think this is a good idea anyway. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19archive-tar: turn length miscalculation warning into BUGLibravatar René Scharfe1-2/+1
Now that we're confident our pax extended header calculation is correct, turn the criticality of the assertion up to the maximum, from warning right up to BUG. Simplify the test, as the stderr comparison step would not be reached in case the BUG message is triggered. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19archive-tar: fix pax extended header length calculationLibravatar René Scharfe1-1/+1
A pax extended header record starts with a decimal number. Its value is the length of the whole record, including its own length. The calculation of that number in strbuf_append_ext_header() is off by one in case the length of the rest is close to a higher order of magnitude. This affects paths and link targets a bit shorter than 1000, 10000, 100000 etc. characters -- paths with a length of up to 100 fit into the tar header and don't need a pax extended header. The mistake has been present since the function was added by ae64bbc18c ("tar-tree: Introduce write_entry()", 2006-03-25). Account for digits added to len during the loop and keep incrementing until we have enough space for len and the rest. The crucial change is to check against the current value of len before each iteration, instead of against its value before the loop. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19archive-tar: report wrong pax extended header lengthLibravatar René Scharfe1-0/+20
Extended header entries contain a length value that is a bit tricky to calculate because it includes its own length (number of decimal digits) as well. We get it wrong in corner cases. Add a check, report wrong results as a warning and add a test for exercising it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-03t5004: avoid using tar for empty packagesLibravatar Carlo Marcelo Arenas Belón1-9/+8
ea2d20d4c2 ("t5004: avoid using tar for checking emptiness of archive", 2013-05-09), introduced a fake empty tar archive to allow for portable tests of emptiness without having to invoke tar 4318094047 ("archive: don't add empty directories to archives", 2017-09-13) changed the expected result for its tests from one containing an empty directory to a plain empty archive but the portable test wasn't updated resulting on them failing again in (at least) NetBSD and OpenBSD Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14archive: don't add empty directories to archivesLibravatar René Scharfe1-2/+2
While git doesn't track empty directories, git archive can be tricked into putting some into archives. One way is to construct an empty tree object, as t5004 does. While that is supported by the object database, it can't be represented in the index and thus it's unlikely to occur in the wild. Another way is using the literal name of a directory in an exclude pathspec -- its contents are are excluded, but the directory stub is included. That's inconsistent: exclude pathspecs containing wildcards don't leave empty directories in the archive. Yet another way is have a few levels of nested subdirectories (e.g. d1/d2/d3/file1) and ignoring the entries at the leaves (e.g. file1). The directories with the ignored content are ignored as well (e.g. d3), but their empty parents are included (e.g. d2). As empty directories are not supported by git, they should also not be written into archives. If an empty directory is really needed then it can be tracked and archived by placing an empty .gitignore file in it. There already is a mechanism in place for suppressing empty directories. When read_tree_recursive() encounters a directory excluded by a pathspec then it enters it anyway because it might contain included entries. It calls the callback function before it is able to decide if the directory is actually needed. For that reason git archive adds directories to a queue and writes entries for them only when it encounters the first child item -- but currently only if pathspecs with wildcards are used. Queue *all* directories, no matter if there even are pathspecs present. This prevents git archive from writing entries for empty directories in all cases. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-01t5004: require 64-bit support for big ZIP testsLibravatar René Scharfe1-2/+7
Check if unzip supports the ZIP64 format and skip the tests that create big archives otherwise. Also skip the test that archives a big file on 32-bit platforms because the git object systems can't unpack files bigger than 4GB there. Reported-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-24archive-zip: support files bigger than 4GBLibravatar René Scharfe1-1/+1
Write a zip64 extended information extra field for big files as part of their local headers and as part of their central directory headers. Also write a zip64 version of the data descriptor in that case. If we're streaming then we don't know the compressed size at the time we write the header. Deflate can end up making a file bigger instead of smaller if we're unlucky. Write a local zip64 header already for files with a size of 2GB or more in this case to be on the safe side. Both sizes need to be included in the local zip64 header, but the extra field for the directory must only contain 64-bit equivalents for 32-bit values of 0xffffffff. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-24archive-zip: support archives bigger than 4GBLibravatar René Scharfe1-1/+1
Add a zip64 extended information extra field to the central directory and emit the zip64 end of central directory records as well as locator if the offset of an entry within the archive exceeds 4GB. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-24archive-zip: add tests for big ZIP archivesLibravatar René Scharfe1-0/+45
Test the creation of ZIP archives bigger than 4GB and containing files bigger than 4GB. They are marked as EXPENSIVE because they take quite a while and because the first one needs a bit more than 4GB of disk space to store the resulting archive. The big archive in the first test is made up of a tree containing thousands of copies of a small file. Yet the test has to write out the full archive because unzip doesn't offer a way to read from stdin. The big file in the second test is provided as a zipped pack file to avoid writing another 4GB file to disk and then adding it. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-28archive-zip: support more than 65535 entriesLibravatar René Scharfe1-1/+1
Support more than 65535 entries cleanly by writing a "zip64 end of central directory record" (with a 64-bit field for the number of entries) before the usual "end of central directory record" (which contains only a 16-bit field). InfoZIP's zip does the same. Archives with 65535 or less entries are not affected. Programs that extract all files like InfoZIP's zip and 7-Zip ignored the field and could extract all files already. Software that relies on the ZIP file directory to show a list of contained files quickly to simulate to normal directory like Windows' built-in ZIP functionality only saw a subset of the included files. Windows supports ZIP64 since Vista according to https://en.wikipedia.org/wiki/Zip_%28file_format%29#ZIP64. Suggested-by: Johannes Schauer <josch@debian.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-28t5004: test ZIP archives with many entriesLibravatar René Scharfe1-0/+40
A ZIP file directory has a 16-bit field for the number of entries it contains. There are 64-bit extensions to deal with that. Demonstrate that git archive --format=zip currently doesn't use them and instead overflows the field. InfoZIP's unzip doesn't care about this field and extracts all files anyway. Software that uses the directory for presenting a filesystem like view quickly -- notably Windows -- depends on it, but doesn't lend itself to an automatic test case easily. Use InfoZIP's zipinfo, which probably isn't available everywhere but at least can provides *some* way to check this field. To speed things up a bit create and commit only a subset of the files and build a fake tree out of duplicates and pass that to git archive. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20t: wrap complicated expect_code users in a blockLibravatar Jeff King1-2/+4
If we are expecting a command to produce a particular exit code, we can use test_expect_code. However, some cases are more complicated, and want to accept one of a range of exit codes. For these, we end up with something like: cmd; case "$?" in ... That unfortunately breaks the &&-chain and fools --chain-lint. Since these special cases are so few, we can wrap them in a block, like this: { cmd; ret=$?; } && case "$ret" in ... This accomplishes the same thing, and retains the &&-chain (the exit status fed to the && is that of the assignment, which should always be true). It's technically longer, but it is probably a good thing for unusual code like this to stand out. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-20Revert "archive: honor tar.umask even for pax headers"Libravatar Junio C Hamano1-5/+0
This reverts commit 10f343ea814f5c18a0913997904ee11cd9b7da24, whose output is no longer bit-for-bit equivalent from the older versions of Git, which the infrastructure to (pretend to) upload tarballs kernel.org uses depends on.
2014-09-02Merge branch 'bc/archive-pax-header-mode'Libravatar Junio C Hamano1-0/+5
Implementations of "tar" that do not understand an extended pax header would extract the contents of it in a regular file; make sure the permission bits of this file follows the same tar.umask configuration setting. * bc/archive-pax-header-mode: archive: honor tar.umask even for pax headers
2014-08-04archive: honor tar.umask even for pax headersLibravatar brian m. carlson1-0/+5
git archive's tar format uses extended pax headers to encode metadata into the archive. Most tar implementations correctly treat these as metadata, but some that do not understand the pax format extract these as files instead. Apply the tar.umask setting to these entries to prevent tampering by other users. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-04t5000, t5003: do not use test_cmp to compare binary filesLibravatar Stepan Kasal1-1/+1
test_cmp() is primarily meant to compare text files (and display the difference for debug purposes). Raw "cmp" is better suited to compare binary files (tar, zip, etc.). On MinGW, test_cmp is a shell function mingw_test_cmp that tries to read both files into environment, stripping CR characters (introduced in commit 4d715ac0). This function usually speeds things up, as fork is extremly slow on Windows. But no wonder that this function is extremely slow and sometimes even crashes when comparing large tar or zip files. Signed-off-by: Stepan Kasal <kasal@ucw.cz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02Merge branch 'rs/empty-archive'Libravatar Junio C Hamano1-0/+15
Fixes tests added in 1.8.2 era that are broken on BSDs. * rs/empty-archive: t5004: resurrect original empty tar archive test t5004: avoid using tar for checking emptiness of archive
2013-05-09t5004: avoid using tar for checking emptiness of archiveLibravatar René Scharfe1-3/+2
Test 2 of t5004 checks if a supposedly empty tar archive really contains no files. 24676f02 (t5004: fix issue with empty archive test and bsdtar) removed our commit hash to make it work with bsdtar, but the test still fails on NetBSD and OpenBSD, which use their own tar that considers a tar file containing only NULs as broken. Here's what the different archivers do when asked to create a tar file without entries: $ uname -v NetBSD 6.0.1 (GENERIC) $ gtar --version | head -1 tar (GNU tar) 1.26 $ bsdtar --version bsdtar 2.8.4 - libarchive 2.8.4 $ : >zero.tar $ perl -e 'print "\0" x 10240' >tenk.tar $ sha1 zero.tar tenk.tar SHA1 (zero.tar) = da39a3ee5e6b4b0d3255bfef95601890afd80709 SHA1 (tenk.tar) = 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c $ : | tar cf - -T - | sha1 da39a3ee5e6b4b0d3255bfef95601890afd80709 $ : | gtar cf - -T - | sha1 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c $ : | bsdtar cf - -T - | sha1 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c So NetBSD's native tar creates an empty file, while GNU tar and bsdtar both give us 10KB of NULs -- just like git archive with an empty tree. Now let's see how the archivers handle these two kinds of empty tar files: $ tar tf zero.tar; echo $? tar: Unexpected EOF on archive file 1 $ gtar tf zero.tar; echo $? gtar: This does not look like a tar archive gtar: Exiting with failure status due to previous errors 2 $ bsdtar tf zero.tar; echo $? 0 $ tar tf tenk.tar; echo $? tar: Cannot identify format. Searching... tar: End of archive volume 1 reached tar: Sorry, unable to determine archive format. 1 $ gtar tf tenk.tar; echo $? 0 $ bsdtar tf tenk.tar; echo $? 0 NetBSD's tar complains about both, bsdtar happily accepts any of them and GNU tar doesn't like zero-length archive files. So the safest course of action is to stay with our block-of-NULs format which is compatible with GNU tar and bsdtar, as we can't make NetBSD's native tar happy anyway. We can simplify our test, however, by taking tar out of the picture. Instead of extracting the archive and checking for the non-presence of files, check if the file has a size of 10KB and contains only NULs. This makes t5004 pass on NetBSD and OpenBSD. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-09t5004: resurrect original empty tar archive testLibravatar René Scharfe1-0/+14
Add a test to verify the emptiness of an archive by extracting its contents. Don't run this test if the version of tar doesn't support archives containing only a comment header, though. The existing check 'tar archive of empty tree is empty' used to work like that (minus the tar capability check) but was changed to depend on the exact representation of empty tar files created by git archive instead of on the behaviour of tar in order to avoid issues with different tar versions. The different approaches test different things: The existing one is for empty trees, for which we know the exact expected output and thus we can simply check it without extracting; the new one is for commits with empty trees, whose archives include stamps and so the more "natural" check by extraction is a better fit because it focuses on the interesting aspect, namely the absence of any archive entries. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-09t5004: avoid using tar for checking emptiness of archiveLibravatar René Scharfe1-3/+2
Test 2 of t5004 checks if a supposedly empty tar archive really contains no files. 24676f02 (t5004: fix issue with empty archive test and bsdtar) removed our commit hash to make it work with bsdtar, but the test still fails on NetBSD and OpenBSD, which use their own tar that considers a tar file containing only NULs as broken. Here's what the different archivers do when asked to create a tar file without entries: $ uname -v NetBSD 6.0.1 (GENERIC) $ gtar --version | head -1 tar (GNU tar) 1.26 $ bsdtar --version bsdtar 2.8.4 - libarchive 2.8.4 $ : >zero.tar $ perl -e 'print "\0" x 10240' >tenk.tar $ sha1 zero.tar tenk.tar SHA1 (zero.tar) = da39a3ee5e6b4b0d3255bfef95601890afd80709 SHA1 (tenk.tar) = 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c $ : | tar cf - -T - | sha1 da39a3ee5e6b4b0d3255bfef95601890afd80709 $ : | gtar cf - -T - | sha1 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c $ : | bsdtar cf - -T - | sha1 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c So NetBSD's native tar creates an empty file, while GNU tar and bsdtar both give us 10KB of NULs -- just like git archive with an empty tree. Now let's see how the archivers handle these two kinds of empty tar files: $ tar tf zero.tar; echo $? tar: Unexpected EOF on archive file 1 $ gtar tf zero.tar; echo $? gtar: This does not look like a tar archive gtar: Exiting with failure status due to previous errors 2 $ bsdtar tf zero.tar; echo $? 0 $ tar tf tenk.tar; echo $? tar: Cannot identify format. Searching... tar: End of archive volume 1 reached tar: Sorry, unable to determine archive format. $ gtar tf tenk.tar; echo $? 0 $ bsdtar tf tenk.tar; echo $? 0 NetBSD's tar complains about both, bsdtar happily accepts any of them and GNU tar doesn't like zero-length archive files. So the safest course of action is to stay with our block-of-NULs format which is compatible with GNU tar and bsdtar, as we can't make NetBSD's native tar happy anyway. We can simplify our test, however, by taking tar out of the picture. Instead of extracting the archive and checking for the non-presence of files, check if the file has a size of 10KB and contains only NULs. This makes t5004 pass on NetBSD and OpenBSD. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-09t5004: ignore pax global header fileLibravatar René Scharfe1-1/+1
Versions of tar that don't know pax headers -- like the ones in NetBSD 6 and OpenBSD 5.2 -- extract them as regular files. Explicitly ignore the file created for our global header when checking the list of extracted files, as this is normal and harmless fall-back behaviour. This fixes test 3 of t5004 on these platforms. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-10t5004: fix issue with empty archive test and bsdtarLibravatar René Scharfe1-1/+1
bsdtar, which is the default tar on Mac OS X, handles empty archives just fine but reports archives containing only a pax extended header comment as damaged. Work around the issue by explicitly generating the archive for the tree and not the commit, which causes git archive to omit the commit hash comment record from the tar file. Reported-by: BJ Hargrave <bj@bjhargrave.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-10archive: handle commits with an empty treeLibravatar Jeff King1-0/+102
git-archive relies on get_pathspec to convert its argv into a list of pathspecs. When get_pathspec is given an empty argv list, it returns a single pathspec, the empty string, to indicate that everything matches. When we feed this to our path_exists function, we typically see that the pathspec turns up at least one item in the tree, and we are happy. But when our tree is empty, we erroneously think it is because the pathspec is too limited, when in fact it is simply that there is nothing to be found in the tree. This is a weird corner case, but the correct behavior is almost certainly to produce an empty archive, not to exit with an error. This patch teaches git-archive to create empty archives when there is no pathspec given (we continue to complain if a pathspec is given, since it obviously is not matched). It also confirms that the tar and zip writers produce sane output in this instance. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>