From e54501004abbd20fa8d813c1e5b82c3b50bb9361 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Sat, 28 Jul 2012 11:03:01 -0400 Subject: diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t4054-diff-bogus-tree.sh | 83 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100755 t/t4054-diff-bogus-tree.sh (limited to 't') diff --git a/t/t4054-diff-bogus-tree.sh b/t/t4054-diff-bogus-tree.sh new file mode 100755 index 0000000000..0843c87890 --- /dev/null +++ b/t/t4054-diff-bogus-tree.sh @@ -0,0 +1,83 @@ +#!/bin/sh + +test_description='test diff with a bogus tree containing the null sha1' +. ./test-lib.sh + +empty_tree=4b825dc642cb6eb9a060e54bf8d69288fbee4904 + +test_expect_success 'create bogus tree' ' + bogus_tree=$( + printf "100644 fooQQQQQQQQQQQQQQQQQQQQQ" | + q_to_nul | + git hash-object -w --stdin -t tree + ) +' + +test_expect_success 'create tree with matching file' ' + echo bar >foo && + git add foo && + good_tree=$(git write-tree) + blob=$(git rev-parse :foo) +' + +test_expect_success 'raw diff shows null sha1 (addition)' ' + echo ":000000 100644 $_z40 $_z40 A foo" >expect && + git diff-tree $empty_tree $bogus_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (removal)' ' + echo ":100644 000000 $_z40 $_z40 D foo" >expect && + git diff-tree $bogus_tree $empty_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (modification)' ' + echo ":100644 100644 $blob $_z40 M foo" >expect && + git diff-tree $good_tree $bogus_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (other direction)' ' + echo ":100644 100644 $_z40 $blob M foo" >expect && + git diff-tree $bogus_tree $good_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (reverse)' ' + echo ":100644 100644 $_z40 $blob M foo" >expect && + git diff-tree -R $good_tree $bogus_tree >actual && + test_cmp expect actual +' + +test_expect_success 'raw diff shows null sha1 (index)' ' + echo ":100644 100644 $_z40 $blob M foo" >expect && + git diff-index $bogus_tree >actual && + test_cmp expect actual +' + +test_expect_success 'patch fails due to bogus sha1 (addition)' ' + test_must_fail git diff-tree -p $empty_tree $bogus_tree +' + +test_expect_success 'patch fails due to bogus sha1 (removal)' ' + test_must_fail git diff-tree -p $bogus_tree $empty_tree +' + +test_expect_success 'patch fails due to bogus sha1 (modification)' ' + test_must_fail git diff-tree -p $good_tree $bogus_tree +' + +test_expect_success 'patch fails due to bogus sha1 (other direction)' ' + test_must_fail git diff-tree -p $bogus_tree $good_tree +' + +test_expect_success 'patch fails due to bogus sha1 (reverse)' ' + test_must_fail git diff-tree -R -p $good_tree $bogus_tree +' + +test_expect_success 'patch fails due to bogus sha1 (index)' ' + test_must_fail git diff-index -p $bogus_tree +' + +test_done -- cgit v1.2.3 From 4337b5856f88f18da47c176e3cbc95a35627044c Mon Sep 17 00:00:00 2001 From: Jeff King Date: Sat, 28 Jul 2012 11:05:24 -0400 Subject: do not write null sha1s to on-disk index We should never need to write the null sha1 into an index entry (short of the 1 in 2^160 chance that somebody actually has content that hashes to it). If we attempt to do so, it is much more likely that it is a bug, since we use the null sha1 as a sentinel value to mean "not valid". The presence of null sha1s in the index (which can come from, among other things, "update-index --cacheinfo", or by reading a corrupted tree) can cause problems for later readers, because they cannot distinguish the literal null sha1 from its use a sentinel value. For example, "git diff-files" on such an entry would make it appear as if it is stat-dirty, and until recently, the diff code assumed such an entry meant that we should be diffing a working tree file rather than a blob. Ideally, we would stop such entries from entering even our in-core index. However, we do sometimes legitimately add entries with null sha1s in order to represent these sentinel situations; simply forbidding them in add_index_entry breaks a lot of the existing code. However, we can at least make sure that our in-core sentinel representation never makes it to disk. To be thorough, we will test an attempt to add both a blob and a submodule entry. In the former case, we might run into problems anyway because we will be missing the blob object. But in the latter case, we do not enforce connectivity across gitlink entries, making this our only point of enforcement. The current implementation does not care which type of entry we are seeing, but testing both cases helps future-proof the test suite in case that changes. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t2107-update-index-basic.sh | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 't') diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh index 809fafe208..0dbbb00d74 100755 --- a/t/t2107-update-index-basic.sh +++ b/t/t2107-update-index-basic.sh @@ -29,4 +29,23 @@ test_expect_success 'update-index -h with corrupt index' ' grep "[Uu]sage: git update-index" broken/usage ' +test_expect_success '--cacheinfo does not accept blob null sha1' ' + echo content >file && + git add file && + git rev-parse :file >expect && + test_must_fail git update-index --cacheinfo 100644 $_z40 file && + git rev-parse :file >actual && + test_cmp expect actual +' + +test_expect_success '--cacheinfo does not accept gitlink null sha1' ' + git init submodule && + (cd submodule && test_commit foo) && + git add submodule && + git rev-parse :submodule >expect && + test_must_fail git update-index --cacheinfo 160000 $_z40 submodule && + git rev-parse :submodule >actual && + test_cmp expect actual +' + test_done -- cgit v1.2.3 From c479d14a80743b1cb86d77695607f4c81f7d8797 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Sat, 28 Jul 2012 11:06:29 -0400 Subject: fsck: detect null sha1 in tree entries Short of somebody happening to beat the 1 in 2^160 odds of actually generating content that hashes to the null sha1, we should never see this value in a tree entry. So let's have fsck warn if it it seen. As in the previous commit, we test both blob and submodule entries to future-proof the test suite against the implementation depending on connectivity to notice the error. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t1450-fsck.sh | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 't') diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 5b8ebd8053..5e36cc71b4 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -217,4 +217,30 @@ test_expect_success 'rev-list --verify-objects with bad sha1' ' grep -q "error: sha1 mismatch 63ffffffffffffffffffffffffffffffffffffff" out ' +_bz='\0' +_bz5="$_bz$_bz$_bz$_bz$_bz" +_bz20="$_bz5$_bz5$_bz5$_bz5" + +test_expect_success 'fsck notices blob entry pointing to null sha1' ' + (git init null-blob && + cd null-blob && + sha=$(printf "100644 file$_bz$_bz20" | + git hash-object -w --stdin -t tree) && + git fsck 2>out && + cat out && + grep "warning.*null sha1" out + ) +' + +test_expect_success 'fsck notices submodule entry pointing to null sha1' ' + (git init null-commit && + cd null-commit && + sha=$(printf "160000 submodule$_bz$_bz20" | + git hash-object -w --stdin -t tree) && + git fsck 2>out && + cat out && + grep "warning.*null sha1" out + ) +' + test_done -- cgit v1.2.3 From b622d4d11d27fd290f7732c6a65f40c054796c1f Mon Sep 17 00:00:00 2001 From: Thomas Rast Date: Mon, 30 Jul 2012 21:25:40 +0200 Subject: send-email: improve RFC2047 quote parsing The RFC2047 unquoting, used to parse email addresses in From and Cc headers, is broken in several ways: * It erroneously substitutes ' ' for '_' in *the whole* header, even outside the quoted field. [Noticed by Christoph.] * It is too liberal in its matching, and happily matches the start of one quoted chunk against the end of another, or even just something that looks like such an end. [Noticed by Junio.] * It fundamentally cannot cope with encodings that are not a superset of ASCII, nor several (incompatible) encodings in the same header. This patch fixes the first two by doing a more careful decoding of the outer quoting (e.g. "=AB" to represent an octet whose value is 0xAB). Fixing the fundamental issues is left for a future, more intrusive, patch. Noticed-by: Christoph Miebach Helped-by: Junio C Hamano Signed-off-by: Thomas Rast Signed-off-by: Junio C Hamano --- t/t9001-send-email.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 't') diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh index 8c12c65c72..035122808b 100755 --- a/t/t9001-send-email.sh +++ b/t/t9001-send-email.sh @@ -841,6 +841,19 @@ test_expect_success $PREREQ '--compose adds MIME for utf8 subject' ' grep "^Subject: =?UTF-8?q?utf8-s=C3=BCbj=C3=ABct?=" msgtxt1 ' +test_expect_success $PREREQ 'utf8 author is correctly passed on' ' + clean_fake_sendmail && + test_commit weird_author && + test_when_finished "git reset --hard HEAD^" && + git commit --amend --author "Füñný Nâmé " && + git format-patch --stdout -1 >funny_name.patch && + git send-email --from="Example " \ + --to=nobody@example.com \ + --smtp-server="$(pwd)/fake.sendmail" \ + funny_name.patch && + grep "^From: Füñný Nâmé " msgtxt1 +' + test_expect_success $PREREQ 'detects ambiguous reference/file conflict' ' echo master > master && git add master && -- cgit v1.2.3 From 2c3fd4bbb42b586b016319d2009a05fe5b878533 Mon Sep 17 00:00:00 2001 From: Brandon Casey Date: Mon, 6 Aug 2012 22:01:48 -0700 Subject: t/t5400: demonstrate breakage caused by informational message from prune When receive-pack triggers 'git gc --auto' and 'git prune' is called to remove a stale temporary object, 'git prune' prints an informational message to stdout about the file that it will remove. Since this message is written to stdout, it is sent back over the transport channel to the git client which tries to interpret it as part of the pack protocol and then promptly terminates with a complaint about a protocol error. Introduce a test which exercises the auto-gc functionality of receive-pack and demonstrates this breakage. Signed-off-by: Brandon Casey Signed-off-by: Junio C Hamano --- t/t5400-send-pack.sh | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) (limited to 't') diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh index 0eace37a03..04a87913ed 100755 --- a/t/t5400-send-pack.sh +++ b/t/t5400-send-pack.sh @@ -145,6 +145,41 @@ test_expect_success 'push --all excludes remote-tracking hierarchy' ' ) ' +test_expect_failure 'receive-pack runs auto-gc in remote repo' ' + rm -rf parent child && + git init parent && + ( + # Setup a repo with 2 packs + cd parent && + echo "Some text" >file.txt && + git add . && + git commit -m "Initial commit" && + git repack -adl && + echo "Some more text" >>file.txt && + git commit -a -m "Second commit" && + git repack + ) && + cp -a parent child && + ( + # Set the child to auto-pack if more than one pack exists + cd child && + git config gc.autopacklimit 1 && + git branch test_auto_gc && + # And create a file that follows the temporary object naming + # convention for the auto-gc to remove + : >.git/objects/tmp_test_object && + test-chmtime =-1209601 .git/objects/tmp_test_object + ) && + ( + cd parent && + echo "Even more text" >>file.txt && + git commit -a -m "Third commit" && + git send-pack ../child HEAD:refs/heads/test_auto_gc >output 2>&1 && + grep "Auto packing the repository for optimum performance." output + ) && + test ! -e child/.git/objects/tmp_test_object +' + rewound_push_setup() { rm -rf parent child && mkdir parent && -- cgit v1.2.3 From 4b7f2fa4c6c2c4675ab00474d419fa356afdfa71 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Mon, 6 Aug 2012 22:31:10 -0700 Subject: receive-pack: do not leak output from auto-gc to standard output The standard output channel of receive-pack is a structured protocol channel, and subprocesses must never be allowed to leak anything into it by writing to their standard output. Use RUN_COMMAND_STDOUT_TO_STDERR option to run_command_v_opt() just like we do when running hooks to prevent output from "gc" leaking to the standard output. Signed-off-by: Junio C Hamano --- t/t5400-send-pack.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 't') diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh index 04a87913ed..250c720c14 100755 --- a/t/t5400-send-pack.sh +++ b/t/t5400-send-pack.sh @@ -145,7 +145,7 @@ test_expect_success 'push --all excludes remote-tracking hierarchy' ' ) ' -test_expect_failure 'receive-pack runs auto-gc in remote repo' ' +test_expect_success 'receive-pack runs auto-gc in remote repo' ' rm -rf parent child && git init parent && ( -- cgit v1.2.3