summaryrefslogtreecommitdiff
path: root/t/t5312-prune-corruption.sh
AgeCommit message (Collapse)AuthorFilesLines
2021-09-27refs: turn on GIT_REF_PARANOIA by defaultLibravatar Jeff King1-2/+8
The original point of the GIT_REF_PARANOIA flag was to include broken refs in iterations, so that possibly-destructive operations would not silently ignore them (and would generally instead try to operate on the oids and fail when the objects could not be accessed). We already turned this on by default for some dangerous operations, like "repack -ad" (where missing a reachability tip would mean dropping the associated history). But it was not on for general use, even though it could easily result in the spreading of corruption (e.g., imagine cloning a repository which simply omits some of its refs because their objects are missing; the result quietly succeeds even though you did not clone everything!). This patch turns on GIT_REF_PARANOIA by default. So a clone as mentioned above would actually fail (upload-pack tells us about the broken ref, and when we ask for the objects, pack-objects fails to deliver them). This may be inconvenient when working with a corrupted repository, but: - we are better off to err on the side of complaining about corruption, and then provide mechanisms for explicitly loosening safety. - this is only one type of corruption anyway. If we are missing any other objects in the history that _aren't_ ref tips, then we'd behave similarly (happily show the ref, but then barf when we started traversing). We retain the GIT_REF_PARANOIA variable, but simply default it to "1" instead of "0". That gives the user an escape hatch for loosening this when working with a corrupt repository. It won't work across a remote connection to upload-pack (because we can't necessarily set environment variables on the remote), but there the client has other options (e.g., choosing which refs to fetch). As a bonus, this also makes ref iteration faster in general (because we don't have to call has_object_file() for each ref), though probably not noticeably so in the general case. In a repo with a million refs, it shaved a few hundred milliseconds off of upload-pack's advertisement; that's noticeable, but most repos are not nearly that large. The possible downside here is that any operation which iterates refs but doesn't ever open their objects may now quietly claim to have X when the object is corrupted (e.g., "git rev-list new-branch --not --all" will treat a broken ref as uninteresting). But again, that's not really any different than corruption below the ref level. We might have refs/heads/old-branch as non-corrupt, but we are not actively checking that we have the entire reachable history. Or the pointed-to object could even be corrupted on-disk (but our "do we have it" check would still succeed). In that sense, this is merely bringing ref-corruption in line with general object corruption. One alternative implementation would be to actually check for broken refs, and then _immediately die_ if we see any. That would cause the "rev-list --not --all" case above to abort immediately. But in many ways that's the worst of all worlds: - it still spends time looking up the objects an extra time - it still doesn't catch corruption below the ref level - it's even more inconvenient; with the current implementation of GIT_REF_PARANOIA for something like upload-pack, we can make the advertisement and let the client choose a non-broken piece of history. If we bail as soon as we see a broken ref, they cannot even see the advertisement. The test changes here show some of the fallout. A non-destructive "git repack -adk" now fails by default (but we can override it). Deleting a broken ref now actually tells the hooks the correct "before" state, rather than a confusing null oid. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27refs: omit dangling symrefs when using GIT_REF_PARANOIALibravatar Jeff King1-0/+7
Dangling symrefs aren't actually a corruption problem. It's perfectly fine for refs/remotes/origin/HEAD to point to an unborn branch. And in particular, if you are trying to establish reachability, a symref that points nowhere doesn't matter either way. Any ref it could point to will be examined during the rest of the traversal. It's possible that a symref pointing nowhere _could_ be a sign that the ref it was meant to point to was deleted accidentally (e.g., via corruption). But there is no particular reason to think that is true for any given case, and in the meantime, GIT_REF_PARANOIA kicking in automatically for some operations means they'll fail unnecessarily. So let's loosen it just a bit. The new test in t5312 shows off an example that is safe, but currently fails (and no longer does after this patch). Note that we don't do anything if the caller explicitly asked for DO_FOR_EACH_INCLUDE_BROKEN. In that case they may be looking for dangling symrefs themselves, and setting GIT_REF_PARANOIA should not _loosen_ things from what the caller asked for. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27t5312: be more assertive about command failureLibravatar Jeff King1-4/+7
When repacking or pruning in a corrupted repository, our tests in t5312 argue that it is OK to complete the operation or bail, as long as we don't actually delete the objects pointed to by the corruption. This isn't a wrong line of reasoning, but the tests are a bit permissive by using test_might_fail. The fact is that we _do_ bail currently, and if we ever stopped doing so, that would be worthy of a human investigating. So let's switch these to test_must_fail. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27t5312: test non-destructive repackLibravatar Jeff King1-0/+5
In t5312, we create a state with a broken ref, and then make sure that destructive repacks don't silently ignore the breakage (where a destructive repack is one that might drop objects). But we don't check the behavior of non-destructive repacks at all (i.e., ones where we'd keep unreachable objects). So let's add a test to confirm the current behavior, which is that they are allowed (i.e., ignoring the breakage and considering any objects it points to as unreachable). This may change in the future, but we'd like for the test suite to alert us to that fact. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27t5312: create bogus ref as necessaryLibravatar Jeff King1-6/+7
Some tests in t5312 create an illegally-named ref, and then see how various operations handle it. But between those operations, we also do some more setup (e.g., repacking), and we are subtly depending on how those setup steps react to the illegal ref. To future-proof us against those behaviors changing, let's instead create and clean up our bogus ref on demand in the tests that need it. This has two small extra advantages: - the tests are more stand-alone; we do not need an extra test to clean up the ref before moving on to other parts of the script - the creation and cleanup is together in one helper function. Because these depend on touching the refs in the filesystem directly, they may need to be tweaked for a world with alternate backends (they have not been noticed so far in the reftable work because with a non-file backend the tests don't fail; they simply become uninteresting noops because the broken ref isn't read at all). Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27t5312: drop "verbose" helperLibravatar Jeff King1-5/+5
t5312 has several uses of the "verbose" helper, as described in 8ad1652418 (t5304: use helper to report failure of "test foo = bar", 2014-10-10). Back then the "-x" trace option for tests was new, and was not as pleasant to use (e.g., some tests failed under "-x", we did not support BASH_XTRACEFD, etc). These days it is clear that "-x" is the preferred way to get extra output, and we don't need to mark up individual tests. Let's get rid of the uses of "verbose" here, as one step toward eradicating it totally. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19t5[0-4]*: adjust the references to the default branch name "main"Libravatar Johannes Schindelin1-9/+9
Carefully excluding t5310, which is developed independently of the current patch series at the time of writing, we now use `main` as default branch in t5[0-4]*. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t5[0-4]*.sh && git checkout HEAD -- t5310\*) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19tests: mark tests relying on the current default for `init.defaultBranch`Libravatar Johannes Schindelin1-0/+3
In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-28tests: remove some direct access to .git/logsLibravatar David Turner1-1/+1
Alternate refs backends might store reflogs somewhere other than .git/logs. Change most test code that directly accesses .git/logs to instead use git reflog commands. There are still a few tests which need direct access to reflogs: to check reflog permissions, to manually create reflogs from scratch, to save/restore reflogs, to check the format of raw reflog data, and to remove not just reflog contents, but the reflogs themselves. All cases which don't need direct access have been modified. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20refs.c: drop curate_packed_refsLibravatar Jeff King1-1/+1
When we delete a ref, we have to rewrite the entire packed-refs file. We take this opportunity to "curate" the packed-refs file and drop any entries that are crufty or broken. Dropping broken entries (e.g., with bogus names, or ones that point to missing objects) is actively a bad idea, as it means that we lose any notion that the data was there in the first place. Aside from the general hackiness that we might lose any information about ref "foo" while deleting an unrelated ref "bar", this may seriously hamper any attempts by the user at recovering from the corruption in "foo". They will lose the sha1 and name of "foo"; the exact pointer may still be useful even if they recover missing objects from a different copy of the repository. But worse, once the ref is gone, there is no trace of the corruption. A follow-up "git prune" may delete objects, even though it would otherwise bail when seeing corruption. We could just drop the "broken" bits from curate_packed_refs, and continue to drop the "crufty" bits: refs whose loose counterpart exists in the filesystem. This is not wrong to do, and it does have the advantage that we may write out a slightly smaller packed-refs file. But it has two disadvantages: 1. It is a potential source of races or mistakes with respect to these refs that are otherwise unrelated to the operation. To my knowledge, there aren't any active problems in this area, but it seems like an unnecessary risk. 2. We have to spend time looking up the matching loose refs for every item in the packed-refs file. If you have a large number of packed refs that do not change, that outweighs the benefit from writing out a smaller packed-refs file (it doesn't get smaller, and you do a bunch of directory traversal to find that out). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20repack: turn on "ref paranoia" when doing a destructive repackLibravatar Jeff King1-1/+1
If we are repacking with "-ad", we will drop any unreachable objects. Likewise, using "-Ad --unpack-unreachable=<time>" will drop any old, unreachable objects. In these cases, we want to make sure the reachability we compute with "--all" is complete. We can do this by passing GIT_REF_PARANOIA=1 in the environment to pack-objects. Note that "-Ad" is safe already, because it only loosens unreachable objects. It is up to "git prune" to avoid deleting them. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20prune: turn on ref_paranoia flagLibravatar Jeff King1-2/+2
Prune should know about broken objects at the tips of refs, so that we can feed them to our traversal rather than ignoring them. It's better for us to abort the operation on the broken object than it is to start deleting objects with an incomplete view of the reachability namespace. Note that for missing objects, aborting is the best we can do. For a badly-named ref, we technically could use its sha1 as a reachability tip. However, the iteration code just feeds us a null sha1, so there would be a reasonable amount of code involved to pass down our wishes. It's not really worth trying to do better, because this is a case that should happen extremely rarely, and the message we provide: fatal: unable to parse object: refs/heads/bogus:name is probably enough to point the user in the right direction. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20t5312: test object deletion code paths in a corrupted repositoryLibravatar Jeff King1-0/+114
When we are doing a destructive operation like "git prune", we want to be extra careful that the set of reachable tips we compute is valid. If there is any corruption or oddity, we are better off aborting the operation and letting the user figure things out rather than plowing ahead and possibly deleting some data that cannot be recovered. The tests here include: 1. Pruning objects mentioned only be refs with invalid names. This used to abort prior to d0f810f (refs.c: allow listing and deleting badly named refs, 2014-09-03), but since then we silently ignore the tip. Likewise, we test repacking that can drop objects (either "-ad", which drops anything unreachable, or "-Ad --unpack-unreachable=<time>", which tries to optimize out a loose object write that would be directly pruned). 2. Pruning objects when some refs point to missing objects. We don't know whether any dangling objects would have been reachable from the missing objects. We are better to keep them around, as they are better than nothing for helping the user recover history. 3. Packed refs that point to missing objects can sometimes be dropped. By itself, this is more of an annoyance (you do not have the object anyway; even if you can recover it from elsewhere, all you are losing is a placeholder for your state at the time of corruption). But coupled with (2), if we drop the ref and then go on to prune, we may lose unrecoverable objects. Note that we use test_might_fail for some of the operations. In some cases, it would be appropriate to abort the operation, and in others, it might be acceptable to continue but taking the information into account. The tests don't care either way, and check only for data loss. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>