summaryrefslogtreecommitdiff
path: root/fsck.c
AgeCommit message (Collapse)AuthorFilesLines
2018-09-27Sync with 2.17.2Libravatar Junio C Hamano1-0/+14
* maint-2.17: Git 2.17.2 fsck: detect submodule paths starting with dash fsck: detect submodule urls starting with dash Git 2.16.5 Git 2.15.3 Git 2.14.5 submodule-config: ban submodule paths that start with a dash submodule-config: ban submodule urls that start with dash submodule--helper: use "--" to signal end of clone options
2018-09-27fsck: detect submodule paths starting with dashLibravatar Jeff King1-0/+7
As with urls, submodule paths with dashes are ignored by git, but may end up confusing older versions. Detecting them via fsck lets us prevent modern versions of git from being a vector to spread broken .gitmodules to older versions. Compared to blocking leading-dash urls, though, this detection may be less of a good idea: 1. While such paths provide confusing and broken results, they don't seem to actually work as option injections against anything except "cd". In particular, the submodule code seems to canonicalize to an absolute path before running "git clone" (so it passes /your/clone/-sub). 2. It's more likely that we may one day make such names actually work correctly. Even after we revert this fsck check, it will continue to be a hassle until hosting servers are all updated. On the other hand, it's not entirely clear that the behavior in older versions is safe. And if we do want to eventually allow this, we may end up doing so with a special syntax anyway (e.g., writing "./-sub" in the .gitmodules file, and teaching the submodule code to canonicalize it when comparing). So on balance, this is probably a good protection. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-27fsck: detect submodule urls starting with dashLibravatar Jeff King1-0/+7
Urls with leading dashes can cause mischief on older versions of Git. We should detect them so that they can be rejected by receive.fsckObjects, preventing modern versions of git from being a vector by which attacks can spread. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-13Merge branch 'jk/submodule-fsck-loose-fixup'Libravatar Junio C Hamano1-1/+2
Finishing touches to a topic that already is in 'maint'. * jk/submodule-fsck-loose-fixup: fsck: avoid looking at NULL blob->object t7415: don't bother creating commit for symlink test
2018-06-11fsck: avoid looking at NULL blob->objectLibravatar Jeff King1-1/+2
Commit 159e7b080b (fsck: detect gitmodules files, 2018-05-02) taught fsck to look at the content of .gitmodules files. If the object turns out not to be a blob at all, we just complain and punt on checking the content. And since this was such an obvious and trivial code path, I didn't even bother to add a test. Except it _does_ do one non-trivial thing, which is call the report() function, which wants us to pass a pointer to a "struct object". Which we don't have (we have only a "struct object_id"). So we erroneously pass a NULL object to report(), which gets dereferenced and causes a segfault. It seems like we could refactor report() to just take the object_id itself. But we pass the object pointer along to a callback function, and indeed this ends up in builtin/fsck.c's objreport() which does want to look at other parts of the object (like the type). So instead, let's just use lookup_unknown_object() to get the real "struct object", and pass that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-30Merge branch 'bc/object-id'Libravatar Junio C Hamano1-9/+11
Conversion from uchar[20] to struct object_id continues. * bc/object-id: (42 commits) merge-one-file: compute empty blob object ID add--interactive: compute the empty tree value Update shell scripts to compute empty tree object ID sha1_file: only expose empty object constants through git_hash_algo dir: use the_hash_algo for empty blob object ID sequencer: use the_hash_algo for empty tree object ID cache-tree: use is_empty_tree_oid sha1_file: convert cached object code to struct object_id builtin/reset: convert use of EMPTY_TREE_SHA1_BIN builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX wt-status: convert two uses of EMPTY_TREE_SHA1_HEX submodule: convert several uses of EMPTY_TREE_SHA1_HEX sequencer: convert one use of EMPTY_TREE_SHA1_HEX merge: convert empty tree constant to the_hash_algo builtin/merge: switch tree functions to use object_id builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo sha1-file: add functions for hex empty tree and blob OIDs builtin/receive-pack: avoid hard-coded constants for push certs diff: specify abbreviation size in terms of the_hash_algo upload-pack: replace use of several hard-coded constants ...
2018-05-29Sync with Git 2.17.1Libravatar Junio C Hamano1-4/+134
* maint: (25 commits) Git 2.17.1 Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 fsck: complain when .gitmodules is a symlink index-pack: check .gitmodules files with --strict unpack-objects: call fsck_finish() after fscking objects fsck: call fsck_finish() after fscking objects fsck: check .gitmodules content fsck: handle promisor objects in .gitmodules check fsck: detect gitmodules files fsck: actually fsck blob data fsck: simplify ".git" check index-pack: make fsck error message more specific verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant ...
2018-05-23Merge branch 'ds/lazy-load-trees'Libravatar Junio C Hamano1-3/+5
The code has been taught to use the duplicated information stored in the commit-graph file to learn the tree object name for a commit to avoid opening and parsing the commit object when it makes sense to do so. * ds/lazy-load-trees: coccinelle: avoid wrong transformation suggestions from commit.cocci commit-graph: lazy-load trees for commits treewide: replace maybe_tree with accessor methods commit: create get_commit_tree() method treewide: rename tree to maybe_tree
2018-05-21fsck: complain when .gitmodules is a symlinkLibravatar Jeff King1-2/+9
We've recently forbidden .gitmodules to be a symlink in verify_path(). And it's an easy way to circumvent our fsck checks for .gitmodules content. So let's complain when we see it. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: check .gitmodules contentLibravatar Jeff King1-1/+59
This patch detects and blocks submodule names which do not match the policy set forth in submodule-config. These should already be caught by the submodule code itself, but putting the check here means that newer versions of Git can protect older ones from malicious entries (e.g., a server with receive.fsckObjects will block the objects, protecting clients which fetch from it). As a side effect, this means fsck will also complain about .gitmodules files that cannot be parsed (or were larger than core.bigFileThreshold). Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: handle promisor objects in .gitmodules checkLibravatar Jeff King1-0/+3
If we have a tree that points to a .gitmodules blob but don't have that blob, we can't check its contents. This produces an fsck error when we encounter it. But in the case of a promisor object, this absence is expected, and we must not complain. Note that this can technically circumvent our transfer.fsckObjects check. Imagine a client fetches a tree, but not the matching .gitmodules blob. An fsck of the incoming objects will show that we don't have enough information. Later, we do fetch the actual blob. But we have no idea that it's a .gitmodules file. The only ways to get around this would be to re-scan all of the existing trees whenever new ones enter (which is expensive), or to somehow persist the gitmodules_found set between fsck runs (which is complicated). In practice, it's probably OK to ignore the problem. Any repository which has all of the objects (including the one serving the promisor packs) can perform the checks. Since promisor packs are inherently about a hierarchical topology in which clients rely on upstream repositories, those upstream repositories can protect all of their downstream clients from broken objects. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: detect gitmodules filesLibravatar Jeff King1-0/+58
In preparation for performing fsck checks on .gitmodules files, this commit plumbs in the actual detection of the files. Note that unlike most other fsck checks, this cannot be a property of a single object: we must know that the object is found at a ".gitmodules" path at the root tree of a commit. Since the fsck code only sees one object at a time, we have to mark the related objects to fit the puzzle together. When we see a commit we mark its tree as a root tree, and when we see a root tree with a .gitmodules file, we mark the corresponding blob to be checked. In an ideal world, we'd check the objects in topological order: commits followed by trees followed by blobs. In that case we can avoid ever loading an object twice, since all markings would be complete by the time we get to the marked objects. And indeed, if we are checking a single packfile, this is the order in which Git will generally write the objects. But we can't count on that: 1. git-fsck may show us the objects in arbitrary order (loose objects are fed in sha1 order, but we may also have multiple packs, and we process each pack fully in sequence). 2. The type ordering is just what git-pack-objects happens to write now. The pack format does not require a specific order, and it's possible that future versions of Git (or a custom version trying to fool official Git's fsck checks!) may order it differently. 3. We may not even be fscking all of the relevant objects at once. Consider pushing with transfer.fsckObjects, where one push adds a blob at path "foo", and then a second push adds the same blob at path ".gitmodules". The blob is not part of the second push at all, but we need to mark and check it. So in the general case, we need to make up to three passes over the objects: once to make sure we've seen all commits, then once to cover any trees we might have missed, and then a final pass to cover any .gitmodules blobs we found in the second pass. We can simplify things a bit by loosening the requirement that we find .gitmodules only at root trees. Technically a file like "subdir/.gitmodules" is not parsed by Git, but it's not unreasonable for us to declare that Git is aware of all ".gitmodules" files and make them eligible for checking. That lets us drop the root-tree requirement, which eliminates one pass entirely. And it makes our worst case much better: instead of potentially queueing every root tree to be re-examined, the worst case is that we queue each unique .gitmodules blob for a second look. This patch just adds the boilerplate to find .gitmodules files. The actual content checks will come in a subsequent commit. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: actually fsck blob dataLibravatar Jeff King1-1/+7
Because fscking a blob has always been a noop, we didn't bother passing around the blob data. In preparation for content-level checks, let's fix up a few things: 1. The fsck_object() function just returns success for any blob. Let's a noop fsck_blob(), which we can fill in with actual logic later. 2. The fsck_loose() function in builtin/fsck.c just threw away blob content after loading it. Let's hold onto it until after we've called fsck_object(). The easiest way to do this is to just drop the parse_loose_object() helper entirely. Incidentally, this also fixes a memory leak: if we successfully loaded the object data but did not parse it, we would have left the function without freeing it. 3. When fsck_loose() loads the object data, it does so with a custom read_loose_object() helper. This function streams any blobs, regardless of size, under the assumption that we're only checking the sha1. Instead, let's actually load blobs smaller than big_file_threshold, as the normal object-reading code-paths would do. This lets us fsck small files, and a NULL return is an indication that the blob was so big that it needed to be streamed, and we can pass that information along to fsck_blob(). Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: simplify ".git" checkLibravatar Jeff King1-3/+1
There's no need for us to manually check for ".git"; it's a subset of the other filesystem-specific tests. Dropping it makes our code slightly shorter. More importantly, the existing code may make a reader wonder why ".GIT" is not covered here, and whether that is a bug (it isn't, as it's also covered in the filesystem-specific tests). Signed-off-by: Jeff King <peff@peff.net>
2018-05-02fsck: convert static functions to struct object_idLibravatar brian m. carlson1-9/+11
Convert two static functions to use struct object_id and parse_oid_hex, instead of relying on harcoded 20 and 40-based constants. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11treewide: replace maybe_tree with accessor methodsLibravatar Derrick Stolee1-3/+5
In anticipation of making trees load lazily, create a Coccinelle script (contrib/coccinelle/commit.cocci) to ensure that all references to the 'maybe_tree' member of struct commit are either mutations or accesses through get_commit_tree() or get_commit_tree_oid(). Apply the Coccinelle script to create the rest of the patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11treewide: rename tree to maybe_treeLibravatar Derrick Stolee1-3/+3
Using the commit-graph file to walk commit history removes the large cost of parsing commits during the walk. This exposes a performance issue: lookup_tree() takes a large portion of the computation time, even when Git never uses those trees. In anticipation of lazy-loading these trees, rename the 'tree' member of struct commit to 'maybe_tree'. This serves two purposes: it hints at the future role of possibly being NULL even if the commit has a valid tree, and it allows for unambiguous transformation from simple member access (i.e. commit->maybe_tree) to method access. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14sha1_file: convert read_sha1_file to struct object_idLibravatar brian m. carlson1-1/+1
Convert read_sha1_file to take a pointer to struct object_id and rename it read_object_file. Do the same for read_sha1_file_extended. Convert one use in grep.c to use the new function without any other code change, since the pointer being passed is a void pointer that is already initialized with a pointer to struct object_id. Update the declaration and definitions of the modified functions, and apply the following semantic patch to convert the remaining callers: @@ expression E1, E2, E3; @@ - read_sha1_file(E1.hash, E2, E3) + read_object_file(&E1, E2, E3) @@ expression E1, E2, E3; @@ - read_sha1_file(E1->hash, E2, E3) + read_object_file(E1, E2, E3) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1.hash, E2, E3, E4) + read_object_file_extended(&E1, E2, E3, E4) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1->hash, E2, E3, E4) + read_object_file_extended(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-14object: rename function 'typename' to 'type_name'Libravatar Brandon Williams1-1/+1
Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-11Merge branch 'rs/fsck-null-return-from-lookup'Libravatar Junio C Hamano1-4/+4
Improve behaviour of "git fsck" upon finding a missing object. * rs/fsck-null-return-from-lookup: fsck: handle NULL return of lookup_blob() and lookup_tree()
2017-10-06fsck: handle NULL return of lookup_blob() and lookup_tree()Libravatar René Scharfe1-4/+4
lookup_blob() and lookup_tree() can return NULL if they find an object of an unexpected type. Accessing the object member is undefined in that case. Cast the result to a struct object pointer instead; we can do that because object is the first member of all object types. This trick is already used in other places in the code. An error message is already shown by object_as_type(), which is called by the lookup functions. The walk callback functions are expected to handle NULL object pointers passed to them, but put_object_name() needs a valid object, so avoid calling it without one. Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22consistently use "fallthrough" comments in switchesLibravatar Jeff King1-0/+1
Gcc 7 adds -Wimplicit-fallthrough, which can warn when a switch case falls through to the next case. The general idea is that the compiler can't tell if this was intentional or not, so you should annotate any intentional fall-throughs as such, leaving it to complain about any unannotated ones. There's a GNU __attribute__ which can be used for annotation, but of course we'd have to #ifdef it away on non-gcc compilers. Gcc will also recognize specially-formatted comments, which matches our current practice. Let's extend that practice to all of the unannotated sites (which I did look over and verify that they were behaving as intended). Ideally in each case we'd actually give some reasons in the comment about why we're falling through, or what we're falling through to. And gcc does support that with -Wimplicit-fallthrough=2, which relaxes the comment pattern matching to anything that contains "fallthrough" (or a variety of spelling variants). However, this isn't the default for -Wimplicit-fallthrough, nor for -Wextra. In the name of simplicity, it's probably better for us to support the default level, which requires "fallthrough" to be the only thing in the comment (modulo some window dressing like "else" and some punctuation; see the gcc manual for the complete set of patterns). This patch suppresses all warnings due to -Wimplicit-fallthrough. We might eventually want to add that to the DEVELOPER Makefile knob, but we should probably wait until gcc 7 is more widely adopted (since earlier versions will complain about the unknown warning type). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-13commit: convert lookup_commit_graft to struct object_idLibravatar Stefan Beller1-1/+1
With this patch, commit.h doesn't contain the string 'sha1' any more. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-29Merge branch 'bc/object-id'Libravatar Junio C Hamano1-3/+3
Conversion from uchar[20] to struct object_id continues. * bc/object-id: (53 commits) object: convert parse_object* to take struct object_id tree: convert parse_tree_indirect to struct object_id sequencer: convert do_recursive_merge to struct object_id diff-lib: convert do_diff_cache to struct object_id builtin/ls-tree: convert to struct object_id merge: convert checkout_fast_forward to struct object_id sequencer: convert fast_forward_to to struct object_id builtin/ls-files: convert overlay_tree_on_cache to object_id builtin/read-tree: convert to struct object_id sha1_name: convert internals of peel_onion to object_id upload-pack: convert remaining parse_object callers to object_id revision: convert remaining parse_object callers to object_id revision: rename add_pending_sha1 to add_pending_oid http-push: convert process_ls_object and descendants to object_id refs/files-backend: convert many internals to struct object_id refs: convert struct ref_update to use struct object_id ref-filter: convert some static functions to struct object_id Convert struct ref_array_item to struct object_id Convert the verify_pack callback to struct object_id Convert lookup_tag to struct object_id ...
2017-05-08object: convert parse_object* to take struct object_idLibravatar brian m. carlson1-1/+1
Make parse_object, parse_object_or_die, and parse_object_buffer take a pointer to struct object_id. Remove the temporary variables inserted earlier, since they are no longer necessary. Transform all of the callers using the following semantic patch: @@ expression E1; @@ - parse_object(E1.hash) + parse_object(&E1) @@ expression E1; @@ - parse_object(E1->hash) + parse_object(E1) @@ expression E1, E2; @@ - parse_object_or_die(E1.hash, E2) + parse_object_or_die(&E1, E2) @@ expression E1, E2; @@ - parse_object_or_die(E1->hash, E2) + parse_object_or_die(E1, E2) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1.hash, E2, E3, E4, E5) + parse_object_buffer(&E1, E2, E3, E4, E5) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1->hash, E2, E3, E4, E5) + parse_object_buffer(E1, E2, E3, E4, E5) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08Convert lookup_tree to struct object_idLibravatar brian m. carlson1-1/+1
Convert the lookup_tree function to take a pointer to struct object_id. The commit was created with manual changes to tree.c, tree.h, and object.c, plus the following semantic patch: @@ @@ - lookup_tree(EMPTY_TREE_SHA1_BIN) + lookup_tree(&empty_tree_oid) @@ expression E1; @@ - lookup_tree(E1.hash) + lookup_tree(&E1) @@ expression E1; @@ - lookup_tree(E1->hash) + lookup_tree(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08Convert lookup_blob to struct object_idLibravatar brian m. carlson1-1/+1
Convert lookup_blob to take a pointer to struct object_id. The commit was created with manual changes to blob.c and blob.h, plus the following semantic patch: @@ expression E1; @@ - lookup_blob(E1.hash) + lookup_blob(&E1) @@ expression E1; @@ - lookup_blob(E1->hash) + lookup_blob(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-23parse_timestamp(): specify explicitly where we parse timestampsLibravatar Johannes Schindelin1-1/+1
Currently, Git's source code represents all timestamps as `unsigned long`. In preparation for using a more appropriate data type, let's introduce a symbol `parse_timestamp` (currently being defined to `strtoul`) where appropriate, so that we can later easily switch to, say, use `strtoull()` instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31Rename sha1_array to oid_arrayLibravatar brian m. carlson1-3/+3
Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31Convert sha1_array_lookup to take struct object_idLibravatar brian m. carlson1-1/+1
Convert this function by changing the declaration and definition and applying the following semantic patch to update the callers: @@ expression E1, E2; @@ - sha1_array_lookup(E1, E2.hash) + sha1_array_lookup(E1, &E2) @@ expression E1, E2; @@ - sha1_array_lookup(E1, E2->hash) + sha1_array_lookup(E1, E2) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31Make sha1_array_append take a struct object_id *Libravatar brian m. carlson1-1/+1
Convert the callers to pass struct object_id by changing the function declaration and definition and applying the following semantic patch: @@ expression E1, E2; @@ - sha1_array_append(E1, E2.hash) + sha1_array_append(E1, &E2) @@ expression E1, E2; @@ - sha1_array_append(E1, E2->hash) + sha1_array_append(E1, E2) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-28sha1-array: convert internal storage for struct sha1_array to object_idLibravatar brian m. carlson1-2/+2
Make the internal storage for struct sha1_array use an array of struct object_id internally. Update the users of this struct which inspect its internals. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-28fsck: convert init_skiplist to struct object_idLibravatar brian m. carlson1-5/+6
Convert a hardcoded constant buffer size to a use of GIT_MAX_HEXSZ, and use parse_oid_hex to reduce the dependency on the size of the hash. This function is a caller of sha1_array_append, which will be converted later. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-26fsck: lazily load types under --connectivity-onlyLibravatar Jeff King1-0/+4
The recent fixes to "fsck --connectivity-only" load all of the objects with their correct types. This keeps the connectivity-only code path close to the regular one, but it also introduces some unnecessary inefficiency. While getting the type of an object is cheap compared to actually opening and parsing the object (as the non-connectivity-only case would do), it's still not free. For reachable non-blob objects, we end up having to parse them later anyway (to see what they point to), making our type lookup here redundant. For unreachable objects, we might never hit them at all in the reachability traversal, making the lookup completely wasted. And in some cases, we might have quite a few unreachable objects (e.g., when alternates are used for shared object storage between repositories, it's normal for there to be objects reachable from other repositories but not the one running fsck). The comment in mark_object_for_connectivity() claims two benefits to getting the type up front: 1. We need to know the types during fsck_walk(). (And not explicitly mentioned, but we also need them when printing the types of broken or dangling commits). We can address this by lazy-loading the types as necessary. Most objects never need this lazy-load at all, because they fall into one of these categories: a. Reachable from our tips, and are coerced into the correct type as we traverse (e.g., a parent link will call lookup_commit(), which converts OBJ_NONE to OBJ_COMMIT). b. Unreachable, but not at the tip of a chunk of unreachable history. We only mention the tips as "dangling", so an unreachable commit which links to hundreds of other objects needs only report the type of the tip commit. 2. It serves as a cross-check that the coercion in (1a) is correct (i.e., we'll complain about a parent link that points to a blob). But we get most of this for free already, because right after coercing, we'll parse any non-blob objects. So we'd notice then if we expected a commit and got a blob. The one exception is when we expect a blob, in which case we never actually read the object contents. So this is a slight weakening, but given that the whole point of --connectivity-only is to sacrifice some data integrity checks for speed, this seems like an acceptable tradeoff. Here are before and after timings for an extreme case with ~5M reachable objects and another ~12M unreachable (it's the torvalds/linux repository on GitHub, connected to shared storage for all of the other kernel forks): [before] $ time git fsck --no-dangling --connectivity-only real 3m4.323s user 1m25.121s sys 1m38.710s [after] $ time git fsck --no-dangling --connectivity-only real 0m51.497s user 0m49.575s sys 0m1.776s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27fsck: handle bad trees like other errorsLibravatar David Turner1-6/+12
Instead of dying when fsck hits a malformed tree object, log the error like any other and continue. Now fsck can tell the user which tree is bad, too. Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18fsck: optionally show more helpful info for broken linksLibravatar Johannes Schindelin1-4/+17
When reporting broken links between commits/trees/blobs, it would be quite helpful at times if the user would be told how the object is supposed to be reachable. With the new --name-objects option, git-fsck will try to do exactly that: name the objects in a way that shows how they are reachable. For example, when some reflog got corrupted and a blob is missing that should not be, the user might want to remove the corresponding reflog entry. This option helps them find that entry: `git fsck` will now report something like this: broken link from tree b5eb6ff... (refs/stash@{<date>}~37:) to blob ec5cf80... Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18fsck: give the error function a chance to see the fsck_optionsLibravatar Johannes Schindelin1-2/+3
We will need this in the next commit, where fsck will be taught to optionally name the objects when reporting issues about them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18fsck_walk(): optionally name objects on the goLibravatar Johannes Schindelin1-4/+83
If fsck_options->name_objects is initialized, and if it already has name(s) for the object(s) that are to be the starting point(s) for fsck_walk(), then that function will now add names for the objects that were walked. This will be highly useful for teaching git-fsck to identify root causes for broken links, which is the task for the next patch in this series. Note that this patch opts for decorating the objects with plain strings instead of full-blown structs (à la `struct rev_name` in the code of the `git name-rev` command), for several reasons: - the code is much simpler than if it had to work with structs that describe arbitrarily long names such as "master~14^2~5:builtin/am.c", - the string processing is actually quite light-weight compared to the rest of fsck's operation, - the caller of fsck_walk() is expected to provide names for the starting points, and using plain and simple strings is just the easiest way to do that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-17Merge branch 'jc/fsck-nul-in-commit'Libravatar Junio C Hamano1-3/+13
"git fsck" learned to catch NUL byte in a commit object as potential error and warn. * jc/fsck-nul-in-commit: fsck: detect and warn a commit with embedded NUL fsck_commit_buffer(): do not special case the last validation
2016-05-10fsck: detect and warn a commit with embedded NULLibravatar Junio C Hamano1-0/+8
Even though a Git commit object is designed to be capable of storing any binary data as its payload, in practice people use it to describe the changes in textual form, and tools like "git log" are designed to treat the payload as text. Detect and warn when we see any commit object with a NUL byte in it. Note that a NUL byte in the header part is already detected as a grave error. This change is purely about the message part. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25tree-walk: convert tree_entry_extract() to use struct object_idLibravatar brian m. carlson1-3/+3
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25struct name_entry: use struct object_id instead of unsigned char sha1[20]Libravatar brian m. carlson1-2/+2
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-14fsck_commit_buffer(): do not special case the last validationLibravatar Junio C Hamano1-3/+5
The pattern taken by all the validations in this function is: if (notice a violation exists) { err = report(... VIOLATION_KIND ...); if (err) return err; } where report() returns zero if specified kind of violation is set to be ignored, and otherwise shows an error message and returns non-zero. The last validation in the function immediately before the function returns 0 to declare "all good" can cheat and directly return the return value from report(), and the current code does so, i.e. if (notice a violation exists) return report(... VIOLATION_KIND ...); return 0; But that is a selfish code that declares it is the ultimate and final form of the function, never to be enhanced later. To allow and invite future enhancements, make the last test follow the same pattern. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22convert trivial cases to ALLOC_ARRAYLibravatar Jeff King1-1/+2
Each of these cases can be converted to use ALLOC_ARRAY or REALLOC_ARRAY, which has two advantages: 1. It automatically checks the array-size multiplication for overflow. 2. It always uses sizeof(*array) for the element-size, so that it can never go out of sync with the declared type of the array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-10Merge branch 'bc/object-id'Libravatar Junio C Hamano1-7/+7
More transition from "unsigned char[40]" to "struct object_id". This needed a few merge fixups, but is mostly disentangled from other topics. * bc/object-id: remote: convert functions to struct object_id Remove get_object_hash. Convert struct object to object_id Add several uses of get_object_hash. object: introduce get_object_hash macro. ref_newer: convert to use struct object_id push_refs_with_export: convert to struct object_id get_remote_heads: convert to struct object_id parse_fetch: convert to use struct object_id add_sought_entry_mem: convert to struct object_id Convert struct ref to use object_id. sha1_file: introduce has_object_file helper.
2015-11-20fsck: treat a NUL in a tag header as an errorLibravatar René Scharfe1-1/+2
We check the return value of verify_header() for commits already, so do the same for tags as well. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net>
2015-11-20Remove get_object_hash.Libravatar brian m. carlson1-3/+3
Convert all instances of get_object_hash to use an appropriate reference to the hash member of the oid member of struct object. This provides no functional change, as it is essentially a macro substitution. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-11-20Convert struct object to object_idLibravatar brian m. carlson1-4/+4
struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-11-20Add several uses of get_object_hash.Libravatar brian m. carlson1-3/+3
Convert most instances where the sha1 member of struct object is dereferenced to use get_object_hash. Most instances that are passed to functions that have versions taking struct object_id, such as get_sha1_hex/get_oid_hex, or instances that can be trivially converted to use struct object_id instead, are not converted. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-08-03Merge branch 'js/fsck-opt'Libravatar Junio C Hamano1-119/+436
Allow ignoring fsck errors on specific set of known-to-be-bad objects, and also tweaking warning level of various kinds of non critical breakages reported. * js/fsck-opt: fsck: support ignoring objects in `git fsck` via fsck.skiplist fsck: git receive-pack: support excluding objects from fsck'ing fsck: introduce `git fsck --connectivity-only` fsck: support demoting errors to warnings fsck: document the new receive.fsck.<msg-id> options fsck: allow upgrading fsck warnings to errors fsck: optionally ignore specific fsck issues completely fsck: disallow demoting grave fsck errors to warnings fsck: add a simple test for receive.fsck.<msg-id> fsck: make fsck_tag() warn-friendly fsck: handle multiple authors in commits specially fsck: make fsck_commit() warn-friendly fsck: make fsck_ident() warn-friendly fsck: report the ID of the error/warning fsck (receive-pack): allow demoting errors to warnings fsck: offer a function to demote fsck errors to warnings fsck: provide a function to parse fsck message IDs fsck: introduce identifiers for fsck messages fsck: introduce fsck options