summaryrefslogtreecommitdiff
path: root/fsck.c
AgeCommit message (Collapse)AuthorFilesLines
2019-12-09Sync with Git 2.24.1Libravatar Junio C Hamano1-1/+24
2019-12-06Sync with 2.22.2Libravatar Johannes Schindelin1-1/+24
* maint-2.22: (43 commits) Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors ...
2019-12-06Sync with 2.21.1Libravatar Johannes Schindelin1-1/+24
* maint-2.21: (42 commits) Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh ...
2019-12-06Sync with 2.20.2Libravatar Johannes Schindelin1-1/+24
* maint-2.20: (36 commits) Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories ...
2019-12-06Sync with 2.19.3Libravatar Johannes Schindelin1-1/+24
* maint-2.19: (34 commits) Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams ...
2019-12-06Sync with 2.18.2Libravatar Johannes Schindelin1-1/+24
* maint-2.18: (33 commits) Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up ...
2019-12-06Sync with 2.17.3Libravatar Johannes Schindelin1-1/+24
* maint-2.17: (32 commits) Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names ...
2019-12-06fsck: reject submodule.update = !command in .gitmodulesLibravatar Jonathan Nieder1-0/+7
This allows hosting providers to detect whether they are being used to attack users using malicious 'update = !command' settings in .gitmodules. Since ac1fbbda2013 (submodule: do not copy unknown update mode from .gitmodules, 2013-12-02), in normal cases such settings have been treated as 'update = none', so forbidding them should not produce any collateral damage to legitimate uses. A quick search does not reveal any repositories making use of this construct, either. Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-06Sync with 2.16.6Libravatar Johannes Schindelin1-1/+17
* maint-2.16: (31 commits) Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses ...
2019-12-06Sync with 2.14.6Libravatar Johannes Schindelin1-1/+10
* maint-2.14: (28 commits) Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses clone --recurse-submodules: prevent name squatting on Windows is_ntfs_dotgit(): only verify the leading segment test-path-utils: offer to run a protectNTFS/protectHFS benchmark ...
2019-12-05is_ntfs_dotgit(): only verify the leading segmentLibravatar Johannes Schindelin1-1/+10
The config setting `core.protectNTFS` is specifically designed to work not only on Windows, but anywhere, to allow for repositories hosted on, say, Linux servers to be protected against NTFS-specific attack vectors. As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated paths (but does not do the same for paths separated by forward slashes), under the assumption that the backslash might not be a valid directory separator on the _current_ Operating System. However, the two callers, `verify_path()` and `fsck_tree()`, are supposed to feed only individual path segments to the `is_ntfs_dotgit()` function. This causes a lot of duplicate scanning (and very inefficient scanning, too, as the inner loop of `is_ntfs_dotgit()` was optimized for readability rather than for speed. Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of splitting the paths by backslashes as directory separators on the callers of said function. Consequently, the `verify_path()` function, which already splits the path by directory separators, now treats backslashes as directory separators _explicitly_ when `core.protectNTFS` is turned on, even on platforms where the backslash is _not_ a directory separator. Note that we have to repeat some code in `verify_path()`: if the backslash is not a directory separator on the current Operating System, we want to allow file names like `\`, but we _do_ want to disallow paths that are clearly intended to cause harm when the repository is cloned on Windows. The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now needs to look for backslashes in tree entries' names specifically when `core.protectNTFS` is turned on. While it would be tempting to completely disallow backslashes in that case (much like `fsck` reports names containing forward slashes as "full paths"), this would be overzealous: when `core.protectNTFS` is turned on in a non-Windows setup, backslashes are perfectly valid characters in file names while we _still_ want to disallow tree entries that are clearly designed to exploit NTFS-specific behavior. This simplification will make subsequent changes easier to implement, such as turning `core.protectNTFS` on by default (not only on Windows) or protecting against attack vectors involving NTFS Alternate Data Streams. Incidentally, this change allows for catching malicious repositories that contain tree entries of the form `dir\.gitmodules` already on the server side rather than only on the client side (and previously only on Windows): in contrast to `is_ntfs_dotgit()`, the `is_ntfs_dotgitmodules()` function already expects the caller to split the paths by directory separators. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-28fsck: accept an oid instead of a "struct tree" for fsck_tree()Libravatar Jeff King1-15/+15
We don't actually look at the tree struct in fsck_tree() beyond its oid and type (which is obviously OBJ_TREE). Just taking an oid gives our callers more flexibility to avoid creating a struct, and makes it clear that we are fscking just what is in the buffer, not any pre-parsed bits from the struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: accept an oid instead of a "struct commit" for fsck_commit()Libravatar Jeff King1-15/+14
We don't actually look at the commit struct in fsck_commit() beyond its oid and type (which is obviously OBJ_COMMIT). Just taking an oid gives our callers more flexibility to avoid creating or parsing a struct, and makes it clear that we are fscking just what is in the buffer, not any pre-parsed bits from the struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: accept an oid instead of a "struct tag" for fsck_tag()Libravatar Jeff King1-14/+13
We don't actually look at the tag struct in fsck_tag() beyond its oid and type (which is obviously OBJ_TAG). Just taking an oid gives our callers more flexibility to avoid creating or parsing a struct, and makes it clear that we are fscking just what is in the buffer, not any pre-parsed bits from the struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: rename vague "oid" local variablesLibravatar Jeff King1-4/+4
In fsck_commit() and fsck_tag(), we have local "oid" variables used for parsing parent and tagged-object oids. Let's give these more specific names in preparation for the functions taking an "oid" parameter for the object we're checking. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: don't require an object struct in verify_headers()Libravatar Jeff King1-5/+7
We only need the oid and type to pass on to report(). Let's accept the broken-out parameters to give our callers more flexibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: don't require an object struct for fsck_ident()Libravatar Jeff King1-14/+16
The only thing we do with the struct is pass its oid and type to report(). We can just take those explicitly, which gives our callers more flexibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: drop blob struct from fsck_finish()Libravatar Jeff King1-14/+4
Since fsck_blob() no longer requires us to have a "struct blob", we don't need to create one. Which also means we don't need to worry about handling the case that lookup_blob() returns NULL (we'll still catch wrongly-identified blobs when we read the actual object contents and type from disk). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: accept an oid instead of a "struct blob" for fsck_blob()Libravatar Jeff King1-13/+13
We don't actually need any information from the object struct except its oid (and the type, of course, but that's implicitly OBJ_BLOB). This gives our callers more flexibility to drop the object structs, too. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: don't require an object struct for report()Libravatar Jeff King1-59/+69
The report() function really only cares about the oid and type of the object, not the full object struct. Let's convert it to take those two items separately, which gives our callers more flexibility. This makes some already-long lines even longer. I've mostly left them, as our eventual goal is to shrink these down as we continue refactoring (e.g., "&item->object" becomes "&item->object.oid, item->object.type", but will eventually shrink down to "oid, type"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: only require an oid for skiplist functionsLibravatar Jeff King1-4/+5
The skiplist is inherently an oidset, so we don't need a full object struct. Let's take just the oid to give our callers more flexibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: only provide oid/type in fsck_error callbackLibravatar Jeff King1-4/+7
None of the callbacks actually care about having a "struct object"; they're happy with just the oid and type information. So let's give ourselves more flexibility to avoid having a "struct object" by just passing the broken-down fields. Note that the callback already takes a "type" field for the fsck message type. We'll rename that to "msg_type" (and use "object_type" for the object type) to make the distinction explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: use oids rather than objects for object_name APILibravatar Jeff King1-27/+34
We don't actually care about having object structs; we only need to look up decorations by oid. Let's accept this more limited form, which will give our callers more flexibility. Note that the decoration API we rely on uses object structs itself (even though it only looks at their oids). We can solve this by switching to a kh_oid_map (we could also use the hashmap oidmap, but it's more awkward for the simple case of just storing a void pointer). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck_describe_object(): build on our get_object_name() primitiveLibravatar Jeff King1-4/+1
This isolates the implementation detail of using the decoration code to our put/get functions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: unify object-name codeLibravatar Jeff King1-31/+51
Commit 90cf590f53 (fsck: optionally show more helpful info for broken links, 2016-07-17) added a system for decorating objects with names. The code is split across builtin/fsck.c (which gives the initial names) and fsck.c (which adds to the names as it traverses the object graph). This leads to some duplication, where both sites have near-identical describe_object() functions (the difference being that the one in builtin/fsck.c uses a circular array of buffers to allow multiple calls in a single printf). Let's provide a unified object_name API for fsck. That lets us drop the duplication, as well as making the interface boundaries more clear (which will let us refactor the implementation more in a future patch). We'll leave describe_object() in builtin/fsck.c as a thin wrapper around the new API, as it relies on a static global to make its many callers a bit shorter. We'll also convert the bare add_decoration() calls in builtin/fsck.c to put_object_name(). This fixes two minor bugs: 1. We leak many small strings. add_decoration() has a last-one-wins approach: it updates the decoration to the new string and returns the old one. But we ignore the return value, leaking the old string. This is quite common to trigger, since we look at reflogs: the tip of any ref will be described both by looking at the actual ref, as well as the latest reflog entry. So we'd always end up leaking one of those strings. 2. The last-one-wins approach gives us lousy names. For instance, we first look at all of the refs, and then all of the reflogs. So rather than seeing "refs/heads/master", we're likely to overwrite it with "HEAD@{12345678}". We're generally better off using the first name we find. And indeed, the test in t1450 expects this ugly HEAD@{} name. After this patch, we've switched to using fsck_put_object_name()'s first-one-wins semantics, and we output the more human-friendly "refs/tags/julius" (and the test is updated accordingly). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: require an actual buffer for non-blobsLibravatar Jeff King1-42/+9
The fsck_object() function takes in a buffer, but also a "struct object". The rules for using these vary between types: - for a commit, we'll use the provided buffer; if it's NULL, we'll fall back to get_commit_buffer(), which loads from either an in-memory cache or from disk. If the latter fails, we'd die(), which is non-ideal for fsck. - for a tag, a NULL buffer will fall back to loading the object from disk (and failure would lead to an fsck error) - for a tree, we _never_ look at the provided buffer, and always use tree->buffer - for a blob, we usually don't look at the buffer at all, unless it has been marked as a .gitmodule file. In that case we check the buffer given to us, or assume a NULL buffer is a very large blob (and complain about it) This is much more complex than it needs to be. It turns out that nobody ever feeds a NULL buffer that isn't a blob: - git-fsck calls fsck_object() only from fsck_obj(). That in turn is called by one of: - fsck_obj_buffer(), which is a callback to verify_pack(), which unpacks everything except large blobs into a buffer (see pack-check.c, lines 131-141). - fsck_loose(), which hits a BUG() on non-blobs with a NULL buffer (builtin/fsck.c, lines 639-640) And in either case, we'll have just called parse_object_buffer() anyway, which would segfault on a NULL buffer for commits or tags (not for trees, but it would install a NULL tree->buffer which would later cause a segfault) - git-index-pack asserts that the buffer is non-NULL unless the object is a blob (see builtin/index-pack.c, line 832) - git-unpack-objects always writes a non-NULL buffer into its obj_buffer hash, which is then fed to fsck_object(). (There is actually a funny thing here where it does not store blob buffers at all, nor does it call fsck on them; it does check any needed blobs via fsck_finish() though). Let's make the rules simpler, which reduces the amount of code and gives us more flexibility in refactoring the fsck code. The new rules are: - only blobs are allowed to pass a NULL buffer - we always use the provided buffer, never pulling information from the object struct We don't have to adjust any callers, because they were already adhering to these. Note that we do drop a few fsck identifiers for missing tags, but that was all dead code (because nobody passed a NULL tag buffer). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: stop checking tag->taggedLibravatar Jeff King1-13/+2
Way back in 92d4c85d24 (fsck-cache: fix SIGSEGV on bad tag object, 2005-05-03), we added an fsck check that the "tagged" field of a tag struct isn't NULL. But that was mainly protecting the printing code for "--tags", and that code wasn't moved along with the check as part of ba002f3b28 (builtin-fsck: move common object checking code to fsck.c, 2008-02-25). It could also serve to detect type mismatch problems (where a tag points to object X as a commit, but really X is a blob), but it couldn't do so reliably (we'd call lookup_commit(X), but it will only notice the problem if we happen to have previously called lookup_blob(X) in the same process). And as of a commit earlier in this series, we'd consider that a parse error and complain about the object even before getting to this point anyway. So let's drop this "tag->tagged" check. It's not helping anything, and getting rid of it makes the function conceptually cleaner, as it really is just checking the buffer we feed it. In fact, we can get rid of our one-line wrapper and just unify fsck_tag() and fsck_tag_buffer(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: stop checking commit->parent countsLibravatar Jeff King1-22/+1
In 4516338243 (builtin-fsck: reports missing parent commits, 2008-02-25), we added code to check that fsck found the same number of parents from parsing the commit itself as we see in the commit struct we got from parse_commit_buffer(). Back then the rationale was that the normal commit parser might skip some bad parents. But earlier in this series, we started treating that reliably as a parsing error, meaning that we'd complain about it before we even hit the code in fsck.c. Let's drop this code, which now makes fsck_commit_buffer() completely independent of any parsed values in the commit struct (that's conceptually cleaner, and also opens up more refactoring options). Note that we can also drop the MISSING_PARENT and MISSING_GRAFT fsck identifiers. This is no loss, as these would not trigger reliably anyway. We'd hit them only when lookup_commit() failed, which occurs only if we happen to have seen the object with another type already in the same process. In most cases, we'd actually run into the problem during the connectivity walk, not here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28fsck: stop checking commit->tree valueLibravatar Jeff King1-5/+0
We check in fsck_commit_buffer() that commit->tree isn't NULL, which in turn generally comes from a previous parse by parse_commit(). But this isn't really accomplishing anything. The two things we might care about are: - was there a syntactically valid "tree <oid>" line in the object? But we've just done our own parse in fsck_commit_buffer() to check this. - does it point to a valid tree object? But checking the "tree" pointer here doesn't actually accomplish that; it just shows that lookup_tree() didn't return NULL, which only means that we haven't yet seen that oid as a non-tree in this process. A real connectivity check would exhaustively walk all graph links, and we do that already in a separate function. So this code isn't helping anything. And it makes the fsck code slightly more confusing and rigid (e.g., it requires that any commit structs have already been parsed). Let's drop it. As a bit of history, the presence of this code looks like a leftover from early fsck code (which did rely on parse_commit() to do most of the parsing). The check comes from ff5ebe39b0 (Port fsck-cache to use parsing functions, 2005-04-18), but we later added an explicit walk in 355885d531 (add generic, type aware object chain walker, 2008-02-25). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-19Merge branch 'br/blame-ignore'Libravatar Junio C Hamano1-36/+1
"git blame" learned to "ignore" commits in the history, whose effects (as well as their presence) get ignored. * br/blame-ignore: t8014: remove unnecessary braces blame: drop some unused function parameters blame: add a test to cover blame_coalesce() blame: use the fingerprint heuristic to match ignored lines blame: add a fingerprint heuristic to match ignored lines blame: optionally track line fingerprints during fill_blame_origin() blame: add config options for the output of ignored or unblamable lines blame: add the ability to ignore commits and their changes blame: use a helper function in blame_chunk() Move oidset_parse_file() to oidset.c fsck: rename and touch up init_skiplist()
2019-06-20object: convert lookup_unknown_object() to use object_idLibravatar Jeff King1-1/+1
There are no callers left of lookup_unknown_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-16Move oidset_parse_file() to oidset.cLibravatar Barret Rhoden1-35/+0
Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-16fsck: rename and touch up init_skiplist()Libravatar Barret Rhoden1-9/+9
init_skiplist() took a file consisting of SHA-1s and comments and added the objects to an oidset. This functionality is useful for other commands and will be moved to oidset.c in a future commit. In preparation for that move, this commit renames it to oidset_parse_file() to reflect its more generic usage and cleans up a few of the names. Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08Use 'unsigned short' for mode, like diff_filespec doesLibravatar Elijah Newren1-1/+1
struct diff_filespec defines mode to be an 'unsigned short'. Several other places in the API which we'd like to interact with using a diff_filespec used a plain unsigned (or unsigned int). This caused problems when taking addresses, so switch to unsigned short. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15tree-walk: store object_id in a separate memberLibravatar brian m. carlson1-2/+2
When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30Merge branch 'jc/cocci-preincr'Libravatar Junio C Hamano1-1/+1
Code cleanup. * jc/cocci-preincr: fsck: s/++i > 1/i++/ cocci: simplify "if (++u > 1)" to "if (u++)"
2018-10-24fsck: s/++i > 1/i++/Libravatar Junio C Hamano1-1/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-10Merge branch 'ab/fsck-skiplist'Libravatar Junio C Hamano1-33/+27
Update fsck.skipList implementation and documentation. * ab/fsck-skiplist: fsck: support comments & empty lines in skipList fsck: use oidset instead of oid_array for skipList fsck: use strbuf_getline() to read skiplist file fsck: add a performance test for skipList fsck: add a performance test fsck: document that skipList input must be unabbreviated fsck: document and test commented & empty line skipList input fsck: document and test sorted skipList input fsck tests: add a test for no skipList input fsck tests: setup of bogus commit object
2018-09-27Sync with 2.18.1Libravatar Junio C Hamano1-0/+14
* maint-2.18: Git 2.18.1 Git 2.17.2 fsck: detect submodule paths starting with dash fsck: detect submodule urls starting with dash Git 2.16.5 Git 2.15.3 Git 2.14.5 submodule-config: ban submodule paths that start with a dash submodule-config: ban submodule urls that start with dash submodule--helper: use "--" to signal end of clone options
2018-09-27Sync with 2.17.2Libravatar Junio C Hamano1-0/+14
* maint-2.17: Git 2.17.2 fsck: detect submodule paths starting with dash fsck: detect submodule urls starting with dash Git 2.16.5 Git 2.15.3 Git 2.14.5 submodule-config: ban submodule paths that start with a dash submodule-config: ban submodule urls that start with dash submodule--helper: use "--" to signal end of clone options
2018-09-27fsck: detect submodule paths starting with dashLibravatar Jeff King1-0/+7
As with urls, submodule paths with dashes are ignored by git, but may end up confusing older versions. Detecting them via fsck lets us prevent modern versions of git from being a vector to spread broken .gitmodules to older versions. Compared to blocking leading-dash urls, though, this detection may be less of a good idea: 1. While such paths provide confusing and broken results, they don't seem to actually work as option injections against anything except "cd". In particular, the submodule code seems to canonicalize to an absolute path before running "git clone" (so it passes /your/clone/-sub). 2. It's more likely that we may one day make such names actually work correctly. Even after we revert this fsck check, it will continue to be a hassle until hosting servers are all updated. On the other hand, it's not entirely clear that the behavior in older versions is safe. And if we do want to eventually allow this, we may end up doing so with a special syntax anyway (e.g., writing "./-sub" in the .gitmodules file, and teaching the submodule code to canonicalize it when comparing). So on balance, this is probably a good protection. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-27fsck: detect submodule urls starting with dashLibravatar Jeff King1-0/+7
Urls with leading dashes can cause mischief on older versions of Git. We should detect them so that they can be rejected by receive.fsckObjects, preventing modern versions of git from being a vector by which attacks can spread. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12fsck: support comments & empty lines in skipListLibravatar Ævar Arnfjörð Bjarmason1-0/+14
It's annoying not to be able to put comments and empty lines in the skipList, when e.g. keeping a big central list of commits to skip in /etc/gitconfig, which was my motivation for 1362df0d41 ("fetch: implement fetch.fsck.*", 2018-07-27). Implement that, and document what version of Git this was changed in, since this on-disk format can be expected to be used by multiple versions of git. There is no notable performance impact from this change, using the test setup described a couple of commits back: Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.69(7.27+0.42) 7.86(7.48+0.37) +2.2% 1450.5: fsck with 1 skipped bad commits 7.69(7.30+0.38) 7.83(7.47+0.36) +1.8% 1450.7: fsck with 10 skipped bad commits 7.76(7.38+0.38) 7.79(7.38+0.41) +0.4% 1450.9: fsck with 100 skipped bad commits 7.76(7.38+0.38) 7.74(7.36+0.38) -0.3% 1450.11: fsck with 1000 skipped bad commits 7.71(7.30+0.41) 7.72(7.34+0.38) +0.1% 1450.13: fsck with 10000 skipped bad commits 7.74(7.34+0.40) 7.72(7.34+0.38) -0.3% 1450.15: fsck with 100000 skipped bad commits 7.75(7.40+0.35) 7.70(7.29+0.40) -0.6% 1450.17: fsck with 1000000 skipped bad commits 7.12(6.86+0.26) 7.13(6.87+0.26) +0.1% Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12fsck: use oidset instead of oid_array for skipListLibravatar René Scharfe1-21/+2
Change the implementation of the skipList feature to use oidset instead of oid_array to store SHA-1s for later lookup. This list is parsed once on startup by fsck, fetch-pack or receive-pack depending on the *.skipList config in use. I.e. only once per invocation, but note that for "clone --recurse-submodules" each submodule will re-parse the list, in addition to the main project, and it will be re-parsed when checking .gitmodules blobs, see fb16287719 ("fsck: check skiplist for object in fsck_blob()", 2018-06-27). Memory usage is a bit higher, but we don't need to keep track of the sort order anymore. Embed the oidset into struct fsck_options to make its ownership clear (no hidden sharing) and avoid unnecessary pointer indirection. The cumulative impact on performance of this & the preceding change, using the test setup described in the previous commit: Test HEAD~2 HEAD~ HEAD ---------------------------------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.70(7.31+0.38) 7.72(7.33+0.38) +0.3% 7.70(7.30+0.40) +0.0% 1450.5: fsck with 1 skipped bad commits 7.84(7.47+0.37) 7.69(7.32+0.36) -1.9% 7.71(7.29+0.41) -1.7% 1450.7: fsck with 10 skipped bad commits 7.81(7.40+0.40) 7.94(7.57+0.36) +1.7% 7.92(7.55+0.37) +1.4% 1450.9: fsck with 100 skipped bad commits 7.81(7.42+0.38) 7.95(7.53+0.41) +1.8% 7.83(7.42+0.41) +0.3% 1450.11: fsck with 1000 skipped bad commits 7.99(7.62+0.36) 7.90(7.50+0.40) -1.1% 7.86(7.49+0.37) -1.6% 1450.13: fsck with 10000 skipped bad commits 7.98(7.57+0.40) 7.94(7.53+0.40) -0.5% 7.90(7.45+0.44) -1.0% 1450.15: fsck with 100000 skipped bad commits 7.97(7.57+0.39) 8.03(7.67+0.36) +0.8% 7.84(7.43+0.41) -1.6% 1450.17: fsck with 1000000 skipped bad commits 7.72(7.22+0.50) 7.28(7.07+0.20) -5.7% 7.13(6.87+0.25) -7.6% Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12fsck: use strbuf_getline() to read skiplist fileLibravatar René Scharfe1-13/+12
The buffer is unlikely to contain a NUL character, so printing its contents using %s in a die() format is unsafe (detected with ASan). Use an idiomatic strbuf_getline() loop instead, which ensures the buffer is always NUL-terminated, supports CRLF files as well, accepts files without a newline after the last line, supports any hash length automatically, and is shorter. This fixes a bug where emitting an error about an invalid line on say line 1 would continue printing subsequent lines, and usually continue into uninitialized memory. The performance impact of this, on a CentOS 7 box with RedHat GCC 4.8.5-28: $ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1451-fsck-skip-list.sh Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.75(7.39+0.35) 7.68(7.29+0.39) -0.9% 1450.5: fsck with 1 skipped bad commits 7.70(7.30+0.40) 7.80(7.42+0.37) +1.3% 1450.7: fsck with 10 skipped bad commits 7.77(7.37+0.40) 7.87(7.47+0.40) +1.3% 1450.9: fsck with 100 skipped bad commits 7.82(7.41+0.40) 7.88(7.43+0.44) +0.8% 1450.11: fsck with 1000 skipped bad commits 7.88(7.49+0.39) 7.84(7.43+0.40) -0.5% 1450.13: fsck with 10000 skipped bad commits 8.02(7.63+0.39) 8.07(7.67+0.39) +0.6% 1450.15: fsck with 100000 skipped bad commits 8.01(7.60+0.41) 8.08(7.70+0.38) +0.9% 1450.17: fsck with 1000000 skipped bad commits 7.60(7.10+0.50) 7.37(7.18+0.19) -3.0% Helped-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02Merge branch 'sb/object-store-lookup'Libravatar Junio C Hamano1-4/+5
lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...
2018-08-02Merge branch 'jk/fsck-gitmodules-gently'Libravatar Junio C Hamano1-3/+6
Recent "security fix" to pay attention to contents of ".gitmodules" while accepting "git push" was a bit overly strict than necessary, which has been adjusted. * jk/fsck-gitmodules-gently: fsck: downgrade gitmodulesParse default to "info" fsck: split ".gitmodules too large" error from parse failure fsck: silence stderr when parsing .gitmodules config: add options parameter to git_config_from_mem config: add CONFIG_ERROR_SILENT handler config: turn die_on_error into caller-facing enum
2018-07-24Merge branch 'rj/submodule-fsck-skip'Libravatar Junio C Hamano1-2/+11
"fsck.skipList" did not prevent a blob object listed there from being inspected for is contents (e.g. we recently started to inspect the contents of ".gitmodules" for certain malicious patterns), which has been corrected. * rj/submodule-fsck-skip: fsck: check skiplist for object in fsck_blob()
2018-07-18Merge branch 'sb/object-store-grafts'Libravatar Junio C Hamano1-1/+2
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-07-16fsck: downgrade gitmodulesParse default to "info"Libravatar Jeff King1-1/+1
We added an fsck check in ed8b10f631 (fsck: check .gitmodules content, 2018-05-02) as a defense against the vulnerability from 0383bbb901 (submodule-config: verify submodule names as paths, 2018-04-30). With the idea that up-to-date hosting sites could protect downstream unpatched clients that fetch from them. As part of that defense, we reject any ".gitmodules" entry that is not syntactically valid. The theory is that if we cannot even parse the file, we cannot accurately check it for vulnerabilities. And anybody with a broken .gitmodules file would eventually want to know anyway. But there are a few reasons this is a bad tradeoff in practice: - for this particular vulnerability, the client has to be able to parse the file. So you cannot sneak an attack through using a broken file, assuming the config parsers for the process running fsck and the eventual victim are functionally equivalent. - a broken .gitmodules file is not necessarily a problem. Our fsck check detects .gitmodules in _any_ tree, not just at the root. And the presence of a .gitmodules file does not necessarily mean it will be used; you'd have to also have gitlinks in the tree. The cgit repository, for example, has a file named .gitmodules from a pre-submodule attempt at sharing code, but does not actually have any gitlinks. - when the fsck check is used to reject a push, it's often hard to work around. The pusher may not have full control over the destination repository (e.g., if it's on a hosting server, they may need to contact the hosting site's support). And the broken .gitmodules may be too far back in history for rewriting to be feasible (again, this is an issue for cgit). So we're being unnecessarily restrictive without actually improving the security in a meaningful way. It would be more convenient to downgrade this check to "info", which means we'd still comment on it, but not reject a push. Site admins can already do this via config, but we should ship sensible defaults. There are a few counterpoints to consider in favor of keeping the check as an error: - the first point above assumes that the config parsers for the victim and the fsck process are equivalent. This is pretty true now, but as time goes on will become less so. Hosting sites are likely to upgrade their version of Git, whereas vulnerable clients will be stagnant (if they did upgrade, they'd cease to be vulnerable!). So in theory we may see drift over time between what two config parsers will accept. In practice, this is probably OK. The config format is pretty established at this point and shouldn't change a lot. And the farther we get from the announcement of the vulnerability, the less interesting this extra layer of protection becomes. I.e., it was _most_ valuable on day 0, when everybody's client was still vulnerable and hosting sites could protect people. But as time goes on and people upgrade, the population of vulnerable clients becomes smaller and smaller. - In theory this could protect us from other vulnerabilities in the future. E.g., .gitmodules are the only way for a malicious repository to feed data to the config parser, so this check could similarly protect clients from a future (to-be-found) bug there. But that's trading a hypothetical case for real-world pain today. If we do find such a bug, the hosting site would need to be updated to fix it, too. At which point we could figure out whether it's possible to detect _just_ the malicious case without hurting existing broken-but-not-evil cases. - Until recently, we hadn't made any restrictions on .gitmodules content. So now in tightening that we're hitting cases where certain things used to work, but don't anymore. There's some moderate pain now. But as time goes on, we'll see more (and more varied) cases that will make tightening harder in the future. So there's some argument for putting rules in place _now_, before users grow more cases that violate them. Again, this is trading pain now for hypothetical benefit in the future. And if we try hard in the future to keep our tightening to a minimum (i.e., rejecting true maliciousness without hurting broken-but-not-evil repos), then that reduces even the hypothetical benefit. Considering both sets of arguments, it makes sense to loosen this check for now. Note that we have to tweak the test in t7415 since fsck will no longer consider this a fatal error. But we still check that it reports the warning, and that we don't get the spurious error from the config code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>