summaryrefslogtreecommitdiff
path: root/t/t7415-submodule-names.sh
AgeCommit message (Collapse)AuthorFilesLines
2020-01-10mingw: safeguard better against backslashes in file namesLibravatar Johannes Schindelin via GitGitGadget1-1/+1
In 224c7d70fa1 (mingw: only test index entries for backslashes, not tree entries, 2019-12-31), we relaxed the check for backslashes in tree entries to check only index entries. However, the code change was incorrect: it was added to `add_index_entry_with_check()`, not to `add_index_entry()`, so under certain circumstances it was possible to side-step the protection. Besides, the description of that commit purported that all index entries would be checked when in fact they were only checked when being added to the index (there are code paths that do not do that, constructing "transient" index entries). In any case, it was pointed out in one insightful review at https://github.com/git-for-windows/git/pull/2437#issuecomment-566771835 that it would be a much better idea to teach `verify_path()` to perform the check for a backslash. This is safer, even if it comes with two notable drawbacks: - `verify_path()` cannot say _what_ is wrong with the path, therefore the user will no longer be told that there was a backslash in the path, only that the path was invalid. - The `git apply` command also calls the `verify_path()` function, and might have been able to handle Windows-style paths (i.e. with backslashes instead of forward slashes). This will no longer be possible unless the user (temporarily) sets `core.protectNTFS=false`. Note that `git add <windows-path>` will _still_ work because `normalize_path_copy_len()` will convert the backslashes to forward slashes before hitting the code path that creates an index entry. The clear advantage is that `verify_path()`'s purpose is to check the validity of the file name, therefore we naturally tap into all the code paths that need safeguarding, also implicitly into future code paths. The benefits of that approach outweigh the downsides, so let's move the check from `add_index_entry_with_check()` to `verify_path()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-02mingw: only test index entries for backslashes, not tree entriesLibravatar Johannes Schindelin1-3/+4
During a clone of a repository that contained a file with a backslash in its name in the past, as of v2.24.1(2), Git for Windows prints errors like this: error: filename in tree entry contains backslash: '\' The idea is to prevent Git from even trying to write files with backslashes in their file names: while these characters are valid in file names on other platforms, on Windows it is interpreted as directory separator (which would obviously lead to ambiguities, e.g. when there is a file `a\b` and there is also a file `a/b`). Arguably, this is the wrong layer for that error: As long as the user never checks out the files whose names contain backslashes, there should not be any problem in the first place. So let's loosen the requirements: we now leave tree entries with backslashes in their file names alone, but we do require any entries that are added to the Git index to contain no backslashes on Windows. Note: just as before, the check is guarded by `core.protectNTFS` (to allow overriding the check by toggling that config setting), and it is _only_ performed on Windows, as the backslash is not a directory separator elsewhere, even when writing to NTFS-formatted volumes. An alternative approach would be to try to prevent creating files with backslashes in their file names. However, that comes with its own set of problems. For example, `git config -f C:\ProgramData\Git\config ...` is a very valid way to specify a custom config location, and we obviously do _not_ want to prevent that. Therefore, the approach chosen in this patch would appear to be better. This addresses https://github.com/git-for-windows/git/issues/2435 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06t7415: drop v2.20.x-specific work-aroundLibravatar Johannes Schindelin1-1/+1
This reverts the work-around that was introduced just for the v2.20.x release train in "t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x"; It is not necessary for v2.21.x. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-06Sync with 2.20.2Libravatar Johannes Schindelin1-0/+56
* maint-2.20: (36 commits) Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories ...
2019-12-06t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.xLibravatar Johannes Schindelin1-1/+1
In v2.20.x, Git clones submodules recursively by first creating the submodules' gitdirs and _then_ "updating" the submodules. This can lead to the situation where the clone path is taken because the directory (while it exists already) is not a git directory, but then the clone fails because that gitdir is unexpectedly already a directory. This _also_ works around the vulnerability that was fixed in "Disallow dubiously-nested submodule git directories", but it produces a different error message than the one expected by the test case, therefore we adjust the test case accordingly. Note: as the two submodules "race each other", there are actually two possible error messages, therefore we have to teach the test case to expect _two_ possible (and good) outcomes in addition to the one it expected before. Note: this workaround is only necessary for the v2.20.x release train; The behavior changed again in v2.21.x so that the original test case's expectations are met again. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-06Sync with 2.18.2Libravatar Johannes Schindelin1-0/+56
* maint-2.18: (33 commits) Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up ...
2019-12-06Sync with 2.17.3Libravatar Johannes Schindelin1-0/+56
* maint-2.17: (32 commits) Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names ...
2019-12-06Sync with 2.16.6Libravatar Johannes Schindelin1-0/+56
* maint-2.16: (31 commits) Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names path: safeguard `.git` against NTFS Alternate Streams Accesses ...
2019-12-05Merge branch 'win32-filenames-cannot-have-trailing-spaces-or-periods'Libravatar Johannes Schindelin1-1/+1
On Windows, filenames cannot have trailing spaces or periods, when opening such paths, they are stripped automatically. Read: you can open the file `README` via the file name `README . . .`. This ambiguity can be used in combination with other security bugs to cause e.g. remote code execution during recursive clones. This patch series fixes that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05mingw: refuse to access paths with trailing spaces or periodsLibravatar Johannes Schindelin1-1/+1
When creating a directory on Windows whose path ends in a space or a period (or chains thereof), the Win32 API "helpfully" trims those. For example, `mkdir("abc ");` will return success, but actually create a directory called `abc` instead. This stems back to the DOS days, when all file names had exactly 8 characters plus exactly 3 characters for the file extension, and the only way to have shorter names was by padding with spaces. Sadly, this "helpful" behavior is a bit inconsistent: after a successful `mkdir("abc ");`, a `mkdir("abc /def")` will actually _fail_ (because the directory `abc ` does not actually exist). Even if it would work, we now have a serious problem because a Git repository could contain directories `abc` and `abc `, and on Windows, they would be "merged" unintentionally. As these paths are illegal on Windows, anyway, let's disallow any accesses to such paths on that Operating System. For practical reasons, this behavior is still guarded by the config setting `core.protectNTFS`: it is possible (and at least two regression tests make use of it) to create commits without involving the worktree. In such a scenario, it is of course possible -- even on Windows -- to create such file names. Among other consequences, this patch disallows submodules' paths to end in spaces on Windows (which would formerly have confused Git enough to try to write into incorrect paths, anyway). While this patch does not fix a vulnerability on its own, it prevents an attack vector that was exploited in demonstrations of a number of recently-fixed security bugs. The regression test added to `t/t7417-submodule-path-url.sh` reflects that attack vector. Note that we have to adjust the test case "prevent git~1 squatting on Windows" in `t/t7415-submodule-names.sh` because of a very subtle issue. It tries to clone two submodules whose names differ only in a trailing period character, and as a consequence their git directories differ in the same way. Previously, when Git tried to clone the second submodule, it thought that the git directory already existed (because on Windows, when you create a directory with the name `b.` it actually creates `b`), but with this patch, the first submodule's clone will fail because of the illegal name of the git directory. Therefore, when cloning the second submodule, Git will take a different code path: a fresh clone (without an existing git directory). Both code paths fail to clone the second submodule, both because the the corresponding worktree directory exists and is not empty, but the error messages are worded differently. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05Disallow dubiously-nested submodule git directoriesLibravatar Johannes Schindelin1-0/+23
Currently it is technically possible to let a submodule's git directory point right into the git dir of a sibling submodule. Example: the git directories of two submodules with the names `hippo` and `hippo/hooks` would be `.git/modules/hippo/` and `.git/modules/hippo/hooks/`, respectively, but the latter is already intended to house the former's hooks. In most cases, this is just confusing, but there is also a (quite contrived) attack vector where Git can be fooled into mistaking remote content for file contents it wrote itself during a recursive clone. Let's plug this bug. To do so, we introduce the new function `validate_submodule_git_dir()` which simply verifies that no git dir exists for any leading directories of the submodule name (if there are any). Note: this patch specifically continues to allow sibling modules names of the form `core/lib`, `core/doc`, etc, as long as `core` is not a submodule name. This fixes CVE-2019-1387. Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-04mingw: disallow backslash characters in tree objects' file namesLibravatar Johannes Schindelin1-3/+5
The backslash character is not a valid part of a file name on Windows. Hence it is dangerous to allow writing files that were unpacked from tree objects, when the stored file name contains a backslash character: it will be misinterpreted as directory separator. This not only causes ambiguity when a tree contains a blob `a\b` and a tree `a` that contains a blob `b`, but it also can be used as part of an attack vector to side-step the careful protections against writing into the `.git/` directory during a clone of a maliciously-crafted repository. Let's prevent that, addressing CVE-2019-1354. Note: we guard against backslash characters in tree objects' file names _only_ on Windows (because on other platforms, even on those where NTFS volumes can be mounted, the backslash character is _not_ a directory separator), and _only_ when `core.protectNTFS = true` (because users might need to generate tree objects for other platforms, of course without touching the worktree, e.g. using `git update-index --cacheinfo`). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-04clone --recurse-submodules: prevent name squatting on WindowsLibravatar Johannes Schindelin1-0/+31
In addition to preventing `.git` from being tracked by Git, on Windows we also have to prevent `git~1` from being tracked, as the default NTFS short name (also known as the "8.3 filename") for the file name `.git` is `git~1`, otherwise it would be possible for malicious repositories to write directly into the `.git/` directory, e.g. a `post-checkout` hook that would then be executed _during_ a recursive clone. When we implemented appropriate protections in 2b4c6efc821 (read-cache: optionally disallow NTFS .git variants, 2014-12-16), we had analyzed carefully that the `.git` directory or file would be guaranteed to be the first directory entry to be written. Otherwise it would be possible e.g. for a file named `..git` to be assigned the short name `git~1` and subsequently, the short name generated for `.git` would be `git~2`. Or `git~3`. Or even `~9999999` (for a detailed explanation of the lengths we have to go to protect `.gitmodules`, see the commit message of e7cb0b4455c (is_ntfs_dotgit: match other .git files, 2018-05-11)). However, by exploiting two issues (that will be addressed in a related patch series close by), it is currently possible to clone a submodule into a non-empty directory: - On Windows, file names cannot end in a space or a period (for historical reasons: the period separating the base name from the file extension was not actually written to disk, and the base name/file extension was space-padded to the full 8/3 characters, respectively). Helpfully, when creating a directory under the name, say, `sub.`, that trailing period is trimmed automatically and the actual name on disk is `sub`. This means that while Git thinks that the submodule names `sub` and `sub.` are different, they both access `.git/modules/sub/`. - While the backslash character is a valid file name character on Linux, it is not so on Windows. As Git tries to be cross-platform, it therefore allows backslash characters in the file names stored in tree objects. Which means that it is totally possible that a submodule `c` sits next to a file `c\..git`, and on Windows, during recursive clone a file called `..git` will be written into `c/`, of course _before_ the submodule is cloned. Note that the actual exploit is not quite as simple as having a submodule `c` next to a file `c\..git`, as we have to make sure that the directory `.git/modules/b` already exists when the submodule is checked out, otherwise a different code path is taken in `module_clone()` that does _not_ allow a non-empty submodule directory to exist already. Even if we will address both issues nearby (the next commit will disallow backslash characters in tree entries' file names on Windows, and another patch will disallow creating directories/files with trailing spaces or periods), it is a wise idea to defend in depth against this sort of attack vector: when submodules are cloned recursively, we now _require_ the directory to be empty, addressing CVE-2019-1349. Note: the code path we patch is shared with the code path of `git submodule update --init`, which must not expect, in general, that the directory is empty. Hence we have to introduce the new option `--force-init` and hand it all the way down from `git submodule` to the actual `git submodule--helper` process that performs the initial clone. Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2018-11-12fsck: mark strings for translationLibravatar Nguyễn Thái Ngọc Duy1-3/+3
Two die() are updated to start with lowercase to be consistent with the rest. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-16fsck: downgrade gitmodulesParse default to "info"Libravatar Jeff King1-1/+1
We added an fsck check in ed8b10f631 (fsck: check .gitmodules content, 2018-05-02) as a defense against the vulnerability from 0383bbb901 (submodule-config: verify submodule names as paths, 2018-04-30). With the idea that up-to-date hosting sites could protect downstream unpatched clients that fetch from them. As part of that defense, we reject any ".gitmodules" entry that is not syntactically valid. The theory is that if we cannot even parse the file, we cannot accurately check it for vulnerabilities. And anybody with a broken .gitmodules file would eventually want to know anyway. But there are a few reasons this is a bad tradeoff in practice: - for this particular vulnerability, the client has to be able to parse the file. So you cannot sneak an attack through using a broken file, assuming the config parsers for the process running fsck and the eventual victim are functionally equivalent. - a broken .gitmodules file is not necessarily a problem. Our fsck check detects .gitmodules in _any_ tree, not just at the root. And the presence of a .gitmodules file does not necessarily mean it will be used; you'd have to also have gitlinks in the tree. The cgit repository, for example, has a file named .gitmodules from a pre-submodule attempt at sharing code, but does not actually have any gitlinks. - when the fsck check is used to reject a push, it's often hard to work around. The pusher may not have full control over the destination repository (e.g., if it's on a hosting server, they may need to contact the hosting site's support). And the broken .gitmodules may be too far back in history for rewriting to be feasible (again, this is an issue for cgit). So we're being unnecessarily restrictive without actually improving the security in a meaningful way. It would be more convenient to downgrade this check to "info", which means we'd still comment on it, but not reject a push. Site admins can already do this via config, but we should ship sensible defaults. There are a few counterpoints to consider in favor of keeping the check as an error: - the first point above assumes that the config parsers for the victim and the fsck process are equivalent. This is pretty true now, but as time goes on will become less so. Hosting sites are likely to upgrade their version of Git, whereas vulnerable clients will be stagnant (if they did upgrade, they'd cease to be vulnerable!). So in theory we may see drift over time between what two config parsers will accept. In practice, this is probably OK. The config format is pretty established at this point and shouldn't change a lot. And the farther we get from the announcement of the vulnerability, the less interesting this extra layer of protection becomes. I.e., it was _most_ valuable on day 0, when everybody's client was still vulnerable and hosting sites could protect people. But as time goes on and people upgrade, the population of vulnerable clients becomes smaller and smaller. - In theory this could protect us from other vulnerabilities in the future. E.g., .gitmodules are the only way for a malicious repository to feed data to the config parser, so this check could similarly protect clients from a future (to-be-found) bug there. But that's trading a hypothetical case for real-world pain today. If we do find such a bug, the hosting site would need to be updated to fix it, too. At which point we could figure out whether it's possible to detect _just_ the malicious case without hurting existing broken-but-not-evil cases. - Until recently, we hadn't made any restrictions on .gitmodules content. So now in tightening that we're hitting cases where certain things used to work, but don't anymore. There's some moderate pain now. But as time goes on, we'll see more (and more varied) cases that will make tightening harder in the future. So there's some argument for putting rules in place _now_, before users grow more cases that violate them. Again, this is trading pain now for hypothetical benefit in the future. And if we try hard in the future to keep our tightening to a minimum (i.e., rejecting true maliciousness without hurting broken-but-not-evil repos), then that reduces even the hypothetical benefit. Considering both sets of arguments, it makes sense to loosen this check for now. Note that we have to tweak the test in t7415 since fsck will no longer consider this a fatal error. But we still check that it reports the warning, and that we don't get the spurious error from the config code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03fsck: silence stderr when parsing .gitmodulesLibravatar Jeff King1-0/+15
If there's a parsing error we'll already report it via the usual fsck report() function (or not, if the user has asked to skip this object or warning type). The error message from the config parser just adds confusion. Let's suppress it. Note that we didn't test this case at all, so I've added coverage in t7415. We may end up toning down or removing this fsck check in the future. So take this test as checking what happens now with a focus on stderr, and not any ironclad guarantee that we must detect and report parse failures in the future. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-13Merge branch 'jk/index-pack-maint'Libravatar Junio C Hamano1-0/+10
"index-pack --strict" has been taught to make sure that it runs the final object integrity checks after making the freshly indexed packfile available to itself. * jk/index-pack-maint: index-pack: correct install_packed_git() args index-pack: handle --strict checks of non-repo packs prepare_commit_graft: treat non-repository as a noop
2018-06-11fsck: avoid looking at NULL blob->objectLibravatar Jeff King1-0/+18
Commit 159e7b080b (fsck: detect gitmodules files, 2018-05-02) taught fsck to look at the content of .gitmodules files. If the object turns out not to be a blob at all, we just complain and punt on checking the content. And since this was such an obvious and trivial code path, I didn't even bother to add a test. Except it _does_ do one non-trivial thing, which is call the report() function, which wants us to pass a pointer to a "struct object". Which we don't have (we have only a "struct object_id"). So we erroneously pass a NULL object to report(), which gets dereferenced and causes a segfault. It seems like we could refactor report() to just take the object_id itself. But we pass the object pointer along to a callback function, and indeed this ends up in builtin/fsck.c's objreport() which does want to look at other parts of the object (like the type). So instead, let's just use lookup_unknown_object() to get the real "struct object", and pass that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-11t7415: don't bother creating commit for symlink testLibravatar Jeff King1-7/+4
Early versions of the fsck .gitmodules detection code actually required a tree to be at the root of a commit for it to be checked for .gitmodules. What we ended up with in 159e7b080b (fsck: detect gitmodules files, 2018-05-02), though, finds a .gitmodules file in _any_ tree (see that commit for more discussion). As a result, there's no need to create a commit in our tests. Let's drop it in the name of simplicity. And since that was the only thing referencing $tree, we can pull our tree creation out of a command substitution. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-01index-pack: handle --strict checks of non-repo packsLibravatar Jeff King1-0/+10
Commit 73c3f0f704 (index-pack: check .gitmodules files with --strict, 2018-05-04) added a call to add_packed_git(), with the intent that the newly-indexed objects would be available to the process when we run fsck_finish(). But that's not what add_packed_git() does. It only allocates the struct, and you must install_packed_git() on the result. So that call was effectively doing nothing (except leaking a struct). But wait, we passed all of the tests! Does that mean we don't need the call at all? For normal cases, no. When we run "index-pack --stdin" inside a repository, we write the new pack into the object directory. If fsck_finish() needs to access one of the new objects, then our initial lookup will fail to find it, but we'll follow up by running reprepare_packed_git() and looking again. That logic was meant to handle somebody else repacking simultaneously, but it ends up working for us here. But there is a case that does need this, that we were not testing. You can run "git index-pack foo.pack" on any file, even when it is not inside the object directory. Or you may not even be in a repository at all! This case fails without doing the proper install_packed_git() call. We can make this work by adding the install call. Note that we should be prepared to handle add_packed_git() failing. We can just silently ignore this case, though. If fsck_finish() later needs the objects and they're not available, it will complain itself. And if it doesn't (because we were able to resolve the whole fsck in the first pass), then it actually isn't an interesting error at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21fsck: complain when .gitmodules is a symlinkLibravatar Jeff King1-0/+29
We've recently forbidden .gitmodules to be a symlink in verify_path(). And it's an easy way to circumvent our fsck checks for .gitmodules content. So let's complain when we see it. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21index-pack: check .gitmodules files with --strictLibravatar Jeff King1-0/+38
Now that the internal fsck code has all of the plumbing we need, we can start checking incoming .gitmodules files. Naively, it seems like we would just need to add a call to fsck_finish() after we've processed all of the objects. And that would be enough to cover the initial test included here. But there are two extra bits: 1. We currently don't bother calling fsck_object() at all for blobs, since it has traditionally been a noop. We'd actually catch these blobs in fsck_finish() at the end, but it's more efficient to check them when we already have the object loaded in memory. 2. The second pass done by fsck_finish() needs to access the objects, but we're actually indexing the pack in this process. In theory we could give the fsck code a special callback for accessing the in-pack data, but it's actually quite tricky: a. We don't have an internal efficient index mapping oids to packfile offsets. We only generate it on the fly as part of writing out the .idx file. b. We'd still have to reconstruct deltas, which means we'd basically have to replicate all of the reading logic in packfile.c. Instead, let's avoid running fsck_finish() until after we've written out the .idx file, and then just add it to our internal packed_git list. This does mean that the objects are "in the repository" before we finish our fsck checks. But unpack-objects already exhibits this same behavior, and it's an acceptable tradeoff here for the same reason: the quarantine mechanism means that pushes will be fully protected. In addition to a basic push test in t7415, we add a sneaky pack that reverses the usual object order in the pack, requiring that index-pack access the tree and blob during the "finish" step. This already works for unpack-objects (since it will have written out loose objects), but we'll check it with this sneaky pack for good measure. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21unpack-objects: call fsck_finish() after fscking objectsLibravatar Jeff King1-0/+7
As with the previous commit, we must call fsck's "finish" function in order to catch any queued objects for .gitmodules checks. This second pass will be able to access any incoming objects, because we will have exploded them to loose objects by now. This isn't quite ideal, because it means that bad objects may have been written to the object database (and a subsequent operation could then reference them, even if the other side doesn't send the objects again). However, this is sufficient when used with receive.fsckObjects, since those loose objects will all be placed in a temporary quarantine area that will get wiped if we find any problems. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: call fsck_finish() after fscking objectsLibravatar Jeff King1-0/+4
Now that the internal fsck code is capable of checking .gitmodules files, we just need to teach its callers to use the "finish" function to check any queued objects. With this, we can now catch the malicious case in t7415 with git-fsck. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21submodule-config: verify submodule names as pathsLibravatar Jeff King1-0/+76
Submodule "names" come from the untrusted .gitmodules file, but we blindly append them to $GIT_DIR/modules to create our on-disk repo paths. This means you can do bad things by putting "../" into the name (among other things). Let's sanity-check these names to avoid building a path that can be exploited. There are two main decisions: 1. What should the allowed syntax be? It's tempting to reuse verify_path(), since submodule names typically come from in-repo paths. But there are two reasons not to: a. It's technically more strict than what we need, as we really care only about breaking out of the $GIT_DIR/modules/ hierarchy. E.g., having a submodule named "foo/.git" isn't actually dangerous, and it's possible that somebody has manually given such a funny name. b. Since we'll eventually use this checking logic in fsck to prevent downstream repositories, it should be consistent across platforms. Because verify_path() relies on is_dir_sep(), it wouldn't block "foo\..\bar" on a non-Windows machine. 2. Where should we enforce it? These days most of the .gitmodules reads go through submodule-config.c, so I've put it there in the reading step. That should cover all of the C code. We also construct the name for "git submodule add" inside the git-submodule.sh script. This is probably not a big deal for security since the name is coming from the user anyway, but it would be polite to remind them if the name they pick is invalid (and we need to expose the name-checker to the shell anyway for our test scripts). This patch issues a warning when reading .gitmodules and just ignores the related config entry completely. This will generally end up producing a sensible error, as it works the same as a .gitmodules file which is missing a submodule entry (so "submodule update" will barf, but "git clone --recurse-submodules" will print an error but not abort the clone. There is one minor oddity, which is that we print the warning once per malformed config key (since that's how the config subsystem gives us the entries). So in the new test, for example, the user would see three warnings. That's OK, since the intent is that this case should never come up outside of malicious repositories (and then it might even benefit the user to see the message multiple times). Credit for finding this vulnerability and the proof of concept from which the test script was adapted goes to Etienne Stalmans. Signed-off-by: Jeff King <peff@peff.net>