summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-07-11merge: fix misleading pre-merge check documentationLibravatar Elijah Newren1-3/+3
builtin/merge.c contains this important requirement for merge strategies: ...the index must be in sync with the head commit. The strategies are responsible to ensure this. However, Documentation/git-merge.txt says: ...[merge will] abort if there are any changes registered in the index relative to the `HEAD` commit. (One exception is when the changed index entries are in the state that would result from the merge already.) Interestingly, prior to commit c0be8aa06b85 ("Documentation/git-merge.txt: Partial rewrite of How Merge Works", 2008-07-19), Documentation/git-merge.txt said much more: ...the index file must match the tree of `HEAD` commit... [NOTE] This is a bit of a lie. In certain special cases [explained in detail]... Otherwise, merge will refuse to do any harm to your repository (that is...your working tree...and index are left intact). So, this suggests that the exceptions existed because there were special cases where it would case no harm, and potentially be slightly more convenient for the user. While the current text in git-merge.txt does list a condition under which it would be safe to proceed despite the index not matching HEAD, it does not match what is actually implemented, in three different ways: * The exception is written to describe what unpack-trees allows. Not all merge strategies allow such an exception, though, making this description misleading. 'ours' and 'octopus' merges have strictly enforced index==HEAD for a while, and the commit previous to this one made 'recursive' do so as well. * If someone did a three-way content merge on a specific file using versions from the relevant commits and staged it prior to running merge, then that path would technically satisfy the exception listed in git-merge.txt. unpack-trees.c would still error out on the path, though, because it defers the three-way content merge logic to other parts of the code (resolve, octopus, or recursive) and has no way of checking whether the index entry from before the merge will match the end result of the merge. * The exception as implemented in unpack-trees actually only checked that the index matched the MERGE_HEAD version of the file and that HEAD matched the merge base. Assuming no renames, that would indeed provide cases where the index matches the end result we'd get from a merge. But renames means unpack-trees is checking that it instead matches something other than what the final result will be, risking either erroring out when we shouldn't need to, or not erroring out when we should and overwriting the user's staged changes. In addition to the wording behind this exception being misleading, it is also somewhat surprising to see how many times the code for the special cases were wrong or the check to make sure the index matched head was forgotten altogether: * Prior to commit ee6566e8d70d ("[PATCH] Rewrite read-tree", 2005-09-05), there were many cases where an unclean index entry was allowed (look for merged_entry_allow_dirty()); it appears that in those cases, the merge would have simply overwritten staged changes with the result of the merge. Thus, the merge result would have been correct, but the user's uncommitted changes could be thrown away without warning. * Prior to commit 160252f81626 ("git-merge-ours: make sure our index matches HEAD", 2005-11-03), the 'ours' merge strategy did not check whether the index matched HEAD. If it didn't, the resulting merge would include all the staged changes, and thus wasn't really an 'ours' strategy. * Prior to commit 3ec62ad9ffba ("merge-octopus: abort if index does not match HEAD", 2016-04-09), 'octopus' merges did not check whether the index matched HEAD, also resulting in any staged changes from before the commit silently being folded into the resulting merge. commit a6ee883b8eb5 ("t6044: new merge testcases for when index doesn't match HEAD", 2016-04-09) was also added at the same time to try to test to make sure all strategies did the necessary checking for the requirement that the index match HEAD. Sadly, it didn't catch all the cases, as evidenced by the remainder of this list... * Prior to commit 65170c07d466 ("merge-recursive: avoid incorporating uncommitted changes in a merge", 2017-12-21), merge-recursive simply relied on unpack_trees() to do the necessary check, but in one special case it avoided calling unpack_trees() entirely and accidentally ended up silently including any staged changes from before the merge in the resulting merge commit. * The commit immediately before this one in this series noted that the exceptions were written in a way that assumed no renames, making it unsafe for merge-recursive to use. merge-recursive was modified to use its own check to enforce that index==HEAD. This history makes it very tempting to go into builtin/merge.c and replace the comment that strategies must enforce that index matches HEAD with code that just enforces it. At this point, that would only affect the 'resolve' strategy; all other strategies have each been modified to manually enforce it. (However, note that index==HEAD is not strictly enforced for fast-forward merges, as those are not considered a merge strategy and they trigger in builtin/merge.c before the section in the code where the relevant comment is found.) But, even if we don't take the step of just fixing these problems by enforcing index==HEAD for all strategies, we at least need to update this misleading documentation in git-merge.txt. For now, just modify the claim in Documentation/git-merge.txt to fix the error. The precise details around combination of merges strategies and special cases probably is not relevant to most users, so simply state that exceptions may exist but are narrow and vary depending upon which merge strategy is in use. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-11merge-recursive: enforce rule that index matches head before mergingLibravatar Elijah Newren4-137/+18
builtin/merge.c says that when we are about to perform a merge: ...the index must be in sync with the head commit. The strategies are responsible to ensure this. merge-recursive has always relied on unpack_trees() to enforce this requirement, except in the case of an "Already up to date!" merge. unpack-trees.c does not actually enforce this requirement, though. It allows for a pair of exceptions, in cases which it refers to as #14(ALT) and #2ALT. Documentation/technical/trivial-merge.txt can be consulted for the precise meanings of the various case numbers and their meanings for unpack-trees.c, but we have a high-level description of the intent behind these two exceptions in a combined and summarized form in Documentation/git-merge.txt: ...[merge will] abort if there are any changes registered in the index relative to the `HEAD` commit. (One exception is when the changed index entries are in the state that would result from the merge already.) While this high-level description does describe conditions under which it would be safe to allow the index to diverge from HEAD, it does not match what is actually implemented. In particular, unpack-trees.c has no knowledge of renames, and these two exceptions were written assuming that no renames take place. Once renames get into the mix, it is no longer safe to allow the index to not match for #2ALT. We could modify unpack-trees to only allow #14(ALT) as an exception, but that would be more strict than required for the resolve strategy (since the resolve strategy doesn't handle renames at all). Therefore, unpack_trees.c seems like the wrong place to fix this. Further, if someone fixes the combination of break and rename detection and modifies merge-recursive to take advantage of the combination, then it will also no longer be safe to allow the index to not match for #14(ALT) when the recursive strategy is in use. Therefore, leaving one of the exceptions in place with the recursive merge strategy feels like we are just leaving a latent bug in the code for folks in the future to stumble across. It may be possible to fix both unpack-trees and merge-recursive in a way that implements the exception as stated in Documentation/git-merge.txt, but it would be somewhat complex, possibly also buggy at first, and ultimately, not all that valuable. Instead, just enforce the requirement stated in builtin/merge.c; error out if the index does not match the HEAD commit, just like the 'ours' and 'octopus' strategies do. Some testcase fixups were in order: t7611: had many tests designed to show that `git merge --abort` could not always restore the index and working tree to the state they were in before the merge started. The tests that were associated with having changes in the index before the merge started are no longer applicable, so they have been removed. t7504: had a few tests that had stray staged changes that were not actually part of the test under consideration t6044: We no longer expect stray staged changes to sometimes result in the merge continuing. Also, fix a case where a merge didn't abort but should have. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-11t6044: add more testcases with staged changes before a merge is invokedLibravatar Elijah Newren1-0/+29
According to Documentation/git-merge.txt, ...[merge will] abort if there are any changes registered in the index relative to the `HEAD` commit. (One exception is when the changed index entries are in the state that would result from the merge already.) Add some tests showing that this exception, while it does accurately state what would be a safe condition under which we could allow the merge to proceed, is not what is actually implemented. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-11merge-recursive: fix assumption that head tree being merged is HEADLibravatar Elijah Newren5-13/+21
`git merge-recursive` does a three-way merge between user-specified trees base, head, and remote. Since the user is allowed to specify head, we can not necesarily assume that head == HEAD. Modify index_has_changes() to take an extra argument specifying the tree to compare against. If NULL, it will compare to HEAD. We then use this from merge-recursive to make sure we compare to the user-specified head. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-11merge-recursive: make sure when we say we abort that we actually abortLibravatar Elijah Newren2-3/+3
In commit 65170c07d4 ("merge-recursive: avoid incorporating uncommitted changes in a merge", 2017-12-21), it was noted that there was a special case when merge-recursive didn't rely on unpack_trees() to enforce the index == HEAD requirement, and thus that it needed to do that enforcement itself. Unfortunately, it returned the wrong exit status, signalling that the merge completed but had conflicts, rather than that it was aborted. Fix the return code, and while we're at it, change the error message to match what unpack_trees() would have printed. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-11t6044: add a testcase for index matching head, when head doesn't match HEADLibravatar Elijah Newren1-0/+11
The `git merge-recursive` command allows the user to directly specify three commits to merge -- base, head, and remote. (More than three can be specified in the case of multiple merge bases.) Note that since the user is allowed to specify head, it need not match HEAD. Virtually every test and script in the current git.git codebase calls `git merge-recursive` with head=HEAD, and likely external callers do as well, which is why this has gone unnoticed. There is one notable counter-example: git-stash.sh. However, git-stash called `git merge-recursive` with an index that matches the expected merge result, which happens to be a currently allowed exception to the "index must match head" rule, so this never triggered an error previously. Since we would like to tighten up the "index must match head" rule, we need to make sure we are comparing to the correct head. Add a testcase that demonstrates the failure when we check the wrong HEAD. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03t6044: verify that merges expected to abort actually abortLibravatar Elijah Newren1-11/+21
t6044 has lots of tests for verifying that merge will abort as expected when there are changes staged before the merge starts. However, it only checked for non-zero exit code, which could mean that the merge ran to completion with conflicts. Check that the merge was actually correctly aborted, i.e. that .git/MERGE_HEAD is not present. This changes one of the tests from expect_success to expect_failure. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03index_has_changes(): avoid assuming operating on the_indexLibravatar Elijah Newren4-13/+17
Modify index_has_changes() to take a struct istate* instead of just operating on the_index. This is only a partial conversion, though, because we call do_diff_cache() which implicitly assumes work is to be done on the_index. Ongoing work is being done elsewhere to do the remainder of the conversion, and thus is not duplicated here. Instead, a simple check is put in place until that work is complete. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03read-cache.c: move index_has_changes() from merge.cLibravatar Elijah Newren2-31/+33
Since index_has_change() is an index-related function, move it to read-cache.c, only modifying it to avoid uses of the active_cache and active_nr macros. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Git 2.17.1Libravatar Junio C Hamano3-2/+18
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Merge branch 'jk/submodule-fsck-loose' into maintLibravatar Junio C Hamano8-30/+271
* jk/submodule-fsck-loose: fsck: complain when .gitmodules is a symlink index-pack: check .gitmodules files with --strict unpack-objects: call fsck_finish() after fscking objects fsck: call fsck_finish() after fscking objects fsck: check .gitmodules content fsck: handle promisor objects in .gitmodules check fsck: detect gitmodules files fsck: actually fsck blob data fsck: simplify ".git" check index-pack: make fsck error message more specific
2018-05-22Sync with Git 2.16.4Libravatar Junio C Hamano20-42/+507
* maint-2.16: Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-22Git 2.16.4Libravatar Junio C Hamano3-2/+7
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Sync with Git 2.15.2Libravatar Junio C Hamano18-41/+500
* maint-2.15: Git 2.15.2 Git 2.14.4 Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-22Git 2.15.2Libravatar Junio C Hamano2-1/+4
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Sync with Git 2.14.4Libravatar Junio C Hamano17-41/+497
* maint-2.14: Git 2.14.4 Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-22Git 2.14.4Libravatar Junio C Hamano3-2/+7
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Sync with Git 2.13.7Libravatar Junio C Hamano16-41/+492
* maint-2.13: Git 2.13.7 verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-22Git 2.13.7Libravatar Junio C Hamano3-2/+22
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Merge branch 'jk/submodule-fix-loose' into maint-2.13Libravatar Junio C Hamano15-41/+472
* jk/submodule-fix-loose: verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests is_ntfs_dotgit: match other .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths
2018-05-21fsck: complain when .gitmodules is a symlinkLibravatar Jeff King2-2/+38
We've recently forbidden .gitmodules to be a symlink in verify_path(). And it's an easy way to circumvent our fsck checks for .gitmodules content. So let's complain when we see it. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21index-pack: check .gitmodules files with --strictLibravatar Jeff King3-0/+60
Now that the internal fsck code has all of the plumbing we need, we can start checking incoming .gitmodules files. Naively, it seems like we would just need to add a call to fsck_finish() after we've processed all of the objects. And that would be enough to cover the initial test included here. But there are two extra bits: 1. We currently don't bother calling fsck_object() at all for blobs, since it has traditionally been a noop. We'd actually catch these blobs in fsck_finish() at the end, but it's more efficient to check them when we already have the object loaded in memory. 2. The second pass done by fsck_finish() needs to access the objects, but we're actually indexing the pack in this process. In theory we could give the fsck code a special callback for accessing the in-pack data, but it's actually quite tricky: a. We don't have an internal efficient index mapping oids to packfile offsets. We only generate it on the fly as part of writing out the .idx file. b. We'd still have to reconstruct deltas, which means we'd basically have to replicate all of the reading logic in packfile.c. Instead, let's avoid running fsck_finish() until after we've written out the .idx file, and then just add it to our internal packed_git list. This does mean that the objects are "in the repository" before we finish our fsck checks. But unpack-objects already exhibits this same behavior, and it's an acceptable tradeoff here for the same reason: the quarantine mechanism means that pushes will be fully protected. In addition to a basic push test in t7415, we add a sneaky pack that reverses the usual object order in the pack, requiring that index-pack access the tree and blob during the "finish" step. This already works for unpack-objects (since it will have written out loose objects), but we'll check it with this sneaky pack for good measure. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21unpack-objects: call fsck_finish() after fscking objectsLibravatar Jeff King2-1/+11
As with the previous commit, we must call fsck's "finish" function in order to catch any queued objects for .gitmodules checks. This second pass will be able to access any incoming objects, because we will have exploded them to loose objects by now. This isn't quite ideal, because it means that bad objects may have been written to the object database (and a subsequent operation could then reference them, even if the other side doesn't send the objects again). However, this is sufficient when used with receive.fsckObjects, since those loose objects will all be placed in a temporary quarantine area that will get wiped if we find any problems. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: call fsck_finish() after fscking objectsLibravatar Jeff King2-0/+7
Now that the internal fsck code is capable of checking .gitmodules files, we just need to teach its callers to use the "finish" function to check any queued objects. With this, we can now catch the malicious case in t7415 with git-fsck. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: check .gitmodules contentLibravatar Jeff King1-1/+59
This patch detects and blocks submodule names which do not match the policy set forth in submodule-config. These should already be caught by the submodule code itself, but putting the check here means that newer versions of Git can protect older ones from malicious entries (e.g., a server with receive.fsckObjects will block the objects, protecting clients which fetch from it). As a side effect, this means fsck will also complain about .gitmodules files that cannot be parsed (or were larger than core.bigFileThreshold). Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: handle promisor objects in .gitmodules checkLibravatar Jeff King1-0/+3
If we have a tree that points to a .gitmodules blob but don't have that blob, we can't check its contents. This produces an fsck error when we encounter it. But in the case of a promisor object, this absence is expected, and we must not complain. Note that this can technically circumvent our transfer.fsckObjects check. Imagine a client fetches a tree, but not the matching .gitmodules blob. An fsck of the incoming objects will show that we don't have enough information. Later, we do fetch the actual blob. But we have no idea that it's a .gitmodules file. The only ways to get around this would be to re-scan all of the existing trees whenever new ones enter (which is expensive), or to somehow persist the gitmodules_found set between fsck runs (which is complicated). In practice, it's probably OK to ignore the problem. Any repository which has all of the objects (including the one serving the promisor packs) can perform the checks. Since promisor packs are inherently about a hierarchical topology in which clients rely on upstream repositories, those upstream repositories can protect all of their downstream clients from broken objects. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: detect gitmodules filesLibravatar Jeff King2-0/+65
In preparation for performing fsck checks on .gitmodules files, this commit plumbs in the actual detection of the files. Note that unlike most other fsck checks, this cannot be a property of a single object: we must know that the object is found at a ".gitmodules" path at the root tree of a commit. Since the fsck code only sees one object at a time, we have to mark the related objects to fit the puzzle together. When we see a commit we mark its tree as a root tree, and when we see a root tree with a .gitmodules file, we mark the corresponding blob to be checked. In an ideal world, we'd check the objects in topological order: commits followed by trees followed by blobs. In that case we can avoid ever loading an object twice, since all markings would be complete by the time we get to the marked objects. And indeed, if we are checking a single packfile, this is the order in which Git will generally write the objects. But we can't count on that: 1. git-fsck may show us the objects in arbitrary order (loose objects are fed in sha1 order, but we may also have multiple packs, and we process each pack fully in sequence). 2. The type ordering is just what git-pack-objects happens to write now. The pack format does not require a specific order, and it's possible that future versions of Git (or a custom version trying to fool official Git's fsck checks!) may order it differently. 3. We may not even be fscking all of the relevant objects at once. Consider pushing with transfer.fsckObjects, where one push adds a blob at path "foo", and then a second push adds the same blob at path ".gitmodules". The blob is not part of the second push at all, but we need to mark and check it. So in the general case, we need to make up to three passes over the objects: once to make sure we've seen all commits, then once to cover any trees we might have missed, and then a final pass to cover any .gitmodules blobs we found in the second pass. We can simplify things a bit by loosening the requirement that we find .gitmodules only at root trees. Technically a file like "subdir/.gitmodules" is not parsed by Git, but it's not unreasonable for us to declare that Git is aware of all ".gitmodules" files and make them eligible for checking. That lets us drop the root-tree requirement, which eliminates one pass entirely. And it makes our worst case much better: instead of potentially queueing every root tree to be re-examined, the worst case is that we queue each unique .gitmodules blob for a second look. This patch just adds the boilerplate to find .gitmodules files. The actual content checks will come in a subsequent commit. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: actually fsck blob dataLibravatar Jeff King3-24/+28
Because fscking a blob has always been a noop, we didn't bother passing around the blob data. In preparation for content-level checks, let's fix up a few things: 1. The fsck_object() function just returns success for any blob. Let's a noop fsck_blob(), which we can fill in with actual logic later. 2. The fsck_loose() function in builtin/fsck.c just threw away blob content after loading it. Let's hold onto it until after we've called fsck_object(). The easiest way to do this is to just drop the parse_loose_object() helper entirely. Incidentally, this also fixes a memory leak: if we successfully loaded the object data but did not parse it, we would have left the function without freeing it. 3. When fsck_loose() loads the object data, it does so with a custom read_loose_object() helper. This function streams any blobs, regardless of size, under the assumption that we're only checking the sha1. Instead, let's actually load blobs smaller than big_file_threshold, as the normal object-reading code-paths would do. This lets us fsck small files, and a NULL return is an indication that the blob was so big that it needed to be streamed, and we can pass that information along to fsck_blob(). Signed-off-by: Jeff King <peff@peff.net>
2018-05-21fsck: simplify ".git" checkLibravatar Jeff King1-3/+1
There's no need for us to manually check for ".git"; it's a subset of the other filesystem-specific tests. Dropping it makes our code slightly shorter. More importantly, the existing code may make a reader wonder why ".GIT" is not covered here, and whether that is a bug (it isn't, as it's also covered in the filesystem-specific tests). Signed-off-by: Jeff King <peff@peff.net>
2018-05-21index-pack: make fsck error message more specificLibravatar Jeff King2-2/+2
If fsck reports an error, we say only "Error in object". This isn't quite as bad as it might seem, since the fsck code would have dumped some errors to stderr already. But it might help to give a little more context. The earlier output would not have even mentioned "fsck", and that may be a clue that the "fsck.*" or "*.fsckObjects" config may be relevant. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21Merge branch 'jk/submodule-name-verify-fix' into jk/submodule-name-verify-fsckLibravatar Jeff King16-42/+474
* jk/submodule-name-verify-fix: verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_path: drop clever fallthrough skip_prefix: add icase-insensitive variant is_{hfs,ntfs}_dotgitmodules: add tests path: match NTFS short names for more .git files is_hfs_dotgit: match other .git files is_ntfs_dotgit: use a size_t for traversing string submodule-config: verify submodule names as paths Note that this includes two bits of evil-merge: - there's a new call to verify_path() that doesn't actually have a mode available. It should be OK to pass "0" here, since we're just manipulating the untracked cache, not an actual index entry. - the lstat() in builtin/update-index.c:update_one() needs to be updated to handle the fsmonitor case (without this it still behaves correctly, but does an unnecessary lstat).
2018-05-21verify_path: disallow symlinks in .gitmodulesLibravatar Jeff King4-15/+37
There are a few reasons it's not a good idea to make .gitmodules a symlink, including: 1. It won't be portable to systems without symlinks. 2. It may behave inconsistently, since Git may look at this file in the index or a tree without bothering to resolve any symbolic links. We don't do this _yet_, but the config infrastructure is there and it's planned for the future. With some clever code, we could make (2) work. And some people may not care about (1) if they only work on one platform. But there are a few security reasons to simply disallow it: a. A symlinked .gitmodules file may circumvent any fsck checks of the content. b. Git may read and write from the on-disk file without sanity checking the symlink target. So for example, if you link ".gitmodules" to "../oops" and run "git submodule add", we'll write to the file "oops" outside the repository. Again, both of those are problems that _could_ be solved with sufficient code, but given the complications in (1) and (2), we're better off just outlawing it explicitly. Note the slightly tricky call to verify_path() in update-index's update_one(). There we may not have a mode if we're not updating from the filesystem (e.g., we might just be removing the file). Passing "0" as the mode there works fine; since it's not a symlink, we'll just skip the extra checks. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21update-index: stat updated files earlierLibravatar Jeff King1-8/+17
In the update_one(), we check verify_path() on the proposed path before doing anything else. In preparation for having verify_path() look at the file mode, let's stat the file earlier, so we can check the mode accurately. This is made a bit trickier by the fact that this function only does an lstat in a few code paths (the ones that flow down through process_path()). So we can speculatively do the lstat() here and pass the results down, and just use a dummy mode for cases where we won't actually be updating the index from the filesystem. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21verify_dotfile: mention case-insensitivity in commentLibravatar Jeff King1-1/+4
We're more restrictive than we need to be in matching ".GIT" on case-sensitive filesystems; let's make a note that this is intentional. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21verify_path: drop clever fallthroughLibravatar Jeff King1-4/+4
We check ".git" and ".." in the same switch statement, and fall through the cases to share the end-of-component check. While this saves us a line or two, it makes modifying the function much harder. Let's just write it out. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21skip_prefix: add case-insensitive variantLibravatar Jeff King1-0/+17
We have the convenient skip_prefix() helper, but if you want to do case-insensitive matching, you're stuck doing it by hand. We could add an extra parameter to the function to let callers ask for this, but the function is small and somewhat performance-critical. Let's just re-implement it for the case-insensitive version. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21is_{hfs,ntfs}_dotgitmodules: add testsLibravatar Johannes Schindelin2-0/+106
This tests primarily for NTFS issues, but also adds one example of an HFS+ issue. Thanks go to Congyi Wu for coming up with the list of examples where NTFS would possibly equate the filename with `.gitmodules`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff King <peff@peff.net>
2018-05-21is_ntfs_dotgit: match other .git filesLibravatar Johannes Schindelin2-1/+93
When we started to catch NTFS short names that clash with .git, we only looked for GIT~1. This is sufficient because we only ever clone into an empty directory, so .git is guaranteed to be the first subdirectory or file in that directory. However, even with a fresh clone, .gitmodules is *not* necessarily the first file to be written that would want the NTFS short name GITMOD~1: a malicious repository can add .gitmodul0000 and friends, which sorts before `.gitmodules` and is therefore checked out *first*. For that reason, we have to test not only for ~1 short names, but for others, too. It's hard to just adapt the existing checks in is_ntfs_dotgit(): since Windows 2000 (i.e., in all Windows versions still supported by Git), NTFS short names are only generated in the <prefix>~<number> form up to number 4. After that, a *different* prefix is used, calculated from the long file name using an undocumented, but stable algorithm. For example, the short name of .gitmodules would be GITMOD~1, but if it is taken, and all of ~2, ~3 and ~4 are taken, too, the short name GI7EBA~1 will be used. From there, collisions are handled by incrementing the number, shortening the prefix as needed (until ~9999999 is reached, in which case NTFS will not allow the file to be created). We'd also want to handle .gitignore and .gitattributes, which suffer from a similar problem, using the fall-back short names GI250A~1 and GI7D29~1, respectively. To accommodate for that, we could reimplement the hashing algorithm, but it is just safer and simpler to provide the known prefixes. This algorithm has been reverse-engineered and described at https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but still available via https://web.archive.org/. These can be recomputed by running the following Perl script: -- snip -- use warnings; use strict; sub compute_short_name_hash ($) { my $checksum = 0; foreach (split('', $_[0])) { $checksum = ($checksum * 0x25 + ord($_)) & 0xffff; } $checksum = ($checksum * 314159269) & 0xffffffff; $checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000); $checksum -= (($checksum * 1152921497) >> 60) * 1000000007; return scalar reverse sprintf("%x", $checksum & 0xffff); } print compute_short_name_hash($ARGV[0]); -- snap -- E.g., running that with the argument ".gitignore" will result in "250a" (which then becomes "gi250a" in the code). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff King <peff@peff.net>
2018-05-21is_hfs_dotgit: match other .git filesLibravatar Jeff King2-12/+51
Both verify_path() and fsck match ".git", ".GIT", and other variants specific to HFS+. Let's allow matching other special files like ".gitmodules", which we'll later use to enforce extra restrictions via verify_path() and fsck. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21is_ntfs_dotgit: use a size_t for traversing stringLibravatar Jeff King1-1/+1
We walk through the "name" string using an int, which can wrap to a negative value and cause us to read random memory before our array (e.g., by creating a tree with a name >2GB, since "int" is still 32 bits even on most 64-bit platforms). Worse, this is easy to trigger during the fsck_tree() check, which is supposed to be protecting us from malicious garbage. Note one bit of trickiness in the existing code: we sometimes assign -1 to "len" at the end of the loop, and then rely on the "len++" in the for-loop's increment to take it back to 0. This is still legal with a size_t, since assigning -1 will turn into SIZE_MAX, which then wraps around to 0 on increment. Signed-off-by: Jeff King <peff@peff.net>
2018-05-21submodule-config: verify submodule names as pathsLibravatar Jeff King5-0/+143
Submodule "names" come from the untrusted .gitmodules file, but we blindly append them to $GIT_DIR/modules to create our on-disk repo paths. This means you can do bad things by putting "../" into the name (among other things). Let's sanity-check these names to avoid building a path that can be exploited. There are two main decisions: 1. What should the allowed syntax be? It's tempting to reuse verify_path(), since submodule names typically come from in-repo paths. But there are two reasons not to: a. It's technically more strict than what we need, as we really care only about breaking out of the $GIT_DIR/modules/ hierarchy. E.g., having a submodule named "foo/.git" isn't actually dangerous, and it's possible that somebody has manually given such a funny name. b. Since we'll eventually use this checking logic in fsck to prevent downstream repositories, it should be consistent across platforms. Because verify_path() relies on is_dir_sep(), it wouldn't block "foo\..\bar" on a non-Windows machine. 2. Where should we enforce it? These days most of the .gitmodules reads go through submodule-config.c, so I've put it there in the reading step. That should cover all of the C code. We also construct the name for "git submodule add" inside the git-submodule.sh script. This is probably not a big deal for security since the name is coming from the user anyway, but it would be polite to remind them if the name they pick is invalid (and we need to expose the name-checker to the shell anyway for our test scripts). This patch issues a warning when reading .gitmodules and just ignores the related config entry completely. This will generally end up producing a sensible error, as it works the same as a .gitmodules file which is missing a submodule entry (so "submodule update" will barf, but "git clone --recurse-submodules" will print an error but not abort the clone. There is one minor oddity, which is that we print the warning once per malformed config key (since that's how the config subsystem gives us the entries). So in the new test, for example, the user would see three warnings. That's OK, since the intent is that this case should never come up outside of malicious repositories (and then it might even benefit the user to see the message multiple times). Credit for finding this vulnerability and the proof of concept from which the test script was adapted goes to Etienne Stalmans. Signed-off-by: Jeff King <peff@peff.net>
2018-04-02Git 2.17Libravatar Junio C Hamano1-1/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02Merge tag 'l10n-2.17.0-rnd1' of git://github.com/git-l10n/git-poLibravatar Junio C Hamano10-22585/+26683
l10n for Git 2.17.0 round 1 * tag 'l10n-2.17.0-rnd1' of git://github.com/git-l10n/git-po: l10n: de.po: translate 132 new messages l10n: zh_CN: review for git v2.17.0 l10n round 1 l10n: zh_CN: for git v2.17.0 l10n round 1 l10n: ko.po: Update Korean translation l10n: fr.po: v2.17.0 no fuzzy l10n: sv.po: Update Swedish translation (3376t0f0u) l10n: Update Catalan translation l10n: fr.po v2.17.0 round 1 l10n: vi.po(3376t): Updated Vietnamese translation for v2.17 l10n: bg.po: Updated Bulgarian translation (3376t) l10n: es.po: Update Spanish translation 2.17.0 l10n: git.pot: v2.17.0 round 1 (132 new, 44 removed) l10n: es.po: fixes to Spanish translation
2018-04-02Merge branch 'pw/add-p-single'Libravatar Junio C Hamano1-1/+1
Hotfix. * pw/add-p-single: add -p: fix 2.17.0-rc* regression due to moved code
2018-03-31add -p: fix 2.17.0-rc* regression due to moved codeLibravatar Ævar Arnfjörð Bjarmason1-1/+1
Fix a regression in 88f6ffc1c2 ("add -p: only bind search key if there's more than one hunk", 2018-02-13) which is present in 2.17.0-rc*, but not 2.16.0. In Perl, regex variables like $1 always refer to the last regex match. When the aforementioned change added a new regex match between the old match and the corresponding code that was expecting $1, the $1 variable would always be undef, since the newly inserted regex match doesn't have any captures. As a result the "/" feature to search for a string in a hunk by regex completely broke, on git.git: $ perl -pi -e 's/Git/Tig/g' README.md $ ./git --exec-path=$PWD add -p [..] Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? s Split into 4 hunks. [...] Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? /Many Use of uninitialized value $1 in string eq at /home/avar/g/git/git-add--interactive line 1568, <STDIN> line 1. search for regex? Many I.e. the initial "/regex" command wouldn't work, and would always emit a warning and ask again for a regex, now it works as intended again. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-31l10n: de.po: translate 132 new messagesLibravatar Ralf Thielow1-2093/+2527
Translate 132 new messages came from git.pot update in abc8de64d (l10n: git.pot: v2.17.0 round 1 (132 new, 44 removed)). Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
2018-03-29Merge branch 'jh/partial-clone'Libravatar Junio C Hamano3-5/+6
Hotfix. * jh/partial-clone: upload-pack: disable object filtering when disabled by config unpack-trees: release oid_array after use in check_updates()
2018-03-29upload-pack: disable object filtering when disabled by configLibravatar Jonathan Nieder2-5/+5
When upload-pack gained partial clone support (v2.17.0-rc0~132^2~12, 2017-12-08), it was guarded by the uploadpack.allowFilter config item to allow server operators to control when they start supporting it. That config item didn't go far enough, though: it controls whether the 'filter' capability is advertised, but if a (custom) client ignores the capability advertisement and passes a filter specification anyway, the server would handle that despite allowFilter being false. This is particularly significant if a security bug is discovered in this new experimental partial clone code. Installations without uploadpack.allowFilter ought not to be affected since they don't intend to support partial clone, but they would be swept up into being vulnerable. Simplify and limit the attack surface by making uploadpack.allowFilter disable the feature, not just the advertisement of it. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-29l10n: zh_CN: review for git v2.17.0 l10n round 1Libravatar Ray Chen1-5/+5
Signed-off-by: Ray Chen <oldsharp@gmail.com>
2018-03-29l10n: zh_CN: for git v2.17.0 l10n round 1Libravatar Jiang Xin1-2066/+2495
Translate 132 new messages (3376t0f0u) for git 2.17.0-rc0. Reviewed-by: 依云 <lilydjwg@gmail.com> Reviewed-by: Fangyi Zhou <fangyi.zhou@yuriko.moe> Signed-off-by: Jiang Xin <worldhello.net@gmail.com>