Age | Commit message (Collapse) | Author | Files | Lines |
|
builtin/merge.c contains this important requirement for merge strategies:
...the index must be in sync with the head commit. The strategies are
responsible to ensure this.
However, Documentation/git-merge.txt says:
...[merge will] abort if there are any changes registered in the index
relative to the `HEAD` commit. (One exception is when the changed
index entries are in the state that would result from the merge
already.)
Interestingly, prior to commit c0be8aa06b85 ("Documentation/git-merge.txt:
Partial rewrite of How Merge Works", 2008-07-19),
Documentation/git-merge.txt said much more:
...the index file must match the tree of `HEAD` commit...
[NOTE]
This is a bit of a lie. In certain special cases [explained
in detail]...
Otherwise, merge will refuse to do any harm to your repository
(that is...your working tree...and index are left intact).
So, this suggests that the exceptions existed because there were special
cases where it would case no harm, and potentially be slightly more
convenient for the user. While the current text in git-merge.txt does
list a condition under which it would be safe to proceed despite the index
not matching HEAD, it does not match what is actually implemented, in
three different ways:
* The exception is written to describe what unpack-trees allows. Not
all merge strategies allow such an exception, though, making this
description misleading. 'ours' and 'octopus' merges have strictly
enforced index==HEAD for a while, and the commit previous to this
one made 'recursive' do so as well.
* If someone did a three-way content merge on a specific file using
versions from the relevant commits and staged it prior to running
merge, then that path would technically satisfy the exception listed
in git-merge.txt. unpack-trees.c would still error out on the path,
though, because it defers the three-way content merge logic to other
parts of the code (resolve, octopus, or recursive) and has no way of
checking whether the index entry from before the merge will match
the end result of the merge.
* The exception as implemented in unpack-trees actually only checked
that the index matched the MERGE_HEAD version of the file and that
HEAD matched the merge base. Assuming no renames, that would indeed
provide cases where the index matches the end result we'd get from a
merge. But renames means unpack-trees is checking that it instead
matches something other than what the final result will be, risking
either erroring out when we shouldn't need to, or not erroring out
when we should and overwriting the user's staged changes.
In addition to the wording behind this exception being misleading, it is
also somewhat surprising to see how many times the code for the special
cases were wrong or the check to make sure the index matched head was
forgotten altogether:
* Prior to commit ee6566e8d70d ("[PATCH] Rewrite read-tree", 2005-09-05),
there were many cases where an unclean index entry was allowed (look for
merged_entry_allow_dirty()); it appears that in those cases, the merge
would have simply overwritten staged changes with the result of the
merge. Thus, the merge result would have been correct, but the user's
uncommitted changes could be thrown away without warning.
* Prior to commit 160252f81626 ("git-merge-ours: make sure our index
matches HEAD", 2005-11-03), the 'ours' merge strategy did not check
whether the index matched HEAD. If it didn't, the resulting merge
would include all the staged changes, and thus wasn't really an 'ours'
strategy.
* Prior to commit 3ec62ad9ffba ("merge-octopus: abort if index does not
match HEAD", 2016-04-09), 'octopus' merges did not check whether the
index matched HEAD, also resulting in any staged changes from before
the commit silently being folded into the resulting merge. commit
a6ee883b8eb5 ("t6044: new merge testcases for when index doesn't match
HEAD", 2016-04-09) was also added at the same time to try to test to
make sure all strategies did the necessary checking for the requirement
that the index match HEAD. Sadly, it didn't catch all the cases, as
evidenced by the remainder of this list...
* Prior to commit 65170c07d466 ("merge-recursive: avoid incorporating
uncommitted changes in a merge", 2017-12-21), merge-recursive simply
relied on unpack_trees() to do the necessary check, but in one special
case it avoided calling unpack_trees() entirely and accidentally ended
up silently including any staged changes from before the merge in the
resulting merge commit.
* The commit immediately before this one in this series noted that the
exceptions were written in a way that assumed no renames, making it
unsafe for merge-recursive to use. merge-recursive was modified to
use its own check to enforce that index==HEAD.
This history makes it very tempting to go into builtin/merge.c and replace
the comment that strategies must enforce that index matches HEAD with code
that just enforces it. At this point, that would only affect the
'resolve' strategy; all other strategies have each been modified to
manually enforce it. (However, note that index==HEAD is not strictly
enforced for fast-forward merges, as those are not considered a merge
strategy and they trigger in builtin/merge.c before the section in the
code where the relevant comment is found.)
But, even if we don't take the step of just fixing these problems by
enforcing index==HEAD for all strategies, we at least need to update this
misleading documentation in git-merge.txt. For now, just modify the claim
in Documentation/git-merge.txt to fix the error. The precise details
around combination of merges strategies and special cases probably is not
relevant to most users, so simply state that exceptions may exist but are
narrow and vary depending upon which merge strategy is in use.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
builtin/merge.c says that when we are about to perform a merge:
...the index must be in sync with the head commit. The strategies are
responsible to ensure this.
merge-recursive has always relied on unpack_trees() to enforce this
requirement, except in the case of an "Already up to date!" merge.
unpack-trees.c does not actually enforce this requirement, though. It
allows for a pair of exceptions, in cases which it refers to as #14(ALT)
and #2ALT. Documentation/technical/trivial-merge.txt can be consulted for
the precise meanings of the various case numbers and their meanings for
unpack-trees.c, but we have a high-level description of the intent behind
these two exceptions in a combined and summarized form in
Documentation/git-merge.txt:
...[merge will] abort if there are any changes registered in the index
relative to the `HEAD` commit. (One exception is when the changed index
entries are in the state that would result from the merge already.)
While this high-level description does describe conditions under which it
would be safe to allow the index to diverge from HEAD, it does not match
what is actually implemented. In particular, unpack-trees.c has no
knowledge of renames, and these two exceptions were written assuming that
no renames take place. Once renames get into the mix, it is no longer
safe to allow the index to not match for #2ALT. We could modify
unpack-trees to only allow #14(ALT) as an exception, but that would be
more strict than required for the resolve strategy (since the resolve
strategy doesn't handle renames at all). Therefore, unpack_trees.c seems
like the wrong place to fix this.
Further, if someone fixes the combination of break and rename detection
and modifies merge-recursive to take advantage of the combination, then it
will also no longer be safe to allow the index to not match for #14(ALT)
when the recursive strategy is in use. Therefore, leaving one of the
exceptions in place with the recursive merge strategy feels like we are
just leaving a latent bug in the code for folks in the future to stumble
across.
It may be possible to fix both unpack-trees and merge-recursive in a way
that implements the exception as stated in Documentation/git-merge.txt,
but it would be somewhat complex, possibly also buggy at first, and
ultimately, not all that valuable. Instead, just enforce the requirement
stated in builtin/merge.c; error out if the index does not match the HEAD
commit, just like the 'ours' and 'octopus' strategies do.
Some testcase fixups were in order:
t7611: had many tests designed to show that `git merge --abort` could
not always restore the index and working tree to the state they
were in before the merge started. The tests that were associated
with having changes in the index before the merge started are no
longer applicable, so they have been removed.
t7504: had a few tests that had stray staged changes that were not
actually part of the test under consideration
t6044: We no longer expect stray staged changes to sometimes result
in the merge continuing. Also, fix a case where a merge
didn't abort but should have.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
According to Documentation/git-merge.txt,
...[merge will] abort if there are any changes registered in the index
relative to the `HEAD` commit. (One exception is when the changed
index entries are in the state that would result from the merge
already.)
Add some tests showing that this exception, while it does accurately state
what would be a safe condition under which we could allow the merge to
proceed, is not what is actually implemented.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
`git merge-recursive` does a three-way merge between user-specified trees
base, head, and remote. Since the user is allowed to specify head, we can
not necesarily assume that head == HEAD.
Modify index_has_changes() to take an extra argument specifying the tree
to compare against. If NULL, it will compare to HEAD. We then use this
from merge-recursive to make sure we compare to the user-specified head.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In commit 65170c07d4 ("merge-recursive: avoid incorporating uncommitted
changes in a merge", 2017-12-21), it was noted that there was a special
case when merge-recursive didn't rely on unpack_trees() to enforce the
index == HEAD requirement, and thus that it needed to do that enforcement
itself. Unfortunately, it returned the wrong exit status, signalling that
the merge completed but had conflicts, rather than that it was aborted.
Fix the return code, and while we're at it, change the error message to
match what unpack_trees() would have printed.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `git merge-recursive` command allows the user to directly specify
three commits to merge -- base, head, and remote. (More than three can be
specified in the case of multiple merge bases.) Note that since the user
is allowed to specify head, it need not match HEAD.
Virtually every test and script in the current git.git codebase calls `git
merge-recursive` with head=HEAD, and likely external callers do as well,
which is why this has gone unnoticed. There is one notable
counter-example: git-stash.sh. However, git-stash called `git
merge-recursive` with an index that matches the expected merge result,
which happens to be a currently allowed exception to the "index must match
head" rule, so this never triggered an error previously.
Since we would like to tighten up the "index must match head" rule, we
need to make sure we are comparing to the correct head. Add a testcase
that demonstrates the failure when we check the wrong HEAD.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
t6044 has lots of tests for verifying that merge will abort as expected
when there are changes staged before the merge starts. However, it only
checked for non-zero exit code, which could mean that the merge ran to
completion with conflicts. Check that the merge was actually correctly
aborted, i.e. that .git/MERGE_HEAD is not present.
This changes one of the tests from expect_success to expect_failure.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Modify index_has_changes() to take a struct istate* instead of just
operating on the_index. This is only a partial conversion, though,
because we call do_diff_cache() which implicitly assumes work is to be
done on the_index. Ongoing work is being done elsewhere to do the
remainder of the conversion, and thus is not duplicated here. Instead,
a simple check is put in place until that work is complete.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since index_has_change() is an index-related function, move it to
read-cache.c, only modifying it to avoid uses of the active_cache and
active_nr macros.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jk/submodule-fsck-loose:
fsck: complain when .gitmodules is a symlink
index-pack: check .gitmodules files with --strict
unpack-objects: call fsck_finish() after fscking objects
fsck: call fsck_finish() after fscking objects
fsck: check .gitmodules content
fsck: handle promisor objects in .gitmodules check
fsck: detect gitmodules files
fsck: actually fsck blob data
fsck: simplify ".git" check
index-pack: make fsck error message more specific
|
|
* maint-2.16:
Git 2.16.4
Git 2.15.2
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-2.15:
Git 2.15.2
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-2.14:
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-2.13:
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jk/submodule-fix-loose:
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
|
|
We've recently forbidden .gitmodules to be a symlink in
verify_path(). And it's an easy way to circumvent our fsck
checks for .gitmodules content. So let's complain when we
see it.
Signed-off-by: Jeff King <peff@peff.net>
|
|
Now that the internal fsck code has all of the plumbing we
need, we can start checking incoming .gitmodules files.
Naively, it seems like we would just need to add a call to
fsck_finish() after we've processed all of the objects. And
that would be enough to cover the initial test included
here. But there are two extra bits:
1. We currently don't bother calling fsck_object() at all
for blobs, since it has traditionally been a noop. We'd
actually catch these blobs in fsck_finish() at the end,
but it's more efficient to check them when we already
have the object loaded in memory.
2. The second pass done by fsck_finish() needs to access
the objects, but we're actually indexing the pack in
this process. In theory we could give the fsck code a
special callback for accessing the in-pack data, but
it's actually quite tricky:
a. We don't have an internal efficient index mapping
oids to packfile offsets. We only generate it on
the fly as part of writing out the .idx file.
b. We'd still have to reconstruct deltas, which means
we'd basically have to replicate all of the
reading logic in packfile.c.
Instead, let's avoid running fsck_finish() until after
we've written out the .idx file, and then just add it
to our internal packed_git list.
This does mean that the objects are "in the repository"
before we finish our fsck checks. But unpack-objects
already exhibits this same behavior, and it's an
acceptable tradeoff here for the same reason: the
quarantine mechanism means that pushes will be
fully protected.
In addition to a basic push test in t7415, we add a sneaky
pack that reverses the usual object order in the pack,
requiring that index-pack access the tree and blob during
the "finish" step.
This already works for unpack-objects (since it will have
written out loose objects), but we'll check it with this
sneaky pack for good measure.
Signed-off-by: Jeff King <peff@peff.net>
|
|
As with the previous commit, we must call fsck's "finish"
function in order to catch any queued objects for
.gitmodules checks.
This second pass will be able to access any incoming
objects, because we will have exploded them to loose objects
by now.
This isn't quite ideal, because it means that bad objects
may have been written to the object database (and a
subsequent operation could then reference them, even if the
other side doesn't send the objects again). However, this is
sufficient when used with receive.fsckObjects, since those
loose objects will all be placed in a temporary quarantine
area that will get wiped if we find any problems.
Signed-off-by: Jeff King <peff@peff.net>
|
|
Now that the internal fsck code is capable of checking
.gitmodules files, we just need to teach its callers to use
the "finish" function to check any queued objects.
With this, we can now catch the malicious case in t7415 with
git-fsck.
Signed-off-by: Jeff King <peff@peff.net>
|
|
This patch detects and blocks submodule names which do not
match the policy set forth in submodule-config. These should
already be caught by the submodule code itself, but putting
the check here means that newer versions of Git can protect
older ones from malicious entries (e.g., a server with
receive.fsckObjects will block the objects, protecting
clients which fetch from it).
As a side effect, this means fsck will also complain about
.gitmodules files that cannot be parsed (or were larger than
core.bigFileThreshold).
Signed-off-by: Jeff King <peff@peff.net>
|
|
If we have a tree that points to a .gitmodules blob but
don't have that blob, we can't check its contents. This
produces an fsck error when we encounter it.
But in the case of a promisor object, this absence is
expected, and we must not complain. Note that this can
technically circumvent our transfer.fsckObjects check.
Imagine a client fetches a tree, but not the matching
.gitmodules blob. An fsck of the incoming objects will show
that we don't have enough information. Later, we do fetch
the actual blob. But we have no idea that it's a .gitmodules
file.
The only ways to get around this would be to re-scan all of
the existing trees whenever new ones enter (which is
expensive), or to somehow persist the gitmodules_found set
between fsck runs (which is complicated).
In practice, it's probably OK to ignore the problem. Any
repository which has all of the objects (including the one
serving the promisor packs) can perform the checks. Since
promisor packs are inherently about a hierarchical topology
in which clients rely on upstream repositories, those
upstream repositories can protect all of their downstream
clients from broken objects.
Signed-off-by: Jeff King <peff@peff.net>
|
|
In preparation for performing fsck checks on .gitmodules
files, this commit plumbs in the actual detection of the
files. Note that unlike most other fsck checks, this cannot
be a property of a single object: we must know that the
object is found at a ".gitmodules" path at the root tree of
a commit.
Since the fsck code only sees one object at a time, we have
to mark the related objects to fit the puzzle together. When
we see a commit we mark its tree as a root tree, and when
we see a root tree with a .gitmodules file, we mark the
corresponding blob to be checked.
In an ideal world, we'd check the objects in topological
order: commits followed by trees followed by blobs. In that
case we can avoid ever loading an object twice, since all
markings would be complete by the time we get to the marked
objects. And indeed, if we are checking a single packfile,
this is the order in which Git will generally write the
objects. But we can't count on that:
1. git-fsck may show us the objects in arbitrary order
(loose objects are fed in sha1 order, but we may also
have multiple packs, and we process each pack fully in
sequence).
2. The type ordering is just what git-pack-objects happens
to write now. The pack format does not require a
specific order, and it's possible that future versions
of Git (or a custom version trying to fool official
Git's fsck checks!) may order it differently.
3. We may not even be fscking all of the relevant objects
at once. Consider pushing with transfer.fsckObjects,
where one push adds a blob at path "foo", and then a
second push adds the same blob at path ".gitmodules".
The blob is not part of the second push at all, but we
need to mark and check it.
So in the general case, we need to make up to three passes
over the objects: once to make sure we've seen all commits,
then once to cover any trees we might have missed, and then
a final pass to cover any .gitmodules blobs we found in the
second pass.
We can simplify things a bit by loosening the requirement
that we find .gitmodules only at root trees. Technically
a file like "subdir/.gitmodules" is not parsed by Git, but
it's not unreasonable for us to declare that Git is aware of
all ".gitmodules" files and make them eligible for checking.
That lets us drop the root-tree requirement, which
eliminates one pass entirely. And it makes our worst case
much better: instead of potentially queueing every root tree
to be re-examined, the worst case is that we queue each
unique .gitmodules blob for a second look.
This patch just adds the boilerplate to find .gitmodules
files. The actual content checks will come in a subsequent
commit.
Signed-off-by: Jeff King <peff@peff.net>
|
|
Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:
1. The fsck_object() function just returns success for any
blob. Let's a noop fsck_blob(), which we can fill in
with actual logic later.
2. The fsck_loose() function in builtin/fsck.c
just threw away blob content after loading it. Let's
hold onto it until after we've called fsck_object().
The easiest way to do this is to just drop the
parse_loose_object() helper entirely. Incidentally,
this also fixes a memory leak: if we successfully
loaded the object data but did not parse it, we would
have left the function without freeing it.
3. When fsck_loose() loads the object data, it
does so with a custom read_loose_object() helper. This
function streams any blobs, regardless of size, under
the assumption that we're only checking the sha1.
Instead, let's actually load blobs smaller than
big_file_threshold, as the normal object-reading
code-paths would do. This lets us fsck small files, and
a NULL return is an indication that the blob was so big
that it needed to be streamed, and we can pass that
information along to fsck_blob().
Signed-off-by: Jeff King <peff@peff.net>
|
|
There's no need for us to manually check for ".git"; it's a
subset of the other filesystem-specific tests. Dropping it
makes our code slightly shorter. More importantly, the
existing code may make a reader wonder why ".GIT" is not
covered here, and whether that is a bug (it isn't, as it's
also covered in the filesystem-specific tests).
Signed-off-by: Jeff King <peff@peff.net>
|
|
If fsck reports an error, we say only "Error in object".
This isn't quite as bad as it might seem, since the fsck
code would have dumped some errors to stderr already. But it
might help to give a little more context. The earlier output
would not have even mentioned "fsck", and that may be a clue
that the "fsck.*" or "*.fsckObjects" config may be relevant.
Signed-off-by: Jeff King <peff@peff.net>
|
|
* jk/submodule-name-verify-fix:
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_path: drop clever fallthrough
skip_prefix: add icase-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
path: match NTFS short names for more .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
Note that this includes two bits of evil-merge:
- there's a new call to verify_path() that doesn't actually
have a mode available. It should be OK to pass "0" here,
since we're just manipulating the untracked cache, not an
actual index entry.
- the lstat() in builtin/update-index.c:update_one() needs
to be updated to handle the fsmonitor case (without this
it still behaves correctly, but does an unnecessary
lstat).
|
|
There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:
1. It won't be portable to systems without symlinks.
2. It may behave inconsistently, since Git may look at
this file in the index or a tree without bothering to
resolve any symbolic links. We don't do this _yet_, but
the config infrastructure is there and it's planned for
the future.
With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:
a. A symlinked .gitmodules file may circumvent any fsck
checks of the content.
b. Git may read and write from the on-disk file without
sanity checking the symlink target. So for example, if
you link ".gitmodules" to "../oops" and run "git
submodule add", we'll write to the file "oops" outside
the repository.
Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.
Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.
Signed-off-by: Jeff King <peff@peff.net>
|
|
In the update_one(), we check verify_path() on the proposed
path before doing anything else. In preparation for having
verify_path() look at the file mode, let's stat the file
earlier, so we can check the mode accurately.
This is made a bit trickier by the fact that this function
only does an lstat in a few code paths (the ones that flow
down through process_path()). So we can speculatively do the
lstat() here and pass the results down, and just use a dummy
mode for cases where we won't actually be updating the index
from the filesystem.
Signed-off-by: Jeff King <peff@peff.net>
|
|
We're more restrictive than we need to be in matching ".GIT"
on case-sensitive filesystems; let's make a note that this
is intentional.
Signed-off-by: Jeff King <peff@peff.net>
|
|
We check ".git" and ".." in the same switch statement, and
fall through the cases to share the end-of-component check.
While this saves us a line or two, it makes modifying the
function much harder. Let's just write it out.
Signed-off-by: Jeff King <peff@peff.net>
|
|
We have the convenient skip_prefix() helper, but if you want
to do case-insensitive matching, you're stuck doing it by
hand. We could add an extra parameter to the function to
let callers ask for this, but the function is small and
somewhat performance-critical. Let's just re-implement it
for the case-insensitive version.
Signed-off-by: Jeff King <peff@peff.net>
|
|
This tests primarily for NTFS issues, but also adds one example of an
HFS+ issue.
Thanks go to Congyi Wu for coming up with the list of examples where
NTFS would possibly equate the filename with `.gitmodules`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
|
|
When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.
However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.
It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
number 4. After that, a *different* prefix is used, calculated from the
long file name using an undocumented, but stable algorithm.
For example, the short name of .gitmodules would be GITMOD~1, but if it
is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
GI7EBA~1 will be used. From there, collisions are handled by
incrementing the number, shortening the prefix as needed (until ~9999999
is reached, in which case NTFS will not allow the file to be created).
We'd also want to handle .gitignore and .gitattributes, which suffer
from a similar problem, using the fall-back short names GI250A~1 and
GI7D29~1, respectively.
To accommodate for that, we could reimplement the hashing algorithm, but
it is just safer and simpler to provide the known prefixes. This
algorithm has been reverse-engineered and described at
https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
still available via https://web.archive.org/.
These can be recomputed by running the following Perl script:
-- snip --
use warnings;
use strict;
sub compute_short_name_hash ($) {
my $checksum = 0;
foreach (split('', $_[0])) {
$checksum = ($checksum * 0x25 + ord($_)) & 0xffff;
}
$checksum = ($checksum * 314159269) & 0xffffffff;
$checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000);
$checksum -= (($checksum * 1152921497) >> 60) * 1000000007;
return scalar reverse sprintf("%x", $checksum & 0xffff);
}
print compute_short_name_hash($ARGV[0]);
-- snap --
E.g., running that with the argument ".gitignore" will
result in "250a" (which then becomes "gi250a" in the code).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
|
|
Both verify_path() and fsck match ".git", ".GIT", and other
variants specific to HFS+. Let's allow matching other
special files like ".gitmodules", which we'll later use to
enforce extra restrictions via verify_path() and fsck.
Signed-off-by: Jeff King <peff@peff.net>
|
|
We walk through the "name" string using an int, which can
wrap to a negative value and cause us to read random memory
before our array (e.g., by creating a tree with a name >2GB,
since "int" is still 32 bits even on most 64-bit platforms).
Worse, this is easy to trigger during the fsck_tree() check,
which is supposed to be protecting us from malicious
garbage.
Note one bit of trickiness in the existing code: we
sometimes assign -1 to "len" at the end of the loop, and
then rely on the "len++" in the for-loop's increment to take
it back to 0. This is still legal with a size_t, since
assigning -1 will turn into SIZE_MAX, which then wraps
around to 0 on increment.
Signed-off-by: Jeff King <peff@peff.net>
|
|
Submodule "names" come from the untrusted .gitmodules file,
but we blindly append them to $GIT_DIR/modules to create our
on-disk repo paths. This means you can do bad things by
putting "../" into the name (among other things).
Let's sanity-check these names to avoid building a path that
can be exploited. There are two main decisions:
1. What should the allowed syntax be?
It's tempting to reuse verify_path(), since submodule
names typically come from in-repo paths. But there are
two reasons not to:
a. It's technically more strict than what we need, as
we really care only about breaking out of the
$GIT_DIR/modules/ hierarchy. E.g., having a
submodule named "foo/.git" isn't actually
dangerous, and it's possible that somebody has
manually given such a funny name.
b. Since we'll eventually use this checking logic in
fsck to prevent downstream repositories, it should
be consistent across platforms. Because
verify_path() relies on is_dir_sep(), it wouldn't
block "foo\..\bar" on a non-Windows machine.
2. Where should we enforce it? These days most of the
.gitmodules reads go through submodule-config.c, so
I've put it there in the reading step. That should
cover all of the C code.
We also construct the name for "git submodule add"
inside the git-submodule.sh script. This is probably
not a big deal for security since the name is coming
from the user anyway, but it would be polite to remind
them if the name they pick is invalid (and we need to
expose the name-checker to the shell anyway for our
test scripts).
This patch issues a warning when reading .gitmodules
and just ignores the related config entry completely.
This will generally end up producing a sensible error,
as it works the same as a .gitmodules file which is
missing a submodule entry (so "submodule update" will
barf, but "git clone --recurse-submodules" will print
an error but not abort the clone.
There is one minor oddity, which is that we print the
warning once per malformed config key (since that's how
the config subsystem gives us the entries). So in the
new test, for example, the user would see three
warnings. That's OK, since the intent is that this case
should never come up outside of malicious repositories
(and then it might even benefit the user to see the
message multiple times).
Credit for finding this vulnerability and the proof of
concept from which the test script was adapted goes to
Etienne Stalmans.
Signed-off-by: Jeff King <peff@peff.net>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
l10n for Git 2.17.0 round 1
* tag 'l10n-2.17.0-rnd1' of git://github.com/git-l10n/git-po:
l10n: de.po: translate 132 new messages
l10n: zh_CN: review for git v2.17.0 l10n round 1
l10n: zh_CN: for git v2.17.0 l10n round 1
l10n: ko.po: Update Korean translation
l10n: fr.po: v2.17.0 no fuzzy
l10n: sv.po: Update Swedish translation (3376t0f0u)
l10n: Update Catalan translation
l10n: fr.po v2.17.0 round 1
l10n: vi.po(3376t): Updated Vietnamese translation for v2.17
l10n: bg.po: Updated Bulgarian translation (3376t)
l10n: es.po: Update Spanish translation 2.17.0
l10n: git.pot: v2.17.0 round 1 (132 new, 44 removed)
l10n: es.po: fixes to Spanish translation
|
|
Hotfix.
* pw/add-p-single:
add -p: fix 2.17.0-rc* regression due to moved code
|
|
Fix a regression in 88f6ffc1c2 ("add -p: only bind search key if
there's more than one hunk", 2018-02-13) which is present in
2.17.0-rc*, but not 2.16.0.
In Perl, regex variables like $1 always refer to the last regex
match. When the aforementioned change added a new regex match between
the old match and the corresponding code that was expecting $1, the $1
variable would always be undef, since the newly inserted regex match
doesn't have any captures.
As a result the "/" feature to search for a string in a hunk by regex
completely broke, on git.git:
$ perl -pi -e 's/Git/Tig/g' README.md
$ ./git --exec-path=$PWD add -p
[..]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? s
Split into 4 hunks.
[...]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? /Many
Use of uninitialized value $1 in string eq at /home/avar/g/git/git-add--interactive line 1568, <STDIN> line 1.
search for regex? Many
I.e. the initial "/regex" command wouldn't work, and would always emit
a warning and ask again for a regex, now it works as intended again.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Translate 132 new messages came from git.pot update in abc8de64d (l10n:
git.pot: v2.17.0 round 1 (132 new, 44 removed)).
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
|
|
Hotfix.
* jh/partial-clone:
upload-pack: disable object filtering when disabled by config
unpack-trees: release oid_array after use in check_updates()
|
|
When upload-pack gained partial clone support (v2.17.0-rc0~132^2~12,
2017-12-08), it was guarded by the uploadpack.allowFilter config item
to allow server operators to control when they start supporting it.
That config item didn't go far enough, though: it controls whether the
'filter' capability is advertised, but if a (custom) client ignores
the capability advertisement and passes a filter specification anyway,
the server would handle that despite allowFilter being false.
This is particularly significant if a security bug is discovered in
this new experimental partial clone code. Installations without
uploadpack.allowFilter ought not to be affected since they don't
intend to support partial clone, but they would be swept up into being
vulnerable.
Simplify and limit the attack surface by making uploadpack.allowFilter
disable the feature, not just the advertisement of it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Ray Chen <oldsharp@gmail.com>
|
|
Translate 132 new messages (3376t0f0u) for git 2.17.0-rc0.
Reviewed-by: 依云 <lilydjwg@gmail.com>
Reviewed-by: Fangyi Zhou <fangyi.zhou@yuriko.moe>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
|