Age | Commit message (Collapse) | Author | Files | Lines |
|
We don't actually look at the tree struct in fsck_tree() beyond its oid
and type (which is obviously OBJ_TREE). Just taking an oid gives our
callers more flexibility to avoid creating a struct, and makes it clear
that we are fscking just what is in the buffer, not any pre-parsed bits
from the struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We don't actually look at the commit struct in fsck_commit() beyond its
oid and type (which is obviously OBJ_COMMIT). Just taking an oid gives
our callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We don't actually look at the tag struct in fsck_tag() beyond its oid
and type (which is obviously OBJ_TAG). Just taking an oid gives our
callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In fsck_commit() and fsck_tag(), we have local "oid" variables used for
parsing parent and tagged-object oids. Let's give these more specific
names in preparation for the functions taking an "oid" parameter for the
object we're checking.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We only need the oid and type to pass on to report(). Let's accept the
broken-out parameters to give our callers more flexibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The only thing we do with the struct is pass its oid and type to
report(). We can just take those explicitly, which gives our callers
more flexibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since fsck_blob() no longer requires us to have a "struct blob", we
don't need to create one. Which also means we don't need to worry about
handling the case that lookup_blob() returns NULL (we'll still catch
wrongly-identified blobs when we read the actual object contents and
type from disk).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We don't actually need any information from the object struct except its
oid (and the type, of course, but that's implicitly OBJ_BLOB). This
gives our callers more flexibility to drop the object structs, too.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The report() function really only cares about the oid and type of the
object, not the full object struct. Let's convert it to take those two
items separately, which gives our callers more flexibility.
This makes some already-long lines even longer. I've mostly left them,
as our eventual goal is to shrink these down as we continue refactoring
(e.g., "&item->object" becomes "&item->object.oid, item->object.type",
but will eventually shrink down to "oid, type").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The skiplist is inherently an oidset, so we don't need a full object
struct. Let's take just the oid to give our callers more flexibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
None of the callbacks actually care about having a "struct object";
they're happy with just the oid and type information. So let's give
ourselves more flexibility to avoid having a "struct object" by just
passing the broken-down fields.
Note that the callback already takes a "type" field for the fsck message
type. We'll rename that to "msg_type" (and use "object_type" for the
object type) to make the distinction explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We don't actually care about having object structs; we only need to look
up decorations by oid. Let's accept this more limited form, which will
give our callers more flexibility.
Note that the decoration API we rely on uses object structs itself (even
though it only looks at their oids). We can solve this by switching to
a kh_oid_map (we could also use the hashmap oidmap, but it's more
awkward for the simple case of just storing a void pointer).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This isolates the implementation detail of using the decoration code to
our put/get functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 90cf590f53 (fsck: optionally show more helpful info for broken
links, 2016-07-17) added a system for decorating objects with names. The
code is split across builtin/fsck.c (which gives the initial names) and
fsck.c (which adds to the names as it traverses the object graph). This
leads to some duplication, where both sites have near-identical
describe_object() functions (the difference being that the one in
builtin/fsck.c uses a circular array of buffers to allow multiple calls
in a single printf).
Let's provide a unified object_name API for fsck. That lets us drop the
duplication, as well as making the interface boundaries more clear
(which will let us refactor the implementation more in a future patch).
We'll leave describe_object() in builtin/fsck.c as a thin wrapper around
the new API, as it relies on a static global to make its many callers a
bit shorter.
We'll also convert the bare add_decoration() calls in builtin/fsck.c to
put_object_name(). This fixes two minor bugs:
1. We leak many small strings. add_decoration() has a last-one-wins
approach: it updates the decoration to the new string and returns
the old one. But we ignore the return value, leaking the old
string. This is quite common to trigger, since we look at reflogs:
the tip of any ref will be described both by looking at the actual
ref, as well as the latest reflog entry. So we'd always end up
leaking one of those strings.
2. The last-one-wins approach gives us lousy names. For instance, we
first look at all of the refs, and then all of the reflogs. So
rather than seeing "refs/heads/master", we're likely to overwrite
it with "HEAD@{12345678}". We're generally better off using the
first name we find.
And indeed, the test in t1450 expects this ugly HEAD@{} name. After
this patch, we've switched to using fsck_put_object_name()'s
first-one-wins semantics, and we output the more human-friendly
"refs/tags/julius" (and the test is updated accordingly).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The fsck_object() function takes in a buffer, but also a "struct
object". The rules for using these vary between types:
- for a commit, we'll use the provided buffer; if it's NULL, we'll
fall back to get_commit_buffer(), which loads from either an
in-memory cache or from disk. If the latter fails, we'd die(), which
is non-ideal for fsck.
- for a tag, a NULL buffer will fall back to loading the object from
disk (and failure would lead to an fsck error)
- for a tree, we _never_ look at the provided buffer, and always use
tree->buffer
- for a blob, we usually don't look at the buffer at all, unless it
has been marked as a .gitmodule file. In that case we check the
buffer given to us, or assume a NULL buffer is a very large blob
(and complain about it)
This is much more complex than it needs to be. It turns out that nobody
ever feeds a NULL buffer that isn't a blob:
- git-fsck calls fsck_object() only from fsck_obj(). That in turn is
called by one of:
- fsck_obj_buffer(), which is a callback to verify_pack(), which
unpacks everything except large blobs into a buffer (see
pack-check.c, lines 131-141).
- fsck_loose(), which hits a BUG() on non-blobs with a NULL buffer
(builtin/fsck.c, lines 639-640)
And in either case, we'll have just called parse_object_buffer()
anyway, which would segfault on a NULL buffer for commits or tags
(not for trees, but it would install a NULL tree->buffer which would
later cause a segfault)
- git-index-pack asserts that the buffer is non-NULL unless the object
is a blob (see builtin/index-pack.c, line 832)
- git-unpack-objects always writes a non-NULL buffer into its
obj_buffer hash, which is then fed to fsck_object(). (There is
actually a funny thing here where it does not store blob buffers at
all, nor does it call fsck on them; it does check any needed blobs
via fsck_finish() though).
Let's make the rules simpler, which reduces the amount of code and gives
us more flexibility in refactoring the fsck code. The new rules are:
- only blobs are allowed to pass a NULL buffer
- we always use the provided buffer, never pulling information from
the object struct
We don't have to adjust any callers, because they were already adhering
to these. Note that we do drop a few fsck identifiers for missing tags,
but that was all dead code (because nobody passed a NULL tag buffer).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Way back in 92d4c85d24 (fsck-cache: fix SIGSEGV on bad tag object,
2005-05-03), we added an fsck check that the "tagged" field of a tag
struct isn't NULL. But that was mainly protecting the printing code for
"--tags", and that code wasn't moved along with the check as part of
ba002f3b28 (builtin-fsck: move common object checking code to fsck.c,
2008-02-25).
It could also serve to detect type mismatch problems (where a tag points
to object X as a commit, but really X is a blob), but it couldn't do so
reliably (we'd call lookup_commit(X), but it will only notice the
problem if we happen to have previously called lookup_blob(X) in the
same process). And as of a commit earlier in this series, we'd consider
that a parse error and complain about the object even before getting to
this point anyway.
So let's drop this "tag->tagged" check. It's not helping anything, and
getting rid of it makes the function conceptually cleaner, as it really
is just checking the buffer we feed it. In fact, we can get rid of our
one-line wrapper and just unify fsck_tag() and fsck_tag_buffer().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 4516338243 (builtin-fsck: reports missing parent commits,
2008-02-25), we added code to check that fsck found the same number of
parents from parsing the commit itself as we see in the commit struct we
got from parse_commit_buffer(). Back then the rationale was that the
normal commit parser might skip some bad parents.
But earlier in this series, we started treating that reliably as a
parsing error, meaning that we'd complain about it before we even hit
the code in fsck.c.
Let's drop this code, which now makes fsck_commit_buffer() completely
independent of any parsed values in the commit struct (that's
conceptually cleaner, and also opens up more refactoring options).
Note that we can also drop the MISSING_PARENT and MISSING_GRAFT fsck
identifiers. This is no loss, as these would not trigger reliably
anyway. We'd hit them only when lookup_commit() failed, which occurs
only if we happen to have seen the object with another type already in
the same process. In most cases, we'd actually run into the problem
during the connectivity walk, not here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We check in fsck_commit_buffer() that commit->tree isn't NULL, which in
turn generally comes from a previous parse by parse_commit(). But this
isn't really accomplishing anything. The two things we might care about
are:
- was there a syntactically valid "tree <oid>" line in the object? But
we've just done our own parse in fsck_commit_buffer() to check this.
- does it point to a valid tree object? But checking the "tree"
pointer here doesn't actually accomplish that; it just shows that
lookup_tree() didn't return NULL, which only means that we haven't
yet seen that oid as a non-tree in this process.
A real connectivity check would exhaustively walk all graph links,
and we do that already in a separate function.
So this code isn't helping anything. And it makes the fsck code slightly
more confusing and rigid (e.g., it requires that any commit structs have
already been parsed). Let's drop it.
As a bit of history, the presence of this code looks like a leftover
from early fsck code (which did rely on parse_commit() to do most of the
parsing). The check comes from ff5ebe39b0 (Port fsck-cache to use
parsing functions, 2005-04-18), but we later added an explicit walk in
355885d531 (add generic, type aware object chain walker, 2008-02-25).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git blame" learned to "ignore" commits in the history, whose
effects (as well as their presence) get ignored.
* br/blame-ignore:
t8014: remove unnecessary braces
blame: drop some unused function parameters
blame: add a test to cover blame_coalesce()
blame: use the fingerprint heuristic to match ignored lines
blame: add a fingerprint heuristic to match ignored lines
blame: optionally track line fingerprints during fill_blame_origin()
blame: add config options for the output of ignored or unblamable lines
blame: add the ability to ignore commits and their changes
blame: use a helper function in blame_chunk()
Move oidset_parse_file() to oidset.c
fsck: rename and touch up init_skiplist()
|
|
There are no callers left of lookup_unknown_object() that aren't just
passing us the "hash" member of a "struct object_id". Let's take the
whole struct, which gets us closer to removing all raw sha1 variables.
It also matches the existing conversions of lookup_blob(), etc.
The conversions of callers were done by hand, but they're all mechanical
one-liners.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Barret Rhoden <brho@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
init_skiplist() took a file consisting of SHA-1s and comments and added
the objects to an oidset. This functionality is useful for other
commands and will be moved to oidset.c in a future commit.
In preparation for that move, this commit renames it to
oidset_parse_file() to reflect its more generic usage and cleans up a
few of the names.
Signed-off-by: Barret Rhoden <brho@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
struct diff_filespec defines mode to be an 'unsigned short'. Several
other places in the API which we'd like to interact with using a
diff_filespec used a plain unsigned (or unsigned int). This caused
problems when taking addresses, so switch to unsigned short.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.
Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code cleanup.
* jc/cocci-preincr:
fsck: s/++i > 1/i++/
cocci: simplify "if (++u > 1)" to "if (u++)"
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update fsck.skipList implementation and documentation.
* ab/fsck-skiplist:
fsck: support comments & empty lines in skipList
fsck: use oidset instead of oid_array for skipList
fsck: use strbuf_getline() to read skiplist file
fsck: add a performance test for skipList
fsck: add a performance test
fsck: document that skipList input must be unabbreviated
fsck: document and test commented & empty line skipList input
fsck: document and test sorted skipList input
fsck tests: add a test for no skipList input
fsck tests: setup of bogus commit object
|
|
* maint-2.18:
Git 2.18.1
Git 2.17.2
fsck: detect submodule paths starting with dash
fsck: detect submodule urls starting with dash
Git 2.16.5
Git 2.15.3
Git 2.14.5
submodule-config: ban submodule paths that start with a dash
submodule-config: ban submodule urls that start with dash
submodule--helper: use "--" to signal end of clone options
|
|
* maint-2.17:
Git 2.17.2
fsck: detect submodule paths starting with dash
fsck: detect submodule urls starting with dash
Git 2.16.5
Git 2.15.3
Git 2.14.5
submodule-config: ban submodule paths that start with a dash
submodule-config: ban submodule urls that start with dash
submodule--helper: use "--" to signal end of clone options
|
|
As with urls, submodule paths with dashes are ignored by
git, but may end up confusing older versions. Detecting them
via fsck lets us prevent modern versions of git from being a
vector to spread broken .gitmodules to older versions.
Compared to blocking leading-dash urls, though, this
detection may be less of a good idea:
1. While such paths provide confusing and broken results,
they don't seem to actually work as option injections
against anything except "cd". In particular, the
submodule code seems to canonicalize to an absolute
path before running "git clone" (so it passes
/your/clone/-sub).
2. It's more likely that we may one day make such names
actually work correctly. Even after we revert this fsck
check, it will continue to be a hassle until hosting
servers are all updated.
On the other hand, it's not entirely clear that the behavior
in older versions is safe. And if we do want to eventually
allow this, we may end up doing so with a special syntax
anyway (e.g., writing "./-sub" in the .gitmodules file, and
teaching the submodule code to canonicalize it when
comparing).
So on balance, this is probably a good protection.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Urls with leading dashes can cause mischief on older
versions of Git. We should detect them so that they can be
rejected by receive.fsckObjects, preventing modern versions
of git from being a vector by which attacks can spread.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It's annoying not to be able to put comments and empty lines in the
skipList, when e.g. keeping a big central list of commits to skip in
/etc/gitconfig, which was my motivation for 1362df0d41 ("fetch:
implement fetch.fsck.*", 2018-07-27).
Implement that, and document what version of Git this was changed in,
since this on-disk format can be expected to be used by multiple
versions of git.
There is no notable performance impact from this change, using the
test setup described a couple of commits back:
Test HEAD~ HEAD
----------------------------------------------------------------------------------------
1450.3: fsck with 0 skipped bad commits 7.69(7.27+0.42) 7.86(7.48+0.37) +2.2%
1450.5: fsck with 1 skipped bad commits 7.69(7.30+0.38) 7.83(7.47+0.36) +1.8%
1450.7: fsck with 10 skipped bad commits 7.76(7.38+0.38) 7.79(7.38+0.41) +0.4%
1450.9: fsck with 100 skipped bad commits 7.76(7.38+0.38) 7.74(7.36+0.38) -0.3%
1450.11: fsck with 1000 skipped bad commits 7.71(7.30+0.41) 7.72(7.34+0.38) +0.1%
1450.13: fsck with 10000 skipped bad commits 7.74(7.34+0.40) 7.72(7.34+0.38) -0.3%
1450.15: fsck with 100000 skipped bad commits 7.75(7.40+0.35) 7.70(7.29+0.40) -0.6%
1450.17: fsck with 1000000 skipped bad commits 7.12(6.86+0.26) 7.13(6.87+0.26) +0.1%
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the implementation of the skipList feature to use oidset
instead of oid_array to store SHA-1s for later lookup.
This list is parsed once on startup by fsck, fetch-pack or
receive-pack depending on the *.skipList config in use. I.e. only once
per invocation, but note that for "clone --recurse-submodules" each
submodule will re-parse the list, in addition to the main project, and
it will be re-parsed when checking .gitmodules blobs, see
fb16287719 ("fsck: check skiplist for object in fsck_blob()",
2018-06-27).
Memory usage is a bit higher, but we don't need to keep track of the
sort order anymore. Embed the oidset into struct fsck_options to make
its ownership clear (no hidden sharing) and avoid unnecessary pointer
indirection.
The cumulative impact on performance of this & the preceding change,
using the test setup described in the previous commit:
Test HEAD~2 HEAD~ HEAD
----------------------------------------------------------------------------------------------------------------
1450.3: fsck with 0 skipped bad commits 7.70(7.31+0.38) 7.72(7.33+0.38) +0.3% 7.70(7.30+0.40) +0.0%
1450.5: fsck with 1 skipped bad commits 7.84(7.47+0.37) 7.69(7.32+0.36) -1.9% 7.71(7.29+0.41) -1.7%
1450.7: fsck with 10 skipped bad commits 7.81(7.40+0.40) 7.94(7.57+0.36) +1.7% 7.92(7.55+0.37) +1.4%
1450.9: fsck with 100 skipped bad commits 7.81(7.42+0.38) 7.95(7.53+0.41) +1.8% 7.83(7.42+0.41) +0.3%
1450.11: fsck with 1000 skipped bad commits 7.99(7.62+0.36) 7.90(7.50+0.40) -1.1% 7.86(7.49+0.37) -1.6%
1450.13: fsck with 10000 skipped bad commits 7.98(7.57+0.40) 7.94(7.53+0.40) -0.5% 7.90(7.45+0.44) -1.0%
1450.15: fsck with 100000 skipped bad commits 7.97(7.57+0.39) 8.03(7.67+0.36) +0.8% 7.84(7.43+0.41) -1.6%
1450.17: fsck with 1000000 skipped bad commits 7.72(7.22+0.50) 7.28(7.07+0.20) -5.7% 7.13(6.87+0.25) -7.6%
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The buffer is unlikely to contain a NUL character, so printing its
contents using %s in a die() format is unsafe (detected with ASan).
Use an idiomatic strbuf_getline() loop instead, which ensures the buffer
is always NUL-terminated, supports CRLF files as well, accepts files
without a newline after the last line, supports any hash length
automatically, and is shorter.
This fixes a bug where emitting an error about an invalid line on say
line 1 would continue printing subsequent lines, and usually continue
into uninitialized memory.
The performance impact of this, on a CentOS 7 box with RedHat GCC
4.8.5-28:
$ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1451-fsck-skip-list.sh
Test HEAD~ HEAD
----------------------------------------------------------------------------------------
1450.3: fsck with 0 skipped bad commits 7.75(7.39+0.35) 7.68(7.29+0.39) -0.9%
1450.5: fsck with 1 skipped bad commits 7.70(7.30+0.40) 7.80(7.42+0.37) +1.3%
1450.7: fsck with 10 skipped bad commits 7.77(7.37+0.40) 7.87(7.47+0.40) +1.3%
1450.9: fsck with 100 skipped bad commits 7.82(7.41+0.40) 7.88(7.43+0.44) +0.8%
1450.11: fsck with 1000 skipped bad commits 7.88(7.49+0.39) 7.84(7.43+0.40) -0.5%
1450.13: fsck with 10000 skipped bad commits 8.02(7.63+0.39) 8.07(7.67+0.39) +0.6%
1450.15: fsck with 100000 skipped bad commits 8.01(7.60+0.41) 8.08(7.70+0.38) +0.9%
1450.17: fsck with 1000000 skipped bad commits 7.60(7.10+0.50) 7.37(7.18+0.19) -3.0%
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
lookup_commit_reference() and friends have been updated to find
in-core object for a specific in-core repository instance.
* sb/object-store-lookup: (32 commits)
commit.c: allow lookup_commit_reference to handle arbitrary repositories
commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories
tag.c: allow deref_tag to handle arbitrary repositories
object.c: allow parse_object to handle arbitrary repositories
object.c: allow parse_object_buffer to handle arbitrary repositories
commit.c: allow get_cached_commit_buffer to handle arbitrary repositories
commit.c: allow set_commit_buffer to handle arbitrary repositories
commit.c: migrate the commit buffer to the parsed object store
commit-slabs: remove realloc counter outside of slab struct
commit.c: allow parse_commit_buffer to handle arbitrary repositories
tag: allow parse_tag_buffer to handle arbitrary repositories
tag: allow lookup_tag to handle arbitrary repositories
commit: allow lookup_commit to handle arbitrary repositories
tree: allow lookup_tree to handle arbitrary repositories
blob: allow lookup_blob to handle arbitrary repositories
object: allow lookup_object to handle arbitrary repositories
object: allow object_as_type to handle arbitrary repositories
tag: add repository argument to deref_tag
tag: add repository argument to parse_tag_buffer
tag: add repository argument to lookup_tag
...
|
|
Recent "security fix" to pay attention to contents of ".gitmodules"
while accepting "git push" was a bit overly strict than necessary,
which has been adjusted.
* jk/fsck-gitmodules-gently:
fsck: downgrade gitmodulesParse default to "info"
fsck: split ".gitmodules too large" error from parse failure
fsck: silence stderr when parsing .gitmodules
config: add options parameter to git_config_from_mem
config: add CONFIG_ERROR_SILENT handler
config: turn die_on_error into caller-facing enum
|
|
"fsck.skipList" did not prevent a blob object listed there from
being inspected for is contents (e.g. we recently started to
inspect the contents of ".gitmodules" for certain malicious
patterns), which has been corrected.
* rj/submodule-fsck-skip:
fsck: check skiplist for object in fsck_blob()
|
|
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.
* sb/object-store-grafts:
commit: allow lookup_commit_graft to handle arbitrary repositories
commit: allow prepare_commit_graft to handle arbitrary repositories
shallow: migrate shallow information into the object parser
path.c: migrate global git_path_* to take a repository argument
cache: convert get_graft_file to handle arbitrary repositories
commit: convert read_graft_file to handle arbitrary repositories
commit: convert register_commit_graft to handle arbitrary repositories
commit: convert commit_graft_pos() to handle arbitrary repositories
shallow: add repository argument to is_repository_shallow
shallow: add repository argument to check_shallow_file_for_update
shallow: add repository argument to register_shallow
shallow: add repository argument to set_alternate_shallow_file
commit: add repository argument to lookup_commit_graft
commit: add repository argument to prepare_commit_graft
commit: add repository argument to read_graft_file
commit: add repository argument to register_commit_graft
commit: add repository argument to commit_graft_pos
object: move grafts to object parser
object-store: move object access functions to object-store.h
|
|
We added an fsck check in ed8b10f631 (fsck: check
.gitmodules content, 2018-05-02) as a defense against the
vulnerability from 0383bbb901 (submodule-config: verify
submodule names as paths, 2018-04-30). With the idea that
up-to-date hosting sites could protect downstream unpatched
clients that fetch from them.
As part of that defense, we reject any ".gitmodules" entry
that is not syntactically valid. The theory is that if we
cannot even parse the file, we cannot accurately check it
for vulnerabilities. And anybody with a broken .gitmodules
file would eventually want to know anyway.
But there are a few reasons this is a bad tradeoff in
practice:
- for this particular vulnerability, the client has to be
able to parse the file. So you cannot sneak an attack
through using a broken file, assuming the config parsers
for the process running fsck and the eventual victim are
functionally equivalent.
- a broken .gitmodules file is not necessarily a problem.
Our fsck check detects .gitmodules in _any_ tree, not
just at the root. And the presence of a .gitmodules file
does not necessarily mean it will be used; you'd have to
also have gitlinks in the tree. The cgit repository, for
example, has a file named .gitmodules from a
pre-submodule attempt at sharing code, but does not
actually have any gitlinks.
- when the fsck check is used to reject a push, it's often
hard to work around. The pusher may not have full control
over the destination repository (e.g., if it's on a
hosting server, they may need to contact the hosting
site's support). And the broken .gitmodules may be too
far back in history for rewriting to be feasible (again,
this is an issue for cgit).
So we're being unnecessarily restrictive without actually
improving the security in a meaningful way. It would be more
convenient to downgrade this check to "info", which means
we'd still comment on it, but not reject a push. Site admins
can already do this via config, but we should ship sensible
defaults.
There are a few counterpoints to consider in favor of
keeping the check as an error:
- the first point above assumes that the config parsers for
the victim and the fsck process are equivalent. This is
pretty true now, but as time goes on will become less so.
Hosting sites are likely to upgrade their version of Git,
whereas vulnerable clients will be stagnant (if they did
upgrade, they'd cease to be vulnerable!). So in theory we
may see drift over time between what two config parsers
will accept.
In practice, this is probably OK. The config format is
pretty established at this point and shouldn't change a
lot. And the farther we get from the announcement of the
vulnerability, the less interesting this extra layer of
protection becomes. I.e., it was _most_ valuable on day
0, when everybody's client was still vulnerable and
hosting sites could protect people. But as time goes on
and people upgrade, the population of vulnerable clients
becomes smaller and smaller.
- In theory this could protect us from other
vulnerabilities in the future. E.g., .gitmodules are the
only way for a malicious repository to feed data to the
config parser, so this check could similarly protect
clients from a future (to-be-found) bug there.
But that's trading a hypothetical case for real-world
pain today. If we do find such a bug, the hosting site
would need to be updated to fix it, too. At which point
we could figure out whether it's possible to detect
_just_ the malicious case without hurting existing
broken-but-not-evil cases.
- Until recently, we hadn't made any restrictions on
.gitmodules content. So now in tightening that we're
hitting cases where certain things used to work, but
don't anymore. There's some moderate pain now. But as
time goes on, we'll see more (and more varied) cases that
will make tightening harder in the future. So there's
some argument for putting rules in place _now_, before
users grow more cases that violate them.
Again, this is trading pain now for hypothetical benefit
in the future. And if we try hard in the future to keep
our tightening to a minimum (i.e., rejecting true
maliciousness without hurting broken-but-not-evil repos),
then that reduces even the hypothetical benefit.
Considering both sets of arguments, it makes sense to loosen
this check for now.
Note that we have to tweak the test in t7415 since fsck will
no longer consider this a fatal error. But we still check
that it reports the warning, and that we don't get the
spurious error from the config code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since ed8b10f631 (fsck: check .gitmodules content,
2018-05-02), we'll report a gitmodulesParse error for two
conditions:
- a .gitmodules entry is not syntactically valid
- a .gitmodules entry is larger than core.bigFileThreshold
with the intent that we can detect malicious files and
protect downstream clients. E.g., from the issue in
0383bbb901 (submodule-config: verify submodule names as
paths, 2018-04-30).
But these conditions are actually quite different with
respect to that bug:
- a syntactically invalid file cannot trigger the problem,
as the victim would barf before hitting the problematic
code
- a too-big .gitmodules _can_ trigger the problem. Even
though it is obviously silly to have a 500MB .gitmodules
file, the submodule code will happily parse it if you
have enough memory.
So it may be reasonable to configure their severity
separately. Let's add a new class for the "too large" case
to allow that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since commit ed8b10f631 ("fsck: check .gitmodules content", 2018-05-02),
fsck will issue an error message for '.gitmodules' content that cannot
be parsed correctly. This is the case, even when the corresponding blob
object has been included on the skiplist. For example, using the cgit
repository, we see the following:
$ git fsck
Checking object directories: 100% (256/256), done.
error: bad config line 5 in blob .gitmodules
error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: could not parse gitmodules blob
Checking objects: 100% (6626/6626), done.
$
$ git config fsck.skiplist '.git/skip'
$ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip
$
$ git fsck
Checking object directories: 100% (256/256), done.
error: bad config line 5 in blob .gitmodules
Checking objects: 100% (6626/6626), done.
$
Note that the error message issued by the config parser is still
present, despite adding the object-id of the blob to the skiplist.
One solution would be to provide a means of suppressing the messages
issued by the config parser. However, given that (logically) we are
asking fsck to ignore this object, a simpler approach is to just not
call the config parser if the object is to be skipped. Add a check to
the 'fsck_blob()' processing function, to determine if the object is
on the skiplist and, if so, exit the function early.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If there's a parsing error we'll already report it via the
usual fsck report() function (or not, if the user has asked
to skip this object or warning type). The error message from
the config parser just adds confusion. Let's suppress it.
Note that we didn't test this case at all, so I've added
coverage in t7415. We may end up toning down or removing
this fsck check in the future. So take this test as checking
what happens now with a focus on stderr, and not any
ironclad guarantee that we must detect and report parse
failures in the future.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The underlying config parser knows how to handle a
config_options struct, but git_config_from_mem() always
passes NULL. Let's allow our callers to specify the options
struct.
We could add a "_with_options" variant, but since there are
only a handful of callers, let's just update them to pass
NULL.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a repository argument to allow the callers of lookup_tree
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a repository argument to allow the callers of lookup_blob
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a repository argument to allow the callers of parse_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* sb/object-store-grafts:
commit: allow lookup_commit_graft to handle arbitrary repositories
commit: allow prepare_commit_graft to handle arbitrary repositories
shallow: migrate shallow information into the object parser
path.c: migrate global git_path_* to take a repository argument
cache: convert get_graft_file to handle arbitrary repositories
commit: convert read_graft_file to handle arbitrary repositories
commit: convert register_commit_graft to handle arbitrary repositories
commit: convert commit_graft_pos() to handle arbitrary repositories
shallow: add repository argument to is_repository_shallow
shallow: add repository argument to check_shallow_file_for_update
shallow: add repository argument to register_shallow
shallow: add repository argument to set_alternate_shallow_file
commit: add repository argument to lookup_commit_graft
commit: add repository argument to prepare_commit_graft
commit: add repository argument to read_graft_file
commit: add repository argument to register_commit_graft
commit: add repository argument to commit_graft_pos
object: move grafts to object parser
object-store: move object access functions to object-store.h
|
|
Continuing with the idea to programatically enumerate various
pieces of data required for command line completion, teach the
codebase to report the list of configuration variables
subcommands care about to help complete them.
* nd/complete-config-vars:
completion: complete general config vars in two steps
log-tree: allow to customize 'grafted' color
completion: support case-insensitive config vars
completion: keep other config var completion in camelCase
completion: drop the hard coded list of config vars
am: move advice.amWorkDir parsing back to advice.c
advice: keep config name in camelCase in advice_config[]
fsck: produce camelCase config key names
help: add --config to list all available config
fsck: factor out msg_id_info[] lazy initialization code
grep: keep all colors in an array
Add and use generic name->id mapping code for color slot parsing
|
|
Finishing touches to a topic that already is in 'maint'.
* jk/submodule-fsck-loose-fixup:
fsck: avoid looking at NULL blob->object
t7415: don't bother creating commit for symlink test
|
|
Commit 159e7b080b (fsck: detect gitmodules files,
2018-05-02) taught fsck to look at the content of
.gitmodules files. If the object turns out not to be a blob
at all, we just complain and punt on checking the content.
And since this was such an obvious and trivial code path, I
didn't even bother to add a test.
Except it _does_ do one non-trivial thing, which is call the
report() function, which wants us to pass a pointer to a
"struct object". Which we don't have (we have only a "struct
object_id"). So we erroneously pass a NULL object to
report(), which gets dereferenced and causes a segfault.
It seems like we could refactor report() to just take the
object_id itself. But we pass the object pointer along to
a callback function, and indeed this ends up in
builtin/fsck.c's objreport() which does want to look at
other parts of the object (like the type).
So instead, let's just use lookup_unknown_object() to get
the real "struct object", and pass that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|