Age | Commit message (Collapse) | Author | Files | Lines |
|
Async processes can be implemented as separate forked
processes, or as threads (depending on the NO_PTHREADS
setting). In the latter case, if an async thread gets
SIGPIPE, it takes down the whole process. This is obviously
bad if the main process was not otherwise going to die, but
even if we were going to die, it means the main process does
not have a chance to report a useful error message.
There's also the small matter that forked async processes
will not take the main process down on a signal, meaning git
will behave differently depending on the NO_PTHREADS
setting.
This patch fixes it by adding a new flag to "struct async"
to block SIGPIPE just in the async thread. In theory, this
should always be on (which makes async threads behave more
like async processes), but we would first want to make sure
that each async process we spawn is careful about checking
return codes from write() and would not spew endlessly into
a dead pipe. So let's start with it as optional, and we can
enable it for specific sites in future patches.
The natural name for this option would be "ignore_sigpipe",
since that's what it does for the threaded case. But since
that name might imply that we are ignoring it in all cases
(including the separate-process one), let's call it
"isolate_sigpipe". What we are really asking for is
isolation. I.e., not to have our main process taken down by
signals spawned by the async process. How that is
implemented is up to the run-command code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This fixes a deadlock on the client side when pushing a
large number of refs from a corrupted repo. There's a
reproduction script below, but let's start with a
human-readable explanation.
The client side of a push goes something like this:
1. Start an async process to demux sideband coming from
the server.
2. Run pack-objects to send the actual pack, and wait for
its status via finish_command().
3. If pack-objects failed, abort immediately.
4. If pack-objects succeeded, read the per-ref status from
the server, which is actually coming over a pipe from
the demux process started in step 1.
We run finish_async() to wait for and clean up the demux
process in two places. In step 3, if we see an error, we
want it to end early. And after step 4, it should be done
writing any data and we are just cleaning it up.
Let's focus on the error case first. We hand the output
descriptor to the server over to pack-objects. So by the
time it has returned an error to us, it has closed the
descriptor and the server has gotten EOF. The server will
mark all refs as failed with "unpacker error" and send us
back the status for each (followed by EOF).
This status goes to the demuxer thread, which relays it over
a pipe to the main thread. But the main thread never even
tries reading the status. It's trying to bail because of the
pack-objects error, and is waiting for the demuxer thread to
finish. If there are a small number of refs, that's OK; the
demuxer thread writes into the pipe buffer, sees EOF from
the server, and quits. But if there are a large number of
refs, it may block on write() back to the main thread,
leading to a deadlock (the main thread is waiting for the
demuxer to finish, the demuxer is waiting for the main
thread to read).
We can break this deadlock by closing the pipe between the
demuxer and the main thread before calling finish_async().
Then the demuxer gets a write() error and exits.
The non-error case usually just works, because we will have
read all of the data from the other side. We do close
demux.out already, but we only do so _after_ calling
finish_async(). This is OK because there shouldn't be any
more data coming from the server. But technically we've only
read to a flush packet, and a broken or malicious server
could be sending more cruft. In such a case, we would hit
the same deadlock. Closing the pipe first doesn't affect the
normal case, and means that for a cruft-sending server,
we'll notice a write() error rather than deadlocking.
Note that when write() sees this error, we'll actually
deliver SIGPIPE to the thread, which will take down the
whole process (unless we're compiled with NO_PTHREADS). This
isn't ideal, but it's an improvement over the status quo,
which is deadlocking. And SIGPIPE handling in async threads
is a bigger problem that we can deal with separately.
A simple reproduction for the error case is below. It's
technically racy (we could exit the main process and take
down the async thread with us before it even reads the
status), though in practice it seems to fail pretty
consistently.
git init repo &&
cd repo &&
# make some commits; we need two so we can simulate corruption
# in the history later.
git commit --allow-empty -m one &&
one=$(git rev-parse HEAD) &&
git commit --allow-empty -m two &&
two=$(git rev-parse HEAD) &&
# now make a ton of refs; our goal here is to overflow the pipe buffer
# when reporting the ref status, which will cause the demuxer to block
# on write()
for i in $(seq 20000); do
echo "create refs/heads/this-is-a-really-long-branch-name-$i $two"
done |
git update-ref --stdin &&
# now make a corruption in the history such that pack-objects will fail
rm -vf .git/objects/$(echo $one | sed 's}..}&/}') &&
# and then push the result
git init --bare dst.git &&
git push --mirror dst.git
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* mm/doc-hooks-linkgit-fix:
Documentation: fix broken linkgit to git-config
|
|
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* es/st-add4-gcc-4.2-workaround:
git-compat-util: st_add4: work around gcc 4.2.x compiler crash
|
|
Although changes by 5b442c4 (tree-diff: catch integer overflow in
combine_diff_path allocation, 2016-02-19) are perfectly valid, they
unfortunately trigger an internal compiler error in gcc 4.2.x:
combine-diff.c: In function 'diff_tree_combined':
combine-diff.c:1391: internal compiler error: Segmentation fault: 11
Experimentation reveals that changing st_add4()'s argument evaluation
order is sufficient to sidestep this problem.
Although st_add3() does not trigger the compiler bug, for style
consistency, change its argument evaluation order to match.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-2.6:
Git 2.6.6
Git 2.5.5
Git 2.4.11
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-2.5:
Git 2.5.5
Git 2.4.11
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-2.4:
Git 2.4.11
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Bugfix patches were backported from the 'master' front to plug heap
corruption holes, to catch integer overflow in the computation of
pathname lengths, and to get rid of the name_path API. Both of
these would have resulted in writing over an under-allocated buffer
when formulating pathnames while tree traversal.
* jk/path-name-safety-2.4:
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
* jk/path-name-safety-2.7:
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
* jk/path-name-safety-2.6:
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
* jk/path-name-safety-2.5:
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
* jk/path-name-safety-2.4:
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.
So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.
This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "struct name_path" data is examined in only two places:
we generate it in process_tree(), and we convert it to a
single string in path_name(). Everyone else just passes it
through to those functions.
We can further note that process_tree() already keeps a
single strbuf with the leading tree path, for use with
tree_entry_interesting().
Instead of building a separate name_path linked list, let's
just use the one we already build in "base". This reduces
the amount of code (especially tricky code in path_name()
which did not check for integer overflows caused by deep
or large pathnames).
It is also more efficient in some instances. Any time we
were using tree_entry_interesting, we were building up the
strbuf anyway, so this is an immediate and obvious win
there. In cases where we were not, we trade off storing
"pathname/" in a strbuf on the heap for each level of the
path, instead of two pointers and an int on the stack (with
one pointer into the tree object). On a 64-bit system, the
latter is 20 bytes; so if path components are less than that
on average, this has lower peak memory usage. In practice
it probably doesn't matter either way; we are already
holding in memory all of the tree objects leading up to each
pathname, and for normal-depth pathnames, we are only
talking about hundreds of bytes.
This patch leaves "struct name_path" as a thin wrapper
around the strbuf, to avoid disrupting callbacks. We should
fix them, but leaving it out makes this diff easier to view.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When "git rev-list" shows an object with its associated path
name, it does so by walking the name_path linked list and
printing each component (stopping at any embedded NULs or
newlines).
We'd like to eventually get rid of name_path entirely in
favor of a single buffer, and dropping this custom printing
code is part of that. As a first step, let's use path_name()
to format the list into a single buffer, and print that.
This is strictly less efficient than the original, but it's
a temporary step in the refactoring; our end game will be to
get the fully formatted name in the first place.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Performing computations on size_t variables that we feed to
xmalloc and friends can be dangerous, as an integer overflow
can cause us to allocate a much smaller chunk than we
realized.
We already have unsigned_add_overflows(), but let's add
unsigned_mult_overflows() to that. Furthermore, rather than
have each site manually check and die on overflow, we can
provide some helpers that will:
- promote the arguments to size_t, so that we know we are
doing our computation in the same size of integer that
will ultimately be fed to xmalloc
- check and die on overflow
- return the result so that computations can be done in
the parameter list of xmalloc.
These functions are a lot uglier to use than normal
arithmetic operators (you have to do "st_add(foo, bar)"
instead of "foo + bar"). To at least limit the damage, we
also provide multi-valued versions. So rather than:
st_add(st_add(a, b), st_add(c, d));
you can write:
st_add4(a, b, c, d);
This isn't nearly as elegant as a varargs function, but it's
a lot harder to get it wrong. You don't have to remember to
add a sentinel value at the end, and the compiler will
complain if you get the number of arguments wrong. This
patch adds only the numbered variants required to convert
the current code base; we can easily add more later if
needed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The graph traversal code here passes along a name_path to
build up the pathname at which we find each blob. But we
never actually do anything with the resulting names, making
it a waste of code and memory.
This usage came in aa1dbc9 (Update http-push functionality,
2006-03-07), and originally the result was passed to
"add_object" (which stored it, but didn't really use it,
either). But we stopped using that function in 1f1e895 (Add
"named object array" concept, 2006-06-19) in favor of
storing just the objects themselves.
Moreover, the generation of the name in process_tree() is
buggy. It sticks "name" onto the end of the name_path linked
list, and then passes it down again as it recurses (instead
of "entry.path"). So it's a good thing this was unused, as
the resulting path for "a/b/c/d" would end up as "a/a/a/a".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A combine_diff_path struct has two "flex" members allocated
alongside the struct: a string to hold the pathname, and an
array of parent pointers. We use an "int" to compute this,
meaning we may easily overflow it if the pathname is
extremely long.
We can fix this by using size_t, and checking for overflow
with the st_add helper.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ma/update-hooks-sample-typofix:
templates/hooks: fix minor typo in the sample update-hook
|
|
* dt/initial-ref-xn-commit-doc:
refs: document transaction semantics
|
|
* ps/plug-xdl-merge-leak:
xdiff/xmerge: fix memory leak in xdl_merge
|
|
Code simplification.
* ak/git-strip-extension-from-dashed-command:
git.c: simplify stripping extension of a file in handle_builtin()
|
|
Code simplification.
* ak/extract-argv0-last-dir-sep:
exec_cmd.c: use find_last_dir_sep() for code simplification
|
|
The code to read the pack data using the offsets stored in the pack
idx file has been made more carefully check the validity of the
data in the idx.
* jk/pack-idx-corruption-safety:
sha1_file.c: mark strings for translation
use_pack: handle signed off_t overflow
nth_packed_object_offset: bounds-check extended offset
t5313: test bounds-checks of corrupted/malicious pack/idx files
|
|
"git config section.var value" to set a value in per-repository
configuration file failed when it was run outside any repository,
but didn't say the reason correctly.
* js/config-set-in-non-repository:
git config: report when trying to modify a non-existing repo config
|
|
A helper function "git submodule" uses since v2.7.0 to list the
modules that match the pathspec argument given to its subcommands
(e.g. "submodule add <repo> <path>") has been fixed.
* sb/submodule-module-list-fix:
submodule helper list: respect correct path prefix
|
|
Recent versions of GNU grep are pickier when their input contains
arbitrary binary data, which some of our tests uses. Rewrite the
tests to sidestep the problem.
* jk/grep-binary-workaround-in-test:
t9200: avoid grep on non-ASCII data
t8005: avoid grep on non-ASCII data
|
|
The documentation did not clearly state that the 'simple' mode is
now the default for "git push" when push.default configuration is
not set.
* mm/push-simple-doc:
Documentation/git-push: document that 'simple' is the default
|
|
* jk/tighten-alloc: (23 commits)
compat/mingw: brown paper bag fix for 50a6c8e
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
...
|
|
The memory ownership rule of fill_textconv() API, which was a bit
tricky, has been documented a bit better.
* jk/more-comments-on-textconv:
diff: clarify textconv interface
|
|
"git merge-tree" used to mishandle "both sides added" conflict with
its own "create a fake ancestor file that has the common parts of
what both sides have added and do a 3-way merge" logic; this has
been updated to use the usual "3-way merge with an empty blob as
the fake common ancestor file" approach used in the rest of the
system.
* jk/no-diff-emit-common:
xdiff: drop XDL_EMIT_COMMON
merge-tree: drop generate_common strategy
merge-one-file: use empty blob for add/add base
|
|
The "v(iew)" subcommand of the interactive "git am -i" command was
broken in 2.6.0 timeframe when the command was rewritten in C.
* jc/am-i-v-fix:
am -i: fix "v"iew
pager: factor out a helper to prepare a child process to run the pager
pager: lose a separate argv[]
|
|
"git rev-parse --git-common-dir" used in the worktree feature
misbehaved when run from a subdirectory.
* nd/git-common-dir-fix:
rev-parse: take prefix into account in --git-common-dir
|
|
"git show 'HEAD:Foo[BAR]Baz'" did not interpret the argument as a
rev, i.e. the object named by the the pathname with wildcard
characters in a tree object.
* nd/dwim-wildcards-as-pathspecs:
get_sha1: don't die() on bogus search strings
check_filename: tighten dwim-wildcard ambiguity
checkout: reorder check_filename conditional
|
|
Handling of errors while writing into our internal asynchronous
process has been made more robust, which reduces flakiness in our
tests.
* jk/epipe-in-async:
t5504: handle expected output from SIGPIPE death
test_must_fail: report number of unexpected signal
fetch-pack: ignore SIGPIPE in sideband demuxer
write_or_die: handle EPIPE in async threads
|
|
Many codepaths forget to check return value from git_config_set();
the function is made to die() to make sure we do not proceed when
setting a configuration variable failed.
* ps/config-error:
config: rename git_config_set_or_die to git_config_set
config: rename git_config_set to git_config_set_gently
compat: die when unable to set core.precomposeunicode
sequencer: die on config error when saving replay opts
init-db: die on config errors when initializing empty repo
clone: die on config error in cmd_clone
remote: die on config error when manipulating remotes
remote: die on config error when setting/adding branches
remote: die on config error when setting URL
submodule--helper: die on config error when cloning module
submodule: die on config error when linking modules
branch: die on config error when editing branch description
branch: die on config error when unsetting upstream
branch: report errors in tracking branch setup
config: introduce set_or_die wrappers
|
|
Traditionally, the tests that try commands that work on the
contents in the working tree were named with "worktree" in their
filenames, but with the recent addition of "git worktree"
subcommand, whose tests are also named similarly, it has become
harder to tell them apart. The traditional tests have been renamed
to use "work-tree" instead in an attempt to differentiate them.
* mg/work-tree-tests:
tests: rename work-tree tests to *work-tree*
|
|
Help those who debug http(s) part of the system.
* sp/remote-curl-ssl-strerror:
remote-curl: include curl_errorstr on SSL setup failures
|
|
Commit 50a6c8e (use st_add and st_mult for allocation size
computation, 2016-02-22) fixed up many xmalloc call-sites
including ones in compat/mingw.c.
But I screwed up one of them, which was half-converted to
ALLOC_ARRAY, using a very early prototype of the function.
And I never caught it because I don't build on Windows.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 8bf4bec (add "ok=sigpipe" to test_must_fail and use
it to fix flaky tests, 2015-11-27) taught t5504 to handle
"git push" racily exiting with SIGPIPE rather than failing.
However, one of the tests checks the output of the command,
as well. In the SIGPIPE case, we will not have produced any
output. If we want the test to be truly non-flaky, we have
to accept either output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If a command is marked as test_must_fail but dies with a
signal, we consider that a problem and report the error to
stderr. However, we don't say _which_ signal; knowing that
can make debugging easier. Let's share as much as we know.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|