Age | Commit message (Collapse) | Author | Files | Lines |
|
Traditionally cat-file's batch-mode does not do any output
buffering. The reason is that a caller may have pipes
connected to its input and output, and would want to use
cat-file interactively, getting output immediately for each
input it sends.
This may involve a lot of small write() calls, which can be
slow. So we introduced --buffer to improve this, but we
can't turn it on by default, as it would break the
interactive case above.
However, when --batch-all-objects is used, we do not read
stdin at all. We generate the output ourselves as quickly as
possible, and then exit. In this case buffering is a strict
win, and it is simply a hassle for the user to have to
remember to specify --buffer.
This patch makes --buffer the default when --batch-all-objects
is used. Specifying "--buffer" manually is still OK, and you
can even override it with "--no-buffer" if you're a
masochist (or debugging).
For some real numbers, running:
git cat-file --batch-all-objects --batch-check='%(objectname)'
on torvalds/linux goes from:
real 0m1.464s
user 0m1.208s
sys 0m0.252s
to:
real 0m1.230s
user 0m1.172s
sys 0m0.056s
for a 16% speedup.
Suggested-by: Charles Bailey <charles@hashpling.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is not unreasonable to ask cat-file for a batch-check
format of simply "%(objectname)". At first glance this seems
like a noop (you are generally already feeding the object
names on stdin!), but it has a few uses:
1. With --batch-all-objects, you can generate a listing of
the sha1s present in the repository, without any input.
2. You do not have to feed sha1s; you can feed arbitrary
sha1 expressions and have git resolve them en masse.
3. You can even feed a raw sha1, with the result that git
will tell you whether we actually have the object or
not.
In case 3, the call to sha1_object_info is useful; it tells
us whether the object exists or not (technically we could
swap this out for has_sha1_file, but the cost is roughly the
same).
In case 2, the existence check is of debatable value. A
mass-resolution might prefer performance to safety (against
outputting a value for a corrupted ref, for example).
However, the object lookup cost is likely not as noticeable
compared to the resolution cost. And since we have provided
that safety in the past, the conservative choice is to keep
it.
In case 1, though, the object lookup is a definite noop; we
know about the object because we found it in the object
database. There is no new information gained by making the
call.
This patch detects that case and optimizes out the call.
Here are best-of-five timings for linux.git:
[before]
$ time git cat-file --buffer \
--batch-all-objects \
--batch-check='%(objectname)'
real 0m2.117s
user 0m2.044s
sys 0m0.072s
[after]
$ time git cat-file --buffer \
--batch-all-objects \
--batch-check='%(objectname)'
real 0m1.230s
user 0m1.176s
sys 0m0.052s
There are two implementation details to note here.
One is that we detect the noop case by seeing that "struct
object_info" does not request any information. But besides
object existence, there is one other piece of information
which sha1_object_info may fill in: whether the object is
cached, loose, or packed. We don't currently provide that
information in the output, but if we were to do so later,
we'd need to take note and disable the optimization in that
case.
And that leads to the second note. If we were to output
that information, a better implementation would be to
remember where we saw the object in --batch-all-objects in
the first place, and avoid looking it up again by sha1.
In fact, we could probably squeeze out some extra
performance for less-trivial cases, too, by remembering the
pack location where we saw the object, and going directly
there to find its information (like type, size, etc). That
would in theory make this optimization unnecessary.
I didn't pursue that path here for two reasons:
1. It's non-trivial to implement, and has memory
implications. Because we sort and de-dup the list of
output sha1s, we'd have to record the pack information
for each object, too.
2. It doesn't save as much as you might hope. It saves the
find_pack_entry() call, but getting the size and type
for deltified objects requires walking down the delta
chain (for the real type) or reading the delta data
header (for the size). These costs tend to dominate the
non-trivial cases.
By contrast, this optimization is easy and self-contained,
and speeds up a real-world case I've used.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-2.5:
Git 2.5.5
Git 2.4.11
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint-2.4:
Git 2.4.11
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Bugfix patches were backported from the 'master' front to plug heap
corruption holes, to catch integer overflow in the computation of
pathname lengths, and to get rid of the name_path API. Both of
these would have resulted in writing over an under-allocated buffer
when formulating pathnames while tree traversal.
* jk/path-name-safety-2.4:
list-objects: pass full pathname to callbacks
list-objects: drop name_path entirely
list-objects: convert name_path to a strbuf
show_object_with_name: simplify by using path_name()
http-push: stop using name_path
tree-diff: catch integer overflow in combine_diff_path allocation
add helpers for detecting size_t overflow
|
|
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.
So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.
This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "struct name_path" data is examined in only two places:
we generate it in process_tree(), and we convert it to a
single string in path_name(). Everyone else just passes it
through to those functions.
We can further note that process_tree() already keeps a
single strbuf with the leading tree path, for use with
tree_entry_interesting().
Instead of building a separate name_path linked list, let's
just use the one we already build in "base". This reduces
the amount of code (especially tricky code in path_name()
which did not check for integer overflows caused by deep
or large pathnames).
It is also more efficient in some instances. Any time we
were using tree_entry_interesting, we were building up the
strbuf anyway, so this is an immediate and obvious win
there. In cases where we were not, we trade off storing
"pathname/" in a strbuf on the heap for each level of the
path, instead of two pointers and an int on the stack (with
one pointer into the tree object). On a 64-bit system, the
latter is 20 bytes; so if path components are less than that
on average, this has lower peak memory usage. In practice
it probably doesn't matter either way; we are already
holding in memory all of the tree objects leading up to each
pathname, and for normal-depth pathnames, we are only
talking about hundreds of bytes.
This patch leaves "struct name_path" as a thin wrapper
around the strbuf, to avoid disrupting callbacks. We should
fix them, but leaving it out makes this diff easier to view.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When "git rev-list" shows an object with its associated path
name, it does so by walking the name_path linked list and
printing each component (stopping at any embedded NULs or
newlines).
We'd like to eventually get rid of name_path entirely in
favor of a single buffer, and dropping this custom printing
code is part of that. As a first step, let's use path_name()
to format the list into a single buffer, and print that.
This is strictly less efficient than the original, but it's
a temporary step in the refactoring; our end game will be to
get the fully formatted name in the first place.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The graph traversal code here passes along a name_path to
build up the pathname at which we find each blob. But we
never actually do anything with the resulting names, making
it a waste of code and memory.
This usage came in aa1dbc9 (Update http-push functionality,
2006-03-07), and originally the result was passed to
"add_object" (which stored it, but didn't really use it,
either). But we stopped using that function in 1f1e895 (Add
"named object array" concept, 2006-06-19) in favor of
storing just the objects themselves.
Moreover, the generation of the name in process_tree() is
buggy. It sticks "name" onto the end of the name_path linked
list, and then passes it down again as it recurses (instead
of "entry.path"). So it's a good thing this was unused, as
the resulting path for "a/b/c/d" would end up as "a/a/a/a".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A combine_diff_path struct has two "flex" members allocated
alongside the struct: a string to hold the pathname, and an
array of parent pointers. We use an "int" to compute this,
meaning we may easily overflow it if the pathname is
extremely long.
We can fix this by using size_t, and checking for overflow
with the st_add helper.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Performing computations on size_t variables that we feed to
xmalloc and friends can be dangerous, as an integer overflow
can cause us to allocate a much smaller chunk than we
realized.
We already have unsigned_add_overflows(), but let's add
unsigned_mult_overflows() to that. Furthermore, rather than
have each site manually check and die on overflow, we can
provide some helpers that will:
- promote the arguments to size_t, so that we know we are
doing our computation in the same size of integer that
will ultimately be fed to xmalloc
- check and die on overflow
- return the result so that computations can be done in
the parameter list of xmalloc.
These functions are a lot uglier to use than normal
arithmetic operators (you have to do "st_add(foo, bar)"
instead of "foo + bar"). To at least limit the damage, we
also provide multi-valued versions. So rather than:
st_add(st_add(a, b), st_add(c, d));
you can write:
st_add4(a, b, c, d);
This isn't nearly as elegant as a varargs function, but it's
a lot harder to get it wrong. You don't have to remember to
add a sentinel value at the end, and the compiler will
complain if you get the number of arguments wrong. This
patch adds only the numbered variants required to convert
the current code base; we can easily add more later if
needed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
History traversal with "git log --source" that starts with an
annotated tag failed to report the tag as "source", due to an
old regression in the command line parser back in v2.2 days.
* jk/pending-keep-tag-name:
revision.c: propagate tag names from pending array
|
|
"git symbolic-ref" forgot to report a failure with its exit status.
* jk/symbolic-ref-maint:
t1401: test reflog creation for git-symbolic-ref
symbolic-ref: propagate error code from create_symref()
|
|
When getpwuid() on the system returned NULL (e.g. the user is not
in the /etc/passwd file or other uid-to-name mappings), the
codepath to find who the user is to record it in the reflog barfed
and died. Loosen the check in this codepath, which already accepts
questionable ident string (e.g. host part of the e-mail address is
obviously bogus), and in general when we operate fmt_ident() function
in non-strict mode.
* jk/ident-loosen-getpwuid:
ident: loosen getpwuid error in non-strict mode
ident: keep a flag for bogus default_email
ident: make xgetpwuid_self() a static local helper
|
|
Improve error reporting when SMTP TLS fails.
* jk/send-email-ssl-errors:
send-email: enable SSL level 1 debug output
|
|
The completion script (in contrib/) used to list "git column"
(which is not an end-user facing command) as one of the choices
* sg/completion-no-column:
completion: remove 'git column' from porcelain commands
|
|
The current code writes a reflog entry whenever we update a
symbolic ref, but we never test that this is so. Let's add a
test to make sure upcoming refactoring doesn't cause a
regression.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If create_symref() fails, git-symbolic-ref will still exit
with code 0, and our caller has no idea that the command did
nothing.
This appears to have been broken since the beginning of time
(e.g., it is not a regression where create_symref() stopped
calling die() or something similar).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we unwrap a tag to find its commit for a traversal, we
do not propagate the "name" field of the tag in the pending
array (i.e., the ref name the user gave us in the first
place) to the commit (instead, we use an empty string). This
means that "git log --source" will never show the tag-name
for commits we reach through it.
This was broken in 2073949 (traverse_commit_list: support
pending blobs/trees with paths, 2014-10-15). That commit
tried to be careful and avoid propagating the path
information for a tag (which would be nonsensical) to trees
and blobs. But it should not have cut off the "name" field,
which should carry forward to children.
Note that this does mean that the "name" field will carry
forward to blobs and trees, too. Whereas prior to 2073949,
we always gave them an empty string. This is the right thing
to do, but in practice no callers probably use it (since now
we have an explicit separate "path" field, which was the
point of 2073949).
We add tests here not only for the broken case, but also a
basic sanity test of "log --source" in general, which did
not have any coverage in the test suite.
Reported-by: Raymundo <gypark@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* sg/lock-file-commit-error:
credential-store: don't pass strerror to die_errno()
|
|
Signed-off-by: SZEDER Gábor <szeder@ira.uka.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The exit code of git-fsck didnot reflect some types of errors found
in packed objects, which has been corrected.
* dt/fsck-verify-pack-error:
verify_pack: do not ignore return value of verification function
|
|
A fix-up for recent topic.
* ep/ident-with-getaddrinfo:
ident: fix undefined variable when NO_IPV6 is set
ident.c: add support for IPv6
|
|
"git p4" used to import Perforce CLs that touch only paths outside
the client spec as empty commits. It has been corrected to ignore
them instead, with a new configuration git-p4.keepEmptyCommits as a
backward compatibility knob.
* ls/p4-keep-empty-commits:
git-p4: add option to keep empty commits
|
|
The helper used to iterate over loose object directories to prune
stale objects did not closedir() immediately when it is done with a
directory--a callback such as the one used for "git prune" may want
to do rmdir(), but it would fail on open directory on platforms
such as WinXP.
* jk/prune-mtime:
prune: close directory earlier during loose-object directory traversal
|
|
Commit 00bce77 (ident.c: add support for IPv6, 2015-11-27)
moved the "gethostbyname" call out of "add_domainname" and
into the helper function "canonical_name". But when moving
the code, it forgot that the "buf" variable is passed as
"host" in the helper.
Reported-by: johan defries <johandefries@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If the user has not specified an identity and we have to
turn to getpwuid() to find the username or gecos field, we
die immediately when getpwuid fails (e.g., because the user
does not exist). This is OK for making a commit, where we
have set IDENT_STRICT and would want to bail on bogus input.
But for something like a reflog, where the ident is "best
effort", it can be pain. For instance, even running "git
clone" with a UID that is not in /etc/passwd will result in
git barfing, just because we can't find an ident to put in
the reflog.
Instead of dying in xgetpwuid_self, we can instead return a
fallback value, and set a "bogus" flag. For the username in
an email, we already have a "default_email_is_bogus" flag.
For the name field, we introduce (and check) a matching
"default_name_is_bogus" flag. As a bonus, this means you now
get the usual "tell me who you are" advice instead of just a
"no such user" error.
No tests, as this is dependent on configuration outside of
git's control. However, I did confirm that it behaves
sensibly when I delete myself from the local /etc/passwd
(reflogs get written, and commits complain).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
'git column' is an internal helper, so it should not be offered on
'git <TAB>' along with porcelain commands.
Signed-off-by: SZEDER Gábor <szeder@ira.uka.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This back-merges hopefully the last batch of trivially correct fixes
to the 2.6.x maintenance track from the master branch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
mark_tree_uninteresting() has code to handle the case where it gets
passed a NULL pointer in its 'tree' parameter, but the function had
'object = &tree->object' assignment before checking if tree is
NULL. This gives a compiler an excuse to declare that tree will
never be NULL and apply a wrong optimization. Avoid it.
* sn/null-pointer-arith-in-mark-tree-uninteresting:
revision.c: fix possible null pointer arithmetic
|
|
Cosmetic improvement to lock-file error messages.
* sg/lock-file-commit-error:
Make error message after failing commit_lock_file() less confusing
|
|
* cb/t3404-shellquote:
t3404: fix quoting of redirect for some versions of bash
|
|
* sb/doc-submodule-sync-recursive:
document submodule sync --recursive
|
|
* nd/doc-check-ref-format-typo:
git-check-ref-format.txt: typo, s/avoids/avoid/
|
|
Code simplification.
* rs/show-branch-argv-array:
show-branch: use argv_array for default arguments
|
|
Code simplification.
* rs/pop-commit:
use pop_commit() for consuming the first entry of a struct commit_list
|
|
Update "git subtree" (in contrib/) so that it can take whitespaces
in the pathnames, not only in the in-tree pathname but the name of
the directory that the repository is in.
* as/subtree-with-spaces:
contrib/subtree: respect spaces in a repository path
t7900-subtree: test the "space in a subdirectory name" case
|
|
Because "test_when_finished" in our test framework queues the
clean-up tasks to be done in a shell variable, it should not be
used inside a subshell. Add a mechanism to allow 'bash' to catch
such uses, and fix the ones that were found.
* jk/test-lint-forbid-when-finished-in-subshell:
test-lib-functions: detect test_when_finished in subshell
t7800: don't use test_config in a subshell
test-lib-functions: support "test_config -C <dir> ..."
t5801: don't use test_when_finished in a subshell
t7610: don't use test_config in a subshell
|
|
If a server's certificate isn't accepted by send-email, the output is:
Unable to initialize SMTP properly. Check config and use --smtp-debug.
but adding --smtp-debug=1 just produces the same output since we don't
get as far as talking SMTP.
Turning on SSL debug at level 1 gives:
DEBUG: .../IO/Socket/SSL.pm:1796: SSL connect attempt failed error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed
DEBUG: .../IO/Socket/SSL.pm:673: fatal SSL error: SSL connect attempt failed error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed
DEBUG: .../IO/Socket/SSL.pm:1780: IO::Socket::IP configuration failed
IO::Socket::SSL defines level 1 debug as "print out errors from
IO::Socket::SSL and ciphers from Net::SSLeay". In fact, it aliases
Net::SSLeay::trace which is defined to guarantee silence at level 0 and
only emit error messages at level 1, so let's enable it by default.
The modification of warnings is needed to avoid a warning about:
Name "IO::Socket::SSL::DEBUG" used only once: possible typo
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If we have to deduce the user's email address and can't come
up with something plausible for the hostname, we simply
write "(none)" or ".(none)" in the hostname.
Later, our strict-check is forced to use strstr to look for
this magic string. This is probably not a problem in
practice, but it's rather ugly. Let's keep an extra flag
that tells us the email is bogus, and check that instead.
We could get away with simply setting the global in
add_domainname(); it only gets called to write into
git_default_email. However, let's make the code a little
more obvious to future readers by actually passing a pointer
to our "bogus" flag down the call-chain.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This function is defined in wrapper.c, but nobody besides
ident.c uses it. And nobody is likely to in the future,
either, as anything that cares about the user's name should
be going through the ident code.
Moving it here is a cleanup of the global namespace, but it
will also enable further cleanups inside ident.c.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Split index related options should appear in the 'SYNOPSIS'
section.
These options are already documented in the 'OPTIONS' section.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A changelist that contains only excluded files due to a client spec was
imported as an empty commit. Fix that issue by ignoring these commits.
Add option "git-p4.keepEmptyCommits" to make the previous behavior
available.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Helped-by: Pete Harlan
Acked-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|