Age | Commit message (Collapse) | Author | Files | Lines |
|
Code clean-up.
* jk/pack-name-cleanups:
index-pack: make pointer-alias fallbacks safer
replace snprintf with odb_pack_name()
odb_pack_keep(): stop generating keepfile name
sha1_file.c: make pack-name helper globally accessible
move odb_* declarations out of git-compat-util.h
|
|
"git branch @" created refs/heads/@ as a branch, and in general the
code that handled @{-1} and @{upstream} was a bit too loose in
disambiguating.
* jk/interpret-branch-name:
checkout: restrict @-expansions when finding branch
strbuf_check_ref_format(): expand only local branches
branch: restrict @-expansions when deleting
t3204: test git-branch @-expansion corner cases
interpret_branch_name: allow callers to restrict expansions
strbuf_branchname: add docstring
strbuf_branchname: drop return value
interpret_branch_name: move docstring to header file
interpret_branch_name(): handle auto-namelen for @{-1}
|
|
The "parse_config_key()" API function has been cleaned up.
* jk/parse-config-key-cleanup:
parse_hide_refs_config: tell parse_config_key we don't want a subsection
parse_config_key: allow matching single-level config
parse_config_key: use skip_prefix instead of starts_with
refs: parse_hide_refs_config to use parse_config_key
|
|
Code cleanup.
* rj/remove-unused-mktemp:
wrapper.c: remove unused gitmkstemps() function
wrapper.c: remove unused git_mkstemp() function
|
|
The odb_pack_keep() function generates the name of a .keep
file and opens it. This has two problems:
1. It requires a fixed-size buffer to create the filename
and doesn't notice when the result is truncated.
2. Of the two callers, one sometimes wants to open a
filename it already has, which makes things awkward (it
has to do so manually, and skips the leading-directory
creation).
Instead, let's have odb_pack_keep() just open the file.
Generating the name isn't hard, and a future patch will
switch callers over to odb_pack_name() anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We provide sha1_pack_name() and sha1_pack_index_name(), but
the more generic form (which takes its own strbuf and an
arbitrary extension) is only used to implement the other
two. Let's make it available, but clean up a few things:
1. Name it odb_pack_name(), as the original
sha1_get_pack_name() is long but not all that
descriptive.
2. Switch the strbuf argument to the beginning, so that it
matches similar path-building functions like
git_path_buf().
3. Clean up the out-dated docstring and move it to the
public declaration.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These functions were originally conceived as wrapper
functions similar to xmkstemp(). They were later moved by
463db9b10 (wrapper: move odb_* to environment.c,
2010-11-06). The more appropriate place for a declaration is
in cache.h.
While we're at it, let's add some basic docstrings.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 4ac9006f832 (real_path: have callers use real_pathdup and
strbuf_realpath, 2016-12-12), we changed the xstrdup(real_path())
pattern to use real_pathdup() directly.
The problem with this change is that real_path() calls
strbuf_realpath() with die_on_error = 1 while real_pathdup() calls
it with die_on_error = 0. Meaning that in cases where real_path()
causes Git to die() with an error message, real_pathdup() is silent
and returns NULL instead.
The callers, however, are ill-prepared for that change, as they expect
the return value to be non-NULL (and otherwise the function died
with an appropriate error message).
Fix this by extending real_pathdup()'s signature to accept the
die_on_error flag and simply pass it through to strbuf_realpath(),
and then adjust all callers after a careful audit whether they would
handle NULLs well.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The interpret_branch_name() function converts names like
@{-1} and @{upstream} into branch names. The expanded ref
names are not fully qualified, and may be outside of the
refs/heads/ namespace (e.g., "@" expands to "HEAD", and
"@{upstream}" is likely to be in "refs/remotes/").
This is OK for callers like dwim_ref() which are primarily
interested in resolving the resulting name, no matter where
it is. But callers like "git branch" treat the result as a
branch name in refs/heads/. When we expand to a ref outside
that namespace, the results are very confusing (e.g., "git
branch @" tries to create refs/heads/HEAD, which is
nonsense).
Callers can't know from the returned string how the
expansion happened (e.g., did the user really ask for a
branch named "HEAD", or did we do a bogus expansion?). One
fix would be to return some out-parameters describing the
types of expansion that occurred. This has the benefit that
the caller can generate precise error messages ("I
understood @{upstream} to mean origin/master, but that is a
remote tracking branch, so you cannot create it as a local
name").
However, out-parameters make the function interface somewhat
cumbersome. Instead, let's do the opposite: let the caller
tell us which elements to expand. That's easier to pass in,
and none of the callers give more precise error messages
than "@{upstream} isn't a valid branch name" anyway (which
should be sufficient).
The strbuf_branchname() function needs a similar parameter,
as most of the callers access interpret_branch_name()
through it.
We can break the callers down into two groups:
1. Callers that are happy with any kind of ref in the
result. We pass "0" here, so they continue to work
without restrictions. This includes merge_name(),
the reflog handling in add_pending_object_with_path(),
and substitute_branch_name(). This last is what powers
dwim_ref().
2. Callers that have funny corner cases (mostly in
git-branch and git-checkout). These need to make use of
the new parameter, but I've left them as "0" in this
patch, and will address them individually in follow-on
patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We generally put docstrings with function declarations,
because it's the callers who need to know how the function
works. Let's do so for interpret_branch_name().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The last caller of git_mkstemp() was removed in commit 6fec0a89
("verify_signed_buffer: use tempfile object", 16-06-2016). Since
the introduction of the 'tempfile' APIs, along with git_mkstemp_mode,
it is unlikely that new callers will materialize. Remove the dead
code.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The parse_config_key() function was introduced to make it
easier to match "section.subsection.key" variables. It also
handles the simpler "section.key", and the caller is
responsible for distinguishing the two from its
out-parameters.
Most callers who _only_ want "section.key" would just use a
strcmp(var, "section.key"), since there is no parsing
required. However, they may still use parse_config_key() if
their "section" variable isn't a constant (an example of
this is in parse_hide_refs_config).
Using the parse_config_key is a bit clunky, though:
const char *subsection;
int subsection_len;
const char *key;
if (!parse_config_key(var, section, &subsection, &subsection_len, &key) &&
!subsection) {
/* matched! */
}
Instead, let's treat a NULL subsection as an indication that
the caller does not expect one. That lets us write:
const char *key;
if (!parse_config_key(var, section, NULL, NULL, &key)) {
/* matched! */
}
Existing callers should be unaffected, as passing a NULL
subsection would currently segfault.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "core.logAllRefUpdates" that used to be boolean has been
enhanced to take 'always' as well, to record ref updates to refs
other than the ones that are expected to be updated (i.e. branches,
remote-tracking branches and notes).
* cw/log-updates-for-all-refs-really:
doc: add note about ignoring '--no-create-reflog'
update-ref: add test cases for bare repository
refs: add option core.logAllRefUpdates = always
config: add markup to core.logAllRefUpdates doc
|
|
When a submodule "A", which has another submodule "B" nested within
it, is "absorbed" into the top-level superproject, the inner
submodule "B" used to be left in a strange state. The logic to
adjust the .git pointers in these submodules has been corrected.
* sb/submodule-recursive-absorb:
submodule absorbing: fix worktree/gitdir pointers recursively for non-moves
cache.h: expose the dying procedure for reading gitlinks
setup: add gentle version of resolve_git_dir
|
|
Code cleanup.
* rs/absolute-pathdup:
use absolute_pathdup()
abspath: add absolute_pathdup()
|
|
Documentation and in-code comments updates.
* sb/in-core-index-doc:
documentation: retire unfinished documentation
cache.h: document add_[file_]to_index
cache.h: document remove_index_entry_at
cache.h: document index_name_pos
|
|
Documentation and in-code comments updates.
* sb/in-core-index-doc:
documentation: retire unfinished documentation
cache.h: document add_[file_]to_index
cache.h: document remove_index_entry_at
cache.h: document index_name_pos
|
|
"git fsck" inspects loose objects more carefully now.
* jk/loose-object-fsck:
fsck: detect trailing garbage in all object types
fsck: parse loose object paths directly
sha1_file: add read_loose_object() function
t1450: test fsck of packed objects
sha1_file: fix error message for alternate objects
t1450: refactor loose-object removal
|
|
When core.logallrefupdates is true, we only create a new reflog for refs
that are under certain well-known hierarchies. The reason is that we
know that some hierarchies (like refs/tags) are not meant to change, and
that unknown hierarchies might not want reflogs at all (e.g., a
hypothetical refs/foo might be meant to change often and drop old
history immediately).
However, sometimes it is useful to override this decision and simply log
for all refs, because the safety and audit trail is more important than
the performance implications of keeping the log around.
This patch introduces a new "always" mode for the core.logallrefupdates
option which will log updates to everything under refs/, regardless
where in the hierarchy it is (we still will not log things like
ORIG_HEAD and FETCH_HEAD, which are known to be transient).
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a function that returns a buffer containing the absolute path of its
argument and a semantic patch for its intended use. It avoids an extra
string copy to a static buffer.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a later patch we want to react to only a subset of errors, defaulting
the rest to die as usual. Separate the block that takes care of dying
into its own function so we have easy access to it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This follows a93bedada (setup: add gentle version of read_gitfile,
2015-06-09), and assumes the same reasoning. resolve_git_dir is unsuited
for speculative calls, so we want to use the gentle version to find out
about potential errors.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* bw/read-blob-data-does-not-modify-index-state:
index: improve constness for reading blob data
|
|
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Do this by moving the existing documentation from
read-cache.c to cache.h.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up in the pathspec API.
* bw/pathspec-cleanup:
pathspec: rename prefix_pathspec to init_pathspec_item
pathspec: small readability changes
pathspec: create strip submodule slash helpers
pathspec: create parse_element_magic helper
pathspec: create parse_long_magic function
pathspec: create parse_short_magic function
pathspec: factor global magic into its own function
pathspec: simpler logic to prefix original pathspec elements
pathspec: always show mnemonic and name in unsupported_magic
pathspec: remove unused variable from unsupported_magic
pathspec: copy and free owned memory
pathspec: remove the deprecated get_pathspec function
ls-tree: convert show_recursive to use the pathspec struct interface
dir: convert fill_directory to use the pathspec struct interface
dir: remove struct path_simplify
mv: remove use of deprecated 'get_pathspec()'
|
|
"git grep" has been taught to optionally recurse into submodules.
* bw/grep-recurse-submodules:
grep: search history of moved submodules
grep: enable recurse-submodules to work on <tree> objects
grep: optionally recurse into submodules
grep: add submodules as a grep source type
submodules: load gitmodules file from commit sha1
submodules: add helper to determine if a submodule is initialized
submodules: add helper to determine if a submodule is populated
real_path: canonicalize directory separators in root parts
real_path: have callers use real_pathdup and strbuf_realpath
real_path: create real_pathdup
real_path: convert real_path_internal to strbuf_realpath
real_path: resolve symlinks by hand
|
|
It's surprisingly hard to ask the sha1_file code to open a
_specific_ incarnation of a loose object. Most of the
functions take a sha1, and loop over the various object
types (packed versus loose) and locations (local versus
alternates) at a low level.
However, some tools like fsck need to look at a specific
file. This patch gives them a function they can use to open
the loose object at a given path.
The implementation unfortunately ends up repeating bits of
related functions, but there's not a good way around it
without some major refactoring of the whole sha1_file stack.
We need to mmap the specific file, then partially read the
zlib stream to know whether we're streaming or not, and then
finally either stream it or copy the data to a buffer.
We can do that by assembling some of the more arcane
internal sha1_file functions, but we end up having to
essentially reimplement unpack_sha1_file(), along with the
streaming bits of check_sha1_signature().
Still, most of the ugliness is contained in the new
function, and the interface is clean enough that it may be
reusable (though it seems unlikely anything but git-fsck
would care about opening a specific file).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Improve constness of the index_state parameter to the
'read_blob_data_from_index' function.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The codeflow of setting NOATIME and CLOEXEC on file descriptors Git
opens has been simplified.
We may want to drop the tip one, but we'll see.
* jc/git-open-cloexec:
sha1_file: stop opening files with O_NOATIME
git_open_cloexec(): use fcntl(2) w/ FD_CLOEXEC fallback
git_open(): untangle possible NOATIME and CLOEXEC interactions
|
|
Now that all callers of the old 'get_pathspec' interface have been
migrated to use the new pathspec struct interface it can be removed
from the codebase.
Since there are no more users of the '_raw' field in the pathspec struct
it can also be removed. This patch also removes the old functionality
of modifying the const char **argv array that was passed into
parse_pathspec. Instead the constructed 'match' string (which is a
pathspec element with the prefix prepended) is only stored in its
corresponding pathspec_item entry.
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
teach submodules to load a '.gitmodules' file from a commit sha1. This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Create real_pathdup which returns a caller owned string of the resolved
realpath based on the provide path.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the name of real_path_internal to strbuf_realpath. In addition
push the static strbuf up to its callers and instead take as a
parameter a pointer to a strbuf to use for the final result.
This change makes strbuf_realpath reentrant.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are three codepaths that use a variable whose name is
pack_compression_level to affect how objects and deltas sent to a
packfile is compressed. Unlike zlib_compression_level that controls
the loose object compression, however, this variable was static to
each of these codepaths. Two of them read the pack.compression
configuration variable, using core.compression as the default, and
one of them also allowed overriding it from the command line.
The other codepath in bulk-checkin did not pay any attention to the
configuration.
Unify the configuration parsing to git_default_config(), where we
implement the parsing of core.loosecompression and core.compression
and make the former override the latter, by moving code to parse
pack.compression and also allow core.compression to give default to
this variable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we open object files, we try to do so with O_NOATIME.
This dates back to 144bde78e9 (Use O_NOATIME when opening
the sha1 files., 2005-04-23), which is an optimization to
avoid creating a bunch of dirty inodes when we're accessing
many objects. But a few things have changed since then:
1. In June 2005, git learned about packfiles, which means
we would do a lot fewer atime updates (rather than one
per object access, we'd generally get one per packfile).
2. In late 2006, Linux learned about "relatime", which is
generally the default on modern installs. So
performance around atimes updates is a non-issue there
these days.
All the world isn't Linux, but as it turns out, Linux
is the only platform to implement O_NOATIME in the
first place.
So it's very unlikely that this code is helping anybody
these days.
Helped-by: Jeff King <peff@peff.net>
[jc: took idea and log message from peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git generally does not explicitly close file descriptors that were
open in the parent process when spawning a child process, but most
of the time the child does not want to access them. As Windows does
not allow removing or renaming a file that has a file descriptor
open, a slow-to-exit child can even break the parent process by
holding onto them. Use O_CLOEXEC flag to open files in various
codepaths.
* ls/git-open-cloexec:
read-cache: make sure file handles are not inherited by child processes
sha1_file: open window into packfiles with O_CLOEXEC
sha1_file: rename git_open_noatime() to git_open()
|
|
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.
* jk/fetch-quick-tag-following:
fetch: use "quick" has_sha1_file for tag following
|
|
The way we structured the fallback/retry mechanism for opening with
O_NOATIME and O_CLOEXEC meant that if we failed due to lack of
support to open the file with O_NOATIME option (i.e. EINVAL), we
would still try to drop O_CLOEXEC first and retry, and then drop
O_NOATIME. A platform on which O_NOATIME is defined in the header
without support from the kernel wouldn't have a chance to open with
O_CLOEXEC option due to this code structure.
Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter
is mostly about performance, while the former can affect correctness.
Instead use O_CLOEXEC to open the file, and then use fcntl(2) to set
O_NOATIME on the resulting file descriptor. open(2) itself does not
cause atime to be updated according to Linus [*1*].
The helper to do the former can be usable in the codepath in
ce_compare_data() that was recently added to open a file descriptor
with O_CLOEXEC; use it while we are at it.
*1* <CA+55aFw83E+zOd+z5h-CA-3NhrLjVr-anL6pubrSWttYx3zu8g@mail.gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update "git diff --no-index" codepath not to try to peek into .git/
directory that happens to be under the current directory, when we
know we are operating outside any repository.
* jk/no-looking-at-dotgit-outside-repo:
diff: handle sha1 abbreviations outside of repository
diff_aligned_abbrev: use "struct oid"
diff_unique_abbrev: rename to diff_aligned_abbrev
find_unique_abbrev: use 4-buffer ring
test-*-cache-tree: setup git dir
read info/{attributes,exclude} only when in repository
|
|
Updates the way approximate count of total objects is computed
while attempting to come up with a unique abbreviated object name,
which in turn needs to estimate how many hexdigits are necessary to
ensure uniqueness.
* jk/abbrev-auto:
find_unique_abbrev: move logic out of get_short_sha1()
|
|
Allow the default abbreviation length, which has historically been
7, to scale as the repository grows. The logic suggests to use 12
hexdigits for the Linux kernel, and 9 to 10 for Git itself.
* lt/abbrev-auto:
abbrev: auto size the default abbreviation
abbrev: prepare for new world order
abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
|
|
Some code paths want to format multiple abbreviated sha1s in
the same output line. Because we use a single static buffer
for our return value, they have to either break their output
into several calls or allocate their own arrays and use
find_unique_abbrev_r().
Intead, let's mimic sha1_to_hex() and use a ring of several
buffers, so that the return value stays valid through
multiple calls. This shortens some of the callers, and makes
it harder to for them to make a silly mistake.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.
* jk/fetch-quick-tag-following:
fetch: use "quick" has_sha1_file for tag following
|
|
"git ls-files" learned "--recurse-submodules" option that can be
used to get a listing of tracked files across submodules (i.e. this
only works with "--cached" option, not for listing untracked or
ignored files). This would be a useful tool to sit on the upstream
side of a pipe that is read with xargs to work on all working tree
files from the top-level superproject.
* bw/ls-files-recurse-submodules:
ls-files: add pathspec matching for submodules
ls-files: pass through safe options for --recurse-submodules
ls-files: optionally recurse into submodules
git: make super-prefix option
|
|
This function is meant to be used when reading from files in the
object store, and the original objective was to avoid smudging atime
of loose object files too often, hence its name. Because we'll be
extending its role in the next commit to also arrange the file
descriptors they return auto-closed in the child processes, rename
it to lose "noatime" part that is too specific.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Codepaths involved in interacting alternate object store have
been cleaned up.
* jk/alt-odb-cleanup:
alternates: use fspathcmp to detect duplicates
sha1_file: always allow relative paths to alternates
count-objects: report alternates via verbose mode
fill_sha1_file: write into a strbuf
alternates: store scratch buffer as strbuf
fill_sha1_file: write "boring" characters
alternates: use a separate scratch space
alternates: encapsulate alt->base munging
alternates: provide helper for allocating alternate
alternates: provide helper for adding to alternates list
link_alt_odb_entry: refactor string handling
link_alt_odb_entry: handle normalize_path errors
t5613: clarify "too deep" recursion tests
t5613: do not chdir in main process
t5613: whitespace/style cleanups
t5613: use test_must_fail
t5613: drop test_valid_repo function
t5613: drop reachable_via function
|
|
In order for the receiving end of "git push" to inspect the
received history and decide to reject the push, the objects sent
from the sending end need to be made available to the hook and
the mechanism for the connectivity check, and this was done
traditionally by storing the objects in the receiving repository
and letting "git gc" to expire it. Instead, store the newly
received objects in a temporary area, and make them available by
reusing the alternate object store mechanism to them only while we
decide if we accept the check, and once we decide, either migrate
them to the repository or purge them immediately.
* jk/quarantine-received-objects:
tmp-objdir: do not migrate files starting with '.'
tmp-objdir: put quarantine information in the environment
receive-pack: quarantine objects until pre-receive accepts
tmp-objdir: introduce API for temporary object directories
check_connected: accept an env argument
|
|
When we auto-follow tags in a fetch, we look at all of the
tags advertised by the remote and fetch ones where we don't
already have the tag, but we do have the object it peels to.
This involves a lot of calls to has_sha1_file(), some of
which we can reasonably expect to fail. Since 45e8a74
(has_sha1_file: re-check pack directory before giving up,
2013-08-30), this may cause many calls to
reprepare_packed_git(), which is potentially expensive.
This has gone unnoticed for several years because it
requires a fairly unique setup to matter:
1. You need to have a lot of packs on the client side to
make reprepare_packed_git() expensive (the most
expensive part is finding duplicates in an unsorted
list, which is currently quadratic).
2. You need a large number of tag refs on the server side
that are candidates for auto-following (i.e., that the
client doesn't have). Each one triggers a re-read of
the pack directory.
3. Under normal circumstances, the client would
auto-follow those tags and after one large fetch, (2)
would no longer be true. But if those tags point to
history which is disconnected from what the client
otherwise fetches, then it will never auto-follow, and
those candidates will impact it on every fetch.
So when all three are true, each fetch pays an extra
O(nr_tags * nr_packs^2) cost, mostly in string comparisons
on the pack names. This was exacerbated by 47bf4b0
(prepare_packed_git_one: refactor duplicate-pack check,
2014-06-30) which uses a slightly more expensive string
check, under the assumption that the duplicate check doesn't
happen very often (and it shouldn't; the real problem here
is how often we are calling reprepare_packed_git()).
This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
accuracy for speed, in cases where we might be racy with a
simultaneous repack. This is similar to the fix in 0eeb077
(index-pack: avoid excessive re-reading of pack directory,
2015-06-09). As with that case, it's OK for has_sha1_file()
occasionally say "no I don't have it" when we do, because
the worst case is not a corruption, but simply that we may
fail to auto-follow a tag that points to it.
Here are results from the included perf script, which sets
up a situation similar to the one described above:
Test HEAD^ HEAD
----------------------------------------------------------
5550.4: fetch 11.21(10.42+0.78) 0.08(0.04+0.02) -99.3%
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|