Age | Commit message (Collapse) | Author | Files | Lines |
|
If a remote repo has too many tags (or branches), cloning it over the
smart HTTP transport can fail because remote-curl.c puts all the refs
from the remote repo on the fetch-pack command line. This can make the
command line longer than the global OS command line limit, causing
fetch-pack to fail.
This is especially a problem on Windows where the command line limit is
orders of magnitude shorter than Linux. There are already real repos out
there that msysGit cannot clone over smart HTTP due to this problem.
Here is an easy way to trigger this problem:
git init too-many-refs
cd too-many-refs
echo bla > bla.txt
git add .
git commit -m test
sha=$(git rev-parse HEAD)
tag=$(perl -e 'print "bla" x 30')
for i in `seq 50000`; do
echo $sha refs/tags/$tag-$i >> .git/packed-refs
done
Then share this repo over the smart HTTP protocol and try cloning it:
$ git clone http://localhost/.../too-many-refs/.git
Cloning into 'too-many-refs'...
fatal: cannot exec 'fetch-pack': Argument list too long
50k tags is obviously an absurd number, but it is required to
demonstrate the problem on Linux because it has a much more generous
command line limit. On Windows the clone fails with as little as 500
tags in the above loop, which is getting uncomfortably close to the
number of tags you might see in real long lived repos.
This is not just theoretical, msysGit is already failing to clone our
company repo due to this. It's a large repo converted from CVS, nearly
10 years of history.
Four possible solutions were discussed on the Git mailing list (in no
particular order):
1) Call fetch-pack multiple times with smaller batches of refs.
This was dismissed as inefficient and inelegant.
2) Add option --refs-fd=$n to pass a an fd from where to read the refs.
This was rejected because inheriting descriptors other than
stdin/stdout/stderr through exec() is apparently problematic on Windows,
plus it would require changes to the run-command API to open extra
pipes.
3) Add option --refs-from=$tmpfile to pass the refs using a temp file.
This was not favored because of the temp file requirement.
4) Add option --stdin to pass the refs on stdin, one per line.
In the end this option was chosen as the most efficient and most
desirable from scripting perspective.
There was however a small complication when using stdin to pass refs to
fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
communication with the remote server.
If we are going to sneak refs on stdin line by line, it would have to be
done very carefully in the presence of --stateless-rpc, because when
reading refs line by line we might read ahead too much data into our
buffer and eat some of the remote protocol data which is also coming on
stdin.
One way to solve this would be to refactor get_remote_heads() in
fetch-pack.c to accept a residual buffer from our stdin line parsing
above, but this function is used in several places so other callers
would be burdened by this residual buffer interface even when most of
them don't need it.
In the end we settled on the following solution:
If --stdin is specified without --stateless-rpc, fetch-pack would read
the refs from stdin one per line, in a script friendly format.
However if --stdin is specified together with --stateless-rpc,
fetch-pack would read the refs from stdin in packetized format
(pkt-line) with a flush packet terminating the list of refs. This way we
can read the exact number of bytes that we need from stdin, and then
get_remote_heads() can continue reading from the same fd without losing
a single byte of remote protocol data.
This way the --stdin option only loses generality and scriptability when
used together with --stateless-rpc, which is not easily scriptable
anyway because it also uses pkt-line when talking to the remote server.
Signed-off-by: Ivan Todoroski <grnch@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint:
Update draft release notes to 1.7.8.4
Update draft release notes to 1.7.7.6
Update draft release notes to 1.7.6.6
thin-pack: try harder to use preferred base objects as base
|
|
* maint-1.7.7:
Update draft release notes to 1.7.7.6
Update draft release notes to 1.7.6.6
thin-pack: try harder to use preferred base objects as base
|
|
* maint-1.7.6:
Update draft release notes to 1.7.6.6
thin-pack: try harder to use preferred base objects as base
|
|
When creating a pack using objects that reside in existing packs, we try
to avoid recomputing futile delta between an object (trg) and a candidate
for its base object (src) if they are stored in the same packfile, and trg
is not recorded as a delta already. This heuristics makes sense because it
is likely that we tried to express trg as a delta based on src but it did
not produce a good delta when we created the existing pack.
As the pack heuristics prefer producing delta to remove data, and Linus's
law dictates that the size of a file grows over time, we tend to record
the newest version of the file as inflated, and older ones as delta
against it.
When creating a thin-pack to transfer recent history, it is likely that we
will try to send an object that is recorded in full, as it is newer. But
the heuristics to avoid recomputing futile delta effectively forbids us
from attempting to express such an object as a delta based on another
object. Sending an object in full is often more expensive than sending a
suboptimal delta based on other objects, and it is even more so if we
could use an object we know the receiving end already has (i.e. preferred
base object) as the delta base.
Tweak the recomputation avoidance logic, so that we do not punt on
computing delta against a preferred base object.
The effect of this change can be seen on two simulated upload-pack
workloads. The first is based on 44 reflog entries from my git.git
origin/master reflog, and represents the packs that kernel.org sent me git
updates for the past month or two. The second workload represents much
larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to
v1.2.0, and so on.
The table below shows the average generated pack size and the average CPU
time consumed for each dataset, both before and after the patch:
dataset
| reflog | tags
---------------------------------
before | 53358 | 2750977
size after | 32398 | 2668479
change | -39% | -3%
---------------------------------
before | 0.18 | 1.12
CPU after | 0.18 | 1.15
change | +0% | +3%
This patch makes a much bigger difference for packs with a shorter slice
of history (since its effect is seen at the boundaries of the pack) though
it has some benefit even for larger packs.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* mh/ref-api-less-extra-refs:
write_head_info(): handle "extra refs" locally
show_ref(): remove unused "flag" and "cb_data" arguments
receive-pack: move more work into write_head_info()
|
|
* jc/show-sig:
log --show-signature: reword the common two-head merge case
log-tree: show mergetag in log --show-signature output
log-tree.c: small refactor in show_signature()
commit --amend -S: strip existing gpgsig headers
verify_signed_buffer: fix stale comment
gpg-interface: allow use of a custom GPG binary
pretty: %G[?GS] placeholders
test "commit -S" and "log --show-signature"
log: --show-signature
commit: teach --gpg-sign option
Conflicts:
builtin/commit-tree.c
builtin/commit.c
builtin/merge.c
notes-cache.c
pretty.c
|
|
* jh/fetch-head-update:
write first for-merge ref to FETCH_HEAD first
|
|
The old code basically did:
generate array of SHA1s for alternate refs
for each unique SHA1 in array:
add_extra_ref(".have", sha1)
for each ref (including real refs and extra refs):
show_ref(refname, sha1)
But there is no need to stuff the alternate refs in extra_refs; we can
call show_ref() directly when iterating over the array, then handle
real refs separately. So change the code to:
generate array of SHA1s for alternate refs
for each unique SHA1 in array:
show_ref(".have", sha1)
for each ref (this now only includes real refs):
show_ref(refname, sha1)
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function is not used as a callback, so it doesn't need these
arguments. Also change its return type to void.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move some more code from the calling site into write_head_info(), and
inline add_alternate_refs() there. (Some more simplification is
coming, and it is easier if all this code is in the same place.)
Move some helper functions to avoid the need for forward declarations.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Any existing commit signature was made against the contents of the old
commit, including its committer date that is about to change, and will
become invalid by amending it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
They both use the extended headers in commit objects, and the former has
necessary infrastructure to show them that is useful to view the result of
the latter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The FETCH_HEAD refname is supposed to refer to the ref that was fetched
and should be merged. However all fetched refs are written to
.git/FETCH_HEAD in an arbitrary order, and resolve_ref_unsafe simply
takes the first ref as the FETCH_HEAD, which is often the wrong one,
when other branches were also fetched.
The solution is to write the for-merge ref(s) to FETCH_HEAD first.
Then, unless --append is used, the FETCH_HEAD refname behaves as intended.
If the user uses --append, they presumably are doing so in order to
preserve the old FETCH_HEAD.
While we are at it, update an old example in the read-tree documentation
that implied that each entry in FETCH_HEAD only has the object name, which
is not true for quite a while.
[jc: adjusted tests]
Signed-off-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jv/maint-config-set:
Fix an incorrect reference to --set-all.
|
|
* jc/checkout-m-twoway:
t/t2023-checkout-m.sh: fix use of test_must_fail
checkout_merged(): squelch false warning from some gcc
Test 'checkout -m -- path'
checkout -m: no need to insist on having all 3 stages
|
|
* jk/maint-strbuf-missing-init:
commit, merge: initialize static strbuf
|
|
* jn/maint-sequencer-fixes:
revert: stop creating and removing sequencer-old directory
Revert "reset: Make reset remove the sequencer state"
revert: do not remove state until sequence is finished
revert: allow single-pick in the middle of cherry-pick sequence
revert: pass around rev-list args in already-parsed form
revert: allow cherry-pick --continue to commit before resuming
revert: give --continue handling its own function
|
|
* jk/maint-mv:
mv: be quiet about overwriting
mv: improve overwrite warning
mv: make non-directory destination error more clear
mv: honor --verbose flag
docs: mention "-k" for both forms of "git mv"
|
|
* jk/fetch-no-tail-match-refs:
connect.c: drop path_match function
fetch-pack: match refs exactly
t5500: give fully-qualified refs to fetch-pack
drop "match" parameter from get_remote_heads
|
|
* ci/stripspace-docs:
Update documentation for stripspace
|
|
* jn/branch-move-to-self:
Allow checkout -B <current-branch> to update the current branch
branch: allow a no-op "branch -M <current-branch> HEAD"
|
|
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ab/sun-studio-portability:
Appease Sun Studio by renaming "tmpfile"
Fix a bitwise negation assignment issue spotted by Sun Studio
Fix an enum assignment issue spotted by Sun Studio
|
|
* rr/revert-cherry-pick:
t3502, t3510: clarify cherry-pick -m failure
t3510 (cherry-pick-sequencer): use exit status
revert: simplify getting commit subject in format_todo()
revert: tolerate extra spaces, tabs in insn sheet
revert: make commit subjects in insn sheet optional
revert: free msg in format_todo()
|
|
* jk/maint-strbuf-missing-init:
commit, merge: initialize static strbuf
Conflicts:
builtin/merge.c
|
|
* rs/diff-tree-combined-clean-up:
submodule: use diff_tree_combined_merge() instead of diff_tree_combined()
pass struct commit to diff_tree_combined_merge()
use struct sha1_array in diff_tree_combined()
|
|
* tr/grep-threading:
grep: disable threading in non-worktree case
grep: enable threading with -p and -W using lazy attribute lookup
grep: load funcname patterns for -W
|
|
* nd/war-on-nul-in-commit:
commit_tree(): refuse commit messages that contain NULs
Convert commit_tree() to take strbuf as message
merge: abort if fails to commit
Conflicts:
builtin/commit.c
commit.c
commit.h
|
|
|
|
* bc/maint-apply-check-no-patch:
builtin/apply.c: report error on failure to recognize input
t/t4131-apply-fake-ancestor.sh: fix broken test
|
|
It is to give an alternate <name> instead of "origin" to the remote
we are cloning from.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"abbrev" and "commit_format" in struct rev_info get initialized in
init_revisions - no need to reinit in cmd_log_init_defaults.
Signed-off-by: Michael Schubert <mschub@elegosoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ms/commit-cc-option-helpstring:
builtin/commit: add missing '/' in help message
|
|
On Solaris the system headers define the "tmpfile" name, which'll
cause Git compiled with Sun Studio 12 Update 1 to whine about us
redefining the name:
"pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC)
"sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC)
"fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC)
"builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC)
Just renaming the "tmpfile" variable to "tmp_file" in the relevant
places is the easiest way to fix this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In builtin/fast-export.c we'd assign to variables of the
tag_of_filtered_mode enum type with constants defined for the
signed_tag_mode enum.
We'd get the intended value since both the value we were assigning
with and the one we actually wanted had the same positional within
their respective enums, but doing it this way makes no sense.
This issue was spotted by Sun Studio 12 Update 1:
"builtin/fast-export.c", line 54: warning: enum type mismatch: op "=" (E_ENUM_TYPE_MISMATCH_OP)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint:
builtin/init-db.c: eliminate -Wformat warning on Solaris
|
|
On Solaris systems we'd warn about an implicit cast of mode_t when we
printed things out with the %d format. We'd get this warning under GCC
4.6.0 with Solaris headers:
builtin/init-db.c: In function ‘separate_git_dir’:
builtin/init-db.c:354:4: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘mode_t’ [-Wformat]
We've been doing this ever since v1.7.4.1-296-gb57fb80. Just work
around this by adding an explicit cast.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/pull-signed-tag:
commit: do not lose mergetag header when not amending
|
|
The earlier ed7a42a (commit: teach --amend to carry forward extra headers,
2011-11-08) broke "git merge/pull; edit to fix conflict; git commit"
workflow by forgetting that commit_tree_extended() takes the whole extra
header list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/checkout-m-twoway:
checkout_merged(): squelch false warning from some gcc
Test 'checkout -m -- path'
checkout -m: no need to insist on having all 3 stages
|
|
* jk/fetch-no-tail-match-refs:
connect.c: drop path_match function
fetch-pack: match refs exactly
t5500: give fully-qualified refs to fetch-pack
drop "match" parameter from get_remote_heads
|
|
* nd/resolve-ref:
Rename resolve_ref() to resolve_ref_unsafe()
Convert resolve_ref+xstrdup to new resolve_refdup function
revert: convert resolve_ref() to read_ref_full()
|
|
* jn/maint-sequencer-fixes:
revert: stop creating and removing sequencer-old directory
Revert "reset: Make reset remove the sequencer state"
revert: do not remove state until sequence is finished
revert: allow single-pick in the middle of cherry-pick sequence
revert: pass around rev-list args in already-parsed form
revert: allow cherry-pick --continue to commit before resuming
revert: give --continue handling its own function
|
|
* jk/maint-mv:
mv: be quiet about overwriting
mv: improve overwrite warning
mv: make non-directory destination error more clear
mv: honor --verbose flag
docs: mention "-k" for both forms of "git mv"
|
|
* ci/stripspace-docs:
Update documentation for stripspace
|
|
* tr/cache-tree:
reset: update cache-tree data when appropriate
commit: write cache-tree data when writing index anyway
Refactor cache_tree_update idiom from commit
Test the current state of the cache-tree optimization
Add test-scrap-cache-tree
|
|
|
|
Maintaining an array of hashes is easier using sha1_array than
open-coding it. This patch also fixes a leak of the SHA1 array
in diff_tree_combined_merge().
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|