summaryrefslogtreecommitdiff
path: root/fetch-pack.c
AgeCommit message (Collapse)AuthorFilesLines
2014-09-11Merge branch 'rs/child-process-init'Libravatar Junio C Hamano1-2/+1
Code clean-up. * rs/child-process-init: run-command: inline prepare_run_command_v_opt() run-command: call run_command_v_opt_cd_env() instead of duplicating it run-command: introduce child_process_init() run-command: introduce CHILD_PROCESS_INIT
2014-08-20run-command: introduce CHILD_PROCESS_INITLibravatar René Scharfe1-2/+1
Most struct child_process variables are cleared using memset first after declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to initialize them statically instead. That's shorter, doesn't require a function call and is slightly more readable (especially given that we already have STRBUF_INIT, ARGV_ARRAY_INIT etc.). Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07fetchpack.c: replace `git_config()` with `git_config_get_*()` familyLibravatar Tanay Abhra1-27/+8
Use `git_config_get_*()` family instead of `git_config()` to take advantage of the config-set API which provides a cleaner control flow. Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-09Merge branch 'jk/skip-prefix'Libravatar Junio C Hamano1-10/+14
* jk/skip-prefix: http-push: refactor parsing of remote object names imap-send: use skip_prefix instead of using magic numbers use skip_prefix to avoid repeated calculations git: avoid magic number with skip_prefix fetch-pack: refactor parsing in get_ack fast-import: refactor parsing of spaces stat_opt: check extra strlen call daemon: use skip_prefix to avoid magic numbers fast-import: use skip_prefix for parsing input use skip_prefix to avoid repeating strings use skip_prefix to avoid magic numbers transport-helper: avoid reading past end-of-string fast-import: fix read of uninitialized argv memory apply: use skip_prefix instead of raw addition refactor skip_prefix to return a boolean avoid using skip_prefix as a boolean daemon: mark some strings as const parse_diff_color_slot: drop ofs parameter
2014-06-20fetch-pack: refactor parsing in get_ackLibravatar Jeff King1-6/+9
There are several uses of the magic number "line+45" when parsing ACK lines from the server, and it's rather unclear why 45 is the correct number. We can make this more clear by keeping a running pointer as we parse, using skip_prefix to jump past the first "ACK ", then adding 40 to jump past get_sha1_hex (which is still magical, but hopefully 40 is less magical to readers of git code). Note that this actually puts us at line+44. The original required some character between the sha1 and further ACK flags (it is supposed to be a space, but we never enforced that). We start our search for flags at line+44, which meanas we are slightly more liberal than the old code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20use skip_prefix to avoid magic numbersLibravatar Jeff King1-4/+5
It's a common idiom to match a prefix and then skip past it with a magic number, like: if (starts_with(foo, "bar")) foo += 3; This is easy to get wrong, since you have to count the prefix string yourself, and there's no compiler check if the string changes. We can use skip_prefix to avoid the magic numbers here. Note that some of these conversions could be much shorter. For example: if (starts_with(arg, "--foo=")) { bar = arg + 6; continue; } could become: if (skip_prefix(arg, "--foo=", &bar)) continue; However, I have left it as: if (skip_prefix(arg, "--foo=", &v)) { bar = v; continue; } to visually match nearby cases which need to actually process the string. Like: if (skip_prefix(arg, "--foo=", &v)) { bar = atoi(v); continue; } Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-09Use starts_with() for C strings instead of memcmp()Libravatar René Scharfe1-1/+1
Convert three cases of checking for a constant prefix using memcmp() to starts_with(). This way there is no need for magic string length constants and we avoid running over the end of the string should it be shorter than the prefix. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-03Merge branch 'nd/log-show-linear-break'Libravatar Junio C Hamano1-0/+1
Attempts to show where a single-strand-of-pearls break in "git log" output. * nd/log-show-linear-break: log: add --show-linear-break to help see non-linear history object.h: centralize object flag allocation
2014-03-25object.h: centralize object flag allocationLibravatar Nguyễn Thái Ngọc Duy1-0/+1
While the field "flags" is mainly used by the revision walker, it is also used in many other places. Centralize the whole flag allocation to one place for a better overview (and easier to move flags if we have too). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-21Merge branch 'jk/shallow-update-fix'Libravatar Junio C Hamano1-11/+0
Serving objects from a shallow repository needs to write a new file to hold the temporary shallow boundaries but it was not cleaned when we exit due to die() or a signal. * jk/shallow-update-fix: shallow: verify shallow file after taking lock shallow: automatically clean up shallow tempfiles shallow: use stat_validity to check for up-to-date file
2014-02-27shallow: automatically clean up shallow tempfilesLibravatar Jeff King1-11/+0
We sometimes write tempfiles of the form "shallow_XXXXXX" during fetch/push operations with shallow repositories. Under normal circumstances, we clean up the result when we are done. However, we do no take steps to clean up after ourselves when we exit due to die() or signal death. This patch teaches the tempfile creation code to register handlers to clean up after ourselves. To handle this, we change the ownership semantics of the filename returned by setup_temporary_shallow. It now keeps a copy of the filename itself, and returns only a const pointer to it. We can also do away with explicit tempfile removal in the callers. They all exit not long after finishing with the file, so they can rely on the auto-cleanup, simplifying the code. Note that we keep things simple and maintain only a single filename to be cleaned. This is sufficient for the current caller, but we future-proof it with a die("BUG"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-10fetch-pack: fix deepen shallow over smart http with no-done capLibravatar Nguyễn Thái Ngọc Duy1-1/+2
In smart http, upload-pack adds new shallow lines at the beginning of each rpc response. Only shallow lines from the first rpc call are useful. After that they are thrown away. It's designed this way because upload-pack is stateless and has no idea when its shallow lines are helpful or not. So after refs are negotiated with multi_ack_detailed and the server thinks it learned enough, it sends "ACK obj-id ready", terminates the rpc call and waits for the final rpc round. The client sends "done". The server sends another response, which also has shallow lines at the beginning, and the last "ACK obj-id" line. When no-done is active, the last round is cut out, the server sends "ACK obj-id ready" and "ACK obj-id" in the same rpc response. fetch-pack is updated to recognize this and not send "done". However it still tries to consume shallow lines, which are never sent. Update the code, make sure to skip consuming shallow lines when no-done is enabled. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-27Merge branch 'jk/allow-fetch-onelevel-refname'Libravatar Junio C Hamano1-1/+1
"git clone" would fail to clone from a repository that has a ref directly under "refs/", e.g. "refs/stash", because different validation paths do different things on such a refname. Loosen the client side's validation to allow such a ref. * jk/allow-fetch-onelevel-refname: fetch-pack: do not filter out one-level refs
2014-01-17Merge branch 'nd/shallow-clone'Libravatar Junio C Hamano1-12/+119
Fetching from a shallow-cloned repository used to be forbidden, primarily because the codepaths involved were not carefully vetted and we did not bother supporting such usage. This attempts to allow object transfer out of a shallow-cloned repository in a controlled way (i.e. the receiver become a shallow repository with truncated history). * nd/shallow-clone: (31 commits) t5537: fix incorrect expectation in test case 10 shallow: remove unused code send-pack.c: mark a file-local function static git-clone.txt: remove shallow clone limitations prune: clean .git/shallow after pruning objects clone: use git protocol for cloning shallow repo locally send-pack: support pushing from a shallow clone via http receive-pack: support pushing to a shallow clone via http smart-http: support shallow fetch/clone remote-curl: pass ref SHA-1 to fetch-pack as well send-pack: support pushing to a shallow clone receive-pack: allow pushes that update .git/shallow connected.c: add new variant that runs with --shallow-file add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses receive/send-pack: support pushing from a shallow clone receive-pack: reorder some code in unpack() fetch: add --update-shallow to accept refs that update .git/shallow upload-pack: make sure deepening preserves shallow roots fetch: support fetching from a shallow repository clone: support remote shallow repository ...
2014-01-15fetch-pack: do not filter out one-level refsLibravatar Jeff King1-1/+1
Currently fetching a one-level ref like "refs/foo" does not work consistently. The outer "git fetch" program filters the list of refs, checking each against check_refname_format. Then it feeds the result to do_fetch_pack to actually negotiate the haves/wants and get the pack. The fetch-pack code does its own filter, and it behaves differently. The fetch-pack filter looks for refs in "refs/", and then feeds everything _after_ the slash (i.e., just "foo") into check_refname_format. But check_refname_format is not designed to look at a partial refname. It complains that the ref has only one component, thinking it is at the root (i.e., alongside "HEAD"), when in reality we just fed it a partial refname. As a result, we omit a ref like "refs/foo" from the pack request, even though "git fetch" then tries to store the resulting ref. If we happen to get the object anyway (e.g., because the ref is contained in another ref we are fetching), then the fetch succeeds. But if it is a unique object, we fail when trying to update "refs/foo". We can fix this by just passing the whole refname into check_refname_format; we know the part we were omitting is "refs/", which is acceptable in a refname. This at least makes the checks consistent with each other. This problem happens most commonly with "refs/stash", which is the only one-level ref in wide use. However, our test does not use "refs/stash", as we may later want to restrict it specifically (not because it is one-level, but because of the semantics of stashes). We may also want to do away with the multiple levels of filtering (which can cause problems when they are out of sync), or even forbid one-level refs entirely. However, those decisions can come later; this fixes the most immediate problem, which is the mismatch between the two. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06shallow: remove unused codeLibravatar Ramsay Jones1-1/+0
Commit 58babfff ("shallow.c: the 8 steps to select new commits for .git/shallow", 05-12-2013) added a function to implement step 5 of the quoted eight steps, namely 'remove_nonexistent_ours_in_pack()'. This function implements an optional optimization step in the new shallow commit selection algorithm. However, this function has no callers. (The commented out call sites would need to change, in order to provide information required by the function.) Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-17Merge branch 'cc/starts-n-ends-with'Libravatar Junio C Hamano1-6/+6
Remove a few duplicate implementations of prefix/suffix comparison functions, and rename them to starts_with and ends_with. * cc/starts-n-ends-with: replace {pre,suf}fixcmp() with {starts,ends}_with() strbuf: introduce starts_with() and ends_with() builtin/remote: remove postfixcmp() and use suffixcmp() instead environment: normalize use of prefixcmp() by removing " != 0"
2013-12-10fetch: add --update-shallow to accept refs that update .git/shallowLibravatar Nguyễn Thái Ngọc Duy1-0/+27
The same steps are done as in when --update-shallow is not given. The only difference is we now add all shallow commits in "ours" and "theirs" to .git/shallow (aka "step 8"). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10fetch: support fetching from a shallow repositoryLibravatar Nguyễn Thái Ngọc Duy1-2/+30
This patch just put together pieces from the 8 steps patch. We stop at step 7 and reject refs that require new shallow commits. Note that, by rejecting refs that require new shallow commits, we leave dangling objects in the repo, which become "object islands" by the next "git fetch" of the same source. If the first fetch our "ours" set is zero and we do practically nothing at step 7, "ours" is full at the next fetch and we may need to walk through commits for reachability test. Room for improvement. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10clone: support remote shallow repositoryLibravatar Nguyễn Thái Ngọc Duy1-3/+51
Cloning from a shallow repository does not follow the "8 steps for new .git/shallow" because if it does we need to get through step 6 for all refs. That means commit walking down to the bottom. Instead the rule to create .git/shallow is simpler and, more importantly, cheap: if a shallow commit is found in the pack, it's probably used (i.e. reachable from some refs), so we add it. Others are dropped. One may notice this method seems flawed by the word "probably". A shallow commit may not be reachable from any refs at all if it's attached to an object island (a group of objects that are not reachable by any refs). If that object island is not complete, a new fetch request may send more objects to connect it to some ref. At that time, because we incorrectly installed the shallow commit in this island, the user will not see anything after that commit (fsck is still ok). This is not desired. Given that object islands are rare (C Git never sends such islands for security reasons) and do not really harm the repository integrity, a tradeoff is made to surprise the user occasionally but work faster everyday. A new option --strict could be added later that follows exactly the 8 steps. "git prune" can also learn to remove dangling objects _and_ the shallow commits that are attached to them from .git/shallow. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10fetch-pack.c: move shallow update code out of fetch_pack()Libravatar Nguyễn Thái Ngọc Duy1-9/+13
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10shallow.c: extend setup_*_shallow() to accept extra shallow commitsLibravatar Nguyễn Thái Ngọc Duy1-2/+3
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Libravatar Christian Couder1-6/+6
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05Merge branch 'jk/robustify-parse-commit'Libravatar Junio C Hamano1-5/+3
* jk/robustify-parse-commit: checkout: do not die when leaving broken detached HEAD use parse_commit_or_die instead of custom message use parse_commit_or_die instead of segfaulting assume parse_commit checks for NULL commit assume parse_commit checks commit->object.parsed log_tree_diff: die when we fail to parse a commit
2013-10-30Merge branch 'jl/pack-transfer-avoid-double-close'Libravatar Junio C Hamano1-0/+4
The codepath that send_pack() calls pack_objects() mistakenly closed the same file descriptor twice, leading to potentially closing a wrong file descriptor that was opened in the meantime. * jl/pack-transfer-avoid-double-close: Clear fd after closing to avoid double-close error
2013-10-24assume parse_commit checks commit->object.parsedLibravatar Jeff King1-5/+3
The parse_commit function will check the "parsed" flag of the object and do nothing if it is set. There is no need for callers to check the flag themselves, and doing so only clutters the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-23Merge branch 'nd/fetch-into-shallow' into maintLibravatar Junio C Hamano1-52/+1
When there is no sufficient overlap between old and new history during a "git fetch" into a shallow repository, objects that the sending side knows the receiving end has were unnecessarily sent. * nd/fetch-into-shallow: Add testcase for needless objects during a shallow fetch list-objects: mark more commits as edges in mark_edges_uninteresting list-objects: reduce one argument in mark_edges_uninteresting upload-pack: delegate rev walking in shallow fetch to pack-objects shallow: add setup_temporary_shallow() shallow: only add shallow graft points to new shallow file move setup_alternate_shallow and write_shallow_commits to shallow.c
2013-10-23Clear fd after closing to avoid double-close errorLibravatar Jens Lindstrom1-0/+4
In send_pack(), clear the fd passed to pack_objects() by setting it to -1, since pack_objects() closes the fd (via a call to run_command()). Likewise, in get_pack(), clear the fd passed to run_command(). Not doing so risks having git_transport_push(), caller of send_pack(), closing the fd again, possibly incorrectly closing some other open file; or similarly with fetch_refs_from_pack(), indirect caller of get_pack(). Signed-off-by: Jens Lindström <jl@opera.com> Acked-by: Jeff King <peff@peff.net> Acked-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-24Merge branch 'nd/fetch-pack-error-reporting-fix'Libravatar Jonathan Nieder1-5/+5
* nd/fetch-pack-error-reporting-fix: fetch-pack.c: show correct command name that fails
2013-09-20Merge branch 'nd/fetch-into-shallow'Libravatar Junio C Hamano1-52/+1
When there is no sufficient overlap between old and new history during a fetch into a shallow repository, we unnecessarily sent objects the sending side knows the receiving end has. * nd/fetch-into-shallow: Add testcase for needless objects during a shallow fetch list-objects: mark more commits as edges in mark_edges_uninteresting list-objects: reduce one argument in mark_edges_uninteresting upload-pack: delegate rev walking in shallow fetch to pack-objects shallow: add setup_temporary_shallow() shallow: only add shallow graft points to new shallow file move setup_alternate_shallow and write_shallow_commits to shallow.c
2013-09-18fetch-pack.c: show correct command name that failsLibravatar Nguyễn Thái Ngọc Duy1-5/+5
When --shallow-file is added to the command line, it has to be before the subcommand name, the first argument won't be the command name any more. Stop assuming that and keep track of the command name explicitly. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-09Merge branch 'jc/push-cas'Libravatar Junio C Hamano1-0/+1
Allow a safer "rewind of the remote tip" push than blind "--force", by requiring that the overwritten remote ref to be unchanged since the new history to replace it was prepared. The machinery is more or less ready. The "--force" option is again the big red button to override any safety, thanks to J6t's sanity (the original round allowed --lockref to defeat --force). The logic to choose the default implemented here is fragile (e.g. "git fetch" after seeing a failure will update the remote-tracking branch and will make the next "push" pass, defeating the safety pretty easily). It is suitable only for the simplest workflows, and it may hurt users more than it helps them. * jc/push-cas: push: teach --force-with-lease to smart-http transport send-pack: fix parsing of --force-with-lease option t5540/5541: smart-http does not support "--force-with-lease" t5533: test "push --force-with-lease" push --force-with-lease: tie it all together push --force-with-lease: implement logic to populate old_sha1_expect[] remote.c: add command line option parser for "--force-with-lease" builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN cache.h: move remote/connect API out of it
2013-09-05Merge branch 'nd/fetch-pack-shallow-fix' into maintLibravatar Junio C Hamano1-1/+3
The recent "short-cut clone connectivity check" topic broke a shallow repository when a fetch operation tries to auto-follow tags. * nd/fetch-pack-shallow-fix: fetch-pack: do not remove .git/shallow file when --depth is not specified
2013-08-30Merge branch 'nd/fetch-pack-shallow-fix'Libravatar Junio C Hamano1-1/+3
The recent "short-cut clone connectivity check" topic broke a shallow repository when a fetch operation tries to auto-follow tags. * nd/fetch-pack-shallow-fix: fetch-pack: do not remove .git/shallow file when --depth is not specified
2013-08-25fetch-pack: do not remove .git/shallow file when --depth is not specifiedLibravatar Nguyễn Thái Ngọc Duy1-1/+3
fetch_pack() can remove .git/shallow file when a shallow repository becomes a full one again. This behavior is triggered incorrectly when tags are also fetched because fetch_pack() will be called twice. At the first fetch_pack() call: - shallow_lock is set up - alternate_shallow_file points to shallow_lock.filename, which is "shallow.lock" - commit_lock_file is called, which sets shallow_lock.filename to "". alternate_shallow_file also becomes "" because it points to the same memory. At the second call, setup_alternate_shallow() is not called and alternate_shallow_file remains "". It's mistaken as unshallow case and .git/shallow is removed. The end result is a broken repository. Fix this by always initializing alternate_shallow_file when fetch_pack() is called. As an extra measure, check if args->depth > 0 before commit/rollback shallow file. Reported-by: Kacper Kornet <kornet@camk.edu.pl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-18move setup_alternate_shallow and write_shallow_commits to shallow.cLibravatar Nguyễn Thái Ngọc Duy1-52/+1
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-08cache.h: move remote/connect API out of itLibravatar Junio C Hamano1-0/+1
The definition of "struct ref" in "cache.h", a header file so central to the system, always confused me. This structure is not about the local ref used by sha1-name API to name local objects. It is what refspecs are expanded into, after finding out what refs the other side has, to define what refs are updated after object transfer succeeds to what values. It belongs to "remote.h" together with "struct refspec". While we are at it, also move the types and functions related to the Git transport connection to a new header file connect.h Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-02fetch-pack: avoid quadratic behavior in rev_list_pushLibravatar Jeff King1-7/+6
When we call find_common to start finding common ancestors with the remote side of a fetch, the first thing we do is insert the tip of each ref into our rev_list linked list. We keep the list sorted the whole time with commit_list_insert_by_date, which means our insertion ends up doing O(n^2) timestamp comparisons. We could teach rev_list_push to use an unsorted list, and then sort it once after we have added each ref. However, in get_rev, we process the list by popping commits off the front and adding parents back in timestamp-sorted order. So that procedure would still operate on the large list. Instead, we can replace the linked list with a heap-based priority queue, which can do O(log n) insertion, making the whole insertion procedure O(n log n). As a result of switching to the prio_queue struct, we fix two minor bugs: 1. When we "pop" a commit in get_rev, and when we clear the rev_list in find_common, we do not take care to free the "struct commit_list", and just leak its memory. With the prio_queue implementation, the memory management is handled for us. 2. In get_rev, we look at the head commit of the list, possibly push its parents onto the list, and then "pop" the front of the list off, assuming it is the same element that we just peeked at. This is typically going to be the case, but would not be in the face of clock skew: the parents are inserted by date, and could potentially be inserted at the head of the list if they have a timestamp newer than their descendent. In this case, we would accidentally pop the parent, and never process it at all. The new implementation pulls the commit off of the queue as we examine it, and so does not suffer from this problem. With this patch, a fetch of a single commit into a repository with 50,000 refs went from: real 0m7.984s user 0m7.852s sys 0m0.120s to: real 0m2.017s user 0m1.884s sys 0m0.124s Before this patch, a larger case with 370K refs still had not completed after tens of minutes; with this patch, it completes in about 12 seconds. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-02fetch-pack: avoid quadratic list insertion in mark_completeLibravatar Jeff King1-1/+2
We insert the commit pointed to by each ref one-by-one into the "complete" commit_list using insert_by_date. Because each insertion is O(n), we end up with O(n^2) behavior. This typically doesn't matter, because the number of refs is reasonably small. And even if there are a lot of refs, they often point to a smaller set of objects (in which case the optimization in commit ea5f220 keeps our "n" small). However, in pathological repositories (hundreds of thousands of refs, each pointing to a unique commit), this quadratic behavior can make a difference. Since we do not care about the list order until we have finished building it, we can simply keep it unsorted during the insertion phase, then sort it afterwards. On a repository like the one described above, this dropped the time to do a no-op fetch from 2.0s to 1.7s. On normal repositories, it probably does not matter at all, but it does not hurt to protect ourselves from pathological cases. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28clone: open a shortcut for connectivity checkLibravatar Nguyễn Thái Ngọc Duy1-1/+10
In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28fetch-pack: prepare updated shallow file before fetching the packLibravatar Nguyễn Thái Ngọc Duy1-36/+37
index-pack --strict looks up and follows parent commits. If shallow information is not ready by the time index-pack is run, index-pack may be led to non-existent objects. Make fetch-pack save shallow file to disk before invoking index-pack. git learns new global option --shallow-file to pass on the alternate shallow file path. Undocumented (and not even support --shallow-file= syntax) because it's unlikely to be used again elsewhere. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-01Merge branch 'jk/pkt-line-cleanup'Libravatar Junio C Hamano1-9/+9
Clean up pkt-line API, implementation and its callers to make them more robust. * jk/pkt-line-cleanup: do not use GIT_TRACE_PACKET=3 in tests remote-curl: always parse incoming refs remote-curl: move ref-parsing code up in file remote-curl: pass buffer straight to get_remote_heads teach get_remote_heads to read from a memory buffer pkt-line: share buffer/descriptor reading implementation pkt-line: provide a LARGE_PACKET_MAX static buffer pkt-line: move LARGE_PACKET_MAX definition from sideband pkt-line: teach packet_read_line to chomp newlines pkt-line: provide a generic reading function with options pkt-line: drop safe_write function pkt-line: move a misplaced comment write_or_die: raise SIGPIPE when we get EPIPE upload-archive: use argv_array to store client arguments upload-archive: do not copy repo name send-pack: prefer prefixcmp over memcmp in receive_status fetch-pack: fix out-of-bounds buffer offset in get_ack upload-pack: remove packet debugging harness upload-pack: do not add duplicate objects to shallow list upload-pack: use get_sha1_hex to parse "shallow" lines
2013-03-21Merge branch 'jc/fetch-raw-sha1'Libravatar Junio C Hamano1-32/+69
Allows requests to fetch objects at any tip of refs (including hidden ones). It seems that there may be use cases even outside Gerrit (e.g. $gmane/215701). * jc/fetch-raw-sha1: fetch: fetch objects by their exact SHA-1 object names upload-pack: optionally allow fetching from the tips of hidden refs fetch: use struct ref to represent refs to be fetched parse_fetch_refspec(): clarify the codeflow a bit
2013-02-20pkt-line: provide a LARGE_PACKET_MAX static bufferLibravatar Jeff King1-6/+6
Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20pkt-line: teach packet_read_line to chomp newlinesLibravatar Jeff King1-2/+0
The packets sent during ref negotiation are all terminated by newline; even though the code to chomp these newlines is short, we end up doing it in a lot of places. This patch teaches packet_read_line to auto-chomp the trailing newline; this lets us get rid of a lot of inline chomping code. As a result, some call-sites which are not reading line-oriented data (e.g., when reading chunks of packfiles alongside sideband) transition away from packet_read_line to the generic packet_read interface. This patch converts all of the existing callsites. Since the function signature of packet_read_line does not change (but its behavior does), there is a possibility of new callsites being introduced in later commits, silently introducing an incompatibility. However, since a later patch in this series will change the signature, such a commit would have to be merged directly into this commit, not to the tip of the series; we can therefore ignore the issue. This is an internal cleanup and should produce no change of behavior in the normal case. However, there is one corner case to note. Callers of packet_read_line have never been able to tell the difference between a flush packet ("0000") and an empty packet ("0004"), as both cause packet_read_line to return a length of 0. Readers treat them identically, even though Documentation/technical/protocol-common.txt says we must not; it also says that implementations should not send an empty pkt-line. By stripping out the newline before the result gets to the caller, we will now treat the newline-only packet ("0005\n") the same as an empty packet, which in turn gets treated like a flush packet. In practice this doesn't matter, as neither empty nor newline-only packets are part of git's protocols (at least not for the line-oriented bits, and readers who are not expecting line-oriented packets will be calling packet_read directly, anyway). But even if we do decide to care about the distinction later, it is orthogonal to this patch. The right place to tighten would be to stop treating empty packets as flush packets, and this change does not make doing so any harder. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20pkt-line: drop safe_write functionLibravatar Jeff King1-1/+1
This is just write_or_die by another name. The one distinction is that write_or_die will treat EPIPE specially by suppressing error messages. That's fine, as we die by SIGPIPE anyway (and in the off chance that it is disabled, write_or_die will simulate it). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20fetch-pack: fix out-of-bounds buffer offset in get_ackLibravatar Jeff King1-0/+2
When we read acks from the remote, we expect either: ACK <sha1> or ACK <sha1> <multi-ack-flag> We parse the "ACK <sha1>" bit from the line, and then start looking for the flag strings at "line+45"; if we don't have them, we assume it's of the first type. But if we do have the first type, then line+45 is not necessarily inside our string at all! It turns out that this works most of the time due to the way we parse the packets. They should come in with a newline, and packet_read puts an extra NUL into the buffer, so we end up with: ACK <sha1>\n\0 with the newline at offset 44 and the NUL at offset 45. We then strip the newline, putting a NUL at offset 44. So when we look at "line+45", we are looking past the end of our string; but it's OK, because we hit the terminator from the original string. This breaks down, however, if the other side does not terminate their packets with a newline. In that case, our packet is one character shorter, and we start looking through uninitialized memory for the flag. No known implementation sends such a packet, so it has never come up in practice. This patch tightens the check by looking for a short, flagless ACK before trying to parse the flag. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-07fetch: fetch objects by their exact SHA-1 object namesLibravatar Junio C Hamano1-1/+21
Teach "git fetch" to accept an exact SHA-1 object name the user may obtain out of band on the LHS of a pathspec, and send it on a "want" message when the server side advertises the allow-tip-sha1-in-want capability. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-07fetch: use struct ref to represent refs to be fetchedLibravatar Junio C Hamano1-31/+48
Even though "git fetch" has full infrastructure to parse refspecs to be fetched and match them against the list of refs to come up with the final list of refs to be fetched, the list of refs that are requested to be fetched were internally converted to a plain list of strings at the transport layer and then passed to the underlying fetch-pack driver. Stop this conversion and instead pass around an array of refs. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-01Merge branch 'jk/gc-auto-after-fetch'Libravatar Junio C Hamano1-0/+3
Help "fetch only" repositories that do not trigger "gc --auto" often enough. * jk/gc-auto-after-fetch: fetch-pack: avoid repeatedly re-scanning pack directory fetch: run gc --auto after fetching