summaryrefslogtreecommitdiff
path: root/connected.c
AgeCommit message (Collapse)AuthorFilesLines
2020-03-15connected.c: reprepare packs for corner casesLibravatar Derrick Stolee1-0/+4
While updating the microsoft/git fork on top of v2.26.0-rc0 and consuming that build into Scalar, I noticed a corner case bug around partial clone. The "scalar clone" command can create a Git repository with the proper config for using partial clone with the "blob:none" filter. Instead of calling "git clone", it runs "git init" then sets a few more config values before running "git fetch". In our builds on v2.26.0-rc0, we noticed that our "git fetch" command was failing with error: https://github.com/microsoft/scalar did not send all necessary objects This does not happen if you copy the config file from a repository created by "git clone --filter=blob:none <url>", but it does happen when adding the config option "core.logAllRefUpdates = true". By debugging, I was able to see that the loop inside check_connnected() that checks if all refs are contained in promisor packs actually did not have any packfiles in the packed_git list. I'm not sure what corner-case issues caused this config option to prevent the reprepare_packed_git() from being called at the proper spot during the fetch operation. This approach requires a situation where we use the remote helper process, which makes it difficult to test. It is possible to place a reprepare_packed_git() call in the fetch code closer to where we receive a pack, but that leaves an opening for a later change to re-introduce this problem. Further, a concurrent repack operation could replace the pack-file list we already loaded into memory, causing this issue in an even harder to reproduce scenario. It is really the responsibility of anyone looping through the list of pack-files for a certain object to fall back to reprepare_packed_git() on a fail-to-find. The loop in check_connected() does not have this fallback, leading to this bug. We _could_ try looping through the packs and only reprepare the packs after a miss, but that change is more involved and has little value. Since this case is isolated to the case when opt->check_refs_are_promisor_objects_only is true, we are confident that we are verifying the refs after downloading new data. This implies that calling reprepare_packed_git() in advance is not a huge cost compared to the rest of the operations already made. Helped-by: Jeff King <peff@peff.net> Helped-by: Junio Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-30connected: verify promisor-ness of partial cloneLibravatar Jonathan Tan1-5/+14
Commit dfa33a298d ("clone: do faster object check for partial clones", 2019-04-21) optimized the connectivity check done when cloning with --filter to check only the existence of objects directly pointed to by refs. But this is not sufficient: they also need to be promisor objects. Make this check more robust by instead checking that these objects are promisor objects, that is, they appear in a promisor pack. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13clone: remove fetch_if_missing=0Libravatar Jonathan Tan1-1/+2
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08) strove to remove the need for fetch_if_missing=0 from the fetching mechanism, so it is plausible to attempt removing fetch_if_missing=0 from clone as well. But doing so reveals a bug - when the server does not send an object directly pointed to by a ref, this should be an error, not a trigger for a lazy fetch. (This case in the fetching mechanism was covered by a test using "git clone", not "git fetch", which is why the aforementioned commit didn't uncover the bug.) The bug can be fixed by suppressing lazy-fetching during the connectivity check. Fix this bug, and remove fetch_if_missing from clone. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-11Merge branch 'bc/object-id-part17'Libravatar Junio C Hamano1-3/+4
Preparation for SHA-256 upgrade continues. * bc/object-id-part17: (26 commits) midx: switch to using the_hash_algo builtin/show-index: replace sha1_to_hex rerere: replace sha1_to_hex builtin/receive-pack: replace sha1_to_hex builtin/index-pack: replace sha1_to_hex packfile: replace sha1_to_hex wt-status: convert struct wt_status to object_id cache: remove null_sha1 builtin/worktree: switch null_sha1 to null_oid builtin/repack: write object IDs of the proper length pack-write: use hash_to_hex when writing checksums sequencer: convert to use the_hash_algo bisect: switch to using the_hash_algo sha1-lookup: switch hard-coded constants to the_hash_algo config: use the_hash_algo in abbrev comparison combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo bundle: switch to use the_hash_algo connected: switch GIT_SHA1_HEXSZ to the_hash_algo show-index: switch hard-coded constants to the_hash_algo blame: remove needless comparison with GIT_SHA1_HEXSZ ...
2019-09-18Merge branch 'cc/multi-promisor'Libravatar Junio C Hamano1-1/+2
Teach the lazy clone machinery that there can be more than one promisor remote and consult them in order when downloading missing objects on demand. * cc/multi-promisor: Move core_partial_clone_filter_default to promisor-remote.c Move repository_format_partial_clone to promisor-remote.c Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} remote: add promisor and partial clone config to the doc partial-clone: add multiple remotes in the doc t0410: test fetching from many promisor remotes builtin/fetch: remove unique promisor remote limitation promisor-remote: parse remote.*.partialclonefilter Use promisor_remote_get_direct() and has_promisor_remote() promisor-remote: use repository_format_partial_clone promisor-remote: add promisor_remote_reinit() promisor-remote: implement promisor_remote_get_direct() Add initial support for many promisor remotes fetch-object: make functions return an error code t0410: remove pipes after git commands
2019-08-19connected: switch GIT_SHA1_HEXSZ to the_hash_algoLibravatar brian m. carlson1-3/+4
Switch various uses of GIT_SHA1_HEXSZ to reference the_hash_algo instead. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-01check_everything_connected: assume alternate ref tips are validLibravatar Jeff King1-0/+1
When we receive a remote ref update to sha1 "X", we want to check that we have all of the objects needed by "X". We can assume that our repository is not currently corrupted, and therefore if we have a ref pointing at "Y", we have all of its objects. So we can stop our traversal from "X" as soon as we hit "Y". If we make the same non-corruption assumption about any repositories we use to store alternates, then we can also use their ref tips to shorten the traversal. This is especially useful when cloning with "--reference", as we otherwise do not have any local refs to check against, and have to traverse the whole history, even though the other side may have sent us few or no objects. Here are results for the included perf test (which shows off more or less the maximal savings, getting one new commit and sharing the whole history): Test HEAD^ HEAD -------------------------------------------------------------------- [on git.git] 5600.3: clone --reference 2.94(2.86+0.08) 0.09(0.08+0.01) -96.9% [on linux.git] 5600.3: clone --reference 45.74(45.34+0.41) 0.36(0.30+0.08) -99.2% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25Use promisor_remote_get_direct() and has_promisor_remote()Libravatar Christian Couder1-1/+2
Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-21clone: do faster object check for partial clonesLibravatar Josh Steadmon1-0/+17
For partial clones, doing a full connectivity check is wasteful; we skip promisor objects (which, for a partial clone, is all known objects), and enumerating them all to exclude them from the connectivity check can take a significant amount of time on large repos. At most, we want to make sure that we get the objects referred to by any wanted refs. For partial clones, just check that these objects were transferred. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03fetch-pack: write shallow, then check connectivityLibravatar Jonathan Tan1-2/+4
When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-13Merge branch 'jh/partial-clone'Libravatar Junio C Hamano1-0/+2
The machinery to clone & fetch, which in turn involves packing and unpacking objects, have been told how to omit certain objects using the filtering mechanism introduced by the jh/object-filtering topic, and also mark the resulting pack as a promisor pack to tolerate missing objects, taking advantage of the mechanism introduced by the jh/fsck-promisors topic. * jh/partial-clone: t5616: test bulk prefetch after partial fetch fetch: inherit filter-spec from partial clone t5616: end-to-end tests for partial clone fetch-pack: restore save_commit_buffer after use unpack-trees: batch fetching of missing blobs clone: partial clone partial-clone: define partial clone settings in config fetch: support filters fetch: refactor calculation of remote list fetch-pack: test support excluding large blobs fetch-pack: add --no-filter fetch-pack, index-pack, transport: partial clone upload-pack: add object filtering for partial clone
2017-12-08fetch: support filtersLibravatar Jeff Hostetler1-0/+2
Teach fetch to support filters. This is only allowed for the remote configured in extensions.partialcloneremote. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16Convert check_connected to use struct object_idLibravatar brian m. carlson1-9/+9
Convert check_connected and the callbacks it takes to use struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23pack: move {,re}prepare_packed_git and approximate_object_countLibravatar Jonathan Tan1-1/+1
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23pack: move add_packed_git()Libravatar Jonathan Tan1-0/+1
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-10check_connected: accept an env argumentLibravatar Jeff King1-0/+1
This lets callers influence the environment seen by rev-list, which will be useful when we start providing quarantined objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20check_connected: add progress flagLibravatar Jeff King1-0/+3
Connectivity checks have to traverse the entire object graph in the worst case (e.g., a full clone or a full push). For large repositories like linux.git, this can take 30-60 seconds, during which time git may produce little or no output. Let's add the option of showing progress, which is taken care of by rev-list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20check_connected: relay errors to alternate descriptorLibravatar Jeff King1-2/+9
Unless the "quiet" flag is given, check_connected sends any errors to the stderr of the caller (because the child rev-list inherits that descriptor). However, server-side callers may want to send these over a sideband channel instead. Let's make that possible. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20check_everything_connected: use a struct with named optionsLibravatar Jeff King1-28/+11
The number of variants of check_everything_connected has grown over the years, so that the "real" function takes several possibly-zero, possibly-NULL arguments. We hid the complexity behind some wrapper functions, but this doesn't scale well when we want to add new options. If we add more wrapper variants to handle the new options, then we can get a combinatorial explosion when those options might be used together (right now nobody wants to use both "shallow" and "transport" together, so we get by with just a few wrappers). If instead we add new parameters to each function, each of which can have a default value, then callers who want the defaults end up with confusing invocations like: check_everything_connected(fn, 0, data, -1, 0, NULL); where it is unclear which parameter is which (and every caller needs updated when we add new options). Instead, let's add a struct to hold all of the optional parameters. This is a little more verbose for the callers (who have to declare the struct and fill it in), but it makes their code much easier to follow, because every option is named as it is set (and unused options do not have to be mentioned at all). Note that we could also stick the iteration function and its callback data into the option struct, too. But since those are required for each call, by avoiding doing so, we can let very simple callers just pass "NULL" for the options and not worry about the struct at all. While we're touching each site, let's also rename the function to check_connected(). The existing name was quite long, and not all of the wrappers even used the full name. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20check_everything_connected: convert to argv_arrayLibravatar Jeff King1-12/+9
This avoids the magic "9" array-size which we must avoid overflowing, making further patches simpler. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20check_everything_connected: always pass --quiet to rev-listLibravatar Jeff King1-2/+1
The check_everything_connected function takes a "quiet" parameter which does two things if non-zero: 1. redirect rev-list's stderr to /dev/null to avoid showing errors to the user 2. pass "--quiet" to rev-list Item (1) is obviously useful. But item (2) is surprisingly not. For rev-list, "--quiet" does not have anything to do with chattiness on stderr; it tells rev-list not to bother writing the list of traversed objects to stdout, for efficiency. And since we always redirect rev-list's stdout to /dev/null in this function, there is no point in asking it to ever write anything to stdout. The efficiency gains are modest; a best-of-five run of "git rev-list --objects --all" on linux.git dropped from 32.013s to 30.502s when adding "--quiet". That's only about 5%, but given how easy it is, it's worth doing. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-09connected.c: use error_errno()Libravatar Nguyễn Thái Ngọc Duy1-6/+3
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-20run-command: introduce CHILD_PROCESS_INITLibravatar René Scharfe1-2/+1
Most struct child_process variables are cleared using memset first after declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to initialize them statically instead. That's shorter, doesn't require a function call and is slightly more readable (especially given that we already have STRBUF_INIT, ARGV_ARRAY_INIT etc.). Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30use strip_suffix instead of ends_with in simple casesLibravatar Jeff King1-3/+3
When stripping a suffix like: if (ends_with(str, "foo")) buf = xmemdupz(str, strlen(str) - 3); we can instead use strip_suffix to avoid the constant 3, which must match the literal "foo" (we sometimes use strlen("foo") instead, but that means we are repeating ourselves). The example above becomes: if (strip_suffix(str, "foo", &len)) buf = xmemdupz(str, len); This also saves a strlen(), since we calculate the string length when detecting the suffix. Note that in some cases we also switch from xstrndup to xmemdupz, which saves a further strlen call. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-17Merge branch 'nd/shallow-clone'Libravatar Junio C Hamano1-8/+34
Fetching from a shallow-cloned repository used to be forbidden, primarily because the codepaths involved were not carefully vetted and we did not bother supporting such usage. This attempts to allow object transfer out of a shallow-cloned repository in a controlled way (i.e. the receiver become a shallow repository with truncated history). * nd/shallow-clone: (31 commits) t5537: fix incorrect expectation in test case 10 shallow: remove unused code send-pack.c: mark a file-local function static git-clone.txt: remove shallow clone limitations prune: clean .git/shallow after pruning objects clone: use git protocol for cloning shallow repo locally send-pack: support pushing from a shallow clone via http receive-pack: support pushing to a shallow clone via http smart-http: support shallow fetch/clone remote-curl: pass ref SHA-1 to fetch-pack as well send-pack: support pushing to a shallow clone receive-pack: allow pushes that update .git/shallow connected.c: add new variant that runs with --shallow-file add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses receive/send-pack: support pushing from a shallow clone receive-pack: reorder some code in unpack() fetch: add --update-shallow to accept refs that update .git/shallow upload-pack: make sure deepening preserves shallow roots fetch: support fetching from a shallow repository clone: support remote shallow repository ...
2013-12-10connected.c: add new variant that runs with --shallow-fileLibravatar Nguyễn Thái Ngọc Duy1-8/+34
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Libravatar Christian Couder1-1/+1
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28clone: open a shortcut for connectivity checkLibravatar Nguyễn Thái Ngọc Duy1-1/+33
In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-15fetch/receive: remove over-pessimistic connectivity checkLibravatar Junio C Hamano1-4/+4
Git 1.7.8 introduced an object and history re-validation step after "fetch" or "push" causes new history to be added to a receiving repository. This is to protect a malicious server or pushing client from corrupting the repository by taking advantage of an existing corrupt object that is unconnected to existing history. But this check is way over-pessimistic. During "fetch" or "receive-pack" (the server side of "push"), unpack-objects and index-pack already validate individual objects that are received, and the only thing we would want to catch are corrupted objects that already happen to exist in our repository but are not referenced from our refs. Such objects must have been written by an earlier run of our codepaths that write out loose objects or packfiles, and they must have done the validation of individual objects when they did so. The only thing left to worry about is the connectivity integrity, which can be checked with "rev-list --objects", which is much cheaper. We have been paying the 5x to 8x runtime overhead the --verify-objects often adds for no real gain. Revert check_everything_connected() not to use this over-pessimistic check. Credit goes to Nguyễn Thái Ngọc Duy, who originally identified the performance regression and endured multiple rounds of reviews to fix it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-09check_everything_connected(): libifyLibravatar Junio C Hamano1-0/+62
Extract the helper function and the type definition of the iterator function it uses out of builtin/fetch.c into a separate source and a header file. Signed-off-by: Junio C Hamano <gitster@pobox.com>