summaryrefslogtreecommitdiff
path: root/builtin/fetch.c
AgeCommit message (Collapse)AuthorFilesLines
2022-02-23Merge branch 'ps/fetch-optim-with-commit-graph'Libravatar Junio C Hamano1-2/+6
A couple of optimization to "git fetch". * ps/fetch-optim-with-commit-graph: fetch: skip computing output width when not printing anything fetch-pack: use commit-graph when computing cutoff
2022-02-18Merge branch 'js/short-help-outside-repo-fix'Libravatar Junio C Hamano1-2/+4
"git cmd -h" outside a repository should error out cleanly for many commands, but instead it hit a BUG(), which has been corrected. * js/short-help-outside-repo-fix: t0012: verify that built-ins handle `-h` even without gitdir checkout/fetch/pull/pack-objects: allow `-h` outside a repository
2022-02-18Merge branch 'ab/release-transport-ls-refs-options'Libravatar Junio C Hamano1-1/+1
* ab/release-transport-ls-refs-options: ls-remote & transport API: release "struct transport_ls_refs_options"
2022-02-11Merge branch 'tg/fetch-prune-exit-code-fix'Libravatar Junio C Hamano1-4/+6
When "git fetch --prune" failed to prune the refs it wanted to prune, the command issued error messages but exited with exit status 0, which has been corrected. * tg/fetch-prune-exit-code-fix: fetch --prune: exit with error if pruning fails
2022-02-11Merge branch 'rc/negotiate-only-typofix'Libravatar Junio C Hamano1-1/+1
Typofix. * rc/negotiate-only-typofix: fetch: fix negotiate-only error message
2022-02-10fetch: skip computing output width when not printing anythingLibravatar Patrick Steinhardt1-2/+6
When updating references via git-fetch(1), then by default we report to the user which references have been changed. This output is formatted in a nice table such that the different columns are aligned. Because the first column contains abbreviated object IDs we thus need to iterate over all refs which have changed and compute the minimum length for their respective abbreviated hashes. While this effort makes sense in most cases, it is wasteful when the user passes the `--quiet` flag: we don't print the summary, but still compute the length. Skip computing the summary width when the user asked for us to be quiet. This gives us a speedup of nearly 10% when doing a mirror-fetch in a repository with thousands of references being updated: Benchmark 1: git fetch --quiet +refs/*:refs/* (HEAD~) Time (mean ± σ): 96.078 s ± 0.508 s [User: 91.378 s, System: 10.870 s] Range (min … max): 95.449 s … 96.760 s 5 runs Benchmark 2: git fetch --quiet +refs/*:refs/* (HEAD) Time (mean ± σ): 88.214 s ± 0.192 s [User: 83.274 s, System: 10.978 s] Range (min … max): 87.998 s … 88.446 s 5 runs Summary 'git fetch --quiet +refs/*:refs/* (HEAD)' ran 1.09 ± 0.01 times faster than 'git fetch --quiet +refs/*:refs/* (HEAD~)' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-09Merge branch 'gc/fetch-negotiate-only-early-return'Libravatar Junio C Hamano1-3/+38
"git fetch --negotiate-only" is an internal command used by "git push" to figure out which part of our history is missing from the other side. It should never recurse into submodules even when fetch.recursesubmodules configuration variable is set, nor it should trigger "gc". The code has been tightened up to ensure it only does common ancestry discovery and nothing else. * gc/fetch-negotiate-only-early-return: fetch: help translators by reusing the same message template fetch --negotiate-only: do not update submodules fetch: skip tasks related to fetching objects fetch: use goto cleanup in cmd_fetch()
2022-02-08checkout/fetch/pull/pack-objects: allow `-h` outside a repositoryLibravatar Johannes Schindelin1-2/+4
When we taught these commands about the sparse index, we did not account for the fact that the `cmd_*()` functions _can_ be called without a gitdir, namely when `-h` is passed to show the usage. A plausible approach to address this is to move the `prepare_repo_settings()` calls right after the `parse_options()` calls: The latter will never return when it handles `-h`, and therefore it is safe to assume that we have a `gitdir` at that point, as long as the built-in is marked with the `RUN_SETUP` flag. However, it is unfortunately not that simple. In `cmd_pack_objects()`, for example, the repo settings need to be fully populated so that the command-line options `--sparse`/`--no-sparse` can override them, not the other way round. Therefore, we choose to imitate the strategy taken in `cmd_diff()`, where we simply do not bother to prepare and initialize the repo settings unless we have a `gitdir`. This fixes https://github.com/git-for-windows/git/issues/3688 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-06ls-remote & transport API: release "struct transport_ls_refs_options"Libravatar Ævar Arnfjörð Bjarmason1-1/+1
Fix a memory leak in codepaths that use the "struct transport_ls_refs_options" API. Since the introduction of the struct in 39835409d10 (connect, transport: encapsulate arg in struct, 2021-02-05) the caller has been responsible for freeing it. That commit in turn migrated code originally added in 402c47d9391 (clone: send ref-prefixes when using protocol v2, 2018-07-20) and b4be74105fe (ls-remote: pass ref prefixes when requesting a remote's refs, 2018-03-15). Only some of those codepaths were releasing the allocated resources of the struct, now all of them will. Mark the "t/t5511-refspec.sh" test as passing when git is compiled with SANITIZE=leak. They'll now be listed as running under the "GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks" CI target). Previously 24/47 tests would fail. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-31fetch --prune: exit with error if pruning failsLibravatar Thomas Gummerer1-4/+6
When pruning refs fails, we print an error to stderr, but still exit 0 from 'git fetch'. Since this is a genuine error, fetch should be exiting with some non-zero exit code. Make it so. The --prune option was introduced in f360d844de ("builtin-fetch: add --prune option", 2009-11-10). Unfortunately it's unclear from that commit whether ignoring the exit code was an oversight or intentional, but it feels like an oversight. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-28fetch: fix negotiate-only error messageLibravatar Robert Coup1-1/+1
The error message when invoking a negotiate-only fetch without providing any tips incorrectly refers to a --negotiate-tip=* argument. Fix this to use the actual argument, --negotiation-tip=*. Signed-off-by: Robert Coup <robert@coup.net.nz> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-20fetch: help translators by reusing the same message templateLibravatar Junio C Hamano1-1/+2
Follow the example set by 12909b6b (i18n: turn "options are incompatible" into "cannot be used together", 2022-01-05) and use the same message string to reduce the need for translation. Reported-by: Jiang Xin <worldhello.net@gmail.com> Helped-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-18fetch --negotiate-only: do not update submodulesLibravatar Glen Choo1-1/+23
`git fetch --negotiate-only` is an implementation detail of push negotiation and, unlike most `git fetch` invocations, does not actually update the main repository. Thus it should not update submodules even if submodule recursion is enabled. This is not just slow, it is wrong e.g. push negotiation with "submodule.recurse=true" will cause submodules to be updated because it invokes `git fetch --negotiate-only`. Fix this by disabling submodule recursion if --negotiate-only was given. Since this makes --negotiate-only and --recurse-submodules incompatible, check for this invalid combination and die. This does not use the "goto cleanup" introduced in the previous commit because we want to recurse through submodules whenever a ref is fetched, and this can happen without introducing new objects. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-18fetch: skip tasks related to fetching objectsLibravatar Glen Choo1-0/+11
cmd_fetch() does the following with the assumption that objects are fetched: * Run gc * Write commit graphs (if enabled by fetch.writeCommitGraph=true) However, neither of these tasks makes sense if objects are not fetched e.g. `git fetch --negotiate-only` never fetches objects. Speed up cmd_fetch() by bailing out early if we know for certain that objects will not be fetched. cmd_fetch() can bail out early whenever objects are not fetched, but for now this only considers --negotiate-only. The same optimization does not apply to `git fetch --dry-run` because that actually fetches objects; the dry run refers to not updating refs. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-18fetch: use goto cleanup in cmd_fetch()Libravatar Glen Choo1-3/+4
Replace an early return with 'goto cleanup' in cmd_fetch() so that the string_list is always cleared (the string_list_clear() call is purely cleanup; the string_list is not reused). This makes cleanup consistent so that a subsequent commit can use 'goto cleanup' to bail out early. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-12Merge branch 'ps/lockfile-cleanup-fix'Libravatar Junio C Hamano1-6/+11
Some lockfile code called free() in signal-death code path, which has been corrected. * ps/lockfile-cleanup-fix: fetch: fix deadlock when cleaning up lockfiles in async signals
2022-01-10Merge branch 'ja/i18n-similar-messages'Libravatar Junio C Hamano1-4/+4
Similar message templates have been consolidated so that translators need to work on fewer number of messages. * ja/i18n-similar-messages: i18n: turn even more messages into "cannot be used together" ones i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom" i18n: factorize "--foo outside a repository" i18n: refactor "unrecognized %(foo) argument" strings i18n: factorize "no directory given for --foo" i18n: factorize "--foo requires --bar" and the like i18n: tag.c factorize i18n strings i18n: standardize "cannot open" and "cannot read" i18n: turn "options are incompatible" into "cannot be used together" i18n: refactor "%s, %s and %s are mutually exclusive" i18n: refactor "foo and bar are mutually exclusive"
2022-01-10Merge branch 'ds/fetch-pull-with-sparse-index'Libravatar Junio C Hamano1-0/+2
"git fetch" and "git pull" are now declared sparse-index clean. Also "git ls-files" learns the "--sparse" option to help debugging. * ds/fetch-pull-with-sparse-index: test-read-cache: remove --table, --expand options t1091/t3705: remove 'test-tool read-cache --table' t1092: replace 'read-cache --table' with 'ls-files --sparse' ls-files: add --sparse option fetch/pull: use the sparse index
2022-01-07fetch: fix deadlock when cleaning up lockfiles in async signalsLibravatar Patrick Steinhardt1-6/+11
When fetching packfiles, we write a bunch of lockfiles for the packfiles we're writing into the repository. In order to not leave behind any cruft in case we exit or receive a signal, we register both an exit handler as well as signal handlers for common signals like SIGINT. These handlers will then unlink the locks and free the data structure tracking them. We have observed a deadlock in this logic though: (gdb) bt #0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 #1 0x00007f4932bea2cd in _int_free (av=0x7f4932f2eb20 <main_arena>, p=0x3e3e4200, have_lock=0) at malloc.c:3969 #2 0x00007f4932bee58c in __GI___libc_free (mem=<optimized out>) at malloc.c:2975 #3 0x0000000000662ab1 in string_list_clear () #4 0x000000000044f5bc in unlock_pack_on_signal () #5 <signal handler called> #6 _int_free (av=0x7f4932f2eb20 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4024 #7 0x00007f4932bee58c in __GI___libc_free (mem=<optimized out>) at malloc.c:2975 #8 0x000000000065afd5 in strbuf_release () #9 0x000000000066ddb9 in delete_tempfile () #10 0x0000000000610d0b in files_transaction_cleanup.isra () #11 0x0000000000611718 in files_transaction_abort () #12 0x000000000060d2ef in ref_transaction_abort () #13 0x000000000060d441 in ref_transaction_prepare () #14 0x000000000060e0b5 in ref_transaction_commit () #15 0x00000000004511c2 in fetch_and_consume_refs () #16 0x000000000045279a in cmd_fetch () #17 0x0000000000407c48 in handle_builtin () #18 0x0000000000408df2 in cmd_main () #19 0x00000000004078b5 in main () The process was killed with a signal, which caused the signal handler to kick in and try free the data structures after we have unlinked the locks. It then deadlocks while calling free(3P). The root cause of this is that it is not allowed to call certain functions in async-signal handlers, as specified by signal-safety(7). Next to most I/O functions, this list of disallowed functions also includes memory-handling functions like malloc(3P) and free(3P) because they may not be reentrant. As a result, if we execute such functions in the signal handler, then they may operate on inconistent state and fail in unexpected ways. Fix this bug by not calling non-async-signal-safe functions when running in the signal handler. We're about to re-raise the signal anyway and will thus exit, so it's not much of a problem to keep the string list of lockfiles untouched. Note that it's fine though to call unlink(2), so we'll still clean up the lockfiles correctly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05i18n: standardize "cannot open" and "cannot read"Libravatar Jean-Noël Avila1-2/+2
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05i18n: refactor "foo and bar are mutually exclusive"Libravatar Jean-Noël Avila1-2/+2
Use static strings for constant parts of the sentences. They are all turned into "cannot be used together". Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22Merge branch 'ab/fetch-set-upstream-while-detached'Libravatar Junio C Hamano1-0/+10
"git fetch --set-upstream" did not check if there is a current branch, leading to a segfault when it is run on a detached HEAD, which has been corrected. * ab/fetch-set-upstream-while-detached: pull, fetch: fix segfault in --set-upstream option
2021-12-22fetch/pull: use the sparse indexLibravatar Derrick Stolee1-0/+2
The 'git fetch' and 'git pull' commands parse the index in order to determine if submodules exist. Without command_requires_full_index=0, this will expand a sparse index, causing slow performance even when there is no new data to fetch. The .gitmodules file will never be inside a sparse directory entry, and even if it was, the index_name_pos() method would expand the sparse index if needed as we search for the path by name. These commands do not iterate over the index, which is the typical thing we are careful about when integrating with the sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-07pull, fetch: fix segfault in --set-upstream optionLibravatar Ævar Arnfjörð Bjarmason1-0/+10
Fix a segfault in the --set-upstream option added in 24bc1a12926 (pull, fetch: add --set-upstream option, 2019-08-19) added in v2.24.0. The code added there did not do the same checking we do for "git branch" itself since 8efb8899cfe (branch: segfault fixes and validation, 2013-02-23), which in turn fixed the same sort of segfault I'm fixing now in "git branch --set-upstream-to", see 6183d826ba6 (branch: introduce --set-upstream-to, 2012-08-20). The warning message I'm adding here is an amalgamation of the error added for "git branch" in 8efb8899cfe, and the error output install_branch_config() itself emits, i.e. it trims "refs/heads/" from the name and says "branch X on remote", not "branch refs/heads/X on remote". I think it would make more sense to simply die() here, but in the other checks for --set-upstream added in 24bc1a12926 we issue a warning() instead. Let's do the same here for consistency for now. There was an earlier submitted alternate way of fixing this in [1], due to that patch breaking threading with the original report at [2] I didn't notice it before authoring this version. I think the more detailed warning message here is better, and we should also have tests for this behavior. The --no-rebase option to "git pull" is needed as of the recently merged 7d0daf3f12f (Merge branch 'en/pull-conflicting-options', 2021-08-30). 1. https://lore.kernel.org/git/20210706162238.575988-1-clemens@endorphin.org/ 2. https://lore.kernel.org/git/CAG6gW_uHhfNiHGQDgGmb1byMqBA7xa8kuH1mP-wAPEe5Tmi2Ew@mail.gmail.com/ Reported-by: Clemens Fruhwirth <clemens@endorphin.org> Reported-by: Jan Pokorný <poki@fnusa.cz> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-01fetch: protect branches checked out in all worktreesLibravatar Anders Kaseorg1-35/+40
Refuse to fetch into the currently checked out branch of any working tree, not just the current one. Fixes this previously reported bug: https://lore.kernel.org/git/cb957174-5e9a-5603-ea9e-ac9b58a2eaad@mathema.de/ As a side effect of using find_shared_symref, we’ll also refuse the fetch when we’re on a detached HEAD because we’re rebasing or bisecting on the branch in question. This seems like a sensible change. Signed-off-by: Anders Kaseorg <andersk@mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-01fetch: lowercase error messagesLibravatar Anders Kaseorg1-24/+26
Documentation/CodingGuidelines says “do not end error messages with a full stop” and “do not capitalize the first word”. Clean up existing messages, some of which we will be touching in later steps in the series, that deviate from these rules in this file, as a preparation for the main part of the topic. Signed-off-by: Anders Kaseorg <andersk@mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-20Merge branch 'js/run-command-close-packs'Libravatar Junio C Hamano1-2/+0
The run-command API has been updated so that the callers can easily ask the file descriptors open for packfiles to be closed immediately before spawning commands that may trigger auto-gc. * js/run-command-close-packs: Close object store closer to spawning child processes run_auto_maintenance(): implicitly close the object store run-command: offer to close the object store before running run-command: prettify the `RUN_COMMAND_*` flags pull: release packs before fetching commit-graph: when closing the graph, also release the slab
2021-09-20Merge branch 'ps/fetch-optim'Libravatar Junio C Hamano1-34/+40
Optimize code that handles large number of refs in the "git fetch" code path. * ps/fetch-optim: fetch: avoid second connectivity check if we already have all objects fetch: merge fetching and consuming refs fetch: refactor fetch refs to be more extendable fetch-pack: optimize loading of refs via commit graph connected: refactor iterator to return next object ID directly fetch: avoid unpacking headers in object existence check fetch: speed up lookup of want refs via commit-graph
2021-09-10Merge branch 'ab/retire-advice-config'Libravatar Junio C Hamano1-1/+1
Code clean up to migrate callers from older advice_config[] based API to newer advice_if_enabled() and advice_enabled() API. * ab/retire-advice-config: advice: move advice.graftFileDeprecated squashing to commit.[ch] advice: remove use of global advice_add_embedded_repo advice: remove read uses of most global `advice_` variables advice: add enum variants for missing advice variables
2021-09-10Merge branch 'ps/fetch-omit-formatting-under-quiet'Libravatar Junio C Hamano1-5/+12
"git fetch --quiet" optimization to avoid useless computation of info that will never be displayed. * ps/fetch-omit-formatting-under-quiet: fetch: skip formatting updated refs with `--quiet`
2021-09-09run_auto_maintenance(): implicitly close the object storeLibravatar Johannes Schindelin1-2/+0
Before spawning the auto maintenance, we need to make sure that we release all open file handles to all the `.pack` files (and MIDX files and commit-graph files and...) so that the maintenance process has the freedom to delete those files. So far, we did this manually every time before calling `run_auto_maintenance()`. With the new `close_object_store` flag, we can do that implicitly in that function, which is more robust because future callers won't be able to forget to close the object store. Note: this changes behavior slightly, as we previously _always_ closed the object store, but now we only close the object store when actually running the auto maintenance. In practice, this should not matter (if anything, it might speed up operations where auto maintenance is disabled). Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01fetch: avoid second connectivity check if we already have all objectsLibravatar Patrick Steinhardt1-4/+3
When fetching refs, we are doing two connectivity checks: - The first one is done such that we can skip fetching refs in the case where we already have all objects referenced by the updated set of refs. - The second one verifies that we have all objects after we have fetched objects. We always execute both connectivity checks, but this is wasteful in case the first connectivity check already notices that we have all objects locally available. Skip the second connectivity check in case we already had all objects available. This gives us a nice speedup when doing a mirror-fetch in a repository with about 2.3M refs where the fetching repo already has all objects: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 30.025 s ± 0.081 s [User: 27.070 s, System: 4.933 s] Range (min … max): 29.900 s … 30.111 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 25.574 s ± 0.177 s [User: 22.855 s, System: 4.683 s] Range (min … max): 25.399 s … 25.765 s 5 runs Summary 'HEAD: git-fetch' ran 1.17 ± 0.01 times faster than 'HEAD~: git-fetch' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01fetch: merge fetching and consuming refsLibravatar Patrick Steinhardt1-21/+9
The functions `fetch_refs()` and `consume_refs()` must always be called together such that we first obtain all missing objects and then update our local refs to match the remote refs. In a subsequent patch, we'll further require that `fetch_refs()` must always be called before `consume_refs()` such that it can correctly assert that we have all objects after the fetch given that we're about to move the connectivity check. Make this requirement explicit by merging both functions into a single `fetch_and_consume_refs()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01fetch: refactor fetch refs to be more extendableLibravatar Patrick Steinhardt1-7/+17
Refactor `fetch_refs()` code to make it more extendable by explicitly handling error cases. The refactored code should behave the same. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01connected: refactor iterator to return next object ID directlyLibravatar Patrick Steinhardt1-4/+3
The object ID iterator used by the connectivity checks returns the next object ID via an out-parameter and then uses a return code to indicate whether an item was found. This is a bit roundabout: instead of a separate error code, we can just return the next object ID directly and use `NULL` pointers as indicator that the iterator got no items left. Furthermore, this avoids a copy of the object ID. Refactor the iterator and all its implementations to return object IDs directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs: Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s] Range (min … max): 29.934 s … 30.406 s 10 runs Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s] Range (min … max): 29.696 s … 29.996 s 10 runs Summary '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch' While this 1% speedup could be labelled as statistically insignificant, the speedup is consistent on my machine. Furthermore, this is an end to end test, so it is expected that the improvement in the connectivity check itself is more significant. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01fetch: avoid unpacking headers in object existence checkLibravatar Patrick Steinhardt1-3/+1
When updating local refs after the fetch has transferred all objects, we do an object existence test as a safety guard to avoid updating a ref to an object which we don't have. We do so via `oid_object_info()`: if it returns an error, then we know the object does not exist. One side effect of `oid_object_info()` is that it parses the object's type, and to do so it must unpack the object header. This is completely pointless: we don't care for the type, but only want to assert that the object exists. Refactor the code to use `repo_has_object_file()`, which both makes the code's intent clearer and is also faster because it does not unpack object headers. In a real-world repo with 2.3M refs, this results in a small speedup when doing a mirror-fetch: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 33.686 s ± 0.176 s [User: 30.119 s, System: 5.262 s] Range (min … max): 33.512 s … 33.944 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 31.247 s ± 0.195 s [User: 28.135 s, System: 5.066 s] Range (min … max): 30.948 s … 31.472 s 5 runs Summary 'HEAD: git-fetch' ran 1.08 ± 0.01 times faster than 'HEAD~: git-fetch' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01fetch: speed up lookup of want refs via commit-graphLibravatar Patrick Steinhardt1-6/+18
When updating our local refs based on the refs fetched from the remote, we need to iterate through all requested refs and load their respective commits such that we can determine whether they need to be appended to FETCH_HEAD or not. In cases where we're fetching from a remote with exceedingly many refs, resolving these refs can be quite expensive given that we repeatedly need to unpack object headers for each of the referenced objects. Speed this up by opportunistically trying to resolve object IDs via the commit graph. We only do so for any refs which are not in "refs/tags": more likely than not, these are going to be a commit anyway, and this lets us avoid having to unpack object headers completely in case the object is a commit that is part of the commit-graph. This significantly speeds up mirror-fetches in a real-world repository with 2.3M refs: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 56.482 s ± 0.384 s [User: 53.340 s, System: 5.365 s] Range (min … max): 56.050 s … 57.045 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 33.727 s ± 0.170 s [User: 30.252 s, System: 5.194 s] Range (min … max): 33.452 s … 33.871 s 5 runs Summary 'HEAD: git-fetch' ran 1.67 ± 0.01 times faster than 'HEAD~: git-fetch' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30fetch: skip formatting updated refs with `--quiet`Libravatar Patrick Steinhardt1-5/+12
When fetching, Git will by default print a list of all updated refs in a nicely formatted table. In order to come up with this table, Git needs to iterate refs twice: first to determine the maximum column width, and a second time to actually format these changed refs. While this table will not be printed in case the user passes `--quiet`, we still go out of our way and do all these steps. In fact, we even do more work compared to not passing `--quiet`: without the flag, we will skip all references in the column width computation which have not been updated, but if it is set we will now compute widths for all refs. Fix this issue by completely skipping both preparation of the format and formatting data for display in case the user passes `--quiet`, improving performance especially with many refs. The following benchmark shows a nice speedup for a quiet mirror-fetch in a repository with 2.3M refs: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 26.929 s ± 0.145 s [User: 24.194 s, System: 4.656 s] Range (min … max): 26.692 s … 27.068 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 25.189 s ± 0.094 s [User: 22.556 s, System: 4.606 s] Range (min … max): 25.070 s … 25.314 s 5 runs Summary 'HEAD: git-fetch' ran 1.07 ± 0.01 times faster than 'HEAD~: git-fetch' While at it, this patch also fixes `adjust_refcol_width()` such that it skips unchanged refs in case the user passed `--quiet`, where verbosity will be negative. While this function won't be called anymore if so, this brings the comment in line with actual code. Furthermore, needless `verbosity >= 0` checks are now removed in `store_updated_refs()`: we never print to the `note` buffer anymore in case `verbosity < 0`, so we won't end up in that code block anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25advice: remove read uses of most global `advice_` variablesLibravatar Ben Boeckel1-1/+1
In c4a09cc9ccb (Merge branch 'hw/advise-ng', 2020-03-25), a new API for accessing advice variables was introduced and deprecated `advice_config` in favor of a new array, `advice_setting`. This patch ports all but two uses which read the status of the global `advice_` variables over to the new `advice_enabled` API. We'll deal with advice_add_embedded_repo and advice_graft_file_deprecated separately. Signed-off-by: Ben Boeckel <mathstuf@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24Merge branch 'jt/push-negotiation-fixes'Libravatar Junio C Hamano1-1/+3
Bugfix for common ancestor negotiation recently introduced in "git push" code path. * jt/push-negotiation-fixes: fetch: die on invalid --negotiation-tip hash send-pack: fix push nego. when remote has refs send-pack: fix push.negotiate with remote helper
2021-07-16Merge branch 'ab/fetch-negotiate-segv-fix'Libravatar Junio C Hamano1-0/+3
Code recently added to support common ancestry negotiation during "git push" did not sanity check its arguments carefully enough. * ab/fetch-negotiate-segv-fix: fetch: fix segfault in --negotiate-only without --negotiation-tip=* fetch: document the --negotiate-only option send-pack.c: move "no refs in common" abort earlier
2021-07-15fetch: die on invalid --negotiation-tip hashLibravatar Jonathan Tan1-1/+3
If a full hexadecimal hash is given as a --negotiation-tip to "git fetch", and that hash does not correspond to an object, "git fetch" will segfault if --negotiate-only is given and will silently ignore that hash otherwise. Make these cases fatal errors, just like the case when an invalid ref name or abbreviated hash is given. While at it, mark the error messages as translatable. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-08fetch: fix segfault in --negotiate-only without --negotiation-tip=*Libravatar Ævar Arnfjörð Bjarmason1-0/+3
The recent --negotiate-only option would segfault in the call to oid_array_for_each() in negotiate_using_fetch() unless one or more --negotiation-tip=* options were provided. All of the other tests for the feature combine both, but nothing was checking this assumption, let's do that and add a test for it. Fixes a bug in 9c1e657a8fd (fetch: teach independent negotiation (no packfile), 2021-05-04). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20fetch: improve grammar of "shallow roots" messageLibravatar Alex Henrie1-1/+1
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-16Merge branch 'jt/push-negotiation'Libravatar Junio C Hamano1-1/+26
"git push" learns to discover common ancestor with the receiving end over protocol v2. * jt/push-negotiation: send-pack: support push negotiation fetch: teach independent negotiation (no packfile) fetch-pack: refactor command and capability write fetch-pack: refactor add_haves() fetch-pack: refactor process_acks()
2021-05-05fetch: teach independent negotiation (no packfile)Libravatar Jonathan Tan1-1/+26
Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-16fetch: add --prefetch optionLibravatar Derrick Stolee1-1/+58
The --prefetch option will be used by the 'prefetch' maintenance task instead of sending refspecs explicitly across the command-line. The intention is to modify the refspec to place all results in refs/prefetch/ instead of anywhere else. Create helper method filter_prefetch_refspec() to modify a given refspec to fit the rules expected of the prefetch task: * Negative refspecs are preserved. * Refspecs without a destination are removed. * Refspecs whose source starts with "refs/tags/" are removed. * Other refspecs are placed within "refs/prefetch/". Finally, we add the 'force' option to ensure that prefetch refs are replaced as necessary. There are some interesting cases that are worth testing. An earlier version of this change dropped the "i--" from the loop that deletes a refspec item and shifts the remaining entries down. This allowed some refspecs to not be modified. The subtle part about the first --prefetch test is that the "refs/tags/*" refspec appears directly before the "refs/heads/bogus/*" refspec. Without that "i--", this ordering would remove the "refs/tags/*" refspec and leave the last one unmodified, placing the result in "refs/heads/*". It is possible to have an empty refspec. This is typically the case for remotes other than the origin, where users want to fetch a specific tag or branch. To correctly test this case, we need to further remove the upstream remote for the local branch. Thus, we are testing a refspec that will be deleted, leaving nothing to fetch. Helped-by: Tom Saeger <tom.saeger@oracle.com> Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17Merge branch 'jt/clone-unborn-head'Libravatar Junio C Hamano1-7/+11
"git clone" tries to locally check out the branch pointed at by HEAD of the remote repository after it is done, but the protocol did not convey the information necessary to do so when copying an empty repository. The protocol v2 learned how to do so. * jt/clone-unborn-head: clone: respect remote unborn HEAD connect, transport: encapsulate arg in struct ls-refs: report unborn targets of symrefs
2021-02-05connect, transport: encapsulate arg in structLibravatar Jonathan Tan1-7/+11
In a future patch we plan to return the name of an unborn current branch from deep in the callchain to a caller via a new pointer parameter that points at a variable in the caller when the caller calls get_remote_refs() and transport_get_remote_refs(). In preparation for that, encapsulate the existing ref_prefixes parameter into a struct. The aforementioned unborn current branch will go into this new struct in the future patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12fetch: implement support for atomic reference updatesLibravatar Patrick Steinhardt1-5/+41
When executing a fetch, then git will currently allocate one reference transaction per reference update and directly commit it. This means that fetches are non-atomic: even if some of the reference updates fail, others may still succeed and modify local references. This is fine in many scenarios, but this strategy has its downsides. - The view of remote references may be inconsistent and may show a bastardized state of the remote repository. - Batching together updates may improve performance in certain scenarios. While the impact probably isn't as pronounced with loose references, the upcoming reftable backend may benefit as it needs to write less files in case the update is batched. - The reference-update hook is currently being executed twice per updated reference. While this doesn't matter when there is no such hook, we have seen severe performance regressions when doing a git-fetch(1) with reference-transaction hook when the remote repository has hundreds of thousands of references. Similar to `git push --atomic`, this commit thus introduces atomic fetches. Instead of allocating one reference transaction per updated reference, it causes us to only allocate a single transaction and commit it as soon as all updates were received. If locking of any reference fails, then we abort the complete transaction and don't update any reference, which gives us an all-or-nothing fetch. Note that this may not completely fix the first of above downsides, as the consistent view also depends on the server-side. If the server doesn't have a consistent view of its own references during the reference negotiation phase, then the client would get the same inconsistent view the server has. This is a separate problem though and, if it actually exists, can be fixed at a later point. This commit also changes the way we write FETCH_HEAD in case `--atomic` is passed. Instead of writing changes as we go, we need to accumulate all changes first and only commit them at the end when we know that all reference updates succeeded. Ideally, we'd just do so via a temporary file so that we don't need to carry all updates in-memory. This isn't trivially doable though considering the `--append` mode, where we do not truncate the file but simply append to it. And given that we support concurrent processes appending to FETCH_HEAD at the same time without any loss of data, seeding the temporary file with current contents of FETCH_HEAD initially and then doing a rename wouldn't work either. So this commit implements the simple strategy of buffering all changes and appending them to the file on commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>