summaryrefslogtreecommitdiff
path: root/builtin
AgeCommit message (Collapse)AuthorFilesLines
2021-02-12Merge branch 'tb/precompose-prefix-too'Libravatar Junio C Hamano5-5/+5
When commands are started from a subdirectory, they may have to compare the path to the subdirectory (called prefix and found out from $(pwd)) with the tracked paths. On macOS, $(pwd) and readdir() yield decomposed path, while the tracked paths are usually normalized to the precomposed form, causing mismatch. This has been fixed by taking the same approach used to normalize the command line arguments. * tb/precompose-prefix-too: MacOS: precompose_argv_prefix()
2021-02-12Merge branch 'tb/pack-revindex-on-disk'Libravatar Junio C Hamano3-11/+68
Introduce an on-disk file to record revindex for packdata, which traditionally was always created on the fly and only in-core. * tb/pack-revindex-on-disk: t5325: check both on-disk and in-memory reverse index pack-revindex: ensure that on-disk reverse indexes are given precedence t: support GIT_TEST_WRITE_REV_INDEX t: prepare for GIT_TEST_WRITE_REV_INDEX Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' builtin/pack-objects.c: respect 'pack.writeReverseIndex' builtin/index-pack.c: write reverse indexes builtin/index-pack.c: allow stripping arbitrary extensions pack-write.c: prepare to write 'pack-*.rev' files packfile: prepare for the existence of '*.rev' files
2021-02-10Merge branch 'ds/more-index-cleanups'Libravatar Junio C Hamano2-8/+0
Cleaning various codepaths up. * ds/more-index-cleanups: t1092: test interesting sparse-checkout scenarios test-lib: test_region looks for trace2 regions sparse-checkout: load sparse-checkout patterns name-hash: use trace2 regions for init repository: add repo reference to index_state fsmonitor: de-duplicate BUG()s around dirty bits cache-tree: extract subtree_pos() cache-tree: simplify verify_cache() prototype cache-tree: clean up cache_tree_update()
2021-02-10Merge branch 'rs/worktree-list-verbose'Libravatar Junio C Hamano1-75/+35
`git worktree list` now annotates worktrees as prunable, shows locked and prunable attributes in --porcelain mode, and gained a --verbose option. * rs/worktree-list-verbose: worktree: teach `list` verbose mode worktree: teach `list` to annotate prunable worktree worktree: teach `list --porcelain` to annotate locked worktree t2402: ensure locked worktree is properly cleaned up worktree: teach worktree_lock_reason() to gently handle main worktree worktree: teach worktree to lazy-load "prunable" reason worktree: libify should_prune_worktree()
2021-02-10Merge branch 'ab/lose-grep-debug'Libravatar Junio C Hamano1-5/+0
Lose the debugging aid that may have been useful in the past, but no longer is, in the "grep" codepaths. * ab/lose-grep-debug: grep/log: remove hidden --debug and --grep-debug options
2021-02-10Merge branch 'jk/use-oid-pos'Libravatar Junio C Hamano1-5/+5
Code clean-up to ensure our use of hashtables using object names as keys use the "struct object_id" objects, not the raw hash values. * jk/use-oid-pos: oid_pos(): access table through const pointers hash_pos(): convert to oid_pos() rerere: use strmap to store rerere directories rerere: tighten rr-cache dirname check rerere: check dirname format while iterating rr_cache directory commit_graft_pos(): take an oid instead of a bare hash
2021-02-05Merge branch 'jv/pack-objects-narrower-ref-iteration'Libravatar Junio C Hamano1-5/+3
The "pack-objects" command needs to iterate over all the tags when automatic tag following is enabled, but it actually iterated over all refs and then discarded everything outside "refs/tags/" hierarchy, which was quite wasteful. * jv/pack-objects-narrower-ref-iteration: builtin/pack-objects.c: avoid iterating all refs
2021-02-05Merge branch 'ph/use-delete-refs'Libravatar Junio C Hamano2-29/+62
When removing many branches and tags, the code used to do so one ref at a time. There is another API it can use to delete multiple refs, and it makes quite a lot of performance difference when the refs are packed. * ph/use-delete-refs: use delete_refs when deleting tags or branches
2021-02-05Merge branch 'zh/ls-files-deduplicate'Libravatar Junio C Hamano1-31/+50
"git ls-files" can and does show multiple entries when the index is unmerged, which is a source for confusion unless -s/-u option is in use. A new option --deduplicate has been introduced. * zh/ls-files-deduplicate: ls-files.c: add --deduplicate option ls_files.c: consolidate two for loops into one ls_files.c: bugfix for --deleted and --modified
2021-02-05Merge branch 'so/log-diff-merge'Libravatar Junio C Hamano4-22/+17
"git log" learned a new "--diff-merges=<how>" option. * so/log-diff-merge: (32 commits) t4013: add tests for --diff-merges=first-parent doc/git-show: include --diff-merges description doc/rev-list-options: document --first-parent changes merges format doc/diff-generate-patch: mention new --diff-merges option doc/git-log: describe new --diff-merges options diff-merges: add '--diff-merges=1' as synonym for 'first-parent' diff-merges: add old mnemonic counterparts to --diff-merges diff-merges: let new options enable diff without -p diff-merges: do not imply -p for new options diff-merges: implement new values for --diff-merges diff-merges: make -m/-c/--cc explicitly mutually exclusive diff-merges: refactor opt settings into separate functions diff-merges: get rid of now empty diff_merges_init_revs() diff-merges: group diff-merge flags next to each other inside 'rev_info' diff-merges: split 'ignore_merges' field diff-merges: fix -m to properly override -c/--cc t4013: add tests for -m failing to override -c/--cc t4013: support test_expect_failure through ':failure' magic diff-merges: revise revs->diff flag handling diff-merges: handle imply -p on -c/--cc logic for log.c ...
2021-02-03Merge branch 'jk/peel-iterated-oid'Libravatar Junio C Hamano4-5/+5
The peel_ref() API has been replaced with peel_iterated_oid(). * jk/peel-iterated-oid: refs: switch peel_ref() to peel_iterated_oid()
2021-02-03Merge branch 'ds/maintenance-prefetch-cleanup'Libravatar Junio C Hamano1-0/+6
Test clean-up plus UI improvement by hiding extra refs that the prefetch task uses from "log --decorate" output. * ds/maintenance-prefetch-cleanup: t7900: clean up some broken refs maintenance: set log.excludeDecoration durin prefetch
2021-02-03MacOS: precompose_argv_prefix()Libravatar Torsten Bögershausen5-5/+5
The following sequence leads to a "BUG" assertion running under MacOS: DIR=git-test-restore-p Adiarnfd=$(printf 'A\314\210') DIRNAME=xx${Adiarnfd}yy mkdir $DIR && cd $DIR && git init && mkdir $DIRNAME && cd $DIRNAME && echo "Initial" >file && git add file && echo "One more line" >>file && echo y | git restore -p . Initialized empty Git repository in /tmp/git-test-restore-p/.git/ BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-index --cached --numstat [snip] The command `git restore` is run from a directory inside a Git repo. Git needs to split the $CWD into 2 parts: The path to the repo and "the rest", if any. "The rest" becomes a "prefix" later used inside the pathspec code. As an example, "/path/to/repo/dir-inside-repå" would determine "/path/to/repo" as the root of the repo, the place where the configuration file .git/config is found. The rest becomes the prefix ("dir-inside-repå"), from where the pathspec machinery expands the ".", more about this later. If there is a decomposed form, (making the decomposing visible like this), "dir-inside-rep°a" doesn't match "dir-inside-repå". Git commands need to: (a) read the configuration variable "core.precomposeunicode" (b) precocompose argv[] (c) precompose the prefix, if there was any The first commit, 76759c7dff53 "git on Mac OS and precomposed unicode" addressed (a) and (b). The call to precompose_argv() was added into parse-options.c, because that seemed to be a good place when the patch was written. Commands that don't use parse-options need to do (a) and (b) themselfs. The commands `diff-files`, `diff-index`, `diff-tree` and `diff` learned (a) and (b) in commit 90a78b83e0b8 "diff: run arguments through precompose_argv" Branch names (or refs in general) using decomposed code points resulting in decomposed file names had been fixed in commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places" The bug report from above shows 2 things: - more commands need to handle precomposed unicode - (c) should be implemented for all commands using pathspecs Solution: precompose_argv() now handles the prefix (if needed), and is renamed into precompose_argv_prefix(). Inside this function the config variable core.precomposeunicode is read into the global variable precomposed_unicode, as before. This reading is skipped if precomposed_unicode had been read before. The original patch for preocomposed unicode, 76759c7dff53, placed precompose_argv() into parse-options.c Now add it into git.c::run_builtin() as well. Existing precompose calls in diff-files.c and others may become redundant, and if we audit the callflows that reach these places to make sure that they can never be reached without going through the new call added to run_builtin(), we might be able to remove these existing ones. But in this commit, we do not bother to do so and leave these precompose callsites as they are. Because precompose() is idempotent and can be called on an already precomposed string safely, this is safer than removing existing calls without fully vetting the callflows. There is certainly room for cleanups - this change intends to be a bug fix. Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should be done in future commits. [1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) [2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/ Reported-by: Daniel Troger <random_n0body@icloud.com> Helped-By: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-30worktree: teach `list` verbose modeLibravatar Rafael Silva1-2/+12
"git worktree list" annotates each worktree according to its state such as "prunable" or "locked", however it is not immediately obvious why these worktrees are being annotated. For prunable worktrees a reason is available that is returned by should_prune_worktree() and for locked worktrees a reason might be available provided by the user via `lock` command. Let's teach "git worktree list" a --verbose mode that outputs the reason why the worktrees are being annotated. The reason is a text that can take virtually any size and appending the text on the default columned format will make it difficult to extend the command with other annotations and not fit nicely on the screen. In order to address this shortcoming the annotation is then moved to the next line indented followed by the reason If the reason is not available the annotation stays on the same line as the worktree itself. The output of "git worktree list" with verbose becomes like so: $ git worktree list --verbose ... /path/to/locked-no-reason acb124 [branch-a] locked /path/to/locked-with-reason acc125 [branch-b] locked: worktree with a locked reason /path/to/prunable-reason ace127 [branch-d] prunable: gitdir file points to non-existent location ... Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-30worktree: teach `list` to annotate prunable worktreeLibravatar Rafael Silva1-0/+10
The "git worktree list" command shows the absolute path to the worktree, the commit that is checked out, the name of the branch, and a "locked" annotation if the worktree is locked, however, it does not indicate whether the worktree is prunable. The "prune" command will remove a worktree if it is prunable unless `--dry-run` option is specified. This could lead to a worktree being removed without the user realizing before it is too late, in case the user forgets to pass --dry-run for instance. If the "list" command shows which worktree is prunable, the user could verify before running "git worktree prune" and hopefully prevents the working tree to be removed "accidentally" on the worse case scenario. Let's teach "git worktree list" to show when a worktree is a prunable candidate for both default and porcelain format. In the default format a "prunable" text is appended: $ git worktree list /path/to/main aba123 [main] /path/to/linked 123abc [branch-a] /path/to/prunable ace127 (detached HEAD) prunable In the --porcelain format a prunable label is added followed by its reason: $ git worktree list --porcelain ... worktree /path/to/prunable HEAD abc1234abc1234abc1234abc1234abc1234abc12 detached prunable gitdir file points to non-existent location ... Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-30worktree: teach `list --porcelain` to annotate locked worktreeLibravatar Rafael Silva1-0/+13
Commit c57b3367be (worktree: teach `list` to annotate locked worktree, 2020-10-11) taught "git worktree list" to annotate locked worktrees by appending "locked" text to its output, however, this is not listed in the --porcelain format. Teach "list --porcelain" to do the same and add a "locked" attribute followed by its reason, thus making both default and porcelain format consistent. If the locked reason is not available then only "locked" is shown. The output of the "git worktree list --porcelain" becomes like so: $ git worktree list --porcelain ... worktree /path/to/locked HEAD 123abcdea123abcd123acbd123acbda123abcd12 detached locked worktree /path/to/locked-with-reason HEAD abc123abc123abc123abc123abc123abc123abc1 detached locked reason why it is locked ... In porcelain mode, if the lock reason contains special characters such as newlines, they are escaped with backslashes and the entire reason is enclosed in double quotes. For example: $ git worktree list --porcelain ... locked "worktree's path mounted in\nremovable device" ... Furthermore, let's update the documentation to state that some attributes in the porcelain format might be listed alone or together with its value depending whether the value is available or not. Thus documenting the case of the new "locked" attribute. Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-30worktree: teach worktree_lock_reason() to gently handle main worktreeLibravatar Rafael Silva1-1/+1
worktree_lock_reason() aborts with an assertion failure when called on the main worktree since locking the main worktree is nonsensical. Not only is this behavior undocumented, thus callers might not even be aware that the call could potentially crash the program, but it also forces clients to be extra careful: if (!is_main_worktree(wt) && worktree_locked_reason(...)) ... Since we know that locking makes no sense in the context of the main worktree, we can simply return false for the main worktree, thus making client code less complex by eliminating the need for the callers to have inside knowledge about the implementation: if (worktree_lock_reason(...)) ... Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-30worktree: libify should_prune_worktree()Libravatar Rafael Silva1-74/+1
As part of teaching "git worktree list" to annotate worktree that is a candidate for pruning, let's move should_prune_worktree() from builtin/worktree.c to worktree.c in order to make part of the worktree public API. should_prune_worktree() knows how to select the given worktree for pruning based on an expiration date, however the expiration value is stored in a static file-scope variable and it is not local to the function. In order to move the function, teach should_prune_worktree() to take the expiration date as an argument and document the new parameter that is not immediately obvious. Also, change the function comment to clearly state that the worktree's path is returned in `wtpath` argument. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-28oid_pos(): access table through const pointersLibravatar Jeff King1-2/+2
When we are looking up an oid in an array, we obviously don't need to write to the array. Let's mark it as const in the function interfaces, as well as in the local variables we use to derference the void pointer (note a few cases use pointers-to-pointers, so we mark everything const). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-28hash_pos(): convert to oid_pos()Libravatar Jeff King1-4/+4
All of our callers are actually looking up an object_id, not a bare hash. Likewise, the arrays they are looking in are actual arrays of object_id (not just raw bytes of hashes, as we might find in a pack .idx; those are handled by bsearch_hash()). Using an object_id gives us more type safety, and makes the callers slightly shorter. It also gets rid of the word "sha1" from several access functions, though we could obviously also rename those with s/sha1/hash/. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-26grep/log: remove hidden --debug and --grep-debug optionsLibravatar Ævar Arnfjörð Bjarmason1-5/+0
Remove the hidden "grep --debug" and "log --grep-debug" options added in 17bf35a3c7b (grep: teach --debug option to dump the parse tree, 2012-09-13). At the time these options seem to have been intended to go along with a documentation discussion and to help the author of relevant tests to perform ad-hoc debugging on them[1]. Reasons to want this gone: 1. They were never documented, and the only (rather trivial) use of them in our own codebase for testing is something I removed back in e01b4dab01e (grep: change non-ASCII -i test to stop using --debug, 2017-05-20). 2. Googling around doesn't show any in-the-wild uses I could dig up, and on the Git ML the only mentions after the original discussion seem to have been when they came up in unrelated diff contexts, or that test commit of mine. 3. An exception to that is c581e4a7499 (grep: under --debug, show whether PCRE JIT is enabled, 2019-08-18) where we added the ability to dump out when PCREv2 has the JIT in effect. The combination of that and my earlier b65abcafc7a (grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) means Git prints this out in its most common in-the-wild configuration: $ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 [all-match] (or pattern_body<body>foo (or pattern_body<body>bar pattern_body<body>baz ) ) $ git grep --debug \( -e foo --and -e bar \) --or -e baz pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 (or (and patternfoo patternbar ) patternbaz ) I.e. for each pattern we're considering for the and/or/--all-match etc. debugging we'll now diligently spew out another identical line saying whether the PCREv2 JIT is on or not. I think that nobody's complained about that rather glaringly obviously bad output says something about how much this is used, i.e. it's not. The need for this debugging aid for the composed grep/log patterns seems to have passed, and the desire to dump the JIT config seems to have been another one-off around the time we had JIT-related issues on the PCREv2 codepath. That the original author of this debugging facility seemingly hasn't noticed the bad output since then[2] is probably some indicator. 1. https://lore.kernel.org/git/cover.1347615361.git.git@drmicha.warpmail.net/ 2. https://lore.kernel.org/git/xmqqk1b8x0ac.fsf@gitster-ct.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25t: support GIT_TEST_WRITE_REV_INDEXLibravatar Taylor Blau2-1/+6
Add a new option that unconditionally enables the pack.writeReverseIndex setting in order to run the whole test suite in a mode that generates on-disk reverse indexes. Additionally, enable this mode in the second run of tests under linux-gcc in 'ci/run-build-and-tests.sh'. Once on-disk reverse indexes are proven out over several releases, we can change the default value of that configuration to 'true', and drop this patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25builtin/pack-objects.c: respect 'pack.writeReverseIndex'Libravatar Taylor Blau1-0/+7
Now that we have an implementation that can write the new reverse index format, enable writing a .rev file in 'git pack-objects' by consulting the pack.writeReverseIndex configuration variable. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25builtin/index-pack.c: write reverse indexesLibravatar Taylor Blau1-3/+47
Teach 'git index-pack' to optionally write and verify reverse index with '--[no-]rev-index', as well as respecting the 'pack.writeReverseIndex' configuration option. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25builtin/index-pack.c: allow stripping arbitrary extensionsLibravatar Taylor Blau1-8/+8
To derive the filename for a .idx file, 'git index-pack' uses derive_filename() to strip the '.pack' suffix and add the new suffix. Prepare for stripping off suffixes other than '.pack' by making the suffix to strip a parameter of derive_filename(). In order to make this consistent with the "suffix" parameter which does not begin with a ".", an additional check in derive_filename. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25packfile: prepare for the existence of '*.rev' filesLibravatar Taylor Blau1-0/+1
Specify the format of the on-disk reverse index 'pack-*.rev' file, as well as prepare the code for the existence of such files. The reverse index maps from pack relative positions (i.e., an index into the array of object which is sorted by their offsets within the packfile) to their position within the 'pack-*.idx' file. Today, this is done by building up a list of (off_t, uint32_t) tuples for each object (the off_t corresponding to that object's offset, and the uint32_t corresponding to its position in the index). To convert between pack and index position quickly, this array of tuples is radix sorted based on its offset. This has two major drawbacks: First, the in-memory cost scales linearly with the number of objects in a pack. Each 'struct revindex_entry' is sizeof(off_t) + sizeof(uint32_t) + padding bytes for a total of 16. To observe this, force Git to load the reverse index by, for e.g., running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking for a single object in a fresh clone of the kernel, Git needs to allocate 120+ MB of memory in order to hold the reverse index in memory. Second, the cost to sort also scales with the size of the pack. Luckily, this is a linear function since 'load_pack_revindex()' uses a radix sort, but this cost still must be paid once per pack per process. As an example, it takes ~60x longer to print the _size_ of an object as it does to print that entire object's _contents_: Benchmark #1: git.compile cat-file --batch <obj Time (mean ± σ): 3.4 ms ± 0.1 ms [User: 3.3 ms, System: 2.1 ms] Range (min … max): 3.2 ms … 3.7 ms 726 runs Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj Time (mean ± σ): 210.3 ms ± 8.9 ms [User: 188.2 ms, System: 23.2 ms] Range (min … max): 193.7 ms … 224.4 ms 13 runs Instead, avoid computing and sorting the revindex once per process by writing it to a file when the pack itself is generated. The format is relatively straightforward. It contains an array of uint32_t's, the length of which is equal to the number of objects in the pack. The ith entry in this table contains the index position of the ith object in the pack, where "ith object in the pack" is determined by pack offset. One thing that the on-disk format does _not_ contain is the full (up to) eight-byte offset corresponding to each object. This is something that the in-memory revindex contains (it stores an off_t in 'struct revindex_entry' along with the same uint32_t that the on-disk format has). Omit it in the on-disk format, since knowing the index position for some object is sufficient to get a constant-time lookup in the pack-*.idx file to ask for an object's offset within the pack. This trades off between the on-disk size of the 'pack-*.rev' file for runtime to chase down the offset for some object. Even though the lookup is constant time, the constant is heavier, since it can potentially involve two pointer walks in v2 indexes (one to access the 4-byte offset table, and potentially a second to access the double wide offset table). Consider trying to map an object's pack offset to a relative position within that pack. In a cold-cache scenario, more page faults occur while switching between binary searching through the reverse index and searching through the *.idx file for an object's offset. Sure enough, with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after 'sync'ing), printing out the entire object's contents is still marginally faster than printing its size: Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null Time (mean ± σ): 22.6 ms ± 0.5 ms [User: 2.4 ms, System: 7.9 ms] Range (min … max): 21.4 ms … 23.5 ms 41 runs Benchmark #2: git.compile cat-file --batch <obj >/dev/null Time (mean ± σ): 17.2 ms ± 0.7 ms [User: 2.8 ms, System: 5.5 ms] Range (min … max): 15.6 ms … 18.2 ms 45 runs (Numbers taken in the kernel after cheating and using the next patch to generate a reverse index). There are a couple of approaches to improve cold cache performance not pursued here: - We could include the object offsets in the reverse index format. Predictably, this does result in fewer page faults, but it triples the size of the file, while simultaneously duplicating a ton of data already available in the .idx file. (This was the original way I implemented the format, and it did show `--batch-check='%(objectsize:disk)'` winning out against `--batch`.) On the other hand, this increase in size also results in a large block-cache footprint, which could potentially hurt other workloads. - We could store the mapping from pack to index position in more cache-friendly way, like constructing a binary search tree from the table and writing the values in breadth-first order. This would result in much better locality, but the price you pay is trading O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you can no longer directly index the table). So, neither of these approaches are taken here. (Thankfully, the format is versioned, so we are free to pursue these in the future.) But, cold cache performance likely isn't interesting outside of one-off cases like asking for the size of an object directly. In real-world usage, Git is often performing many operations in the revindex (i.e., asking about many objects rather than a single one). The trade-off is worth it, since we will avoid the vast majority of the cost of generating the revindex that the extra pointer chase will look like noise in the following patch's benchmarks. This patch describes the format and prepares callers (like in pack-revindex.c) to be able to read *.rev files once they exist. An implementation of the writer will appear in the next patch, and callers will gradually begin to start using the writer in the patches that follow after that. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25Merge branch 'tb/pack-revindex-api'Libravatar Junio C Hamano2-14/+25
Abstract accesses to in-core revindex that allows enumerating objects stored in a packfile in the order they appear in the pack, in preparation for introducing an on-disk precomputed revindex. * tb/pack-revindex-api: (21 commits) for_each_object_in_pack(): clarify pack vs index ordering pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()' pack-revindex: hide the definition of 'revindex_entry' pack-revindex: remove unused 'find_revindex_position()' pack-revindex: remove unused 'find_pack_revindex()' builtin/gc.c: guess the size of the revindex for_each_object_in_pack(): convert to new revindex API unpack_entry(): convert to new revindex API packed_object_info(): convert to new revindex API retry_bad_packed_offset(): convert to new revindex API get_delta_base_oid(): convert to new revindex API rebuild_existing_bitmaps(): convert to new revindex API try_partial_reuse(): convert to new revindex API get_size_by_pos(): convert to new revindex API show_objects_for_type(): convert to new revindex API bitmap_position_packfile(): convert to new revindex API check_object(): convert to new revindex API write_reused_pack_verbatim(): convert to new revindex API write_reused_pack_one(): convert to new revindex API write_reuse_object(): convert to new revindex API ...
2021-01-25Merge branch 'cc/write-promisor-file'Libravatar Junio C Hamano1-5/+3
A bit of code refactoring. * cc/write-promisor-file: pack-write: die on error in write_promisor_file() fetch-pack: refactor writing promisor file fetch-pack: rename helper to create_promisor_file()
2021-01-25Merge branch 'ab/mailmap'Libravatar Junio C Hamano5-18/+6
Clean-up docs, codepaths and tests around mailmap. * ab/mailmap: (22 commits) shortlog: remove unused(?) "repo-abbrev" feature mailmap doc + tests: document and test for case-insensitivity mailmap tests: add tests for empty "<>" syntax mailmap tests: add tests for whitespace syntax mailmap tests: add a test for comment syntax mailmap doc + tests: add better examples & test them tests: refactor a few tests to use "test_commit --append" test-lib functions: add an --append option to test_commit test-lib functions: add --author support to test_commit test-lib functions: document arguments to test_commit test-lib functions: expand "test_commit" comment template mailmap: test for silent exiting on missing file/blob mailmap tests: get rid of overly complex blame fuzzing mailmap tests: add a test for "not a blob" error mailmap tests: remove redundant entry in test mailmap tests: improve --stdin tests mailmap tests: modernize syntax & test idioms mailmap tests: use our preferred whitespace syntax mailmap doc: start by mentioning the comment syntax check-mailmap doc: note config options ...
2021-01-25Merge branch 'ps/fetch-atomic'Libravatar Junio C Hamano1-57/+167
"git fetch" learns to treat ref updates atomically in all-or-none fashion, just like "git push" does, with the new "--atomic" option. * ps/fetch-atomic: fetch: implement support for atomic reference updates fetch: allow passing a transaction to `s_update_ref()` fetch: refactor `s_update_ref` to use common exit path fetch: use strbuf to format FETCH_HEAD updates fetch: extract writing to FETCH_HEAD
2021-01-25Merge branch 'jc/deprecate-pack-redundant'Libravatar Junio C Hamano1-0/+13
Warn loudly when the "pack-redundant" command, which has been left stale with almost unusable performance issues, gets used, as we no longer want to recommend its use (instead just "repack -d" instead). * jc/deprecate-pack-redundant: pack-redundant: gauge the usage before proposing its removal
2021-01-25Merge branch 'ab/branch-sort'Libravatar Junio C Hamano3-4/+6
The implementation of "git branch --sort" wrt the detached HEAD display has always been hacky, which has been cleaned up. * ab/branch-sort: branch: show "HEAD detached" first under reverse sort branch: sort detached HEAD based on a flag ref-filter: move ref_sorting flags to a bitfield ref-filter: move "cmp_fn" assignment into "else if" arm ref-filter: add braces to if/else if/else chain branch tests: add to --sort tests branch: change "--local" to "--list" in comment
2021-01-25Merge branch 'ma/more-opaque-lock-file'Libravatar Junio C Hamano1-4/+4
Code clean-up. * ma/more-opaque-lock-file: read-cache: try not to peek into `struct {lock_,temp}file` refs/files-backend: don't peek into `struct lock_file` midx: don't peek into `struct lock_file` commit-graph: don't peek into `struct lock_file` builtin/gc: don't peek into `struct lock_file`
2021-01-25Merge branch 'ab/mktag'Libravatar Junio C Hamano2-171/+84
"git mktag" validates its input using its own rules before writing a tag object---it has been updated to share the logic with "git fsck". * ab/mktag: (23 commits) mktag: add a --[no-]strict option mktag: mark strings for translation mktag: convert to parse-options mktag: allow omitting the header/body \n separator mktag: allow turning off fsck.extraHeaderEntry fsck: make fsck_config() re-usable mktag: use fsck instead of custom verify_tag() mktag: use puts(str) instead of printf("%s\n", str) mktag: remove redundant braces in one-line body "if" mktag: use default strbuf_read() hint mktag tests: test verify_object() with replaced objects mktag tests: improve verify_object() test coverage mktag tests: test "hash-object" compatibility mktag tests: stress test whitespace handling mktag tests: run "fsck" after creating "mytag" mktag tests: don't create "mytag" twice mktag tests: don't redirect stderr to a file needlessly mktag tests: remove needless SHA-1 hardcoding mktag tests: use "test_commit" helper mktag tests: don't needlessly use a subshell ...
2021-01-23sparse-checkout: load sparse-checkout patternsLibravatar Derrick Stolee1-5/+0
A future feature will want to load the sparse-checkout patterns into a pattern_list, but the current mechanism to do so is a bit complicated. This is made difficult due to needing to find the sparse-checkout file in different ways throughout the codebase. The logic implemented in the new get_sparse_checkout_patterns() was duplicated in populate_from_existing_patterns() in unpack-trees.c. Use the new method instead, keeping the logic around handling the struct unpack_trees_options. The callers to get_sparse_checkout_filename() in builtin/sparse-checkout.c manipulate the sparse-checkout file directly, so it is not appropriate to replace logic in that file with get_sparse_checkout_patterns(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23cache-tree: clean up cache_tree_update()Libravatar Derrick Stolee1-3/+0
Make the method safer by allocating a cache_tree member for the given index_state if it is not already present. This is preferrable to a BUG() statement or returning with an error because future callers will want to populate an empty cache-tree using this method. Callers can also remove their conditional allocations of cache_tree. Also drop local variables that can be found directly from the 'istate' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23ls-files.c: add --deduplicate optionLibravatar ZheNing Hu1-3/+28
During a merge conflict, the name of a file may appear multiple times in "git ls-files" output, once for each stage. If you use both `--delete` and `--modify` at the same time, the output may mention a deleted file twice. When none of the '-t', '-u', or '-s' options is in use, these duplicate entries do not add much value to the output. Introduce a new '--deduplicate' option to suppress them. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: extended doc and rewritten commit log] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23ls_files.c: consolidate two for loops into oneLibravatar ZheNing Hu1-36/+27
This will make it easier to show only one entry per filename in the next step. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: corrected the log message] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23ls_files.c: bugfix for --deleted and --modifiedLibravatar ZheNing Hu1-4/+7
This situation may occur in the original code: lstat() failed but we use `&st` to feed ie_modified() later. Therefore, we can directly execute show_ce without the judgment of ie_modified() when lstat() has failed. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: fixed misindented code] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-22builtin/pack-objects.c: avoid iterating all refsLibravatar Jacob Vosmaer1-5/+3
In git-pack-objects, we iterate over all the tags if the --include-tag option is passed on the command line. For some reason this uses for_each_ref which is expensive if the repo has many refs. We should use for_each_tag_ref instead. Because the add_ref_tag callback will now only visit tags we simplified it a bit. The motivation for this change is that we observed performance issues with a repository on gitlab.com that has 500,000 refs but only 2,000 tags. The fetch traffic on that repo is dominated by CI, and when we changed CI to fetch with 'git fetch --no-tags' we saw a dramatic change in the CPU profile of git-pack-objects. This lead us to this particular ref walk. More details in: https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598 Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21use delete_refs when deleting tags or branchesLibravatar Phil Hord2-29/+62
'git tag -d' accepts one or more tag refs to delete, but each deletion is done by calling `delete_ref` on each argv. This is very slow when removing from packed refs. Use delete_refs instead so all the removals can be done inside a single transaction with a single update. Do the same for 'git branch -d'. Since delete_refs performs all the packed-refs delete operations inside a single transaction, if any of the deletes fail then all them will be skipped. In practice, none of them should fail since we verify the hash of each one before calling delete_refs, but some network error or odd permissions problem could have different results after this change. Also, since the file-backed deletions are not performed in the same transaction, those could succeed even when the packed-refs transaction fails. After deleting branches, remove the branch config only if the branch ref was removed and was not subsequently added back in. A manual test deleting 24,000 tags took about 30 minutes using delete_ref. It takes about 5 seconds using delete_refs. Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Phil Hord <phil.hord@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21refs: switch peel_ref() to peel_iterated_oid()Libravatar Jeff King4-5/+5
The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char *refname, const struct object_id *oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-20maintenance: set log.excludeDecoration durin prefetchLibravatar Derrick Stolee1-0/+6
The 'prefetch' task fetches refs from all remotes and places them in the refs/prefetch/<remote>/ refspace. As this task is intended to run in the background, this allows users to keep their local data very close to the remote servers' data while not updating the users' understanding of the remote refs in refs/remotes/<remote>/. However, this can clutter 'git log' decorations with copies of the refs with the full name 'refs/prefetch/<remote>/<branch>'. The log.excludeDecoration config option was added in a6be5e67 (log: add log.excludeDecoration config option, 2020-05-16) for exactly this purpose. Ensure we set this only for users that would benefit from it by assigning it at the beginning of the prefetch task. Other alternatives would be during 'git maintenance register' or 'git maintenance start', but those might assign the config even when the prefetch task is disabled by existing config. Further, users could run 'git maintenance run --task=prefetch' using their own scripting or scheduling. This provides the best coverage to automatically update the config when valuable. It is improbable, but possible, that users might want to run the prefetch task _and_ see these refs in their log decorations. This seems incredibly unlikely to me, but users can always opt-in on a command-by-command basis using --decorate-refs=refs/prefetch/. Test that this works in a few cases. In particular, ensure that our assignment of log.excludeDecoration=refs/prefetch/ is additive to other existing exclusions. Further, ensure we do not add multiple copies in multiple runs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15Merge branch 'ds/for-each-repo-noopfix'Libravatar Junio C Hamano1-0/+7
"git for-each-repo --config=<var> <cmd>" should not run <cmd> for any repository when the configuration variable <var> is not defined even once. * ds/for-each-repo-noopfix: for-each-repo: do nothing on empty config
2021-01-15Merge branch 'ds/maintenance-part-4'Libravatar Junio C Hamano1-27/+395
Follow-up on the "maintenance part-3" which introduced scheduled maintenance tasks to support platforms whose native scheduling methods are not 'cron'. * ds/maintenance-part-4: maintenance: use Windows scheduled tasks maintenance: use launchctl on macOS maintenance: include 'cron' details in docs maintenance: extract platform-specific scheduling
2021-01-15Merge branch 'en/stash-apply-sparse-checkout'Libravatar Junio C Hamano1-49/+116
"git stash" did not work well in a sparsely checked out working tree. * en/stash-apply-sparse-checkout: stash: fix stash application in sparse-checkouts stash: remove unnecessary process forking t7012: add a testcase demonstrating stash apply bugs in sparse checkouts
2021-01-15Merge branch 'zh/arg-help-format'Libravatar Junio C Hamano7-61/+61
Clean up option descriptions in "git cmd --help". * zh/arg-help-format: builtin/*: update usage format parse-options: format argh like error messages
2021-01-15Merge branch 'rs/rebase-commit-validation'Libravatar Junio C Hamano1-1/+3
Diagnose command line error of "git rebase" early. * rs/rebase-commit-validation: rebase: verify commit parameter
2021-01-15Merge branch 'ma/sha1-is-a-hash'Libravatar Junio C Hamano2-3/+3
Retire more names with "sha1" in it. * ma/sha1-is-a-hash: hash-lookup: rename from sha1-lookup sha1-lookup: rename `sha1_pos()` as `hash_pos()` object-file.c: rename from sha1-file.c object-name.c: rename from sha1-name.c
2021-01-15Merge branch 'bc/rev-parse-path-format'Libravatar Junio C Hamano1-12/+94
"git rev-parse" can be explicitly told to give output as absolute or relative path with the `--path-format=(absolute|relative)` option. * bc/rev-parse-path-format: rev-parse: add option for absolute or relative path formatting abspath: add a function to resolve paths with missing components