summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-15diffcore-rename: guide inexact rename detection based on basenamesLibravatar Elijah Newren2-8/+52
Make use of the new find_basename_matches() function added in the last two patches, to find renames more rapidly in cases where we can match up files based on basenames. As a quick reminder (see the last two commit messages for more details), this means for example that docs/extensions.txt and docs/config/extensions.txt are considered likely renames if there are no remaining 'extensions.txt' files elsewhere among the added and deleted files, and if a similarity check confirms they are similar, then they are marked as a rename without looking for a better similarity match among other files. This is a behavioral change, as covered in more detail in the previous commit message. We do not use this heuristic together with either break or copy detection. The point of break detection is to say that filename similarity does not imply file content similarity, and we only want to know about file content similarity. The point of copy detection is to use more resources to check for additional similarities, while this is an optimization that uses far less resources but which might also result in finding slightly fewer similarities. So the idea behind this optimization goes against both of those features, and will be turned off for both. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 13.815 s ± 0.062 s 13.294 s ± 0.103 s mega-renames: 1799.937 s ± 0.493 s 187.248 s ± 0.882 s just-one-mega: 51.289 s ± 0.019 s 5.557 s ± 0.017 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15diffcore-rename: complete find_basename_matches()Libravatar Elijah Newren1-3/+79
It is not uncommon in real world repositories for the majority of file renames to not change the basename of the file; i.e. most "renames" are just a move of files into different directories. We can make use of this to avoid comparing all rename source candidates with all rename destination candidates, by first comparing sources to destinations with the same basenames. If two files with the same basename are sufficiently similar, we record the rename; if not, we include those files in the more exhaustive matrix comparison. This means we are adding a set of preliminary additional comparisons, but for each file we only compare it with at most one other file. For example, if there was a include/media/device.h that was deleted and a src/module/media/device.h that was added, and there are no other device.h files in the remaining sets of added and deleted files after exact rename detection, then these two files would be compared in the preliminary step. This commit does not yet actually employ this new optimization, it merely adds a function which can be used for this purpose. The next commit will do the necessary plumbing to make use of it. Note that this optimization might give us different results than without the optimization, because it's possible that despite files with the same basename being sufficiently similar to be considered a rename, there's an even better match between files without the same basename. I think that is okay for four reasons: (1) it's easy to explain to the users what happened if it does ever occur (or even for them to intuitively figure out), (2) as the next patch will show it provides such a large performance boost that it's worth the tradeoff, and (3) it's somewhat unlikely that despite having unique matching basenames that other files serve as better matches. Reason (4) takes a full paragraph to explain... If the previous three reasons aren't enough, consider what rename detection already does. Break detection is not the default, meaning that if files have the same _fullname_, then they are considered related even if they are 0% similar. In fact, in such a case, we don't even bother comparing the files to see if they are similar let alone comparing them to all other files to see what they are most similar to. Basically, we override content similarity based on sufficient filename similarity. Without the filename similarity (currently implemented as an exact match of filename), we swing the pendulum the opposite direction and say that filename similarity is irrelevant and compare a full N x M matrix of sources and destinations to find out which have the most similar contents. This optimization just adds another form of filename similarity comparison, but augments it with a file content similarity check as well. Basically, if two files have the same basename and are sufficiently similar to be considered a rename, mark them as such without comparing the two to all other rename candidates. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15diffcore-rename: compute basenames of source and dest candidatesLibravatar Elijah Newren1-0/+63
We want to make use of unique basenames among remaining source and destination files to help inform rename detection, so that more likely pairings can be checked first. (src/moduleA/foo.txt and source/module/A/foo.txt are likely related if there are no other 'foo.txt' files among the remaining deleted and added files.) Add a new function, not yet used, which creates a map of the unique basenames within rename_src and another within rename_dst, together with the indices within rename_src/rename_dst where those basenames show up. Non-unique basenames still show up in the map, but have an invalid index (-1). This function was inspired by the fact that in real world repositories, files are often moved across directories without changing names. Here are some sample repositories and the percentage of their historical renames (as of early 2020) that preserved basenames: * linux: 76% * gcc: 64% * gecko: 79% * webkit: 89% These statistics alone don't prove that an optimization in this area will help or how much it will help, since there are also unpaired adds and deletes, restrictions on which basenames we consider, etc., but it certainly motivated the idea to try something in this area. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15t4001: add a test comparing basename similarity and content similarityLibravatar Elijah Newren1-0/+23
Add a simple test where a removed file is similar to two different added files; one of them has the same basename, and the other has a slightly higher content similarity. In the current test, content similarity is weighted higher than filename similarity. Subsequent commits will add a new rule that weighs a mixture of filename similarity and content similarity in a manner that will change the outcome of this testcase. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15diffcore-rename: filter rename_src list when possibleLibravatar Elijah Newren1-8/+51
We have to look at each entry in rename_src a total of rename_dst_nr times. When we're not detecting copies, any exact renames or ignorable rename paths will just be skipped over. While checking that these can be skipped over is a relatively cheap check, it's still a waste of time to do that check more than once, let alone rename_dst_nr times. When rename_src_nr is a few thousand times bigger than the number of relevant sources (such as when cherry-picking a commit that only touched a handful of files, but from a side of history that has different names for some high level directories), this time can add up. First make an initial pass over the rename_src array and move all the relevant entries to the front, so that we can iterate over just those relevant entries. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 14.119 s ± 0.101 s 13.815 s ± 0.062 s mega-renames: 1802.044 s ± 0.828 s 1799.937 s ± 0.493 s just-one-mega: 51.391 s ± 0.028 s 51.289 s ± 0.019 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-12diffcore-rename: no point trying to find a match better than exactLibravatar Elijah Newren1-6/+14
diffcore_rename() had some code to avoid having destination paths that already had an exact rename detected from being re-checked for other renames. Source paths, however, were re-checked because we wanted to allow the possibility of detecting copies. But if copy detection isn't turned on, then this merely amounts to attempting to find a better-than-exact match, which naturally ends up being an expensive no-op. In particular, copy detection is never turned on by the merge machinery. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 14.263 s ± 0.053 s 14.119 s ± 0.101 s mega-renames: 5504.231 s ± 5.150 s 1802.044 s ± 0.828 s just-one-mega: 158.534 s ± 0.498 s 51.391 s ± 0.028 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-11Sync with maintLibravatar Junio C Hamano0-0/+0
2021-02-11Merge branch 'en/merge-ort-perf'Libravatar Junio C Hamano2-1/+94
The "ort" merge strategy. * en/merge-ort-perf: merge-ort: begin performance work; instrument with trace2_region_* calls merge-ort: ignore the directory rename split conflict for now merge-ort: fix massive leak
2021-02-11Merge branch 'en/ort-directory-rename'Libravatar Junio C Hamano1-19/+811
ORT merge strategy learns to infer "renamed directory" while merging. * en/ort-directory-rename: merge-ort: fix a directory rename detection bug merge-ort: process_renames() now needs more defensiveness merge-ort: implement apply_directory_rename_modifications() merge-ort: add a new toplevel_dir field merge-ort: implement handle_path_level_conflicts() merge-ort: implement check_for_directory_rename() merge-ort: implement apply_dir_rename() and check_dir_renamed() merge-ort: implement compute_collisions() merge-ort: modify collect_renames() for directory rename handling merge-ort: implement handle_directory_level_conflicts() merge-ort: implement compute_rename_counts() merge-ort: copy get_renamed_dir_portion() from merge-recursive.c merge-ort: add outline of get_provisional_directory_renames() merge-ort: add outline for computing directory renames merge-ort: collect which directories are removed in dirs_removed merge-ort: initialize and free new directory rename data structures merge-ort: add new data structures for directory rename detection
2021-02-11Merge branch 'tb/ci-run-cocci-with-18.04' into maintLibravatar Junio C Hamano1-1/+1
* tb/ci-run-cocci-with-18.04: .github/workflows/main.yml: run static-analysis on bionic
2021-02-10Merge branch 'tb/ci-run-cocci-with-18.04'Libravatar Junio C Hamano1-1/+1
The version of Ubuntu Linux used by default at GitHub Actions CI has been updated to one that lack coccinelle; until it gets fixed, work it around by sticking to the previous release (18.04). * tb/ci-run-cocci-with-18.04: .github/workflows/main.yml: run static-analysis on bionic
2021-02-10The seventh batchLibravatar Junio C Hamano1-34/+25
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10Merge branch 'ab/detox-gettext-tests'Libravatar Junio C Hamano22-173/+29
Get rid of "GETTEXT_POISON" support altogether, which may or may not be controversial. * ab/detox-gettext-tests: tests: remove uses of GIT_TEST_GETTEXT_POISON=false tests: remove support for GIT_TEST_GETTEXT_POISON ci: remove GETTEXT_POISON jobs
2021-02-10Merge branch 'ab/grep-pcre-invalid-utf8'Libravatar Junio C Hamano7-8/+82
Update support for invalid UTF-8 in PCRE2. * ab/grep-pcre-invalid-utf8: grep/pcre2: better support invalid UTF-8 haystacks grep/pcre2 tests: don't rely on invalid UTF-8 data test
2021-02-10Merge branch 'ab/retire-pcre1'Libravatar Junio C Hamano8-216/+18
The support for deprecated PCRE1 library has been dropped. * ab/retire-pcre1: Remove support for v1 of the PCRE library config.mak.uname: remove redundant NO_LIBPCRE1_JIT flag
2021-02-10Merge branch 'jk/pretty-lazy-load-commit'Libravatar Junio C Hamano2-12/+13
Some pretty-format specifiers do not need the data in commit object (e.g. "%H"), but we were over-eager to load and parse it, which has been made even lazier. * jk/pretty-lazy-load-commit: pretty: lazy-load commit data when expanding user-format
2021-02-10Merge branch 'ds/more-index-cleanups'Libravatar Junio C Hamano15-53/+408
Cleaning various codepaths up. * ds/more-index-cleanups: t1092: test interesting sparse-checkout scenarios test-lib: test_region looks for trace2 regions sparse-checkout: load sparse-checkout patterns name-hash: use trace2 regions for init repository: add repo reference to index_state fsmonitor: de-duplicate BUG()s around dirty bits cache-tree: extract subtree_pos() cache-tree: simplify verify_cache() prototype cache-tree: clean up cache_tree_update()
2021-02-10Merge branch 'rs/worktree-list-verbose'Libravatar Junio C Hamano5-80/+314
`git worktree list` now annotates worktrees as prunable, shows locked and prunable attributes in --porcelain mode, and gained a --verbose option. * rs/worktree-list-verbose: worktree: teach `list` verbose mode worktree: teach `list` to annotate prunable worktree worktree: teach `list --porcelain` to annotate locked worktree t2402: ensure locked worktree is properly cleaned up worktree: teach worktree_lock_reason() to gently handle main worktree worktree: teach worktree to lazy-load "prunable" reason worktree: libify should_prune_worktree()
2021-02-10Merge branch 'js/rebase-i-commit-cleanup-fix'Libravatar Junio C Hamano2-1/+20
When "git rebase -i" processes "fixup" insn, there is no reason to clean up the commit log message, but we did the usual stripspace processing. This has been corrected. * js/rebase-i-commit-cleanup-fix: rebase -i: do leave commit message intact in fixup! chains
2021-02-10Merge branch 'jk/t0000-cleanups'Libravatar Junio C Hamano1-286/+284
Code clean-up. * jk/t0000-cleanups: t0000: consistently use single quotes for outer tests t0000: run cleaning test inside sub-test t0000: run prereq tests inside sub-test t0000: keep clean-up tests together
2021-02-10Merge branch 'sg/t7800-difftool-robustify'Libravatar Junio C Hamano1-19/+19
Test fix. * sg/t7800-difftool-robustify: t7800-difftool: don't accidentally match tmp dirs
2021-02-10Merge branch 'ab/lose-grep-debug'Libravatar Junio C Hamano4-107/+2
Lose the debugging aid that may have been useful in the past, but no longer is, in the "grep" codepaths. * ab/lose-grep-debug: grep/log: remove hidden --debug and --grep-debug options
2021-02-10Merge branch 'jk/use-oid-pos'Libravatar Junio C Hamano10-94/+87
Code clean-up to ensure our use of hashtables using object names as keys use the "struct object_id" objects, not the raw hash values. * jk/use-oid-pos: oid_pos(): access table through const pointers hash_pos(): convert to oid_pos() rerere: use strmap to store rerere directories rerere: tighten rr-cache dirname check rerere: check dirname format while iterating rr_cache directory commit_graft_pos(): take an oid instead of a bare hash
2021-02-08Sync with 2.30.1Libravatar Junio C Hamano1-0/+8
2021-02-08.github/workflows/main.yml: run static-analysis on bionicLibravatar Taylor Blau1-1/+1
GitHub Actions is transitioning workflow steps that run on 'ubuntu-latest' from 18.04 to 20.04 [1]. This works fine in all steps except the static-analysis one, since Coccinelle isn't available on Ubuntu focal (it is only available in the universe suite). Until Coccinelle can be installed from 20.04's main suite, pin the static-analysis build to run on 18.04, where it can be installed by default. [1]: https://github.com/actions/virtual-environments/issues/1816 Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-08Git 2.30.1Libravatar Junio C Hamano2-1/+9
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-08Merge branch 'pb/ci-matrix-wo-shortcut' into maintLibravatar Junio C Hamano1-0/+4
Our setting of GitHub CI test jobs were a bit too eager to give up once there is even one failure found. Tweak the knob to allow other jobs keep running even when we see a failure, so that we can find more failures in a single run. * pb/ci-matrix-wo-shortcut: ci: do not cancel all jobs of a matrix if one fails
2021-02-08Merge branch 'pb/blame-funcname-range-userdiff' into maintLibravatar Junio C Hamano1-4/+4
Test fix. * pb/blame-funcname-range-userdiff: annotate-tests: quote variable expansions containing path names
2021-02-08Merge branch 'jk/p5303-sed-portability-fix' into maintLibravatar Junio C Hamano1-4/+8
A perf script was made more portable. * jk/p5303-sed-portability-fix: p5303: avoid sed GNU-ism
2021-02-08Merge branch 'ab/branch-sort' into maintLibravatar Junio C Hamano8-44/+111
The implementation of "git branch --sort" wrt the detached HEAD display has always been hacky, which has been cleaned up. * ab/branch-sort: branch: show "HEAD detached" first under reverse sort branch: sort detached HEAD based on a flag ref-filter: move ref_sorting flags to a bitfield ref-filter: move "cmp_fn" assignment into "else if" arm ref-filter: add braces to if/else if/else chain branch tests: add to --sort tests branch: change "--local" to "--list" in comment
2021-02-08Merge branch 'ma/more-opaque-lock-file' into maintLibravatar Junio C Hamano5-15/+15
Code clean-up. * ma/more-opaque-lock-file: read-cache: try not to peek into `struct {lock_,temp}file` refs/files-backend: don't peek into `struct lock_file` midx: don't peek into `struct lock_file` commit-graph: don't peek into `struct lock_file` builtin/gc: don't peek into `struct lock_file`
2021-02-08Merge branch 'dl/p4-encode-after-kw-expansion' into maintLibravatar Junio C Hamano1-1/+1
Text encoding fix for "git p4". * dl/p4-encode-after-kw-expansion: git-p4: fix syncing file types with pattern
2021-02-08Merge branch 'ar/t6016-modernise' into maintLibravatar Junio C Hamano1-187/+167
Test update. * ar/t6016-modernise: t6016: move to lib-log-graph.sh framework
2021-02-08Merge branch 'zh/arg-help-format' into maintLibravatar Junio C Hamano8-64/+64
Clean up option descriptions in "git cmd --help". * zh/arg-help-format: builtin/*: update usage format parse-options: format argh like error messages
2021-02-08Merge branch 'ma/doc-pack-format-varint-for-sizes' into maintLibravatar Junio C Hamano1-1/+16
Doc update. * ma/doc-pack-format-varint-for-sizes: pack-format.txt: document sizes at start of delta data
2021-02-08Merge branch 'ma/t1300-cleanup' into maintLibravatar Junio C Hamano1-40/+32
Code clean-up. * ma/t1300-cleanup: t1300: don't needlessly work with `core.foo` configs t1300: remove duplicate test for `--file no-such-file` t1300: remove duplicate test for `--file ../foo`
2021-02-08Merge branch 'fc/t6030-bisect-reset-removes-auxiliary-files' into maintLibravatar Junio C Hamano1-8/+8
A 3-year old test that was not testing anything useful has been corrected. * fc/t6030-bisect-reset-removes-auxiliary-files: test: bisect-porcelain: fix location of files
2021-02-05Sync with maintLibravatar Junio C Hamano1-0/+47
2021-02-05The sixth batchLibravatar Junio C Hamano1-0/+40
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-05Merge branch 'sg/test-stress-jobs'Libravatar Junio C Hamano1-4/+4
Test framework fix. * sg/test-stress-jobs: test-lib: prevent '--stress-jobs=X' from being ignored
2021-02-05Merge branch 'jk/weather-balloon-require-variadic-macro'Libravatar Junio C Hamano1-2/+5
We've carried compatibility codepaths for compilers without variadic macros for quite some time, but the world may be ready for them to be removed. Force compilation failure on exotic platforms where variadic macros are not available to find out who screams in such a way that we can easily revert if it turns out that the world is not yet ready. * jk/weather-balloon-require-variadic-macro: git-compat-util: always enable variadic macros
2021-02-05Merge branch 'pb/ci-matrix-wo-shortcut'Libravatar Junio C Hamano1-0/+4
Our setting of GitHub CI test jobs were a bit too eager to give up once there is even one failure found. Tweak the knob to allow other jobs keep running even when we see a failure, so that we can find more failures in a single run. * pb/ci-matrix-wo-shortcut: ci: do not cancel all jobs of a matrix if one fails
2021-02-05Merge branch 'pb/blame-funcname-range-userdiff'Libravatar Junio C Hamano1-4/+4
Test fix. * pb/blame-funcname-range-userdiff: annotate-tests: quote variable expansions containing path names
2021-02-05Merge branch 'jk/p5303-sed-portability-fix'Libravatar Junio C Hamano1-4/+8
A perf script was made more portable. * jk/p5303-sed-portability-fix: p5303: avoid sed GNU-ism
2021-02-05Merge branch 'jv/pack-objects-narrower-ref-iteration'Libravatar Junio C Hamano1-5/+3
The "pack-objects" command needs to iterate over all the tags when automatic tag following is enabled, but it actually iterated over all refs and then discarded everything outside "refs/tags/" hierarchy, which was quite wasteful. * jv/pack-objects-narrower-ref-iteration: builtin/pack-objects.c: avoid iterating all refs
2021-02-05Merge branch 'ph/use-delete-refs'Libravatar Junio C Hamano2-29/+62
When removing many branches and tags, the code used to do so one ref at a time. There is another API it can use to delete multiple refs, and it makes quite a lot of performance difference when the refs are packed. * ph/use-delete-refs: use delete_refs when deleting tags or branches
2021-02-05Merge branch 'tb/ls-refs-optim'Libravatar Junio C Hamano4-73/+103
The ls-refs protocol operation has been optimized to narrow the sub-hierarchy of refs/ it walks to produce response. * tb/ls-refs-optim: ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets ls-refs.c: initialize 'prefixes' before using it refs: expose 'for_each_fullref_in_prefixes'
2021-02-05Merge branch 'zh/ls-files-deduplicate'Libravatar Junio C Hamano3-31/+124
"git ls-files" can and does show multiple entries when the index is unmerged, which is a source for confusion unless -s/-u option is in use. A new option --deduplicate has been introduced. * zh/ls-files-deduplicate: ls-files.c: add --deduplicate option ls_files.c: consolidate two for loops into one ls_files.c: bugfix for --deleted and --modified
2021-02-05Merge branch 'ds/cache-tree-basics'Libravatar Junio C Hamano5-18/+94
Document, clean-up and optimize the code around the cache-tree extension in the index. * ds/cache-tree-basics: cache-tree: speed up consecutive path comparisons cache-tree: use ce_namelen() instead of strlen() index-format: discuss recursion of cache-tree better index-format: update preamble to cache tree extension index-format: use 'cache tree' over 'cached tree' cache-tree: trace regions for prime_cache_tree cache-tree: trace regions for I/O cache-tree: use trace2 in cache_tree_update() unpack-trees: add trace2 regions tree-walk: report recursion counts
2021-02-05Merge branch 'en/ort-conflict-handling'Libravatar Junio C Hamano1-18/+653
ORT merge strategy learns more support for merge conflicts. * en/ort-conflict-handling: merge-ort: add handling for different types of files at same path merge-ort: copy find_first_merges() implementation from merge-recursive.c merge-ort: implement format_commit() merge-ort: copy and adapt merge_submodule() from merge-recursive.c merge-ort: copy and adapt merge_3way() from merge-recursive.c merge-ort: flesh out implementation of handle_content_merge() merge-ort: handle book-keeping around two- and three-way content merge merge-ort: implement unique_path() helper merge-ort: handle directory/file conflicts that remain merge-ort: handle D/F conflict where directory disappears due to merge