summaryrefslogtreecommitdiff
path: root/t/t6423-merge-rename-directories.sh
AgeCommit message (Collapse)AuthorFilesLines
2021-04-08Merge branch 'en/ort-perf-batch-9'Libravatar Junio C Hamano1-0/+71
The ort merge backend has been optimized by skipping irrelevant renames. * en/ort-perf-batch-9: diffcore-rename: avoid doing basename comparisons for irrelevant sources merge-ort: skip rename detection entirely if possible merge-ort: use relevant_sources to filter possible rename sources merge-ort: precompute whether directory rename detection is needed merge-ort: introduce wrappers for alternate tree traversal merge-ort: add data structures for an alternate tree traversal merge-ort: precompute subset of sources for which we need rename detection diffcore-rename: enable filtering possible rename sources
2021-03-10merge-ort: precompute subset of sources for which we need rename detectionLibravatar Elijah Newren1-0/+71
rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. Add a simple testcase showing the two kinds of reasons renames are relevant; it's a testcase that will only pass if we detect both kinds of needed renames. Other than the testcase added above, this commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on *both* sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename *isn't* detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename *is* detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits 2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10tests: remove most uses of C_LOCALE_OUTPUTLibravatar Ævar Arnfjörð Bjarmason1-1/+1
As a follow-up to d162b25f956 (tests: remove support for GIT_TEST_GETTEXT_POISON, 2021-01-20) remove those uses of the now always true C_LOCALE_OUTPUT prerequisite from those tests which declare it as an argument to test_expect_{success,failure}. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26t6423: add more details about direct resolution of directoriesLibravatar Elijah Newren1-16/+23
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26t6423: note improved ort handling with untracked filesLibravatar Elijah Newren1-71/+124
Similar to the previous commit, since the "recursive" backend relies on unpack_trees() to check if unstaged or untracked files would be overwritten by a merge, and unpack_trees() does not understand renames -- it has false positives and false negatives. Once it has run, since it updates as it goes, merge-recursive then has to handle completing the merge as best it can despite extra changes in the working copy. However, this is not just an issue for dirty files, but also for untracked files because directory renames can cause file contents to need to be written to a location that was not tracked on either side of history. Since the "ort" backend does the complete merge inmemory, and only updates the index and working copy as a post-processing step, if there are untracked files in the way it can simply abort the merge much like checkout does. Update t6423 to reflect the better merge abilities and expectations for ort, while still leaving the best-case-as-good-as-recursive-can-do expectations there for the recursive backend so we retain its stability until we are ready to deprecate and remove it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26t6423, t6436: note improved ort handling with dirty filesLibravatar Elijah Newren1-114/+152
The "recursive" backend relies on unpack_trees() to check if unstaged changes would be overwritten by a merge, but unpack_trees() does not understand renames -- and once it returns, it has already written many updates to the working tree and index. As such, "recursive" had to do a special 4-way merge where it would need to also treat the working copy as an extra source of differences that we had to carefully avoid overwriting and resulting in moving files to new locations to avoid conflicts. The "ort" backend, by contrast, does the complete merge inmemory, and only updates the index and working copy as a post-processing step. If there are dirty files in the way, it can simply abort the merge. Update t6423 and t6436 to reflect the better merge abilities and expectations we have for ort, while still leaving the best-case-as-good-as-recursive-can-do expectations there for the recursive backend so we retain its stability until we are ready to deprecate and remove it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26t6423: expect improved conflict markers labels in the ort backendLibravatar Elijah Newren1-10/+28
Conflict markers carry an extra annotation of the form REF-OR-COMMIT:FILENAME to help distinguish where the content is coming from, with the :FILENAME piece being left off if it is the same for both sides of history (thus only renames with content conflicts carry that part of the annotation). However, there were cases where the :FILENAME annotation was accidentally left off, due to merge-recursive's every-codepath-needs-a-copy-of-all-special-case-code format. Update a few tests to have the correct :FILENAME extension on relevant paths with the ort backend, while leaving the expectation for merge-recursive the same to avoid destabilizing it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26t6404, t6423: expect improved rename/delete handling in ort backendLibravatar Elijah Newren1-20/+50
When a file is renamed and has content conflicts, merge-recursive does not have some stages for the old filename and some stages for the new filename in the index; instead it copies all the stages corresponding to the old filename over to the corresponding locations for the new filename, so that there are three higher order stages all corresponding to the new filename. Doing things this way makes it easier for the user to access the different versions and to resolve the conflict (no need to manually 'git rm' the old version as well as 'git add' the new one). rename/deletes should be handled similarly -- there should be two stages for the renamed file rather than just one. We do not want to destabilize merge-recursive right now, so instead update relevant tests to have different expectations depending on whether the "recursive" or "ort" merge strategies are in use. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26merge tests: expect improved directory/file conflict handling in ortLibravatar Elijah Newren1-15/+38
merge-recursive.c is built on the idea of running unpack_trees() and then "doing minor touch-ups" to get the result. Unfortunately, unpack_trees() was run in an update-as-it-goes mode, leading merge-recursive.c to follow suit and end up with an immediate evaluation and fix-it-up-as-you-go design. Some things like directory/file conflicts are not well representable in the index data structure, and required special extra code to handle. But then when it was discovered that rename/delete conflicts could also be involved in directory/file conflicts, the special directory/file conflict handling code had to be copied to the rename/delete codepath. ...and then it had to be copied for modify/delete, and for rename/rename(1to2) conflicts, ...and yet it still missed some. Further, when it was discovered that there were also file/submodule conflicts and submodule/directory conflicts, we needed to copy the special submodule handling code to all the special cases throughout the codebase. And then it was discovered that our handling of directory/file conflicts was suboptimal because it would create untracked files to store the contents of the conflicting file, which would not be cleaned up if someone were to run a 'git merge --abort' or 'git rebase --abort'. It was also difficult or scary to try to add or remove the index entries corresponding to these files given the directory/file conflict in the index. But changing merge-recursive.c to handle these correctly was a royal pain because there were so many sites in the code with similar but not identical code for handling directory/file/submodule conflicts that would all need to be updated. I have worked hard to push all directory/file/submodule conflict handling in merge-ort through a single codepath, and avoid creating untracked files for storing tracked content (it does record things at alternate paths, but makes sure they have higher-order stages in the index). Since updating merge-recursive is too much work and we don't want to destabilize it, instead update the testsuite to have different expectations for relevant directory/file/submodule conflict tests. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26t/: new helper for tests that pass with ort but fail with recursiveLibravatar Elijah Newren1-6/+7
There are a number of tests that the "recursive" backend does not handle correctly but which the redesign in "ort" will. Add a new helper in lib-merge.sh for selecting a different test expectation based on the setting of GIT_TEST_MERGE_ALGORITHM, and use it in various testcases to document which ones we expect to fail under recursive but pass under ort. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16t6423: more involved rules for renaming directories into each otherLibravatar Elijah Newren1-26/+227
Testcases 12b and 12c were both slightly weird; they were marked as having a weird resolution, but with the note that even straightforward simple rules can give weird results when the input is bizarre. However, during optimization work for merge-ort, I discovered a significant speedup that is possible if we add one more fairly straightforward rule: we don't bother doing directory rename detection if there are no new files added to the directory on the other side of the history to be affected by the directory rename. This seems like an obvious and straightforward rule, but there was one funny corner case where directory rename detection could affect only existing files: the funny corner case where two directories are renamed into each other on opposite sides of history. In other words, it only results in a different output for testcases 12b and 12c. Since we already thought testcases 12b and 12c were weird anyway, and because the optimization often has a significant effect on common cases (but is entirely prevented if we can't change how 12b and 12c function), let's add the additional rule and tweak how 12b and 12c work. Split both testcases into two (one where we add no new files, and one where the side that doesn't rename a given directory will add files to it), and mark them with the new expectation. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16t6423: update directory rename detection tests with new ruleLibravatar Elijah Newren1-21/+123
While investigating the issues highlighted by the testcase in the previous patch, I also found a shortcoming in the directory rename detection rules. Split testcase 6b into two to explain this issue and update directory-rename-detection.txt to remove one of the previous rules that I know believe to be detrimental. Also, update the wording around testcase 8e; while we are not modifying the results of that testcase, we were previously unsure of the appropriate resolution of that test and the new rule makes the previously chosen resolution for that testcase a bit more solid. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16t6423: more involved directory rename testLibravatar Elijah Newren1-0/+195
Add a new testcase modelled on a real world repository example that served multiple purposes: * it uncovered a bug in the current directory rename detection implementation. * it is a good test of needing to do directory rename detection for a series of commits instead of just one (and uses rebase instead of just merge like all the other tests in this testfile). * it is an excellent stress test for some of the optimizations in my new merge-ort engine I can expand on the final item later when I have submitted more of merge-ort, but the bug is the main immediate concern. It arises as follows: * dir/subdir/ has several files * almost all files in dir/subdir/ are renamed to folder/subdir/ * one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/ * If the other side of history (that doesn't do the renames) adds a new file to dir/subdir/, where should it be placed after the merge? The most obvious two choices are: (1) leave the new file in dir/subdir/, don't make it follow the rename, and (2) move the new file to folder/subdir/, following the rename of most the files. However, there's a possible third choice here: (3) move the new file to folder/subdir/newsubdir/. The choice reinforce the fact that merge.directoryRenames=conflict is a good default, but when the merge machinery needs to stick it somewhere and notify the user of the possibility that they might want to place it elsewhere. Surprisingly, the current code would always choose (3), while the real world repository was clearly expecting (2) -- move the file along with where the herd of files was going, not with the special exception. The problem here is that for the majority of the file renames, dir/subdir/ -> folder/subdir/ is actually represented as dir/ -> folder/ This directory rename would have a big weight associated with it since most the files followed that rename. However, we always consult the most immediate directory first, and there is only one rename rule for it: dir/subdir/ -> folder/subdir/newsubdir/ Since this rule is the only one for mapping from dir/subdir/, it automatically wins and that directory rename was followed instead of the desired dir/subdir/ -> folder/subdir/. Unfortunately, the fix is a bit involved so for now just add the testcase documenting the issue. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-10t642[23]: be more flexible for add/add conflicts involving pair renamesLibravatar Elijah Newren1-2/+2
Much like the last commit accepted 'add/add' and 'rename/add' interchangably, we also want to do the same for 'add/add' and 'rename/rename'. This also allows us to avoid the ambiguity in meaning with 'rename/rename' (is it two separate files renamed to the same location, or one file renamed on both sides but differently)? Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-10t6423: add an explanation about why one of the tests does not passLibravatar Elijah Newren1-0/+8
I had long since forgotten the idea behind this test and why it failed, and took a little while to figure it out. To prevent others from having to spend a similar time on it, add an explanation in the comments. However, the reasoning in the explanation makes me question why I considered it a failure at all. I'm not sure if I had a better reason when I originally wrote it, but for now just add commentary about the possible expectations and why it behaves the way it does right now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-10t6416, t6423: clarify some comments and fix some typosLibravatar Elijah Newren1-13/+12
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-10t6423: fix test setup for a couple testsLibravatar Elijah Newren1-0/+2
Commit da1e295e00 ("t604[236]: do not run setup in separate tests", 2019-10-22) removed approximately half the tests (which were setup-only tests) in t6043 by turning them into functions that the subsequent test would call as their first step. This ensured that any test from this file could be run entirely independently of all the other tests in the file. Unfortunately, the call to the new setup function was missed in two of the test_expect_failure cases. Add them in. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-10Collect merge-related tests to t64xxLibravatar Elijah Newren1-0/+4693
The tests for the merge machinery are spread over several places. Collect them into t64xx for simplicity. Some notes: t60[234]*.sh: Merge tests started in t602*, overgrew bisect and remote tracking tests in t6030, t6040, and t6041, and nearly overtook replace tests in t6050. This made picking out relevant tests that I wanted to run in a tighter loop slightly more annoying for years. t303*.sh: These started out as tests for the 'merge-recursive' toplevel command, but did not restrict to that and had lots of overlap with the underlying merge machinery. t7405, t7613: submodule-specific merge logic started out in submodule.c but was moved to merge-recursive.c in commit 18cfc08866 ("submodule.c: move submodule merging to merge-recursive.c", 2018-05-15). Since these tests are about the logic found in the merge machinery, moving these tests to be with the merge tests makes sense. t7607, t7609: Having tests spread all over the place makes it more likely that additional tests related to a certain piece of logic grow in all those other places. Much like t303*.sh, these two tests were about the underlying merge machinery rather than outer levels. Tests that were NOT moved: t76[01]*.sh: Other than the four tests mentioned above, the remaining tests in t76[01]*.sh are related to non-recursive merge strategies, parameter parsing, and other stuff associated with the highlevel builtin/merge.c rather than the recursive merge machinery. t3[45]*.sh: The rebase testcases in t34*.sh also test the merge logic pretty heavily; sometimes changes I make only trigger failures in the rebase tests. The rebase tests are already nicely coupled together, though, and I didn't want to mess that up. Similar comments apply for the cherry-pick tests in t35*.sh. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>