summaryrefslogtreecommitdiff
path: root/git-add--interactive.perl
AgeCommit message (Collapse)AuthorFilesLines
2018-03-14Merge branch 'jk/add-i-diff-filter'Libravatar Junio C Hamano1-0/+8
The "interactive.diffFilter" used by "git add -i" must retain one-to-one correspondence between its input and output, but it was not enforced and caused end-user confusion. We now at least make sure the filtered result has the same number of lines as its input to detect a broken filter. * jk/add-i-diff-filter: add--interactive: detect bogus diffFilter output t3701: add a test for interactive.diffFilter
2018-03-14Merge branch 'pw/add-p-recount'Libravatar Junio C Hamano1-22/+86
"git add -p" has been lazy in coalescing split patches before passing the result to underlying "git apply", leading to corner case bugs; the logic to prepare the patch to be applied after hunk selections has been tightened. * pw/add-p-recount: add -p: don't rely on apply's '--recount' option add -p: fix counting when splitting and coalescing add -p: calculate offset delta for edited patches add -p: adjust offsets of subsequent hunks when one is skipped t3701: add failing test for pathological context lines t3701: don't hard code sha1 hash values t3701: use test_write_lines and write_script t3701: indent here documents add -i: add function to format hunk header
2018-03-05add--interactive: detect bogus diffFilter outputLibravatar Jeff King1-0/+8
It's important that the diff-filter only filter the individual lines, and that there remain a one-to-one mapping between the input and output lines. Otherwise, things like hunk-splitting will behave quite unexpectedly (e.g., you think you are splitting at one point, but it has a different effect in the text patch we apply). We can't detect all problematic cases, but we can at least catch the obvious case where we don't even have the correct number of lines. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05add -p: don't rely on apply's '--recount' optionLibravatar Phillip Wood1-1/+1
Now that add -p counts patches properly it should be possible to turn off the '--recount' option when invoking 'git apply' Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05add -p: fix counting when splitting and coalescingLibravatar Phillip Wood1-0/+11
When a file has no trailing new line at the end diff records this by appending "\ No newline at end of file" below the last line of the file. This line should not be counted in the hunk header. Fix the splitting and coalescing code to count files without a trailing new line properly and change one of the tests to test splitting without a trailing new line. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-05add -p: calculate offset delta for edited patchesLibravatar Phillip Wood1-9/+50
Recount the number of preimage and postimage lines in a hunk after it has been edited so any change in the number of insertions or deletions can be used to adjust the offsets of subsequent hunks. If an edited hunk is subsequently split then the offset correction will be lost. It would be possible to fix this if it is a problem, however the code here is still an improvement on the status quo for the common case where an edited hunk is applied without being split. This is also a necessary step to removing '--recount' and '--allow-overlap' from the invocation of 'git apply'. Before '--recount' can be removed the splitting and coalescing counting needs to be fixed to handle a missing newline at the end of a file. In order to remove '--allow-overlap' there needs to be i) some way of verifying the offset data in the edited hunk (probably by correlating the preimage (or postimage if the patch is going to be applied in reverse) lines of the edited and unedited versions to see if they are offset or if any leading/trailing context lines have been removed) and ii) a way of dealing with edited hunks that change context lines that are shared with neighbouring hunks. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-01add -p: adjust offsets of subsequent hunks when one is skippedLibravatar Phillip Wood1-2/+13
Since commit 8cbd431082 ("git-add--interactive: replace hunk recounting with apply --recount", 2008-7-2) if a hunk is skipped then we rely on the context lines to apply subsequent hunks in the right place. While this works most of the time it is possible for hunks to end up being applied in the wrong place. To fix this adjust the offset of subsequent hunks to correct for any change in the number of insertions or deletions due to the skipped hunk. The change in offset due to edited hunks that have the number of insertions or deletions changed is ignored here, it will be fixed in the next commit. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-20add -i: add function to format hunk headerLibravatar Phillip Wood1-10/+11
This code is duplicated in a couple of places so make it into a function as we're going to add some more callers shortly. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-13add -p: improve error messagesLibravatar Phillip Wood1-3/+15
If the user presses a key that isn't currently active then explain why it isn't active rather than just listing all the keys. It already did this for some keys, this patch does the same for the those that weren't already handled. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-13add -p: only bind search key if there's more than one hunkLibravatar Phillip Wood1-23/+27
If there is only a single hunk then disable searching as there is nothing to search for. Also print a specific error message if the user tries to search with '/' when there's only a single hunk rather than just listing the key bindings. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-13add -p: only display help for active keysLibravatar Phillip Wood1-1/+7
If the user presses a key that add -p wasn't expecting then it prints a list of key bindings. Although the prompt only lists the active bindings the help was printed for all bindings. Fix this by using the list of keys in the prompt to filter the help. Note that the list of keys was already passed to help_patch_cmd() by the caller so there is no change needed to the call site. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16add--interactive: ignore submodule changes except HEADLibravatar Nguyễn Thái Ngọc Duy1-1/+1
For 'add -i' and 'add -p', the only action we can take on a dirty submodule entry is update the index with a new value from its HEAD. The content changes inside (from its own index, untracked files...) do not matter, at least until 'git add -i' learns about launching a new interactive add session inside a submodule. Ignore all other submodules changes except HEAD. This reduces the number of entries the user has to check through in 'git add -i', and the number of 'no' they have to answer to 'git add -p' when dirty submodules are present. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-10Merge branch 'pw/unquote-path-in-git-pm'Libravatar Junio C Hamano1-42/+1
Code refactoring. * pw/unquote-path-in-git-pm: t9700: add tests for Git::unquote_path() Git::unquote_path(): throw an exception on bad path Git::unquote_path(): handle '\a' add -i: move unquote_path() to Git.pm
2017-06-30add -i: move unquote_path() to Git.pmLibravatar Phillip Wood1-42/+1
Move unquote_path() from git-add--interactive to Git.pm so it can be used by other scripts. Note this is a straight copy, it does not handle '\a'. That will be fixed in the next commit. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26Merge branch 'jk/add-p-commentchar-fix'Libravatar Junio C Hamano1-1/+2
"git add -p" were updated in 2.12 timeframe to cope with custom core.commentchar but the implementation was buggy and a metacharacter like $ and * did not work. * jk/add-p-commentchar-fix: add--interactive: quote commentChar regex add--interactive: handle EOF in prompt_yesno
2017-06-21add--interactive: quote commentChar regexLibravatar Jeff King1-1/+1
Since c9d961647 (i18n: add--interactive: mark edit_hunk_manually message for translation, 2016-12-14), when the user asks to edit a hunk manually, we respect core.commentChar in generating the edit instructions. However, when we then strip out comment lines, we use a simple regex like: /^$commentChar/ If your chosen comment character is a regex metacharacter, then that will behave in a confusing manner ("$", for instance, would only eliminate blank lines, not actual comment lines). We can fix that by telling perl not to respect metacharacters. Reported-by: Christian Rösch <christian@croesch.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21add--interactive: handle EOF in prompt_yesnoLibravatar Jeff King1-0/+1
The prompt_yesno function loops indefinitely waiting for a "y" or "n" response. But it doesn't handle EOF, meaning that we can end up in an infinite loop of reading EOF from stdin. One way to simulate that is with: echo e | GIT_EDITOR='echo corrupt >' git add -p Let's break out of the loop and propagate the undef to the caller. Without modifying the callers that effectively turns it into a "no" response. This is reasonable for both of the current callers, and it leaves room for any future caller to check for undef explicitly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-09add--interactive: drop diff.indentHeuristic handlingLibravatar Jeff King1-4/+0
Now that diff.indentHeuristic is handled automatically by the plumbing commands, there's no need to propagate it manually. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Marc Branchaud <marcnarc@xiplink.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19Merge branch 'va/i18n-perl-scripts'Libravatar Junio C Hamano1-1/+1
Message fix. * va/i18n-perl-scripts: git-add--interactive.perl: add missing dot in a message
2017-04-13git-add--interactive.perl: add missing dot in a messageLibravatar Ralf Thielow1-1/+1
One message appears twice in the translations and the only difference is a dot at the end. So add this dot to make the messages being identical. Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-17Merge branch 'jk/add-i-use-pathspecs'Libravatar Junio C Hamano1-11/+2
"git add -p <pathspec>" unnecessarily expanded the pathspec to a list of individual files that matches the pathspec by running "git ls-files <pathspec>", before feeding it to "git diff-index" to see which paths have changes, because historically the pathspec language supported by "diff-index" was weaker. These days they are equivalent and there is no reason to internally expand it. This helps both performance and avoids command line argument limit on some platforms. * jk/add-i-use-pathspecs: add--interactive: do not expand pathspecs with ls-files
2017-03-14add--interactive: do not expand pathspecs with ls-filesLibravatar Jeff King1-11/+2
When we want to get the list of modified files, we first expand any user-provided pathspecs with "ls-files", and then feed the resulting list of paths as arguments to "diff-index" and "diff-files". If your pathspec expands into a large number of paths, you may run into one of two problems: 1. The OS may complain about the size of the argument list, and refuse to run. For example: $ (ulimit -s 128 && git add -p drivers) Can't exec "git": Argument list too long at .../git-add--interactive line 177. Died at .../git-add--interactive line 177. That's on the linux.git repository, which has about 20K files in the "drivers" directory (none of them modified in this case). The "ulimit -s" trick is necessary to show the problem on Linux even for such a gigantic set of paths. Other operating systems have much smaller limits (e.g., a real-world case was seen with only 5K files on OS X). 2. Even when it does work, it's really slow. The pathspec code is not optimized for huge numbers of paths. Here's the same case without the ulimit: $ time git add -p drivers No changes. real 0m16.559s user 0m53.140s sys 0m0.220s We can improve this by skipping "ls-files" completely, and just feeding the original pathspecs to the diff commands. This solution was discussed in 2010: http://public-inbox.org/git/20100105041438.GB12574@coredump.intra.peff.net/ but at the time the diff code's pathspecs were more primitive than those used by ls-files (e.g., they did not support globs). Making the change would have caused a user-visible regression, so we didn't. Since then, the pathspec code has been unified, and the diff commands natively understand pathspecs like '*.c'. This patch implements that solution. That skips the argument-list limits, and the result runs much faster: $ time git add -p drivers No changes. real 0m0.149s user 0m0.116s sys 0m0.080s There are two new tests. The first just exercises the globbing behavior to confirm that we are not causing a regression there. The second checks the actual argument behavior using GIT_TRACE. We _could_ do it with the "ulimit -s" trick, as above. But that would mean the test could only run where "ulimit -s" works. And tests of that sort are expensive, because we have to come up with enough files to actually bust the limit (we can't just shrink the "128" down infinitely, since it is also the in-program stack size). Finally, two caveats and possibilities for future work: a. This fixes one argument-list expansion, but there may be others. In fact, it's very likely that if you run "git add -i" and select a large number of modified files that the script would try to feed them all to a single git command. In practice this is probably fine. The real issue here is that the argument list was growing with the _total_ number of files, not the number of modified or selected files. b. If the repository contains filenames with literal wildcard characters (e.g., "foo*"), the original code expanded them via "ls-files" and then fed those wildcard names to "diff-index", which would have treated them as wildcards. This was a bug, which is now fixed (though unless you really go through some contortions with ":(literal)", it's likely that your original pathspec would match whatever the accidentally-expanded wildcard would anyway). So this takes us one step closer to working correctly with files whose names contain wildcard characters, but it's likely that others remain (e.g., if "git add -i" feeds the selected paths to "git add"). Reported-by: Wincent Colaiuta <win@wincent.com> Reported-by: Mislav Marohnić <mislav.marohnic@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02add--interactive: fix missing file prompt for patch mode with "-i"Libravatar Jeff King1-4/+4
When invoked as "git add -i", each menu interactive menu option prompts the user to select a list of files. This includes the "patch" option, which gets the list before starting the hunk-selection loop. As "git add -p", it behaves differently, and jumps straight to the hunk selection loop. Since 0539d5e6d (i18n: add--interactive: mark patch prompt for translation, 2016-12-14), the "add -i" case mistakenly jumps to straight to the hunk-selection loop. Prior to that commit the distinction between the two cases was managed by the $patch_mode variable. That commit used $patch_mode for something else, and moved the old meaning to the "$cmd" variable. But it forgot to update the $patch_mode check inside patch_update_cmd() which controls the file-list behavior. The simplest fix would be to change that line to check $cmd. But while we're here, let's use a less obscure name for this flag: $patch_mode_only, a boolean which tells whether we are in full-interactive mode or only in patch-mode. Reported-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-24Merge branch 'rt/align-add-i-help-text'Libravatar Junio C Hamano1-1/+1
Doc update. * rt/align-add-i-help-text: git add -i: replace \t with blanks in the help message
2017-02-22git add -i: replace \t with blanks in the help messageLibravatar Ralf Thielow1-1/+1
Within the help message of 'git add -i', the 'diff' command uses one tab character and blanks to create the space between the name and the description while the others use blanks only. So if the tab size is not at 4 characters, this description will not be in range. Replace the tab character with blanks. Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-10Merge branch 'jc/retire-compaction-heuristics'Libravatar Junio C Hamano1-3/+0
"git diff" and its family had two experimental heuristics to shift the contents of a hunk to make the patch easier to read. One of them turns out to be better than the other, so leave only the "--indent-heuristic" option and remove the other one. * jc/retire-compaction-heuristics: diff: retire "compaction" heuristics
2016-12-27Merge branch 'va/i18n-perl-scripts'Libravatar Junio C Hamano1-114/+215
Porcelain scripts written in Perl are getting internationalized. * va/i18n-perl-scripts: i18n: difftool: mark warnings for translation i18n: send-email: mark composing message for translation i18n: send-email: mark string with interpolation for translation i18n: send-email: mark warnings and errors for translation i18n: send-email: mark strings for translation i18n: add--interactive: mark status words for translation i18n: add--interactive: remove %patch_modes entries i18n: add--interactive: mark edit_hunk_manually message for translation i18n: add--interactive: i18n of help_patch_cmd i18n: add--interactive: mark patch prompt for translation i18n: add--interactive: mark plural strings i18n: clean.c: match string with git-add--interactive.perl i18n: add--interactive: mark strings with interpolation for translation i18n: add--interactive: mark simple here-documents for translation i18n: add--interactive: mark strings for translation Git.pm: add subroutines for commenting lines
2016-12-23diff: retire "compaction" heuristicsLibravatar Junio C Hamano1-3/+0
When a patch inserts a block of lines, whose last lines are the same as the existing lines that appear before the inserted block, "git diff" can choose any place between these existing lines as the boundary between the pre-context and the added lines (adjusting the end of the inserted block as appropriate) to come up with variants of the same patch, and some variants are easier to read than others. We have been trying to improve the choice of this boundary, and Git 2.11 shipped with an experimental "compaction-heuristic". Since then another attempt to improve the logic further resulted in a new "indent-heuristic" logic. It is agreed that the latter gives better result overall, and the former outlived its usefulness. Retire "compaction", and keep "indent" as an experimental feature. The latter hopefully will be turned on by default in a future release, but that should be done as a separate step. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14i18n: add--interactive: mark status words for translationLibravatar Vasco Almeida1-6/+6
Mark words 'nothing', 'unchanged' and 'binary' used to display what has been staged or not, in "git add -i" status command. Alternatively one could mark N__('nothing') no-op in order to xgettext(1) extract the string and then trigger the translation at run time only with __($print->{FILE}), but that has the side effect of triggering retrieval of translations for the changes indicator too (e.g. +2/-1) which may or may not be a problem. To avoid that potential problem, mark only where there is certain to trigger translation only of those words but in this case we must also retrieve the translation for the eq tests, since the value assigned was of the translation, not the English source. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14i18n: add--interactive: remove %patch_modes entriesLibravatar Vasco Almeida1-21/+0
Remove unnecessary entries from %patch_modes. After the i18n conversion, these entries are not used anymore. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14i18n: add--interactive: mark edit_hunk_manually message for translationLibravatar Vasco Almeida1-13/+39
Mark message of edit_hunk_manually displayed in the editing file when user chooses 'e' option. The message had to be unfolded to allow translation of the $participle verb. Some messages end up being exactly the same for some use cases, but left it for easier change in the future, e.g., wanting to change wording of one particular use case. The comment character is now used according to the git configuration core.commentchar. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14i18n: add--interactive: i18n of help_patch_cmdLibravatar Vasco Almeida1-8/+46
Mark help message of help_patch_cmd for translation. The message must be unfolded to be free of variables so we can have high quality translations. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14i18n: add--interactive: mark patch prompt for translationLibravatar Vasco Almeida1-8/+46
Mark prompt message assembled in place for translation, unfolding each use case for each entry in the %patch_modes hash table. Previously, this script relied on whether $patch_mode was set to run the command patch_update_cmd() or show status and loop the main loop. Now, it uses $cmd to indicate we must run patch_update_cmd() and $patch_mode is used to tell which flavor of the %patch_modes are we on. This is introduced in order to be able to mark and unfold the message prompt knowing in which context we are. The tracking of context was done previously by point %patch_mode_flavour hash table to the correct entry of %patch_modes, focusing only on value of %patch_modes. Now, we are also interested in the key ('staged', 'stash', 'checkout_head', ...). Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14i18n: add--interactive: mark plural stringsLibravatar Vasco Almeida1-9/+18
Mark plural strings for translation. Unfold each action case in one entire sentence. Pass new keyword for xgettext to extract. Update test to include new subroutine __n() for plural strings handling. Update documentation to include a description of the new __n() subroutine. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14i18n: add--interactive: mark strings with interpolation for translationLibravatar Vasco Almeida1-12/+13
Since at this point Git::I18N.perl lacks support for Perl i18n placeholder substitution, use of sprintf following die or error_msg is necessary for placeholder substitution take place. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14i18n: add--interactive: mark simple here-documents for translationLibravatar Vasco Almeida1-3/+5
Mark messages in here-documents without interpolation for translation. The here-document delimiter \EOF, which is the same as 'EOF', indicates that the text is to be treated literally without interpolation of its content. Unfortunately xgettext is not able to extract here-documents delimited with \EOF but it is with delimiter enclosed in single quotes. So change \EOF to 'EOF', although in this case does not make difference what variation of here-document to use since there is nothing to interpolate. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-14i18n: add--interactive: mark strings for translationLibravatar Vasco Almeida1-34/+42
Mark simple strings (without interpolation) for translation. Brackets around first parameter of ternary operator is necessary because otherwise xgettext fails to extract strings marked for translation from the rest of the file. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19diff: improve positioning of add/delete blocks in diffsLibravatar Michael Haggerty1-1/+4
Some groups of added/deleted lines in diffs can be slid up or down, because lines at the edges of the group are not unique. Picking good shifts for such groups is not a matter of correctness but definitely has a big effect on aesthetics. For example, consider the following two diffs. The first is what standard Git emits: --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) { } if (!$smtp_server) { + $smtp_server = $repo->config('sendemail.smtpserver'); +} +if (!$smtp_server) { foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) { if (-x $_) { $smtp_server = $_; The following diff is equivalent, but is obviously preferable from an aesthetic point of view: --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) { $initial_reply_to =~ s/(^\s+|\s+$)//g; } +if (!$smtp_server) { + $smtp_server = $repo->config('sendemail.smtpserver'); +} if (!$smtp_server) { foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) { if (-x $_) { This patch teaches Git to pick better positions for such "diff sliders" using heuristics that take the positions of nearby blank lines and the indentation of nearby lines into account. The existing Git code basically always shifts such "sliders" as far down in the file as possible. The only exception is when the slider can be aligned with a group of changed lines in the other file, in which case Git favors depicting the change as one add+delete block rather than one add and a slightly offset delete block. This naive algorithm often yields ugly diffs. Commit d634d61ed6 improved the situation somewhat by preferring to position add/delete groups to make their last line a blank line, when that is possible. This heuristic does more good than harm, but (1) it can only help if there are blank lines in the right places, and (2) always picks the last blank line, even if there are others that might be better. The end result is that it makes perhaps 1/3 as many errors as the default Git algorithm, but that still leaves a lot of ugly diffs. This commit implements a new and much better heuristic for picking optimal "slider" positions using the following approach: First observe that each hypothetical positioning of a diff slider introduces two splits: one between the context lines preceding the group and the first added/deleted line, and the other between the last added/deleted line and the first line of context following it. It tries to find the positioning that creates the least bad splits. Splits are evaluated based only on the presence and locations of nearby blank lines, and the indentation of lines near the split. Basically, it prefers to introduce splits adjacent to blank lines, between lines that are indented less, and between lines with the same level of indentation. In more detail: 1. It measures the following characteristics of a proposed splitting position in a `struct split_measurement`: * the number of blank lines above the proposed split * whether the line directly after the split is blank * the number of blank lines following that line * the indentation of the nearest non-blank line above the split * the indentation of the line directly below the split * the indentation of the nearest non-blank line after that line 2. It combines the measured attributes using a bunch of empirically-optimized weighting factors to derive a `struct split_score` that measures the "badness" of splitting the text at that position. 3. It combines the `split_score` for the top and the bottom of the slider at each of its possible positions, and selects the position that has the best `split_score`. I determined the initial set of weighting factors by collecting a corpus of Git histories from 29 open-source software projects in various programming languages. I generated many diffs from this corpus, and determined the best positioning "by eye" for about 6600 diff sliders. I used about half of the repositories in the corpus (corresponding to about 2/3 of the sliders) as a training set, and optimized the weights against this corpus using a crude automated search of the parameter space to get the best agreement with the manually-determined values. Then I tested the resulting heuristic against the full corpus. The results are summarized in the following table, in column `indent-1`: | repository | count | Git 2.9.0 | compaction | compaction-fixed | indent-1 | indent-2 | | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- | | afnetworking | 109 | 89 (81.7%) | 37 (33.9%) | 37 (33.9%) | 2 (1.8%) | 2 (1.8%) | | alamofire | 30 | 18 (60.0%) | 14 (46.7%) | 15 (50.0%) | 0 (0.0%) | 0 (0.0%) | | angular | 184 | 127 (69.0%) | 39 (21.2%) | 23 (12.5%) | 5 (2.7%) | 5 (2.7%) | | animate | 313 | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | | ant | 380 | 356 (93.7%) | 152 (40.0%) | 148 (38.9%) | 15 (3.9%) | 15 (3.9%) | * | bugzilla | 306 | 263 (85.9%) | 109 (35.6%) | 99 (32.4%) | 14 (4.6%) | 15 (4.9%) | * | corefx | 126 | 91 (72.2%) | 22 (17.5%) | 21 (16.7%) | 6 (4.8%) | 6 (4.8%) | | couchdb | 78 | 44 (56.4%) | 26 (33.3%) | 28 (35.9%) | 6 (7.7%) | 6 (7.7%) | * | cpython | 937 | 158 (16.9%) | 50 (5.3%) | 49 (5.2%) | 5 (0.5%) | 5 (0.5%) | * | discourse | 160 | 95 (59.4%) | 42 (26.2%) | 36 (22.5%) | 18 (11.2%) | 13 (8.1%) | | docker | 307 | 194 (63.2%) | 198 (64.5%) | 253 (82.4%) | 8 (2.6%) | 8 (2.6%) | * | electron | 163 | 132 (81.0%) | 38 (23.3%) | 39 (23.9%) | 6 (3.7%) | 6 (3.7%) | | git | 536 | 470 (87.7%) | 73 (13.6%) | 78 (14.6%) | 16 (3.0%) | 16 (3.0%) | * | gitflow | 127 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | | ionic | 133 | 89 (66.9%) | 29 (21.8%) | 38 (28.6%) | 1 (0.8%) | 1 (0.8%) | | ipython | 482 | 362 (75.1%) | 167 (34.6%) | 169 (35.1%) | 11 (2.3%) | 11 (2.3%) | * | junit | 161 | 147 (91.3%) | 67 (41.6%) | 66 (41.0%) | 1 (0.6%) | 1 (0.6%) | * | lighttable | 15 | 5 (33.3%) | 0 (0.0%) | 2 (13.3%) | 0 (0.0%) | 0 (0.0%) | | magit | 88 | 75 (85.2%) | 11 (12.5%) | 9 (10.2%) | 1 (1.1%) | 0 (0.0%) | | neural-style | 28 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | | nodejs | 781 | 649 (83.1%) | 118 (15.1%) | 111 (14.2%) | 4 (0.5%) | 5 (0.6%) | * | phpmyadmin | 491 | 481 (98.0%) | 75 (15.3%) | 48 (9.8%) | 2 (0.4%) | 2 (0.4%) | * | react-native | 168 | 130 (77.4%) | 79 (47.0%) | 81 (48.2%) | 0 (0.0%) | 0 (0.0%) | | rust | 171 | 128 (74.9%) | 30 (17.5%) | 27 (15.8%) | 16 (9.4%) | 14 (8.2%) | | spark | 186 | 149 (80.1%) | 52 (28.0%) | 52 (28.0%) | 2 (1.1%) | 2 (1.1%) | | tensorflow | 115 | 66 (57.4%) | 48 (41.7%) | 48 (41.7%) | 5 (4.3%) | 5 (4.3%) | | test-more | 19 | 15 (78.9%) | 2 (10.5%) | 2 (10.5%) | 1 (5.3%) | 1 (5.3%) | * | test-unit | 51 | 34 (66.7%) | 14 (27.5%) | 8 (15.7%) | 2 (3.9%) | 2 (3.9%) | * | xmonad | 23 | 22 (95.7%) | 2 (8.7%) | 2 (8.7%) | 1 (4.3%) | 1 (4.3%) | * | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- | | totals | 6668 | 4391 (65.9%) | 1496 (22.4%) | 1491 (22.4%) | 150 (2.2%) | 144 (2.2%) | | totals (training set) | 4552 | 3195 (70.2%) | 1053 (23.1%) | 1061 (23.3%) | 86 (1.9%) | 88 (1.9%) | | totals (test set) | 2116 | 1196 (56.5%) | 443 (20.9%) | 430 (20.3%) | 64 (3.0%) | 56 (2.6%) | In this table, the numbers are the count and percentage of human-rated sliders that the corresponding algorithm got *wrong*. The columns are * "repository" - the name of the repository used. I used the diffs between successive non-merge commits on the HEAD branch of the corresponding repository. * "count" - the number of sliders that were human-rated. I chose most, but not all, sliders to rate from those among which the various algorithms gave different answers. * "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0. * "compaction" - the heuristic used by `git diff --compaction-heuristic` in Git 2.9.0. * "compaction-fixed" - the heuristic used by `git diff --compaction-heuristic` after the fixes from earlier in this patch series. Note that the results are not dramatically different than those for "compaction". Both produce non-ideal diffs only about 1/3 as often as the default `git diff`. * "indent-1" - the new `--indent-heuristic` algorithm, using the first set of weighting factors, determined as described above. * "indent-2" - the new `--indent-heuristic` algorithm, using the final set of weighting factors, determined as described below. * `*` - indicates that repo was part of training set used to determine the first set of weighting factors. The fact that the heuristic performed nearly as well on the test set as on the training set in column "indent-1" is a good indication that the heuristic was not over-trained. Given that fact, I ran a second round of optimization, using the entire corpus as the training set. The resulting set of weights gave the results in column "indent-2". These are the weights included in this patch. The final result gives consistently and significantly better results across the whole corpus than either `git diff` or `git diff --compaction-heuristic`. It makes only about 1/30 as many errors as the former and about 1/10 as many errors as the latter. (And a good fraction of the remaining errors are for diffs that involve weirdly-formatted code, sometimes apparently machine-generated.) The tools that were used to do this optimization and analysis, along with the human-generated data values, are recorded in a separate project [1]. This patch adds a new command-line option `--indent-heuristic`, and a new configuration setting `diff.indentHeuristic`, that activate this heuristic. This interface is only meant for testing purposes, and should be finalized before including this change in any release. [1] https://github.com/mhagger/diff-slider-tools Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-16add--interactive: respect diff.compactionHeuristicLibravatar Jeff King1-0/+4
We use plumbing to generate the diff, so it doesn't automatically pick up UI config like compactionHeuristic. Let's forward it on, since interactive adding is porcelain. Note that we only need to handle the "true" case. There's no point in passing --no-compaction-heuristic when the variable is false, since nothing else could have turned it on. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-28add--interactive: allow custom diff highlighting programsLibravatar Jeff King1-2/+10
The patch hunk selector of add--interactive knows how ask git for colorized diffs, and correlate them with the uncolored diffs we apply. But there's not any way for somebody who uses a diff-filter tool like contrib's diff-highlight to see their normal highlighting. This patch lets users define an arbitrary shell command to pipe the colorized diff through. The exact output shouldn't matter (since we just show the result to humans) as long as it is line-compatible with the original diff (so that hunk-splitting can split the colorized version, too). I left two minor issues with the new system that I don't think are worth fixing right now, but could be done later: 1. We only filter colorized diffs. Theoretically a user could want to filter a non-colorized diff, but I find it unlikely in practice. Users who are doing things like diff-highlighting are likely to want color, too. 2. add--interactive will re-colorize a diff which has been hand-edited, but it won't have run through the filter. Fixing this is conceptually easy (just pipe the diff through the filter), but practically hard to do without using tempfiles (it would need to feed data to and read the result from the filter without deadlocking; this raises portability questions with respect to Windows). I've punted on both issues for now, and if somebody really cares later, they can do a patch on top. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-17Merge branch 'ak/add-i-empty-candidates'Libravatar Junio C Hamano1-0/+5
The interactive "show a list and let the user choose from it" interface "add -i" used showed and prompted to the user even when the candidate list was empty, against which the only "choice" the user could have made was to choose nothing. * ak/add-i-empty-candidates: add -i: return from list_and_choose if there is no candidate
2015-01-22add -i: return from list_and_choose if there is no candidateLibravatar Alexander Kuleshov1-0/+5
The list_and_choose() helper is given a prompt and a list, asks the user to make selection from the list, and then returns a list of items chosen. Even when it is given an empty list as the original candidate set to choose from, it gave a prompt to the user, who can only say "I am done choosing". Return an empty result when the input is an empty list without bothering the user. The existing caller must already have a logic to say "Nothing to do" or an equivalent when the returned list is empty (i.e. the user chose to select nothing) if it is necessary, so no change to the callers is necessary. This fixes the case where "add untracked" is asked in "git add -i" and there is no untracked files in the working tree. We used to give an empty list of files to choose from with a prompt, but with this change, we no longer do. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-15add--interactive: leave main loop on read errorLibravatar Jeff King1-0/+1
The main hunk loop for add--interactive will loop if it does not get a known input. This is a good thing if the user typed some invalid input. However, if we have an uncorrectable read error, we'll end up looping infinitely. We can fix this by noticing read errors (i.e., <STDIN> returns undef) and breaking out of the loop. One easy way to trigger this is if you have an editor that does not take over the terminal (e.g., one that spawns a window in an existing process and waits), start the editor with the hunk-edit command, and hit ^C to send SIGINT. The editor process dies due to SIGINT, but the perl add--interactive process does not (perl suspends SIGINT for the duration of our system() call). We return to the main loop, but further reads from stdin don't work. The SIGINT _also_ killed our parent git process, which orphans our process group, meaning that further reads from the terminal will always fail. We loop infinitely, getting EIO on each read. Note that there are several other spots where we read from stdin, too. However, in each of those cases, we do something sane when the read returns undef (breaking out of the loop, taking the input as "no", etc). They don't need similar treatment. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-08Merge branch 'jl/nor-or-nand-and'Libravatar Junio C Hamano1-2/+2
Eradicate mistaken use of "nor" (that is, essentially "nor" used not in "neither A nor B" ;-)) from in-code comments, command output strings, and documentations. * jl/nor-or-nand-and: code and test: fix misuses of "nor" comments: fix misuses of "nor" contrib: fix misuses of "nor" Documentation: fix misuses of "nor"
2014-03-31code and test: fix misuses of "nor"Libravatar Justin Lebar1-2/+2
Signed-off-by: Justin Lebar <jlebar@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03git-add--interactive: warn if module for interactive.singlekey is missingLibravatar Simon Ruderich1-0/+3
Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Simon Ruderich <simon@ruderich.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-25add-interactive: handle unborn branch in patch modeLibravatar Jeff King1-9/+13
The list_modified function already knows how to handle an unborn branch by diffing against the empty tree. However, the diff we perform to get the actual hunks does not. Let's use the same logic for both diffs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-04add--interactive: fix external command invocation on WindowsLibravatar Johannes Sixt1-1/+1
Back in 21e9757e (Hack git-add--interactive to make it work with ActiveState Perl, 2007-08-01), the invocation of external commands was changed to use qx{} on Windows. The rationale was that the command interpreter on Windows is not a POSIX shell, but rather Windows's CMD. That patch was wrong to include 'msys' in the check whether to use qx{} or not: 'msys' identifies MSYS perl as shipped with Git for Windows, which does not need the special treatment; qx{} should be used only with ActiveState perl, which is identified by 'MSWin32'. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-23add -i: add extra options at the right place in "diff" command lineLibravatar Junio C Hamano1-1/+1
Appending "--diff-algorithm=histogram" at the end of canned command line for various modes of "diff" is correct for most of them but not for "stash" that has a non-option already wired in, like so: 'stash' => { DIFF => 'diff-index -p HEAD', Appending an extra option after non-option may happen to work due to overly lax command line parser, but that is not something we should rely on. Instead, splice in the extra argument immediately after the command name (i.e. 'diff-index', 'diff-files', etc.). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-12add--interactive: respect diff.algorithmLibravatar John Keeping1-0/+5
When staging hunks interactively it is sometimes useful to use an alternative diff algorithm which splits the changes into hunks in a more logical manner. This is not possible because the plumbing commands called by add--interactive ignore the "diff.algorithm" configuration option (as they should). Since add--interactive is a porcelain command it should respect this configuration variable. To do this, make it read diff.algorithm and pass its value to the underlying diff-index and diff-files invocations. At this point, do not add options to "git add", "git reset" or "git checkout" (all of which can call git-add--interactive). If a user wants to override the value on the command line they can use: git -c diff.algorithm=$ALGO ... Signed-off-by: John Keeping <john@keeping.me.uk> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>