summaryrefslogtreecommitdiff
path: root/t/t4034-diff-words.sh
AgeCommit message (Collapse)AuthorFilesLines
2012-03-14diff: tweak a _copy_ of diff_options with word-diffLibravatar Thomas Rast1-1/+1
When using word diff, the code sets the word_regex from various defaults if it was not set already. The problem is that it does this on the original diff_options, which will also be used in subsequent diffs. This means that when the word_regex is not given on the command line, only the first diff for which a setting for word_regex (either from attributes or diff.wordRegex) ever takes effect. This value then propagates to the rest of the diff runs and in particular prevents further attribute lookups. Fix the problem of changing diff state once and for all, by working with a _copy_ of the diff_options. Noticed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-14t4034: diff.*.wordregex should not be "sticky" in --word-diffLibravatar Johannes Sixt1-0/+36
The test case applies a custom wordRegex to one file in a diff, and expects that the default word splitting applies to the second file in the diff. But the custom wordRegex is also incorrectly used for the second file. Helped-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-18Merge branch 'tr/maint-word-diff-incomplete-line'Libravatar Junio C Hamano1-0/+14
* tr/maint-word-diff-incomplete-line: word-diff: ignore '\ No newline at eof' marker
2012-01-12word-diff: ignore '\ No newline at eof' markerLibravatar Thomas Rast1-0/+14
The word-diff logic accumulates + and - lines until another line type appears (normally [ @\]), at which point it generates the word diff. This is usually correct, but it breaks when the preimage does not have a newline at EOF: $ printf "%s" "a a a" >a $ printf "%s\n" "a ab a" >b $ git diff --no-index --word-diff a b diff --git 1/a 2/b index 9f68e94..6a7c02f 100644 --- 1/a +++ 2/b @@ -1 +1 @@ [-a a a-] No newline at end of file {+a ab a+} Because of the order of the lines in a unified diff @@ -1 +1 @@ -a a a \ No newline at end of file +a ab a the '\' line flushed the buffers, and the - and + lines were never matched with each other. A proper fix would defer such markers until the end of the hunk. However, word-diff is inherently whitespace-ignoring, so as a cheap fix simply ignore the marker (and hide it from the output). We use a prefix match for '\ ' to parallel the logic in apply.c:parse_fragment(). We currently do not localize this string (just accept other variants of it in git-apply), but this should be future-proof. Noticed-by: Ivan Shirokoff <shirokoff@yandex-team.ru> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-15Add built-in diff patterns for MATLAB codeLibravatar Gustaf Hendeby1-0/+1
MATLAB is often used in industry and academia for scientific computations motivating it being included as a built-in pattern. Signed-off-by: Gustaf Hendeby <hendeby@isy.liu.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20do not read beyond end of malloc'd bufferLibravatar Jim Meyering1-0/+26
With diff.suppress-blank-empty=true, "git diff --word-diff" would output data that had been read from uninitialized heap memory. The problem was that fn_out_consume did not account for the possibility of a line with length 1, i.e., the empty context line that diff.suppress-blank-empty=true converts from " \n" to "\n". Since it assumed there would always be a prefix character (the space), it decremented "len" unconditionally, thus passing len=0 to emit_line, which would then blindly call emit_line_0 with len=-1 which would pass that value on to fwrite as SIZE_MAX. Boom. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18t4034 (diff --word-diff): add a minimum Perl drier test vectorLibravatar Junio C Hamano1-0/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18t4034 (diff --word-diff): style suggestionsLibravatar Jonathan Nieder1-245/+205
Rearrange code to be easier to browse: - first data - then functions - then test assertions Mark up inline test vectors as cat >vector <<-\EOF data data EOF for visual scannability. Use words like "set up" for tests that set up for other tests, to make it obvious which tests are safe to skip. Use repeated function calls instead of a loop for the language-specific tests, so the invocations can be easily tweaked individually (for example if one starts to fail). This means if you add a new subdirectory to t4034/, it will not be automatically used. I think that's worth it for the added explicitness. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18t4034: bulk verify builtin word regex sanityLibravatar Thomas Rast1-0/+15
The builtin word regexes should be tested with some simple examples against simple issues. Do this in bulk. Mainly due to a lack of language knowledge and inspiration, most of the test cases (cpp, csharp, java, objc, pascal, php, python, ruby) are directly based off a C operator precedence table to verify that all operators are split correctly. This means that they are probably incomplete or inaccurate except for 'cpp' itself. Still, they are good enough to already have uncovered a typo in the python and ruby patterns. 'fortran' is based on my anecdotal knowledge of the DO10I parsing rules, and thus probably useless. The rest (bibtex, html, tex) are an ad-hoc test of what I consider important splits in those languages. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-24Merge branch 'en/and-cascade-tests'Libravatar Junio C Hamano1-2/+2
* en/and-cascade-tests: (25 commits) t4124 (apply --whitespace): use test_might_fail t3404: do not use 'describe' to implement test_cmp_rev t3404 (rebase -i): introduce helper to check position of HEAD t3404 (rebase -i): move comment to description t3404 (rebase -i): unroll test_commit loops t3301 (notes): use test_expect_code for clarity t1400 (update-ref): use test_must_fail t1502 (rev-parse --parseopt): test exit code from "-h" t6022 (renaming merge): chain test commands with && test-lib: introduce test_line_count to measure files tests: add missing &&, batch 2 tests: add missing && Introduce sane_unset and use it to ensure proper && chaining t7800 (difftool): add missing && t7601 (merge-pull-config): add missing && t7001 (mv): add missing && t6016 (rev-list-graph-simplify-history): add missing && t5602 (clone-remote-exec): add missing && t4026 (color): remove unneeded and unchained command t4019 (diff-wserror): add lots of missing && ... Conflicts: t/t7006-pager.sh
2010-11-09tests: add missing &&Libravatar Jonathan Nieder1-2/+2
Breaks in a test assertion's && chain can potentially hide failures from earlier commands in the chain. Commands intended to fail should be marked with !, test_must_fail, or test_might_fail. The examples in this patch do not require that. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-20test-lib: extend test_decode_color to handle more color codesLibravatar Kevin Ballard1-36/+36
Enhance the test_decode_color function to handle all common color codes, including background colors and escapes that contain multiple codes. This change necessitates changing <WHITE> to <BOLD>, so update t4034 as well. This change is necessary for the next commit in order to test background colors properly. Signed-off-by: Kevin Ballard <kevin@sb.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-14diff: add --word-diff option that generalizes --color-wordsLibravatar Thomas Rast1-0/+122
This teaches the --color-words engine a more general interface that supports two new modes: * --word-diff=plain, inspired by the 'wdiff' utility (most similar to 'wdiff -n <old> <new>'): uses delimiters [-removed-] and {+added+} * --word-diff=porcelain, which generates an ad-hoc machine readable format: - each diff unit is prefixed by [-+ ] and terminated by newline as in unified diff - newlines in the input are output as a line consisting only of a tilde '~' Both of these formats still support color if it is enabled, using it to highlight the differences. --color-words becomes a synonym for --word-diff=color, which is the color-only format. Also adds some compatibility/convenience options. Thanks to Junio C Hamano and Miles Bader for good ideas. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-27Merge branch 'jk/1.7.0-status'Libravatar Junio C Hamano1-19/+9
* jk/1.7.0-status: status/commit: do not suggest "reset HEAD <path>" while merging commit/status: "git add <path>" is not necessarily how to resolve commit/status: check $GIT_DIR/MERGE_HEAD only once t7508-status: test all modes with color t7508-status: status --porcelain ignores relative paths setting status: reduce duplicated setup code status: disable color for porcelain format status -s: obey color.status builtin-commit: refactor short-status code into wt-status.c t7508-status.sh: Add tests for status -s status -s: respect the status.relativePaths option docs: note that status configuration affects only long format commit: support alternate status formats status: add --porcelain output format status: refactor format option parsing status: refactor short-mode printing to its own function status: typo fix in usage git status: not "commit --dry-run" anymore git stat -s: short status output git stat: the beginning of "status that is not a dry-run of commit" Conflicts: t/t4034-diff-words.sh wt-status.c
2009-12-08t7508-status: test all modes with colorLibravatar Michael J Gruber1-16/+7
Move a useful script function to decode colored output to text form from t4034 and use it in this test as well. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-28Give the hunk comment its own colorLibravatar Bert Wesarg1-1/+3
Inspired by the coloring of quilt. Introduce a separate color and paint the hunk comment part, i.e. the name of the function, in a separate color "diff.func" (defaults to plain). Whitespace between hunk header and hunk comment is printed in plain color. Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-27emit_line(): don't emit an empty <SET><RESET> followed by a newlineLibravatar Junio C Hamano1-4/+4
When emit_line() is called with an empty line (but non-zero length, as we send line terminating LF or CRLF to the function), it used to emit <SET><RESET> followed by a newline. Stop the wastefulness. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-30diff --color-words -U0: fix the location of hunk headersLibravatar Johannes Schindelin1-1/+1
Colored word diff without context lines firstly printed all the hunk headers among each other and then printed the diff. This was due to the code relying on getting at least one context line at the end of each hunk, where the colored words would be flushed (it is done that way to be able to ignore rewrapped lines). Noticed by Markus Heidelberg. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-30t4034-diff-words: add a test for word diff without contextLibravatar Markus Heidelberg1-0/+20
Signed-off-by: Markus Heidelberg <markus.heidelberg@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-21Change the spelling of "wordregex".Libravatar Boyd Stephen Smith Jr1-4/+4
Use "wordRegex" for configuration variable names. Use "word_regex" for C language tokens. Signed-off-by: Boyd Stephen Smith Jr. <bss@iguanasuicide.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-21color-words: Support diff.wordregex config optionLibravatar Boyd Stephen Smith Jr1-2/+43
When diff is invoked with --color-words (w/o =regex), use the regular expression the user has configured as diff.wordregex. diff drivers configured via attributes take precedence over the diff.wordregex-words setting. If the user wants to change them, they have their own configuration variables. Signed-off-by: Boyd Stephen Smith Jr <bss@iguanasuicide.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17color-words: make regex configurable via attributesLibravatar Thomas Rast1-0/+36
Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17color-words: take an optional regular expression describing wordsLibravatar Johannes Schindelin1-0/+57
In some applications, words are not delimited by white space. To allow for that, you can specify a regular expression describing what makes a word with git diff --color-words='[A-Za-z0-9]+' Note that words cannot contain newline characters. As suggested by Thomas Rast, the words are the exact matches of the regular expression. Note that a regular expression beginning with a '^' will match only a word at the beginning of the hunk, not a word at the beginning of a line, and is probably not what you want. This commit contains a quoting fix by Thomas Rast. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17color-words: change algorithm to allow for 0-character word boundariesLibravatar Johannes Schindelin1-0/+66
Up until now, the color-words code assumed that word boundaries are identical to white space characters. Therefore, it could get away with a very simple scheme: it copied the hunks, substituted newlines for each white space character, called libxdiff with the processed text, and then identified the text to output by the offsets (which agreed since the original text had the same length). This code was ugly, for a number of reasons: - it was impossible to introduce 0-character word boundaries, - we had to print everything word by word, and - the code needed extra special handling of newlines in the removed part. Fix all of these issues by processing the text such that - we build word lists, separated by newlines, - we remember the original offsets for every word, and - after calling libxdiff on the wordlists, we parse the hunk headers, and find the corresponding offsets, and then - we print the removed/added parts in one go. The pre and post samples in the test were provided by Santi BĂ©jar. Note that there is some strange special handling of hunk headers where one line range is 0 due to POSIX: in this case, the start is one too low. In other words a hunk header '@@ -1,0 +2 @@' actually means that the line must be added after the _second_ line of the pre text, _not_ the first. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>