summaryrefslogtreecommitdiff
path: root/t/t4034/cpp/pre
AgeCommit message (Collapse)AuthorFilesLines
2021-10-25userdiff-cpp: back out the digit-separators in numbersLibravatar Johannes Sixt1-4/+4
The implementation of digit-separating single-quotes introduced a note-worthy regression: the change of a character literal with a digit would splice the digit and the closing single-quote. For example, the change from 'a' to '2' is now tokenized as '[-a'-]{+2'+} instead of '[-a-]{+2+}'. The options to fix the regression are: - Tighten the regular expression such that the single-quote can only occur between digits (that would match the official syntax). - Remove support for digit separators. I chose to remove support, because - I have not seen a lot of code make use of digit separators. - If code does use digit separators, then the numbers are typically long. If a change in one of the segments occurs, it is actually better visible if only that segment is highlighted as the word that changed instead of the whole long number. This choice does introduce another minor regression, though, which is highlighted in the test case: when a change occurs in the second or later segment of a hexadecimal number where the segment begins with a digit, but also has letters, the segment is mistaken as consisting of a number and an identifier. I can live with that. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-10userdiff-cpp: prepare test cases with yet unsupported featuresLibravatar Johannes Sixt1-5/+5
We are going to add support for C++'s digit-separating single-quote and the spaceship operator. By adding the test cases in this separate commit, the effect on the word highlighting will become more obvious as the features are implemented and the file cpp/expect is updated. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08t4034: add tests showing problematic cpp tokenizationsLibravatar Johannes Sixt1-1/+15
The word regex is too loose and matches long streaks of characters that should actually be separate tokens. Add these problematic test cases. Separate the lines with text that will remain identical in the pre- and post-image so that the diff algorithm will not lump removals and additions of consecutive lines together. This makes the expected output easier to read. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08t4034/cpp: actually test that operator tokens are not splitLibravatar Johannes Sixt1-14/+11
8d96e7288f2b (t4034: bulk verify builtin word regex sanity, 2010-12-18) added many tests with the intent to verify that operators consisting of more than one symbol are kept together. These are tested by probing a transition from, e.g., a!=b to x!=y, which results in the word-diff [-a-]{+x+}!=[-b-]{+y+} But that proves only that the letters and operators are separate tokens. To prove that != is an unseparable token, we have to probe a transition from, e.g., a=b to a!=b having a word-diff a[-=-]{+!=+}b that proves that the ! is not separate from the =. In the post-image, add to or remove from operators a character that turns it into another valid operator. Change the identifiers used around operators such that the diff algorithm does not have an incentive to match, e.g., a<b in one spot in the pre-image with a<b elsewhere in the post-image. Adjust the expected output to match the new differences. Notice that there are some undesirable tokenizations around e, ., and -. This will be addressed in a later change. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18t4034: bulk verify builtin word regex sanityLibravatar Thomas Rast1-0/+19
The builtin word regexes should be tested with some simple examples against simple issues. Do this in bulk. Mainly due to a lack of language knowledge and inspiration, most of the test cases (cpp, csharp, java, objc, pascal, php, python, ruby) are directly based off a C operator precedence table to verify that all operators are split correctly. This means that they are probably incomplete or inaccurate except for 'cpp' itself. Still, they are good enough to already have uncovered a typo in the python and ruby patterns. 'fortran' is based on my anecdotal knowledge of the DO10I parsing rules, and thus probably useless. The rest (bibtex, html, tex) are an ad-hoc test of what I consider important splits in those languages. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>