tgif.git - Terin's Improved Git Fork

diff options

author	Johannes Sixt <j6t@kdbg.org>	2021-10-08 19:09:55 +0000
committer	Junio C Hamano <gitster@pobox.com>	2021-10-08 13:04:07 -0700
commit	350b87cd658553598a269fdd320ca05ee4789a10 (patch)
tree	06a7cd33e6eab78d7d3ca4bffaf9b8956791384f /t/t4034/cpp/post
parent	t4034: add tests showing problematic cpp tokenizations (diff)
download	tgif-350b87cd658553598a269fdd320ca05ee4789a10.tar.xz

userdiff-cpp: tighten word regex

Generally, word regex can be written such that they match tokens liberally and need not model the actual syntax because it can be assumed that the regex will only be applied to syntactically correct text. The regex for cpp (C/C++) is too liberal, though. It regards these sequences as single tokens: 1+2 1.5-e+2+f and the following amalgams as one token: .l as in str.length .f as in str.find .e as in str.erase Tighten the regex in the following way: - Accept + and - only in one position in the exponent. + and - are no longer regarded as the sign of a number and are treated by the catcher-all that is not visible in the driver's regex. - Accept a leading decimal point only when it is followed by a digit. For readability, factor hex- and binary numbers into an own term. As a drive-by, this fixes that floating point numbers such as 12E5 (with upper-case E) were split into two tokens. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Diffstat (limited to 't/t4034/cpp/post')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: