tgif.git - Terin's Improved Git Fork

diff options

author	Ævar Arnfjörð Bjarmason <avarab@gmail.com>	2019-06-28 01:39:05 +0200
committer	Junio C Hamano <gitster@pobox.com>	2019-06-28 09:11:09 -0700
commit	44570188a0e324048decf06b845d34c45b08a4fa (patch)
tree	34599afabc8325f61c8da6d4b3eba0b2807fc666 /t/t4013/diff.diff-tree_--pretty_-p_side
parent	log tests: test regex backends in "--encode=<enc>" tests (diff)
download	tgif-44570188a0e324048decf06b845d34c45b08a4fa.tar.xz

grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>"

Fix a bug introduced in 18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was missed due to a blindspot in our tests, as discussed in the previous commit. I then blindly copied the same bug in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) when adding the PCRE v2 code. We should not tell PCRE that we're processing UTF-8 just because we're dealing with non-ASCII. In the case of e.g. "log --encoding=<...>" under is_utf8_locale() the haystack might be in ISO-8859-1, and the needle might be in a non-UTF-8 encoding. Maybe we should be more strict here and die earlier? Should we also be converting the needle to the encoding in question, and failing if it's not a string that's valid in that encoding? Maybe. But for now matching this as non-UTF8 at least has some hope of producing sensible results, since we know that our default heuristic of assuming the text to be matched is in the user locale encoding isn't true when we've explicitly encoded it to be in a different encoding. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Diffstat (limited to 't/t4013/diff.diff-tree_--pretty_-p_side')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: