diff options
author | René Scharfe <l.s.r@web.de> | 2021-12-18 20:50:02 +0100 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2021-12-20 12:45:02 -0800 |
commit | dc2c44fbb100fa609174d9069a70e2b54b0591ca (patch) | |
tree | ed54967a4db81bca347a379d243850586a035c80 | |
parent | Git 2.34.1 (diff) | |
download | tgif-dc2c44fbb100fa609174d9069a70e2b54b0591ca.tar.xz |
grep/pcre2: use PCRE2_UTF even with ASCII patterns
compile_pcre2_pattern() currently uses the option PCRE2_UTF only for
patterns with non-ASCII characters. Patterns with ASCII wildcards can
match non-ASCII strings, though. Without that option PCRE2 mishandles
UTF-8 input, though -- it matches parts of multi-byte characters. Fix
that by using PCRE2_UTF even for ASCII-only patterns.
This is a remake of the reverted ae39ba431a (grep/pcre2: fix an edge
case concerning ascii patterns and UTF-8 data, 2021-10-15). The change
to the condition and the test are simplified and more targeted.
Original-patch-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
-rw-r--r-- | grep.c | 2 | ||||
-rwxr-xr-x | t/t7812-grep-icase-non-ascii.sh | 6 |
2 files changed, 7 insertions, 1 deletions
@@ -382,7 +382,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt } options |= PCRE2_CASELESS; } - if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) && + if (!opt->ignore_locale && is_utf8_locale() && !(!opt->ignore_case && (p->fixed || p->is_fixed))) options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF); diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh index e5d1e4ea68..ca3f24f807 100755 --- a/t/t7812-grep-icase-non-ascii.sh +++ b/t/t7812-grep-icase-non-ascii.sh @@ -123,4 +123,10 @@ test_expect_success GETTEXT_LOCALE,LIBPCRE2,PCRE2_MATCH_INVALID_UTF 'PCRE v2: gr test_cmp invalid-0xe5 actual ' +test_expect_success GETTEXT_LOCALE,LIBPCRE2 'PCRE v2: grep non-literal ASCII from UTF-8' ' + git grep --perl-regexp -h -o -e ll. file >actual && + echo "lló" >expected && + test_cmp expected actual +' + test_done |