summaryrefslogtreecommitdiff
path: root/grep.c
AgeCommit message (Collapse)AuthorFilesLines
2017-06-24Merge branch 'ab/pcre-v2'Libravatar Junio C Hamano1-1/+1
Hotfix for a topic already in 'master'. * ab/pcre-v2: grep: fix erroneously copy/pasted variable in check/assert pattern
2017-06-21grep: fix erroneously copy/pasted variable in check/assert patternLibravatar Ævar Arnfjörð Bjarmason1-1/+1
Fix an erroneously copy/pasted check for the pcre2_jit_stack variable to check pcre2_match_context instead. The former was already checked in the preceding "if" statement. This is a trivial and obvious error introduced in my commit 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01). In practice if pcre2_match_context_create() returned NULL we were likely in a situation where malloc() was returning NULL, and were thus screwed anyway, but if only the pcre2_match_context_create() call returned NULL (through some transitory bug) PCRE v2 would just allocate and supply its own context object when matching, and we'd run normally at the trivial expense of not getting a slight speedup by sharing the context object between successive matches. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-19Merge branch 'bw/object-id'Libravatar Junio C Hamano1-10/+9
Conversion from uchar[20] to struct object_id continues. * bw/object-id: (33 commits) diff: rename diff_fill_sha1_info to diff_fill_oid_info diffcore-rename: use is_empty_blob_oid tree-diff: convert path_appendnew to object_id tree-diff: convert diff_tree_paths to struct object_id tree-diff: convert try_to_follow_renames to struct object_id builtin/diff-tree: cleanup references to sha1 diff-tree: convert diff_tree_sha1 to struct object_id notes-merge: convert write_note_to_worktree to struct object_id notes-merge: convert verify_notes_filepair to struct object_id notes-merge: convert find_notes_merge_pair_ps to struct object_id notes-merge: convert merge_from_diffs to struct object_id notes-merge: convert notes_merge* to struct object_id tree-diff: convert diff_root_tree_sha1 to struct object_id combine-diff: convert find_paths_* to struct object_id combine-diff: convert diff_tree_combined to struct object_id diff: convert diff_flush_patch_id to struct object_id patch-ids: convert to struct object_id diff: finish conversion for prepare_temp_file to struct object_id diff: convert reuse_worktree_file to struct object_id diff: convert fill_filespec to struct object_id ...
2017-06-02diff: convert fill_filespec to struct object_idLibravatar Brandon Williams1-1/+1
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02grep: convert to struct object_idLibravatar Brandon Williams1-9/+8
Convert the remaining parts of grep to use struct object_id. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-02grep: add support for PCRE v2Libravatar Ævar Arnfjörð Bjarmason1-0/+145
Add support for v2 of the PCRE API. This is a new major version of PCRE that came out in early 2015[1]. The regular expression syntax is the same, but while the API is similar, pretty much every function is either renamed or takes different arguments. Thus using it via entirely new functions makes sense, as opposed to trying to e.g. have one compile_pcre_pattern() that would call either PCRE v1 or v2 functions. Git can now be compiled with either USE_LIBPCRE1=YesPlease or USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a synonym for the former. Providing both is a compile-time error. With earlier patches to enable JIT for PCRE v1 the performance of the release versions of both libraries is almost exactly the same, with PCRE v2 being around 1% slower. However after I reported this to the pcre-dev mailing list[2] I got a lot of help with the API use from Zoltán Herczeg, he subsequently optimized some of the JIT functionality in v2 of the library. Running the p7820-grep-engines.sh performance test against the latest Subversion trunk of both, with both them and git compiled as -O3, and the test run against linux.git, gives the following results. Just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~5 HEAD~ HEAD p7820-grep-engines.sh [...] Test HEAD~5 HEAD~ HEAD ----------------------------------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.31(1.10+0.48) 0.21(0.35+0.56) -32.3% 0.21(0.34+0.55) -32.3% 7820.7: perl grep '^how to' 0.56(2.70+0.40) 0.24(0.64+0.52) -57.1% 0.20(0.28+0.60) -64.3% 7820.11: perl grep '[how] to' 0.56(2.66+0.38) 0.29(0.95+0.45) -48.2% 0.23(0.45+0.54) -58.9% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.02(5.77+0.42) 0.31(1.02+0.54) -69.6% 0.23(0.50+0.54) -77.5% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.38(1.57+0.42) 0.27(0.85+0.46) -28.9% 0.21(0.33+0.57) -44.7% See commit ("perf: add a comparison test of grep regex engines", 2017-04-19) for details on the machine the above test run was executed on. Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine mentioning p7820-grep-engines.sh for more details on the test setup. For ease of readability, a different run just of HEAD~ (PCRE v1 with JIT v.s. PCRE v2), again with just the /perl/ tests shown: [...] Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.21(0.42+0.52) 0.21(0.31+0.58) +0.0% 7820.7: perl grep '^how to' 0.25(0.65+0.50) 0.20(0.31+0.57) -20.0% 7820.11: perl grep '[how] to' 0.30(0.90+0.50) 0.23(0.46+0.53) -23.3% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.30(1.19+0.38) 0.23(0.51+0.51) -23.3% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.27(0.84+0.48) 0.21(0.34+0.57) -22.2% I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead, when it does it's around 20% faster. A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3) the compiled pattern can be shared between threads, but not some of the JIT context, however the grep threading support does all pattern & JIT compilation in separate threads, so this code doesn't need to concern itself with thread safety. See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the initial addition of PCRE v1. This change follows some of the same patterns it did (and which were discussed on list at the time), e.g. mocking up types with typedef instead of ifdef-ing them out when USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the program, but makes the code look nicer. 1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html 2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: un-break building with PCRE < 8.32Libravatar Ævar Arnfjörð Bjarmason1-3/+3
Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions earlier than 8.32. The JIT support was added in version 8.20 released on 2011-10-21, but it wasn't until 8.32 released on 2012-11-30 that the fast code path to use the JIT via pcre_jit_exec() was added[1] (see also [2]). This means that versions 8.20 through 8.31 could still use the JIT, but supporting it on those versions would add to the already verbose macro soup around JIT support it, and I don't expect that the use-case of compiling a brand new git against a 5 year old PCRE is particularly common, and if someone does that they can just get the existing pre-JIT slow codepath. So just take the easy way out and disable the JIT on any version older than 8.32. The reason this change isn't part of the initial change PCRE JIT support is to have a cleaner history showing which parts of the implementation are only used for ancient PCRE versions. This also makes it easier to revert this change if we ever decide to stop supporting those old versions. 1. http://www.pcre.org/original/changelog.txt ("28. Introducing a native interface for JIT. Through this interface, the compiled[...]") 2. https://bugs.exim.org/show_bug.cgi?id=2121 Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: add support for the PCRE v1 JIT APILibravatar Ævar Arnfjörð Bjarmason1-4/+36
Change the grep PCRE v1 code to use JIT when available. When PCRE support was initially added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into 8.20 released on 2011-10-21. Enabling JIT support usually improves performance by more than 40%. The pattern compilation times are relatively slower, but those relative numbers are tiny, and are easily made back in all but the most trivial cases of grep. Detailed benchmarks & overview of compilation times is at: http://sljit.sourceforge.net/pcre.html With this change the difference in a t/perf/p7820-grep-engines.sh run is, with just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh Test HEAD~ HEAD --------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.35(1.11+0.43) 0.23(0.42+0.46) -34.3% 7820.7: perl grep '^how to' 0.64(2.71+0.36) 0.27(0.66+0.44) -57.8% 7820.11: perl grep '[how] to' 0.63(2.51+0.42) 0.33(0.98+0.39) -47.6% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.17(5.61+0.35) 0.34(1.08+0.46) -70.9% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.43(1.52+0.44) 0.30(0.88+0.42) -30.2% The conditional support for JIT is implemented as suggested in the pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's not present. The implementation is relatively verbose because even if PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine if the JIT is available, and if so the faster pcre_jit_exec() function should be called instead of pcre_exec(), and a different (but not complimentary!) function needs to be called to free pcre1_extra_info. There's no graceful fallback if pcre_jit_stack_alloc() fails under PCRE_CONFIG_JIT, instead the program will simply abort. I don't think this is worth handling gracefully, it'll only fail in cases where malloc() doesn't work, in which case we're screwed anyway. That there's no assignment of `p->pcre1_jit_on = 0` when PCRE_CONFIG_JIT isn't defined isn't a bug. The create_grep_pat() function allocates the grep_pat allocates it with calloc(), so it's guaranteed to be 0 when PCRE_CONFIG_JIT isn't defined. I you're bisecting and find this change, check that your PCRE isn't older than 8.32. This change intentionally broke really old versions of PCRE, but that's fixed in follow-up commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: move is_fixed() earlier to avoid forward declarationLibravatar Ævar Arnfjörð Bjarmason1-12/+12
Move the is_fixed() function which are currently only used in compile_regexp() earlier so it can be used in the PCRE family of functions in a later change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: change internal *pcre* variable & function names to be *pcre1*Libravatar Ævar Arnfjörð Bjarmason1-26/+26
Change the internal PCRE variable & function names to have a "1" suffix. This is for preparation for libpcre2 support, where having non-versioned names would be confusing. An earlier change in this series ("grep: change the internal PCRE macro names to be PCRE1", 2017-04-07) elaborates on the motivations behind this change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: change the internal PCRE macro names to be PCRE1Libravatar Ævar Arnfjörð Bjarmason1-3/+3
Change the internal USE_LIBPCRE define, & build options flag to use a naming convention ending in PCRE1, without changing the long-standing USE_LIBPCRE Makefile flag which enables this code. This is for preparation for libpcre2 support where having things like USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely need to for backwards compatibility with old Makefile arguments would be confusing. In some ways it would be better to change everything that now uses USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their build scripts. However I'd like to leave the door open to making USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease, i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it the default. This code and the USE_LIBPCRE Makefile argument was added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was no indication that the PCRE project would release an entirely new & incompatible API around 3 years later. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: factor test for \0 in grep patterns into a functionLibravatar Ævar Arnfjörð Bjarmason1-7/+15
Factor the test for \0 in grep patterns into a function. Since commit 9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a \0 is considered fixed as regcomp() can't handle it. This change makes later changes that make use of either has_null() or is_fixed() (but not both) smaller. While I'm at it make the comment conform to the style guide, i.e. add an opening "/*\n". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26grep: remove redundant regflags assignmentsLibravatar Ævar Arnfjörð Bjarmason1-5/+1
Remove redundant assignments to the "regflags" variable. This variable is only used set under GREP_PATTERN_TYPE_ERE, so there's no need to un-set it under GREP_PATTERN_TYPE_{FIXED,BRE,PCRE}. Back in 5010cb5fcc[1], we did do "opt.regflags &= ~REG_EXTENDED" upon seeing "-G" on the command line and flipped the bit on upon seeing "-E", but I think that was perfectly sensible and it would have been a bug if we didn't. They were part of the command line parsing that could have seen "-E" on the command line earlier. When cca2c172 ("git-grep: do not die upon -F/-P when grep.extendedRegexp is set.", 2011-05-09) switched the command line parsing to "read into a 'tentatively this is what we saw the last' variable and then finally commit just once", we didn't touch opt.regflags for PCRE and FIXED, but we still had to flip regflags between BRE and ERE, because parsing of grep.extendedregexp configuration variable directly touched opt.regflags back then, which was done by b22520a3 ("grep: allow -E and -n to be turned on by default via configuration", 2011-03-30). When 84befcd0 ("grep: add a grep.patternType configuration setting", 2012-08-03) introduced extended_regexp_option field, we stopped flipping regflags while reading the configuration, and that was when we should have noticed and stopped dropping REG_EXTENDED bit in the "now we can commit what type to use" helper function. There is no reason to do this anymore, so stop doing it, more to reduce "wait this is used under fixed/BRE/PCRE how?" confusion when reading the code, than to to save ourselves trivial CPU cycles by removing one assignment. 1. "built-in "git grep"", 2006-04-30. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-30convert unchecked snprintf into xsnprintfLibravatar Jeff King1-2/+2
These calls to snprintf should always succeed, because their input is small and fixed. Let's use xsnprintf to make sure this is the case (and to make auditing for actual truncation easier). These could be candidates for turning into heap buffers, but they fall into a few broad categories that make it not worth doing: - formatting single numbers is simple enough that we can see the result should fit - the size of a sha1 is likewise well-known, and I didn't want to cause unnecessary conflicts with the ongoing process to convert these constants to GIT_MAX_HEXSZ - the interface for curl_errorstr is dictated by curl Signed-off-by: Jeff King <peff@peff.net>
2017-03-17grep: set default output methodLibravatar Brandon Williams1-5/+7
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-22grep: add submodules as a grep source typeLibravatar Brandon Williams1-1/+15
Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new type in the various switch statements in grep.c. When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the identifier can either be NULL (to indicate that the working tree will be used) or a SHA1 (the REV of the submodule to be grep'd). If the identifier is a SHA1 then we want to fall through to the `GREP_SOURCE_SHA1` case to handle the copying of the SHA1. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-26Merge branch 'js/regexec-buf'Libravatar Junio C Hamano1-12/+2
Some codepaths in "git diff" used regexec(3) on a buffer that was mmap(2)ed, which may not have a terminating NUL, leading to a read beyond the end of the mapped region. This was fixed by introducing a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND extension. * js/regexec-buf: regex: use regexec_buf() regex: add regexec_buf() that can work on a non NUL-terminated string regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
2016-09-21regex: use regexec_buf()Libravatar Johannes Schindelin1-12/+2
The new regexec_buf() function operates on buffers with an explicitly specified length, rather than NUL-terminated strings. We need to use this function whenever the buffer we want to pass to regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated). Note: the original motivation for this patch was to fix a bug where `git diff -G <regex>` would crash. This patch converts more callers, though, some of which allocated to construct NUL-terminated strings, or worse, modified buffers to temporarily insert NULs while calling regexec(3). By converting them to use regexec_buf(), the code has become much cleaner. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-10Merge branch 'js/am-3-merge-recursive-direct'Libravatar Junio C Hamano1-4/+4
"git am -3" calls "git merge-recursive" when it needs to fall back to a three-way merge; this call has been turned into an internal subroutine call instead of spawning a separate subprocess. * js/am-3-merge-recursive-direct: merge-recursive: flush output buffer even when erroring out merge_trees(): ensure that the callers release output buffer merge-recursive: offer an option to retain the output in 'obuf' merge-recursive: write the commit title in one go merge-recursive: flush output buffer before printing error messages am -3: use merge_recursive() directly again merge-recursive: switch to returning errors instead of dying merge-recursive: handle return values indicating errors merge-recursive: allow write_tree_from_memory() to error out merge-recursive: avoid returning a wholesale struct merge_recursive: abort properly upon errors prepare the builtins for a libified merge_recursive() merge-recursive: clarify code in was_tracked() die(_("BUG")): avoid translating bug messages die("bug"): report bugs consistently t5520: verify that `pull --rebase` shows the helpful advice when failing
2016-08-04Merge branch 'jc/grep-commandline-vs-configuration'Libravatar Junio C Hamano1-11/+11
"git -c grep.patternType=extended log --basic-regexp" misbehaved because the internal API to access the grep machinery was not designed well. * jc/grep-commandline-vs-configuration: grep: further simplify setting the pattern type
2016-07-26die("bug"): report bugs consistentlyLibravatar Johannes Schindelin1-4/+4
The vast majority of error messages in Git's source code which report a bug use the convention to prefix the message with "BUG:". As part of cleaning up merge-recursive to stop die()ing except in case of detected bugs, let's just make the remainder of the bug reports consistent with the de facto rule. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-25grep: further simplify setting the pattern typeLibravatar Junio C Hamano1-11/+11
When c5c31d33 (grep: move pattern-type bits support to top-level grep.[ch], 2012-10-03) introduced grep_commit_pattern_type() helper function, the intention was to allow the users of grep API to having to fiddle only with .pattern_type_option (which can be set to "fixed", "basic", "extended", and "pcre"), and then immediately before compiling the pattern strings for use, call grep_commit_pattern_type() to have it prepare various bits in the grep_opt structure (like .fixed, .regflags, etc.). However, grep_set_pattern_type_option() helper function the grep API internally uses were left as an external function by mistake. This function shouldn't have been made callable by the users of the API. Later when the grep API was used in revision traversal machinery, the caller then mistakenly started calling the function around 34a4ae55 (log --grep: use the same helper to set -E/-F options as "git grep", 2012-10-03), instead of setting the .pattern_type_option field and letting the grep_commit_pattern_type() to take care of the details. This caused an unnecessary bug that made a configured grep.patternType take precedence over the command line options (e.g. --basic-regexp, --fixed-strings) in "git log" family of commands. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-19Merge branch 'nd/icase'Libravatar Junio C Hamano1-7/+57
"git grep -i" has been taught to fold case in non-ascii locales correctly. * nd/icase: grep.c: reuse "icase" variable diffcore-pickaxe: support case insensitive match on non-ascii diffcore-pickaxe: Add regcomp_or_die() grep/pcre: support utf-8 gettext: add is_utf8_locale() grep/pcre: prepare locale-dependent tables for icase matching grep: rewrite an if/else condition to avoid duplicate expression grep/icase: avoid kwsset when -F is specified grep/icase: avoid kwsset on literal non-ascii strings test-regex: expose full regcomp() to the command line test-regex: isolate the bug test code grep: break down an "if" stmt in preparation for next changes
2016-07-01grep.c: reuse "icase" variableLibravatar Nguyễn Thái Ngọc Duy1-4/+1
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01grep/pcre: support utf-8Libravatar Nguyễn Thái Ngọc Duy1-0/+2
In the previous change in this function, we add locale support for single-byte encodings only. It looks like pcre only supports utf-* as multibyte encodings, the others are left in the cold (which is fine). We need to enable PCRE_UTF8 so pcre can find character boundary correctly. It's needed for case folding (when --ignore-case is used) or '*', '+' or similar syntax is used. The "has_non_ascii()" check is to be on the conservative side. If there's non-ascii in the pattern, the searched content could still be in utf-8, but we can treat it just like a byte stream and everything should work. If we force utf-8 based on locale only and pcre validates utf-8 and the file content is in non-utf8 encoding, things break. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Helped-by: Plamen Totev <plamen.totev@abv.bg> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01grep/pcre: prepare locale-dependent tables for icase matchingLibravatar Nguyễn Thái Ngọc Duy1-2/+6
The default tables are usually built with C locale and only suitable for LANG=C or similar. This should make case insensitive search work correctly for all single-byte charsets. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01grep: rewrite an if/else condition to avoid duplicate expressionLibravatar Nguyễn Thái Ngọc Duy1-4/+1
"!icase || ascii_only" is repeated twice in this if/else chain as this series evolves. Rewrite it (and basically revert the first if condition back to before the "grep: break down an "if" stmt..." commit). Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01grep/icase: avoid kwsset when -F is specifiedLibravatar Nguyễn Thái Ngọc Duy1-1/+44
Similar to the previous commit, we can't use kws on icase search outside ascii range. But we can't simply pass the pattern to regcomp/pcre like the previous commit because it may contain regex special characters, so we need to quote the regex first. To avoid misquote traps that could lead to undefined behavior, we always stick to basic regex engine in this case. We don't need fancy features for grepping a literal string anyway. basic_regex_quote_buf() assumes that if the pattern is in a multibyte encoding, ascii chars must be unambiguously encoded as single bytes. This is true at least for UTF-8. For others, let's wait until people yell up. Chances are nobody uses multibyte, non utf-8 charsets anymore. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Helped-by: René Scharfe <l.s.r@web.de> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-27grep/icase: avoid kwsset on literal non-ascii stringsLibravatar Nguyễn Thái Ngọc Duy1-1/+6
When we detect the pattern is just a literal string, we avoid heavy regex engine and use fast substring search implemented in kwsset.c. But kws uses git-ctype which is locale-independent so it does not know how to fold case properly outside ascii range. Let regcomp or pcre take care of this case instead. Slower, but accurate. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Helped-by: René Scharfe <l.s.r@web.de> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-27grep: break down an "if" stmt in preparation for next changesLibravatar Nguyễn Thái Ngọc Duy1-1/+3
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-20Merge branch 'rs/xdiff-hunk-with-func-line'Libravatar Junio C Hamano1-2/+26
"git show -W" (extend hunks to cover the entire function, delimited by lines that match the "funcname" pattern) used to show the entire file when a change added an entire function at the end of the file, which has been fixed. * rs/xdiff-hunk-with-func-line: xdiff: fix merging of appended hunk with -W grep: -W: don't extend context to trailing empty lines t7810: add test for grep -W and trailing empty context lines xdiff: don't trim common tail with -W xdiff: -W: don't include common trailing empty lines in context xdiff: ignore empty lines before added functions with -W xdiff: handle appended chunks better with -W xdiff: factor out match_func_rec() t4051: rewrite, add more tests
2016-05-31grep: -W: don't extend context to trailing empty linesLibravatar René Scharfe1-2/+26
Empty lines between functions are shown by grep -W, as it considers them to be part of the function preceding them. They are not interesting in most languages. The previous patches stopped showing them for diff -W. Stop showing empty lines trailing a function with grep -W. Grep scans the lines of a buffer from top to bottom and prints matching lines immediately. Thus we need to peek ahead in order to determine if an empty line is part of a function body and worth showing or not. Remember how far ahead we peeked in order to avoid having to do so repeatedly when handling multiple consecutive empty lines. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-09grep.c: use error_errno()Libravatar Nguyễn Thái Ngọc Duy1-2/+2
While at there, improve the error message a bit (what operation failed?) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22use xmallocz to avoid size arithmeticLibravatar Jeff King1-2/+1
We frequently allocate strings as xmalloc(len + 1), where the extra 1 is for the NUL terminator. This can be done more simply with xmallocz, which also checks for integer overflow. There's no case where switching xmalloc(n+1) to xmallocz(n) is wrong; the result is the same length, and malloc made no guarantees about what was in the buffer anyway. But in some cases, we can stop manually placing NUL at the end of the allocated buffer. But that's only safe if it's clear that the contents will always fill the buffer. In each case where this patch does so, I manually examined the control flow, and I tried to err on the side of caution. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05color: add color_set helper for copying raw colorsLibravatar Jeff King1-16/+16
To set up default colors, we sometimes strcpy() from the default string literals into our color buffers. This isn't a bug (assuming the destination is COLOR_MAXLEN bytes), but makes it harder to audit the code for problematic strcpy calls. Let's introduce a color_set which copies under the assumption that there are COLOR_MAXLEN bytes in the destination (of course you can call it on a smaller buffer, so this isn't providing a huge amount of safety, but it's more convenient than calling xsnprintf yourself). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25grep: use xsnprintf to format failure messageLibravatar Jeff King1-2/+2
This looks at first glance like the sprintf can overflow our buffer, but it's actually fine; the p->origin string is something constant and small, like "command line" or "-e option". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-11Merge branch 'jk/blame-commit-label'Libravatar Junio C Hamano1-2/+2
"git blame HEAD -- missing" failed to correctly say "HEAD" when it tried to say "No such path 'missing' in HEAD". * jk/blame-commit-label: blame.c: fix garbled error message use xstrdup_or_null to replace ternary conditionals builtin/commit.c: use xstrdup_or_null instead of envdup builtin/apply.c: use xstrdup_or_null instead of null_strdup git-compat-util: add xstrdup_or_null helper
2015-01-13use xstrdup_or_null to replace ternary conditionalsLibravatar Jeff King1-2/+2
This replaces "x ? xstrdup(x) : NULL" with xstrdup_or_null(x). The change is fairly mechanical, with the exception of resolve_refdup, which can eliminate a temporary variable. There are still a few hits grepping for "?.*xstrdup", but these are of slightly different forms and cannot be converted (e.g., "x ? xstrdup(x->foo) : NULL"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-31Merge branch 'rs/grep-color-words'Libravatar Junio C Hamano1-7/+22
Allow painting or not painting (partial) matches in context lines when showing "grep -C<num>" output in color. * rs/grep-color-words: grep: add color.grep.matchcontext and color.grep.matchselected
2014-10-28grep: add color.grep.matchcontext and color.grep.matchselectedLibravatar René Scharfe1-7/+22
The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-14color_parse: do not mention variable name in error messageLibravatar Jeff King1-1/+1
Originally the color-parsing function was used only for config variables. It made sense to pass the variable name so that the die() message could be something like: $ git -c color.branch.plain=bogus branch fatal: bad color value 'bogus' for variable 'color.branch.plain' These days we call it in other contexts, and the resulting error messages are a little confusing: $ git log --pretty='%C(bogus)' fatal: bad color value 'bogus' for variable '--pretty format' $ git config --get-color foo.bar bogus fatal: bad color value 'bogus' for variable 'command line' This patch teaches color_parse to complain only about the value, and then return an error code. Config callers can then propagate that up to the config parser, which mentions the variable name. Other callers can provide a custom message. After this patch these three cases now look like: $ git -c color.branch.plain=bogus branch error: invalid color value: bogus fatal: unable to parse 'color.branch.plain' from command-line config $ git log --pretty='%C(bogus)' error: invalid color value: bogus fatal: unable to parse --pretty format $ git config --get-color foo.bar bogus error: invalid color value: bogus fatal: unable to parse default color value Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-03Merge branch 'as/grep-fullname-config'Libravatar Junio C Hamano1-0/+5
Add a configuration variable to force --full-name to be default for "git grep". This may cause regressions on scripted users that do not expect this new behaviour. * as/grep-fullname-config: grep: add grep.fullName config variable
2014-03-20grep: add grep.fullName config variableLibravatar Andreas Schwab1-0/+5
This configuration variable sets the default for the --full-name option. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-18Merge branch 'rs/grep-h-c'Libravatar Junio C Hamano1-2/+5
"git grep" learns to handle combination of "-h (no header)" and "-c (counts)". * rs/grep-h-c: grep: support -h (no header) with --count t7810: add missing variables to tests in loop
2014-03-11grep: support -h (no header) with --countLibravatar René Scharfe1-2/+5
Suppress printing the header (filename) with -h even if in -c/--count mode. GNU grep and OpenBSD's grep do the same. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-06Use hashcpy() when copying object namesLibravatar Sun He1-1/+1
We invented hashcpy() to keep the abstraction of "object name" behind it. Use it instead of calling memcpy() with hard-coded 20-byte length when moving object names between pieces of memory. Leave ppc/sha1.c as-is, because the function is about the SHA-1 hash algorithm whose output is and will always be 20 bytes. Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Helped-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Sun He <sunheehnus@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-10grep: allow to use textconv filtersLibravatar Jeff King1-14/+86
Recently and not so recently, we made sure that log/grep type operations use textconv filters when a userfacing diff would do the same: ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28) b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28) 0508fe5 (combine-diff: respect textconv attributes, 2011-05-23) "git grep" currently does not use textconv filters at all, that is neither for displaying the match and context nor for the actual grepping, even when requested by --textconv. Introduce an option "--textconv" which makes git grep use any configured textconv filters for grepping and output purposes. It is off by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-25fix clang -Wtautological-compare with unsigned enumLibravatar Antoine Pelisse1-1/+2
Create a GREP_HEADER_FIELD_MIN so we can check that the field value is sane and silence the clang warning. Clang warning happens because the enum is unsigned (this is implementation-defined, and there is no negative fields) and the check is then tautological. Signed-off-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: John Keeping <john@keeping.me.uk> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-29Merge branch 'nd/grep-true-path'Libravatar Jeff King1-3/+8
"git grep -e pattern <tree>" asked the attribute system to read "<tree>:.gitattributes" file in the working tree, which was nonsense. * nd/grep-true-path: grep: stop looking at random places for .gitattributes
2012-10-12grep: stop looking at random places for .gitattributesLibravatar Nguyễn Thái Ngọc Duy1-3/+8
grep searches for .gitattributes using "name" field in struct grep_source but that field is not real on-disk path name. For example, "grep pattern rev" fills the field with "rev:path", and Git looks for .gitattributes in the (non-existent but exploitable) path "rev:path" instead of "path". This patch passes real paths down to grep_source_load_driver() when: - grep on work tree - grep on the index - grep a commit (or a tag if it points to a commit) so that these cases look up .gitattributes at proper paths. .gitattributes lookup is disabled in all other cases. Initial-work-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>