Age | Commit message (Collapse) | Author | Files | Lines |
|
We used to call regfree() after regcomp() failed in some codepaths,
which have been corrected.
* ma/regex-no-regfree-after-comp-fail:
regex: do not call `regfree()` if compilation fails
|
|
It is apparently undefined behavior to call `regfree()` on a regex where
`regcomp()` failed. The language in [1] is a bit muddy, at least to me,
but the clearest hint is this (`preg` is the `regex_t *`):
Upon successful completion, the regcomp() function shall return 0.
Otherwise, it shall return an integer value indicating an error as
described in <regex.h>, and the content of preg is undefined.
Funnily enough, there is also the `regerror()` function which should be
given a pointer to such a "failed" `regex_t` -- the content of which
would supposedly be undefined -- and which may investigate it to come up
with a detailed error message.
In any case, the example in that document shows how `regfree()` is not
called after `regcomp()` fails.
We have quite a few users of this API and most get this right. These
three users do not.
Several implementations can handle this just fine [2] and these code paths
supposedly have not wreaked havoc or we'd have heard about it. (These
are all in code paths where git got bad input and is just about to die
anyway.) But let's just avoid the issue altogether.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html
[2] https://www.redhat.com/archives/libvir-list/2013-September/msg00262.html
Researched-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-byi Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae55
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.
Let's just convert all remaining ones in one fell swoop.
This trick was performed by this invocation:
sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add documentation explaining the functions in color.h.
While at it, migrate the function `color_set` into grep.c,
where the only callers are.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git grep" compiled with libpcre2 sometimes triggered a segfault,
which is being fixed.
* ab/pcre2-grep:
grep: fix segfault under -P + PCRE2 <=10.30 + (*NO_JIT)
test-lib: add LIBPCRE1 & LIBPCRE2 prerequisites
|
|
Fix a bug in the compilation of PCRE2 patterns under JIT (the most
common runtime configuration). Any pattern with a (*NO_JIT) verb would
segfault in any currently released PCRE2 version:
$ git grep -P '(*NO_JIT)hi.*there'
Segmentation fault
That this segfaulted was a bug in PCRE2 itself, after reporting it[1]
on pcre-dev it's been fixed in a yet-to-be-released version of
PCRE (presumably released first as 10.31). Now it'll die with:
$ git grep -P '(*NO_JIT)hi.*there'
fatal: pcre2_jit_match failed with error code -45: bad JIT option
But the cause of the bug is in our own code dating back to my
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01).
As explained at more length in the comment being added here, it isn't
sufficient to just check pcre2_config() to see whether the JIT should
be used, pcre2_pattern_info() also has to be asked.
This is something I discovered myself when fiddling around with PCRE2
verbs in patterns passed to git. I don't expect that any user of git
has encountered this given the obscurity of passing PCRE2 verbs
through to the library, along with the relative obscurity of (*NO_JIT)
itself.
1. "How am I supposed to use PCRE2 JIT in the face of (*NO_JIT) ?"
(<CACBZZX5mMqDuWuFmi7sRBp3wH6CFyd-ghACukd=v0NN=rBMnJg@mail.gmail.com> &
https://lists.exim.org/lurker/thread/20171123.101502.7f0d38ca.en.html)
on the pcre-dev mailing list
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Non-empty lines before a function definition are most likely comments
for that function and thus relevant. Include them in function context.
Such a non-empty line might also belong to the preceding function if
there is no separating blank line. Stop extending the context upwards
also at the next function line to make sure only one extra function body
is shown at most.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Function context can be bigger than -A/-B/-C context. To find the
beginning of the combined context we search backwards. Currently we
check at each loop iteration what we're looking for and determine the
effective upper boundary based on that.
Simplify this a bit by setting the variable "from" to the lowest unshown
line number up front if we're looking for a function line and set it
back to the required -B/-C context line number when we find one. This
prepares the ground for the next patch; no functional change intended.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Building with NO_LIBPCRE1_JIT did not disable it, which has been fixed.
* ab/pcre-v2:
grep: fix NO_LIBPCRE1_JIT to fully disable JIT
|
|
If you have a pcre1 library which is compiled with JIT enabled then
PCRE_STUDY_JIT_COMPILE will be defined whether or not the
NO_LIBPCRE1_JIT configuration is set.
This means that we enable JIT functionality when calling pcre_study
even if NO_LIBPCRE1_JIT has been explicitly set and we just use plain
pcre_exec later.
Fix this by using own macro (GIT_PCRE_STUDY_JIT_COMPILE) which we set to
PCRE_STUDY_JIT_COMPILE only if NO_LIBPCRE1_JIT is not set and define to
0 otherwise, as before.
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git grep -L" and "git grep --quiet -L" reported different exit
codes; this has been corrected.
* as/grep-quiet-no-match-exit-code-fix:
git-grep: correct exit code with --quiet and -L
|
|
The handling of `status_only` no longer interferes with the handling of
`unmatch_name_only`. `--quiet` no longer affects the exit code when using
`-L`/`--files-without-match`.
Signed-off-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert grep to use 'struct repository' which enables recursing into
submodules to be handled in-process.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove the redundant REG_NEWLINE regcomp() flag from the code that
compiles a fixed-string regular-expression.
The REG_NEWLINE causes metacharacters such as "." to match a newline,
since the basic_regex_quote_buf() function being called here escapes
all metacharacters using REG_NEWLINE is confusing and redundant.
The use of this flag was introduced as an unintended emergent property
of 793dc676e0 ("grep/icase: avoid kwsset when -F is specified",
2016-06-25).
That change amended the existing regflags, which were initialized to
REG_NEWLINE in init_grep_defaults() assuming a subsequent non-fixed
regcomp().
Manual testing reveals that this was always redundant, since no flags
of any use were inherited from opt->regflags even back
then. 793dc676e0 passes all tests with this on top:
diff --git a/grep.c b/grep.c
index 627ae3e3e8..89e84ed7fd 100644
--- a/grep.c
+++ b/grep.c
@@ -407,3 +407,3 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
basic_regex_quote_buf(&sb, p->pattern);
- regflags = opt->regflags & ~REG_EXTENDED;
+ regflags = 0;
if (opt->ignore_case)
Since this isn't used for anything and never was, remove it to reduce
confusion when reading this code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor calls to the grep machinery to always pass opt.ignore_case &
opt.extended_regexp_option instead of setting the equivalent regflags
bits.
The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log:
make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was
really just plastering over the code smell which this change fixes.
The reason for adding the extensive commentary here is that I
discovered some subtle complexity in implementing this that really
should be called out explicitly to future readers.
Before this change we'd rely on the difference between
`extended_regexp_option` and `regflags` to serve as a membrane between
our preliminary parsing of grep.extendedRegexp and grep.patternType,
and what we decided to do internally.
Now that those two are the same thing, it's necessary to unset
`extended_regexp_option` just before we commit in cases where both of
those config variables are set. See 84befcd0a4 ("grep: add a
grep.patternType configuration setting", 2012-08-03) for the code and
documentation related to that.
The explanation of why the if/else branches in
grep_commit_pattern_type() are ordered the way they are exists in that
commit message, but I think it's worth calling this subtlety out
explicitly with a comment for future readers.
Even though grep_commit_pattern_type() is the only caller of
grep_set_pattern_type_option() it's simpler to reset the
extended_regexp_option flag in the latter, since 2/3 branches in the
former would otherwise need to reset it, this way we can do it in one
place.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove the redundant re-assignments of the fixed/pcre1/pcre2 fields to
zero right after the entire struct has been set to zero via
memset(...).
See an earlier related cleanup commit e0b9f8ae09 ("grep: remove
redundant regflags assignments", 2017-05-25) for an explanation of why
the code was structured like this to begin with.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove the redundant re-assignment of the fixed field to zero right
after the entire struct has been set to zero via memset(...).
Unlike some nearby commits this pattern doesn't date back to the
pattern described in e0b9f8ae09 ("grep: remove redundant regflags
assignments", 2017-05-25), instead it was apparently cargo-culted in
9eceddeec6 ("Use kwset in grep", 2011-08-21).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Adjust a now-redundant assignment to extended_regexp_option to make it
zero if grep.extendedRegexp is not set. This is always called right
after init_grep_defaults() which memsets the entire structure to 0, so
there's no need to set it again to zero.
However the reason for the if/else pattern is a holdover from[1] where
this was adjusted from a bitfield assignment to a boolean. Rather than
getting rid of the assignment to 0 in all cases, let's just use the
value returned by git_config_bool(), which is more idiomatic and in
sync with the rest of the boolean handling in this function.
This is a logical follow-up to my commit to remove redundant regflags
assignments[2]. This logic was originally introduced in [3], but as
explained in the former commit it's working around a pattern in our
code that no longer exists, and is now confusing as it leads the
reader to think that this needs to be flipped back & forth.
1. 84befcd0a4 ("grep: add a grep.patternType configuration setting",
2012-08-03)
2. e0b9f8ae09 ("grep: remove redundant regflags assignments",
2017-05-25)
3. b22520a37c ("grep: allow -E and -n to be turned on by default via
configuration", 2011-03-30)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop assigning 0 to the extended_regexp_option field right after we've
zeroed out the entire struct with memset() just a few lines earlier.
Unlike some of the code being refactored in subsequent commits, this
was always completely redundant. See the original code introduced in
84befcd0a4 ("grep: add a grep.patternType configuration setting",
2012-08-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
|
|
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
|
|
Hotfix for a topic already in 'master'.
* ab/pcre-v2:
grep: fix erroneously copy/pasted variable in check/assert pattern
|
|
Fix an erroneously copy/pasted check for the pcre2_jit_stack variable
to check pcre2_match_context instead. The former was already checked
in the preceding "if" statement.
This is a trivial and obvious error introduced in my commit
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01).
In practice if pcre2_match_context_create() returned NULL we were
likely in a situation where malloc() was returning NULL, and were thus
screwed anyway, but if only the pcre2_match_context_create() call
returned NULL (through some transitory bug) PCRE v2 would just
allocate and supply its own context object when matching, and we'd run
normally at the trivial expense of not getting a slight speedup by
sharing the context object between successive matches.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Conversion from uchar[20] to struct object_id continues.
* bw/object-id: (33 commits)
diff: rename diff_fill_sha1_info to diff_fill_oid_info
diffcore-rename: use is_empty_blob_oid
tree-diff: convert path_appendnew to object_id
tree-diff: convert diff_tree_paths to struct object_id
tree-diff: convert try_to_follow_renames to struct object_id
builtin/diff-tree: cleanup references to sha1
diff-tree: convert diff_tree_sha1 to struct object_id
notes-merge: convert write_note_to_worktree to struct object_id
notes-merge: convert verify_notes_filepair to struct object_id
notes-merge: convert find_notes_merge_pair_ps to struct object_id
notes-merge: convert merge_from_diffs to struct object_id
notes-merge: convert notes_merge* to struct object_id
tree-diff: convert diff_root_tree_sha1 to struct object_id
combine-diff: convert find_paths_* to struct object_id
combine-diff: convert diff_tree_combined to struct object_id
diff: convert diff_flush_patch_id to struct object_id
patch-ids: convert to struct object_id
diff: finish conversion for prepare_temp_file to struct object_id
diff: convert reuse_worktree_file to struct object_id
diff: convert fill_filespec to struct object_id
...
|
|
Replace occurrences of `free(ptr); ptr = NULL` which weren't caught by
the coccinelle rule. These fall into two categories:
- free/NULL assignments one after the other which coccinelle all put
on one line, which is functionally equivalent code, but very ugly.
- manually spotted occurrences where the NULL assignment isn't right
after the free() call.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert the remaining parts of grep to use struct object_id.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].
The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.
Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.
With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.
However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.
Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:
$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~5 HEAD~ HEAD p7820-grep-engines.sh
[...]
Test HEAD~5 HEAD~ HEAD
-----------------------------------------------------------------------------------------------------------------
7820.3: perl grep 'how.to' 0.31(1.10+0.48) 0.21(0.35+0.56) -32.3% 0.21(0.34+0.55) -32.3%
7820.7: perl grep '^how to' 0.56(2.70+0.40) 0.24(0.64+0.52) -57.1% 0.20(0.28+0.60) -64.3%
7820.11: perl grep '[how] to' 0.56(2.66+0.38) 0.29(0.95+0.45) -48.2% 0.23(0.45+0.54) -58.9%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.02(5.77+0.42) 0.31(1.02+0.54) -69.6% 0.23(0.50+0.54) -77.5%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.38(1.57+0.42) 0.27(0.85+0.46) -28.9% 0.21(0.33+0.57) -44.7%
See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.
Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.
For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:
[...]
Test HEAD~ HEAD
----------------------------------------------------------------------------------------
7820.3: perl grep 'how.to' 0.21(0.42+0.52) 0.21(0.31+0.58) +0.0%
7820.7: perl grep '^how to' 0.25(0.65+0.50) 0.20(0.31+0.57) -20.0%
7820.11: perl grep '[how] to' 0.30(0.90+0.50) 0.23(0.46+0.53) -23.3%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.30(1.19+0.38) 0.23(0.51+0.51) -23.3%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.27(0.84+0.48) 0.21(0.34+0.57) -22.2%
I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.
A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.
See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.
1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.32.
The JIT support was added in version 8.20 released on 2011-10-21, but
it wasn't until 8.32 released on 2012-11-30 that the fast code path to
use the JIT via pcre_jit_exec() was added[1] (see also [2]).
This means that versions 8.20 through 8.31 could still use the JIT,
but supporting it on those versions would add to the already verbose
macro soup around JIT support it, and I don't expect that the use-case
of compiling a brand new git against a 5 year old PCRE is particularly
common, and if someone does that they can just get the existing
pre-JIT slow codepath.
So just take the easy way out and disable the JIT on any version older
than 8.32.
The reason this change isn't part of the initial change PCRE JIT
support is to have a cleaner history showing which parts of the
implementation are only used for ancient PCRE versions. This also
makes it easier to revert this change if we ever decide to stop
supporting those old versions.
1. http://www.pcre.org/original/changelog.txt ("28. Introducing a
native interface for JIT. Through this interface, the
compiled[...]")
2. https://bugs.exim.org/show_bug.cgi?id=2121
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.
Enabling JIT support usually improves performance by more than
40%. The pattern compilation times are relatively slower, but those
relative numbers are tiny, and are easily made back in all but the
most trivial cases of grep. Detailed benchmarks & overview of
compilation times is at: http://sljit.sourceforge.net/pcre.html
With this change the difference in a t/perf/p7820-grep-engines.sh run
is, with just the /perl/ tests shown:
$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh
Test HEAD~ HEAD
---------------------------------------------------------------------------------------
7820.3: perl grep 'how.to' 0.35(1.11+0.43) 0.23(0.42+0.46) -34.3%
7820.7: perl grep '^how to' 0.64(2.71+0.36) 0.27(0.66+0.44) -57.8%
7820.11: perl grep '[how] to' 0.63(2.51+0.42) 0.33(0.98+0.39) -47.6%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.17(5.61+0.35) 0.34(1.08+0.46) -70.9%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.43(1.52+0.44) 0.30(0.88+0.42) -30.2%
The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.
The implementation is relatively verbose because even if
PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine
if the JIT is available, and if so the faster pcre_jit_exec() function
should be called instead of pcre_exec(), and a different (but not
complimentary!) function needs to be called to free pcre1_extra_info.
There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will simply abort. I don't think
this is worth handling gracefully, it'll only fail in cases where
malloc() doesn't work, in which case we're screwed anyway.
That there's no assignment of `p->pcre1_jit_on = 0` when
PCRE_CONFIG_JIT isn't defined isn't a bug. The create_grep_pat()
function allocates the grep_pat allocates it with calloc(), so it's
guaranteed to be 0 when PCRE_CONFIG_JIT isn't defined.
I you're bisecting and find this change, check that your PCRE isn't
older than 8.32. This change intentionally broke really old versions
of PCRE, but that's fixed in follow-up commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the is_fixed() function which are currently only used in
compile_regexp() earlier so it can be used in the PCRE family of
functions in a later change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the internal PCRE variable & function names to have a "1"
suffix. This is for preparation for libpcre2 support, where having
non-versioned names would be confusing.
An earlier change in this series ("grep: change the internal PCRE
macro names to be PCRE1", 2017-04-07) elaborates on the motivations
behind this change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the internal USE_LIBPCRE define, & build options flag to use a
naming convention ending in PCRE1, without changing the long-standing
USE_LIBPCRE Makefile flag which enables this code.
This is for preparation for libpcre2 support where having things like
USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
need to for backwards compatibility with old Makefile arguments would
be confusing.
In some ways it would be better to change everything that now uses
USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
build scripts.
However I'd like to leave the door open to making
USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
the default.
This code and the USE_LIBPCRE Makefile argument was added in commit
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was
no indication that the PCRE project would release an entirely new &
incompatible API around 3 years later.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Factor the test for \0 in grep patterns into a function. Since commit
9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a
\0 is considered fixed as regcomp() can't handle it.
This change makes later changes that make use of either has_null() or
is_fixed() (but not both) smaller.
While I'm at it make the comment conform to the style guide, i.e. add
an opening "/*\n".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove redundant assignments to the "regflags" variable. This variable
is only used set under GREP_PATTERN_TYPE_ERE, so there's no need to
un-set it under GREP_PATTERN_TYPE_{FIXED,BRE,PCRE}.
Back in 5010cb5fcc[1], we did do "opt.regflags &= ~REG_EXTENDED" upon
seeing "-G" on the command line and flipped the bit on upon seeing
"-E", but I think that was perfectly sensible and it would have been a
bug if we didn't. They were part of the command line parsing that
could have seen "-E" on the command line earlier.
When cca2c172 ("git-grep: do not die upon -F/-P when
grep.extendedRegexp is set.", 2011-05-09) switched the command line
parsing to "read into a 'tentatively this is what we saw the last'
variable and then finally commit just once", we didn't touch
opt.regflags for PCRE and FIXED, but we still had to flip regflags
between BRE and ERE, because parsing of grep.extendedregexp
configuration variable directly touched opt.regflags back then, which
was done by b22520a3 ("grep: allow -E and -n to be turned on by
default via configuration", 2011-03-30).
When 84befcd0 ("grep: add a grep.patternType configuration setting",
2012-08-03) introduced extended_regexp_option field, we stopped
flipping regflags while reading the configuration, and that was when
we should have noticed and stopped dropping REG_EXTENDED bit in the
"now we can commit what type to use" helper function.
There is no reason to do this anymore, so stop doing it, more to
reduce "wait this is used under fixed/BRE/PCRE how?" confusion when
reading the code, than to to save ourselves trivial CPU cycles by
removing one assignment.
1. "built-in "git grep"", 2006-04-30.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These calls to snprintf should always succeed, because their
input is small and fixed. Let's use xsnprintf to make sure
this is the case (and to make auditing for actual truncation
easier).
These could be candidates for turning into heap buffers, but
they fall into a few broad categories that make it not worth
doing:
- formatting single numbers is simple enough that we can
see the result should fit
- the size of a sha1 is likewise well-known, and I didn't
want to cause unnecessary conflicts with the ongoing
process to convert these constants to GIT_MAX_HEXSZ
- the interface for curl_errorstr is dictated by curl
Signed-off-by: Jeff King <peff@peff.net>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.
When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd). If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region. This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.
* js/regexec-buf:
regex: use regexec_buf()
regex: add regexec_buf() that can work on a non NUL-terminated string
regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
|
|
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.
We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).
Note: the original motivation for this patch was to fix a bug where
`git diff -G <regex>` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3). By converting them to use regexec_buf(), the code has
become much cleaner.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git am -3" calls "git merge-recursive" when it needs to fall back
to a three-way merge; this call has been turned into an internal
subroutine call instead of spawning a separate subprocess.
* js/am-3-merge-recursive-direct:
merge-recursive: flush output buffer even when erroring out
merge_trees(): ensure that the callers release output buffer
merge-recursive: offer an option to retain the output in 'obuf'
merge-recursive: write the commit title in one go
merge-recursive: flush output buffer before printing error messages
am -3: use merge_recursive() directly again
merge-recursive: switch to returning errors instead of dying
merge-recursive: handle return values indicating errors
merge-recursive: allow write_tree_from_memory() to error out
merge-recursive: avoid returning a wholesale struct
merge_recursive: abort properly upon errors
prepare the builtins for a libified merge_recursive()
merge-recursive: clarify code in was_tracked()
die(_("BUG")): avoid translating bug messages
die("bug"): report bugs consistently
t5520: verify that `pull --rebase` shows the helpful advice when failing
|
|
"git -c grep.patternType=extended log --basic-regexp" misbehaved
because the internal API to access the grep machinery was not
designed well.
* jc/grep-commandline-vs-configuration:
grep: further simplify setting the pattern type
|
|
The vast majority of error messages in Git's source code which report a
bug use the convention to prefix the message with "BUG:".
As part of cleaning up merge-recursive to stop die()ing except in case of
detected bugs, let's just make the remainder of the bug reports consistent
with the de facto rule.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When c5c31d33 (grep: move pattern-type bits support to top-level
grep.[ch], 2012-10-03) introduced grep_commit_pattern_type() helper
function, the intention was to allow the users of grep API to having
to fiddle only with .pattern_type_option (which can be set to "fixed",
"basic", "extended", and "pcre"), and then immediately before compiling
the pattern strings for use, call grep_commit_pattern_type() to have
it prepare various bits in the grep_opt structure (like .fixed,
.regflags, etc.).
However, grep_set_pattern_type_option() helper function the grep API
internally uses were left as an external function by mistake. This
function shouldn't have been made callable by the users of the API.
Later when the grep API was used in revision traversal machinery,
the caller then mistakenly started calling the function around
34a4ae55 (log --grep: use the same helper to set -E/-F options as
"git grep", 2012-10-03), instead of setting the .pattern_type_option
field and letting the grep_commit_pattern_type() to take care of the
details.
This caused an unnecessary bug that made a configured
grep.patternType take precedence over the command line options
(e.g. --basic-regexp, --fixed-strings) in "git log" family of
commands.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git grep -i" has been taught to fold case in non-ascii locales
correctly.
* nd/icase:
grep.c: reuse "icase" variable
diffcore-pickaxe: support case insensitive match on non-ascii
diffcore-pickaxe: Add regcomp_or_die()
grep/pcre: support utf-8
gettext: add is_utf8_locale()
grep/pcre: prepare locale-dependent tables for icase matching
grep: rewrite an if/else condition to avoid duplicate expression
grep/icase: avoid kwsset when -F is specified
grep/icase: avoid kwsset on literal non-ascii strings
test-regex: expose full regcomp() to the command line
test-regex: isolate the bug test code
grep: break down an "if" stmt in preparation for next changes
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the previous change in this function, we add locale support for
single-byte encodings only. It looks like pcre only supports utf-* as
multibyte encodings, the others are left in the cold (which is
fine).
We need to enable PCRE_UTF8 so pcre can find character boundary
correctly. It's needed for case folding (when --ignore-case is used)
or '*', '+' or similar syntax is used.
The "has_non_ascii()" check is to be on the conservative side. If
there's non-ascii in the pattern, the searched content could still be
in utf-8, but we can treat it just like a byte stream and everything
should work. If we force utf-8 based on locale only and pcre validates
utf-8 and the file content is in non-utf8 encoding, things break.
Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|