summaryrefslogtreecommitdiff
path: root/grep.c
AgeCommit message (Collapse)AuthorFilesLines
2016-07-19Merge branch 'nd/icase'Libravatar Junio C Hamano1-7/+57
"git grep -i" has been taught to fold case in non-ascii locales correctly. * nd/icase: grep.c: reuse "icase" variable diffcore-pickaxe: support case insensitive match on non-ascii diffcore-pickaxe: Add regcomp_or_die() grep/pcre: support utf-8 gettext: add is_utf8_locale() grep/pcre: prepare locale-dependent tables for icase matching grep: rewrite an if/else condition to avoid duplicate expression grep/icase: avoid kwsset when -F is specified grep/icase: avoid kwsset on literal non-ascii strings test-regex: expose full regcomp() to the command line test-regex: isolate the bug test code grep: break down an "if" stmt in preparation for next changes
2016-07-01grep.c: reuse "icase" variableLibravatar Nguyễn Thái Ngọc Duy1-4/+1
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01grep/pcre: support utf-8Libravatar Nguyễn Thái Ngọc Duy1-0/+2
In the previous change in this function, we add locale support for single-byte encodings only. It looks like pcre only supports utf-* as multibyte encodings, the others are left in the cold (which is fine). We need to enable PCRE_UTF8 so pcre can find character boundary correctly. It's needed for case folding (when --ignore-case is used) or '*', '+' or similar syntax is used. The "has_non_ascii()" check is to be on the conservative side. If there's non-ascii in the pattern, the searched content could still be in utf-8, but we can treat it just like a byte stream and everything should work. If we force utf-8 based on locale only and pcre validates utf-8 and the file content is in non-utf8 encoding, things break. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Helped-by: Plamen Totev <plamen.totev@abv.bg> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01grep/pcre: prepare locale-dependent tables for icase matchingLibravatar Nguyễn Thái Ngọc Duy1-2/+6
The default tables are usually built with C locale and only suitable for LANG=C or similar. This should make case insensitive search work correctly for all single-byte charsets. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01grep: rewrite an if/else condition to avoid duplicate expressionLibravatar Nguyễn Thái Ngọc Duy1-4/+1
"!icase || ascii_only" is repeated twice in this if/else chain as this series evolves. Rewrite it (and basically revert the first if condition back to before the "grep: break down an "if" stmt..." commit). Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01grep/icase: avoid kwsset when -F is specifiedLibravatar Nguyễn Thái Ngọc Duy1-1/+44
Similar to the previous commit, we can't use kws on icase search outside ascii range. But we can't simply pass the pattern to regcomp/pcre like the previous commit because it may contain regex special characters, so we need to quote the regex first. To avoid misquote traps that could lead to undefined behavior, we always stick to basic regex engine in this case. We don't need fancy features for grepping a literal string anyway. basic_regex_quote_buf() assumes that if the pattern is in a multibyte encoding, ascii chars must be unambiguously encoded as single bytes. This is true at least for UTF-8. For others, let's wait until people yell up. Chances are nobody uses multibyte, non utf-8 charsets anymore. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Helped-by: René Scharfe <l.s.r@web.de> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-27grep/icase: avoid kwsset on literal non-ascii stringsLibravatar Nguyễn Thái Ngọc Duy1-1/+6
When we detect the pattern is just a literal string, we avoid heavy regex engine and use fast substring search implemented in kwsset.c. But kws uses git-ctype which is locale-independent so it does not know how to fold case properly outside ascii range. Let regcomp or pcre take care of this case instead. Slower, but accurate. Noticed-by: Plamen Totev <plamen.totev@abv.bg> Helped-by: René Scharfe <l.s.r@web.de> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-27grep: break down an "if" stmt in preparation for next changesLibravatar Nguyễn Thái Ngọc Duy1-1/+3
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-20Merge branch 'rs/xdiff-hunk-with-func-line'Libravatar Junio C Hamano1-2/+26
"git show -W" (extend hunks to cover the entire function, delimited by lines that match the "funcname" pattern) used to show the entire file when a change added an entire function at the end of the file, which has been fixed. * rs/xdiff-hunk-with-func-line: xdiff: fix merging of appended hunk with -W grep: -W: don't extend context to trailing empty lines t7810: add test for grep -W and trailing empty context lines xdiff: don't trim common tail with -W xdiff: -W: don't include common trailing empty lines in context xdiff: ignore empty lines before added functions with -W xdiff: handle appended chunks better with -W xdiff: factor out match_func_rec() t4051: rewrite, add more tests
2016-05-31grep: -W: don't extend context to trailing empty linesLibravatar René Scharfe1-2/+26
Empty lines between functions are shown by grep -W, as it considers them to be part of the function preceding them. They are not interesting in most languages. The previous patches stopped showing them for diff -W. Stop showing empty lines trailing a function with grep -W. Grep scans the lines of a buffer from top to bottom and prints matching lines immediately. Thus we need to peek ahead in order to determine if an empty line is part of a function body and worth showing or not. Remember how far ahead we peeked in order to avoid having to do so repeatedly when handling multiple consecutive empty lines. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-09grep.c: use error_errno()Libravatar Nguyễn Thái Ngọc Duy1-2/+2
While at there, improve the error message a bit (what operation failed?) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22use xmallocz to avoid size arithmeticLibravatar Jeff King1-2/+1
We frequently allocate strings as xmalloc(len + 1), where the extra 1 is for the NUL terminator. This can be done more simply with xmallocz, which also checks for integer overflow. There's no case where switching xmalloc(n+1) to xmallocz(n) is wrong; the result is the same length, and malloc made no guarantees about what was in the buffer anyway. But in some cases, we can stop manually placing NUL at the end of the allocated buffer. But that's only safe if it's clear that the contents will always fill the buffer. In each case where this patch does so, I manually examined the control flow, and I tried to err on the side of caution. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05color: add color_set helper for copying raw colorsLibravatar Jeff King1-16/+16
To set up default colors, we sometimes strcpy() from the default string literals into our color buffers. This isn't a bug (assuming the destination is COLOR_MAXLEN bytes), but makes it harder to audit the code for problematic strcpy calls. Let's introduce a color_set which copies under the assumption that there are COLOR_MAXLEN bytes in the destination (of course you can call it on a smaller buffer, so this isn't providing a huge amount of safety, but it's more convenient than calling xsnprintf yourself). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25grep: use xsnprintf to format failure messageLibravatar Jeff King1-2/+2
This looks at first glance like the sprintf can overflow our buffer, but it's actually fine; the p->origin string is something constant and small, like "command line" or "-e option". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-11Merge branch 'jk/blame-commit-label'Libravatar Junio C Hamano1-2/+2
"git blame HEAD -- missing" failed to correctly say "HEAD" when it tried to say "No such path 'missing' in HEAD". * jk/blame-commit-label: blame.c: fix garbled error message use xstrdup_or_null to replace ternary conditionals builtin/commit.c: use xstrdup_or_null instead of envdup builtin/apply.c: use xstrdup_or_null instead of null_strdup git-compat-util: add xstrdup_or_null helper
2015-01-13use xstrdup_or_null to replace ternary conditionalsLibravatar Jeff King1-2/+2
This replaces "x ? xstrdup(x) : NULL" with xstrdup_or_null(x). The change is fairly mechanical, with the exception of resolve_refdup, which can eliminate a temporary variable. There are still a few hits grepping for "?.*xstrdup", but these are of slightly different forms and cannot be converted (e.g., "x ? xstrdup(x->foo) : NULL"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-31Merge branch 'rs/grep-color-words'Libravatar Junio C Hamano1-7/+22
Allow painting or not painting (partial) matches in context lines when showing "grep -C<num>" output in color. * rs/grep-color-words: grep: add color.grep.matchcontext and color.grep.matchselected
2014-10-28grep: add color.grep.matchcontext and color.grep.matchselectedLibravatar René Scharfe1-7/+22
The config option color.grep.match can be used to specify the highlighting color for matching strings. Add the options matchContext and matchSelected to allow different colors to be specified for matching strings in the context vs. in selected lines. This is similar to the ms and mc specifiers in GNU grep's environment variable GREP_COLORS. Tests are from Zoltan Klinger's earlier attempt to solve the same issue in a different way. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-14color_parse: do not mention variable name in error messageLibravatar Jeff King1-1/+1
Originally the color-parsing function was used only for config variables. It made sense to pass the variable name so that the die() message could be something like: $ git -c color.branch.plain=bogus branch fatal: bad color value 'bogus' for variable 'color.branch.plain' These days we call it in other contexts, and the resulting error messages are a little confusing: $ git log --pretty='%C(bogus)' fatal: bad color value 'bogus' for variable '--pretty format' $ git config --get-color foo.bar bogus fatal: bad color value 'bogus' for variable 'command line' This patch teaches color_parse to complain only about the value, and then return an error code. Config callers can then propagate that up to the config parser, which mentions the variable name. Other callers can provide a custom message. After this patch these three cases now look like: $ git -c color.branch.plain=bogus branch error: invalid color value: bogus fatal: unable to parse 'color.branch.plain' from command-line config $ git log --pretty='%C(bogus)' error: invalid color value: bogus fatal: unable to parse --pretty format $ git config --get-color foo.bar bogus error: invalid color value: bogus fatal: unable to parse default color value Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-03Merge branch 'as/grep-fullname-config'Libravatar Junio C Hamano1-0/+5
Add a configuration variable to force --full-name to be default for "git grep". This may cause regressions on scripted users that do not expect this new behaviour. * as/grep-fullname-config: grep: add grep.fullName config variable
2014-03-20grep: add grep.fullName config variableLibravatar Andreas Schwab1-0/+5
This configuration variable sets the default for the --full-name option. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-18Merge branch 'rs/grep-h-c'Libravatar Junio C Hamano1-2/+5
"git grep" learns to handle combination of "-h (no header)" and "-c (counts)". * rs/grep-h-c: grep: support -h (no header) with --count t7810: add missing variables to tests in loop
2014-03-11grep: support -h (no header) with --countLibravatar René Scharfe1-2/+5
Suppress printing the header (filename) with -h even if in -c/--count mode. GNU grep and OpenBSD's grep do the same. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-06Use hashcpy() when copying object namesLibravatar Sun He1-1/+1
We invented hashcpy() to keep the abstraction of "object name" behind it. Use it instead of calling memcpy() with hard-coded 20-byte length when moving object names between pieces of memory. Leave ppc/sha1.c as-is, because the function is about the SHA-1 hash algorithm whose output is and will always be 20 bytes. Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Helped-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Sun He <sunheehnus@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-10grep: allow to use textconv filtersLibravatar Jeff King1-14/+86
Recently and not so recently, we made sure that log/grep type operations use textconv filters when a userfacing diff would do the same: ef90ab6 (pickaxe: use textconv for -S counting, 2012-10-28) b1c2f57 (diff_grep: use textconv buffers for add/deleted files, 2012-10-28) 0508fe5 (combine-diff: respect textconv attributes, 2011-05-23) "git grep" currently does not use textconv filters at all, that is neither for displaying the match and context nor for the actual grepping, even when requested by --textconv. Introduce an option "--textconv" which makes git grep use any configured textconv filters for grepping and output purposes. It is off by default. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-25fix clang -Wtautological-compare with unsigned enumLibravatar Antoine Pelisse1-1/+2
Create a GREP_HEADER_FIELD_MIN so we can check that the field value is sane and silence the clang warning. Clang warning happens because the enum is unsigned (this is implementation-defined, and there is no negative fields) and the check is then tautological. Signed-off-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: John Keeping <john@keeping.me.uk> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-29Merge branch 'nd/grep-true-path'Libravatar Jeff King1-3/+8
"git grep -e pattern <tree>" asked the attribute system to read "<tree>:.gitattributes" file in the working tree, which was nonsense. * nd/grep-true-path: grep: stop looking at random places for .gitattributes
2012-10-12grep: stop looking at random places for .gitattributesLibravatar Nguyễn Thái Ngọc Duy1-3/+8
grep searches for .gitattributes using "name" field in struct grep_source but that field is not real on-disk path name. For example, "grep pattern rev" fills the field with "rev:path", and Git looks for .gitattributes in the (non-existent but exploitable) path "rev:path" instead of "path". This patch passes real paths down to grep_source_load_driver() when: - grep on work tree - grep on the index - grep a commit (or a tag if it points to a commit) so that these cases look up .gitattributes at proper paths. .gitattributes lookup is disabled in all other cases. Initial-work-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09revisions: initialize revs->grep_filter using grep_init()Libravatar Junio C Hamano1-0/+5
Instead of using the hand-rolled initialization sequence, use grep_init() to populate the necessary bits. This opens the door to allow the calling commands to optionally read grep.* configuration variables via git_config() if they want to. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09grep: move pattern-type bits support to top-level grep.[ch]Libravatar Junio C Hamano1-0/+42
Switching between -E/-G/-P/-F correctly needs a lot more than just flipping opt->regflags bit these days, and we have a nice helper function buried in builtin/grep.c for the sole use of "git grep". Extract it so that "log --grep" family can also use it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-09grep: move the configuration parsing logic to grep.[ch]Libravatar Junio C Hamano1-0/+130
The configuration handling is a library-ish part of this program, that is not specific to "git grep" command. It should be reusable by "log" and others. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29log --grep-reflog: reject the option without -gLibravatar Junio C Hamano1-0/+2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29revision: add --grep-reflog to filter commits by reflog messagesLibravatar Nguyễn Thái Ngọc Duy1-0/+1
Similar to --author/--committer which filters commits by author and committer header fields. --grep-reflog adds a fake "reflog" header to commit and a grep filter to search on that line. All rules to --author/--committer apply except no timestamp stripping. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-29grep: prepare for new header field filterLibravatar Nguyễn Thái Ngọc Duy1-1/+8
grep supports only author and committer headers, which have the same special treatment that later headers may or may not have. Check for field type and only strip_timestamp() when the field is either author or committer. GREP_HEADER_FIELD_MAX is put in the grep_header_field enum to be calculated automatically, correctly, as long as it's at the end of the enum. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-20grep.c: make two symbols really file-scope static this timeLibravatar Junio C Hamano1-2/+2
Adding a declaration at the beginning is not sufficient for obvious reasons. The definition has to be made static. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-15grep.c: mark private file-scope symbols as staticLibravatar Junio C Hamano1-1/+5
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-14log --grep/--author: honor --all-match honored for multiple --grep patternsLibravatar Junio C Hamano1-0/+19
When we have both header expression (which has to be an OR node by construction) and a pattern expression (which could be anything), we create a new top-level OR node to bind them together, and the resulting expression structure looks like this: OR / \ / \ pattern OR / \ / \ ..... committer OR / \ author TRUE The three elements on the top-level backbone that are inspected by the "all-match" logic are "pattern", "committer" and "author". When there are more than one elements in the "pattern", the top-level node of the "pattern" part of the subtree is an OR, and that node is inspected by "all-match". The result ends up ignoring the "--all-match" given from the command line. A match on either side of the pattern is considered a match, hence: git log --grep=A --grep=B --author=C --all-match shows the same "authored by C and has either A or B" that is correct only when run without "--all-match". Fix this by turning the resulting expression around when "--all-match" is in effect, like this: OR / \ / \ / OR committer / \ author \ pattern The set of nodes on the top-level backbone in the resulting expression becomes "committer", "author", and the nodes that are on the top-level backbone of the "pattern" subexpression. This makes the "all-match" logic inspect the same nodes in "pattern" as the case without the author and/or the committer restriction, and makes the earlier "log" example to show "authored by C and has A and has B", which is what the command line expects. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-14grep: teach --debug option to dump the parse treeLibravatar Junio C Hamano1-2/+90
Our "grep" allows complex boolean expressions to be formed to match each individual line with operators like --and, '(', ')' and --not. Introduce the "--debug" option to show the parse tree to help people who want to debug and enhance it. Also "log" learns "--grep-debug" option to do the same. The command line parser to the log family is a lot more limited than the general "git grep" parser, but it has special handling for header matching (e.g. "--author"), and a parse tree is valuable when working on it. Note that "--all-match" is *not* any individual node in the parse tree. It is an instruction to the evaluator to check all the nodes in the top-level backbone have matched and reject a document as non-matching otherwise. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-01Merge branch 'rs/maint-grep-F' into maintLibravatar Junio C Hamano1-17/+57
"git grep -e '$pattern'", unlike the case where the patterns are read from a file, did not treat individual lines in the given pattern argument as separate regular expressions as it should. By René Scharfe * rs/maint-grep-F: grep: stop leaking line strings with -f grep: support newline separated pattern list grep: factor out do_append_grep_pat() grep: factor out create_grep_pat()
2012-05-20grep: support newline separated pattern listLibravatar René Scharfe1-1/+32
Currently, patterns that contain newline characters don't match anything when given to git grep. Regular grep(1) interprets patterns as lists of newline separated search strings instead. Implement this functionality by creating and inserting extra grep_pat structures for patterns consisting of multiple lines when appending to the pattern lists. For simplicity, all pattern strings are duplicated. The original pattern is truncated in place to make it contain only the first line. Requested-by: Torne (Richard Coles) <torne@google.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-20grep: factor out do_append_grep_pat()Libravatar René Scharfe1-6/+9
Add do_append_grep_pat() as a shared function for adding patterns to the header pattern list and the general pattern list. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-20grep: factor out create_grep_pat()Libravatar René Scharfe1-11/+17
Add create_grep_pat(), a shared helper for all grep pattern allocation and initialization needs. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-11Merge branch 'ah/maint-grep-double-init' into maintLibravatar Junio C Hamano1-1/+1
By Angus Hammond * ah/maint-grep-double-init: grep.c: remove redundant line of code
2012-05-07grep.c: remove redundant line of codeLibravatar Angus Hammond1-1/+1
Signed-off-by: Angus Hammond <angusgh@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07Merge branch 'jc/pickaxe-ignore-case'Libravatar Junio C Hamano1-8/+3
By Junio C Hamano (2) and Ramsay Jones (1) * jc/pickaxe-ignore-case: ctype.c: Fix a sparse warning pickaxe: allow -i to search in patch case-insensitively grep: use static trans-case table
2012-02-28grep: use static trans-case tableLibravatar Junio C Hamano1-8/+3
In order to prepare the kwset machinery for a case-insensitive search, we used to use a static table of 256 elements and filled it every time before calling kwsalloc(). Because the kwset machinery will never modify this table, just allocate a single instance globally and fill it at the compile time. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-26Sync with 1.7.8.5Libravatar Junio C Hamano1-1/+1
2012-02-26grep -P: Fix matching ^ and $Libravatar Michał Kiedrowicz1-1/+1
When "git grep" is run with -P/--perl-regexp, it doesn't match ^ and $ at the beginning/end of the line. This is because PCRE normally matches ^ and $ at the beginning/end of the whole text, not for each line, and "git grep" passes a large chunk of text (possibly containing many lines) to pcre_exec() and then splits the text into lines. This makes "git grep -P" behave differently from "git grep -E" and also from "grep -P" and "pcregrep": $ cat file a b $ git grep --no-index -P '^ ' file $ git grep --no-index -E '^ ' file file: b $ grep -c -P '^ ' file b $ pcregrep -c '^ ' file b Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: load file data after checking binary-nessLibravatar Jeff King1-3/+3
Usually we load each file to grep into memory, check whether it's binary, and then either grep it (the default) or not (if "-I" was given). In the "-I" case, we can skip loading the file entirely if it is marked as binary via gitattributes. On my giant 3-gigabyte media repository, doing "git grep -I foo" went from: real 0m0.712s user 0m0.044s sys 0m4.780s to: real 0m0.026s user 0m0.016s sys 0m0.020s Obviously this is an extreme example. The repo is almost entirely binary files, and you can see that we spent all of our time asking the kernel to read() the data. However, with a cold disk cache, even avoiding a few binary files can have an impact. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02grep: respect diff attributes for binary-nessLibravatar Jeff King1-2/+14
There is currently no way for users to tell git-grep that a particular path is or is not a binary file; instead, grep always relies on its auto-detection (or the user specifying "-a" to treat all binary-looking files like text). This patch teaches git-grep to use the same attribute lookup that is used by git-diff. We could add a new "grep" flag, but that is unnecessarily complex and unlikely to be useful. Despite the name, the "-diff" attribute (or "diff=foo" and the associated diff.foo.binary config option) are really about describing the contents of the path. It's simply historical that diff was the only thing that cared about these attributes in the past. And if this simple approach turns out to be insufficient, we still have a backwards-compatible path forward: we can add a separate "grep" attribute, and fall back to respecting "diff" if it is unset. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>