summaryrefslogtreecommitdiff
path: root/t/t7814-grep-recurse-submodules.sh
AgeCommit message (Collapse)AuthorFilesLines
2022-02-09clone, submodule: pass partial clone filters to submodulesLibravatar Josh Steadmon1-0/+41
When cloning a repo with a --filter and with --recurse-submodules enabled, the partial clone filter only applies to the top-level repo. This can lead to unexpected bandwidth and disk usage for projects which include large submodules. For example, a user might wish to make a partial clone of Gerrit and would run: `git clone --recurse-submodules --filter=blob:5k https://gerrit.googlesource.com/gerrit`. However, only the superproject would be a partial clone; all the submodules would have all blobs downloaded regardless of their size. With this change, the same filter can also be applied to submodules, meaning the expected bandwidth and disk savings apply consistently. To avoid changing default behavior, add a new clone flag, `--also-filter-submodules`. When this is set along with `--filter` and `--recurse-submodules`, the filter spec is passed along to git-submodule and git-submodule--helper, such that submodule clones also have the filter applied. This applies the same filter to the superproject and all submodules. Users who need to customize the filter per-submodule would need to clone with `--no-recurse-submodules` and then manually initialize each submodule with the proper filter. Applying filters to submodules should be safe thanks to Jonathan Tan's recent work [1, 2, 3] eliminating the use of alternates as a method of accessing submodule objects, so any submodule object access now triggers a lazy fetch from the submodule's promisor remote if the accessed object is missing. This patch is a reworked version of [4], which was created prior to Jonathan Tan's work. [1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16) [2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate', 2021-09-20) [3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules', 2021-10-25) [4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/ Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29grep: demonstrate bug with textconv attributes and submodulesLibravatar Matheus Tavares1-0/+103
In some circumstances, "git grep --textconv --recurse-submodules" ignores the textconv attributes from the submodules and erroneously applies the attributes defined in the superproject on the submodules' files. The textconv cache is also saved on the superproject, even for submodule objects. A fix for these problems will probably require at least three changes: - Some textconv and attributes functions (as well as their callees) will have to be adjusted to work with arbitrary repositories. Note that "fill_textconv()", for example, already receives a "struct repository" but it writes the textconv cache using "write_loose_object()", which implicitly works on "the_repository". - grep.c functions will have to call textconv/userdiff routines passing the "repo" field from "struct grep_source" instead of the one from "struct grep_opt". The latter always points to "the_repository" on "git grep" executions (see its initialization in builtin/grep.c), but the former points to the correct repository that each source (an object, file, or buffer) comes from. - "userdiff_find_by_path()" might need to use a different attributes stack for each repository it works on or reset its internal static stack when the repository is changed throughout the calls. For now, let's add some tests to demonstrate these problems, and also update a NEEDSWORK comment in grep.h that mentions this bug to reference the added tests. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-08t7814: show lack of alternate ODB-addingLibravatar Jonathan Tan1-0/+3
The previous patches have made "git grep" no longer need to add submodule ODBs as alternates, at least for the code paths tested in t7814. Demonstrate this by making adding a submodule ODB as an alternate fatal in this test. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Emily Shaffer <emilyshaffer@google.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-30grep: ignore --recurse-submodules if --no-index is givenLibravatar Philippe Blain1-1/+10
Since grep learned to recurse into submodules in 0281e487fd (grep: optionally recurse into submodules, 2016-12-16), using --recurse-submodules along with --no-index makes Git die(). This is unfortunate because if submodule.recurse is set in a user's ~/.gitconfig, invoking `git grep --no-index` either inside or outside a Git repository results in fatal: option not supported with --recurse-submodules Let's allow using these options together, so that setting submodule.recurse globally does not prevent using `git grep --no-index`. Using `--recurse-submodules` should not have any effect if `--no-index` is used inside a repository, as Git will recurse into the checked out submodule directories just like into regular directories. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-22Merge branch 'mt/grep-submodules-working-tree'Libravatar Junio C Hamano1-0/+21
"git grep --recurse-submodules" that looks at the working tree files looked at the contents in the index in submodules, instead of files in the working tree. * mt/grep-submodules-working-tree: grep: fix worktree case in submodules
2019-07-30grep: fix worktree case in submodulesLibravatar Matheus Tavares1-0/+21
Running git-grep with --recurse-submodules results in a cached grep for the submodules even when --cached is not used. This makes all modifications in submodules' tracked files be always ignored when grepping. Solve that making git-grep respect the cached option when invoking grep_cache() inside grep_submodule(). Also, add tests to ensure that the desired behavior is performed. Reported-by: Daniel Zaoui <jackdanielz@eyomi.org> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28t7814: do not generate same commits in different reposLibravatar Nguyễn Thái Ngọc Duy1-1/+17
t7814 has repo tree like this initial-repo submodule sub In each repo 'submodule' and 'sub', a commit is made to add the same initial file 'a' with the same message 'add a'. If tests run fast enough, the two commits are made in the same second, resulting identical commits. There is nothing wrong with that per-se. But it could make the test flaky. Currently all submodule odbs are merged back in the main one (because we can't, or couldn't, access separate submodule repos otherwise). But eventually we need to access objects from the right repo. Because the same commit could sometimes be present in both 'submodule' and 'sub', if there is a bug looking up objects in the wrong repo, sometimes it will go unnoticed because it finds the needed object in the wrong repo anyway. Fix this by changing commit time after every commit. This makes all commits unique. Of course there are still identical blobs in different repos, but because we often lookup commit first, then tree and blob, unique commits are already quite safe. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16submodule-config.c: use repo_get_oid for reading .gitmodulesLibravatar Nguyễn Thái Ngọc Duy1-5/+1
Since 76e9bdc437 (submodule: support reading .gitmodules when it's not in the working tree - 2018-10-25), every time you do git grep --recurse-submodules you are likely to see one warning line per submodule (unless all those submodules also have submodules). On a superproject with plenty of submodules (I've seen one with 67) this is really annoying. The warning was there because we could not resolve extended SHA-1 syntax on a submodule. We can now. Make use of the new API and get rid of the warning. It would be even better if config_with_options() supports multiple repositories too. But one step at a time. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-31submodule: support reading .gitmodules when it's not in the working treeLibravatar Antonio Ospite1-0/+16
When the .gitmodules file is not available in the working tree, try using the content from the index and from the current branch. This covers the case when the file is part of the repository but for some reason it is not checked out, for example because of a sparse checkout. This makes it possible to use at least the 'git submodule' commands which *read* the gitmodules configuration file without fully populating the working tree. Writing to .gitmodules will still require that the file is checked out, so check for that before calling config_set_in_gitmodules_file_gently. Add a similar check also in git-submodule.sh::cmd_add() to anticipate the eventual failure of the "git submodule add" command when .gitmodules is not safely writeable; this prevents the command from leaving the repository in a spurious state (e.g. the submodule repository was cloned but .gitmodules was not updated because config_set_in_gitmodules_file_gently failed). Moreover, since config_from_gitmodules() now accesses the global object store, it is necessary to protect all code paths which call the function against concurrent access to the global object store. Currently this only happens in builtin/grep.c::grep_submodules(), so call grep_read_lock() before invoking code involving config_from_gitmodules(). Finally, add t7418-submodule-sparse-gitmodules.sh to verify that reading from .gitmodules succeeds and that writing to it fails when the file is not checked out. NOTE: there is one rare case where this new feature does not work properly yet: nested submodules without .gitmodules in their working tree. This has been documented with a warning and a test_expect_failure item in t7814, and in this case the current behavior is not altered: no config is read. Signed-off-by: Antonio Ospite <ao2@ao2.it> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-01builtin/grep.c: respect 'submodule.recurse' optionLibravatar Stefan Beller1-0/+18
In builtin/grep.c we parse the config before evaluating the command line options. This makes the task of teaching grep to respect the new config option 'submodule.recurse' very easy by just parsing that option. As an alternative I had implemented a similar structure to treat submodules as the fetch/push command have, including * aligning the meaning of the 'recurse_submodules' to possible submodule values RECURSE_SUBMODULES_* as defined in submodule.h. * having a callback to parse the value and * reacting to the RECURSE_SUBMODULES_DEFAULT state that was the initial state. However all this is not needed for a true boolean value, so let's keep it simple. However this adds another place where "submodule.recurse" is parsed. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-21grep: add tests for grep pattern types being passed to submodulesLibravatar Ævar Arnfjörð Bjarmason1-0/+49
Add testing for grep pattern types being correctly passed to submodules. The pattern "(.|.)[\d]" matches differently under fixed (not at all), and then matches different lines under basic/extended & perl regular expressions, so this change asserts that the pattern type is passed along correctly. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-21grep: amend submodule recursion test for regex engine testingLibravatar Ævar Arnfjörð Bjarmason1-83/+83
Amend the submodule recursion test to prepare it for subsequent tests of whether it passes along the grep.patternType to the submodule greps. This is the result of searching & replacing: foobar -> (1|2)d(3|4) foo -> (1|2) bar -> (3|4) Currently there's no tests for whether e.g. -P or -E is correctly passed along, tests for that will be added in a follow-up change, but first add content to the tests which will match differently under different regex engines. Reuse the pattern established in an earlier commit of mine in this series ("log: add exhaustive tests for pattern style options & config", 2017-04-07). The pattern "(.|.)[\d]" will match this content differently under fixed/basic/extended & perl. This test code was originally added in commit 0281e487fd ("grep: optionally recurse into submodules", 2016-12-16). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-17grep: fix bug when recursing with relative pathspecLibravatar Brandon Williams1-0/+75
When using the --recurse-submodules flag with a relative pathspec which includes "..", an error is produced inside the child process spawned for a submodule. When creating the pathspec struct in the child, the ".." is interpreted to mean "go up a directory" which causes an error stating that the path ".." is outside of the repository. While it is true that ".." is outside the scope of the submodule, it is confusing to a user who originally invoked the command where ".." was indeed still inside the scope of the superproject. Since the child process launched for the submodule has some context that it is operating underneath a superproject, this error could be avoided. This patch fixes the bug by passing the 'prefix' to the child process. Now each child process that works on a submodule has two points of reference to the superproject: (1) the 'super_prefix' which is the path from the root of the superproject down to root of the submodule and (2) the 'prefix' which is the path from the root of the superproject down to the directory where the user invoked the git command. With these two pieces of information a child process can correctly interpret the pathspecs provided by the user as well as being able to properly format its output relative to the directory the user invoked the original command from. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-22grep: search history of moved submodulesLibravatar Brandon Williams1-0/+41
If a submodule was renamed at any point since it's inception then if you were to try and grep on a commit prior to the submodule being moved, you wouldn't be able to find a working directory for the submodule since the path in the past is different from the current path. This patch teaches grep to find the .git directory for a submodule in the parents .git/modules/ directory in the event the path to the submodule in the commit that is being searched differs from the state of the currently checked out commit. If found, the child process that is spawned to grep the submodule will chdir into its gitdir instead of a working directory. In order to override the explicit setting of submodule child process's gitdir environment variable (which was introduced in '10f5c526') `GIT_DIR_ENVIORMENT` needs to be pushed onto child process's env_array. This allows the searching of history from a submodule's gitdir, rather than from a working directory. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-22grep: enable recurse-submodules to work on <tree> objectsLibravatar Brandon Williams1-1/+102
Teach grep to recursively search in submodules when provided with a <tree> object. This allows grep to search a submodule based on the state of the submodule that is present in a commit of the super project. When grep is provided with a <tree> object, the name of the object is prefixed to all output. In order to provide uniformity of output between the parent and child processes the option `--parent-basename` has been added so that the child can preface all of it's output with the name of the parent's object instead of the name of the commit SHA1 of the submodule. This changes output from the command `git grep -e. -l --recurse-submodules HEAD` from: HEAD:file <commit sha1 of submodule>:sub/file to: HEAD:file HEAD:sub/file Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-22grep: optionally recurse into submodulesLibravatar Brandon Williams1-0/+99
Allow grep to recognize submodules and recursively search for patterns in each submodule. This is done by forking off a process to recursively call grep on each submodule. The top level --super-prefix option is used to pass a path to the submodule which can in turn be used to prepend to output or in pathspec matching logic. Recursion only occurs for submodules which have been initialized and checked out by the parent project. If a submodule hasn't been initialized and checked out it is simply skipped. In order to support the existing multi-threading infrastructure in grep, output from each child process is captured in a strbuf so that it can be later printed to the console in an ordered fashion. To limit the number of theads that are created, each child process has half the number of threads as its parents (minimum of 1), otherwise we potentailly have a fork-bomb. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>