summaryrefslogtreecommitdiff
path: root/builtin/sparse-checkout.c
AgeCommit message (Collapse)AuthorFilesLines
2021-09-07sparse-checkout: clear tracked sparse dirsLibravatar Derrick Stolee1-0/+94
When changing the scope of a sparse-checkout using cone mode, we might have some tracked directories go out of scope. The current logic removes the tracked files from within those directories, but leaves the ignored files within those directories. This is a bit unexpected to users who have given input to Git saying they don't need those directories anymore. This is something that is new to the cone mode pattern type: the user has explicitly said "I want these directories and _not_ those directories." The typical sparse-checkout patterns more generally apply to "I want files with with these patterns" so it is natural to leave ignored files as they are. This focus on directories in cone mode provides us an opportunity to change the behavior. Leaving these ignored files in the sparse directories makes it impossible to gain performance benefits in the sparse index. When we track into these directories, we need to know if the files are ignored or not, which might depend on the _tracked_ .gitignore file(s) within the sparse directory. This depends on the indexed version of the file, so the sparse directory must be expanded. We must take special care to look for untracked, non-ignored files in these directories before deleting them. We do not want to delete any meaningful work that the users were doing in those directories and perhaps forgot to add and commit before switching sparse-checkout definitions. Since those untracked files might be code files that generated ignored build output, also do not delete any ignored files from these directories in that case. The users can recover their state by resetting their sparse-checkout definition to include that directory and continue. Alternatively, they can see the warning that is presented and delete the directory themselves to regain the performance they expect. By deleting the sparse directories when changing scope (or running 'git sparse-checkout reapply') we regain these performance benefits as if the repository was in a clean state. Since these ignored files are frequently build output or helper files from IDEs, the users should not need the files now that the tracked files are removed. If the tracked files reappear, then they will have newer timestamps than the build artifacts, so the artifacts will need to be regenerated anyway. Use the sparse-index as a data structure in order to find the sparse directories that can be safely deleted. Re-expand the index to a full one if it was full before. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30use fspathhash() everywhereLibravatar René Scharfe1-8/+2
cf2dc1c238 (speed up alt_odb_usable() with many alternates, 2021-07-07) introduced the function fspathhash() for calculating path hashes while respecting the configuration option core.ignorecase. Call it instead of open-coding it; the resulting code is shorter and less repetitive. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'ds/sparse-index-protections'Libravatar Junio C Hamano1-9/+35
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-03-30sparse-checkout: disable sparse-indexLibravatar Derrick Stolee1-1/+9
We use 'git sparse-checkout init --cone --sparse-index' to toggle the sparse-index feature. It makes sense to also disable it when running 'git sparse-checkout disable'. This is particularly important because it removes the extensions.sparseIndex config option, allowing other tools to use this Git repository again. This does mean that 'git sparse-checkout init' will not re-enable the sparse-index feature, even if it was previously enabled. While testing this feature, I noticed that the sparse-index was not being written on the first run, but by a second. This was caught by the call to 'test-tool read-cache --table'. This requires adjusting some assignments to core_apply_sparse_checkout and pl.use_cone_patterns in the sparse_checkout_init() logic. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-checkout: toggle sparse index from builtinLibravatar Derrick Stolee1-1/+16
The sparse index extension is used to signal that index writes should be in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1. Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that specifies if the sparse index should be used. It also updates the index to use the correct format, either way. Add a warning in the documentation that the use of a repository extension might reduce compatibility with third-party tools. 'git sparse-checkout init' already sets extension.worktreeConfig, which places most sparse-checkout users outside of the scope of most third-party tools. Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of GIT_TEST_SPARSE_INDEX=1. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30sparse-checkout: hold pattern list in indexLibravatar Derrick Stolee1-7/+10
As we modify the sparse-checkout definition, we perform index operations on a pattern_list that only exists in-memory. This allows easy backing out in case the index update fails. However, if the index write itself cares about the sparse-checkout pattern set, we need access to that in-memory copy. Place a pointer to a 'struct pattern_list' in the index so we can access this on-demand. This will be used in the next change which uses the sparse-checkout definition to filter out directories that are outside the sparse cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16exclude: add flags parameter to add_patterns()Libravatar Jeff King1-4/+4
There are a number of callers of add_patterns() and its sibling functions. Let's give them a "flags" parameter for adding new options without having to touch each caller. We'll use this in a future patch to add O_NOFOLLOW support. But for now each caller just passes 0. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23sparse-checkout: load sparse-checkout patternsLibravatar Derrick Stolee1-5/+0
A future feature will want to load the sparse-checkout patterns into a pattern_list, but the current mechanism to do so is a bit complicated. This is made difficult due to needing to find the sparse-checkout file in different ways throughout the codebase. The logic implemented in the new get_sparse_checkout_patterns() was duplicated in populate_from_existing_patterns() in unpack-trees.c. Use the new method instead, keeping the logic around handling the struct unpack_trees_options. The callers to get_sparse_checkout_filename() in builtin/sparse-checkout.c manipulate the sparse-checkout file directly, so it is not appropriate to replace logic in that file with get_sparse_checkout_patterns(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30sparse-checkout: fill in some options boilerplateLibravatar Jeff King1-0/+37
The sparse-checkout passes along argv and argc to its sub-command helper functions. Many of these sub-commands do not yet take any command-line options, and ignore those parameters. Let's instead add empty option lists and make sure we call parse_options(). That will give a useful error message for something like: git sparse-checkout list --nonsense which currently just silently ignores the unknown option. As a bonus, it also silences some -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-29Merge branch 'xl/upgrade-repo-format'Libravatar Junio C Hamano1-0/+2
Allow runtime upgrade of the repository format version, which needs to be done carefully. There is a rather unpleasant backward compatibility worry with the last step of this series, but it is the right thing to do in the longer term. * xl/upgrade-repo-format: check_repository_format_gently(): refuse extensions for old repositories sparse-checkout: upgrade repository to version 1 when enabling extension fetch: allow adding a filter after initial clone repository: add a helper function to perform repository format upgrade
2020-06-05sparse-checkout: upgrade repository to version 1 when enabling extensionLibravatar Xin Li1-0/+2
The 'extensions' configuration variable gets special meaning in the new repository version, so when enabling the extension we should upgrade the repository to version 1. Signed-off-by: Xin Li <delphij@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05sparse-checkout: avoid staging deletions of all filesLibravatar Elijah Newren1-0/+4
sparse-checkout's purpose is to update the working tree to have it reflect a subset of the tracked files. As such, it shouldn't be switching branches, making commits, downloading or uploading data, or staging or unstaging changes. Other than updating the worktree, the only thing sparse-checkout should touch is the SKIP_WORKTREE bit of the index. In particular, this sets up a nice invariant: running sparse-checkout will never change the status of any file in `git status` (reflecting the fact that we only set the SKIP_WORKTREE bit if the file is safe to delete, i.e. if the file is unmodified). Traditionally, we did a _really_ bad job with this goal. The predecessor to sparse-checkout involved manual editing of .git/info/sparse-checkout and running `git read-tree -mu HEAD`. That command would stage and unstage changes and overwrite dirty changes in the working tree. The initial implementation of the sparse-checkout command was no better; it simply invoked `git read-tree -mu HEAD` as a subprocess and had the same caveats, though this issue came up repeatedly in review comments and workarounds for the problems were put in place before the feature was merged[1, 2, 3, 4, 5, 6; especially see 4 & 6]. [1] https://lore.kernel.org/git/CABPp-BFT9A5n=_bx5LsjCvbogqwSjiwgr5amcjgbU1iAk4KLJg@mail.gmail.com/ [2] https://lore.kernel.org/git/CABPp-BEmwSwg4tgJg6nVG8a3Hpn_g-=ZjApZF4EiJO+qVgu4uw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFV7TA0qwZCQpHCqx9N+JifyRyuBQ-pZ_oGfe-NOgyh7A@mail.gmail.com/ [4] https://lore.kernel.org/git/CABPp-BHYCCD+Vx5fq35jH82eHc1-P53Lz_aGNpHJNcx9kg2K-A@mail.gmail.com/ [5] https://lore.kernel.org/git/CABPp-BF+JWYZfDqp2Tn4AEKVp4b0YMA=Mbz4Nz62D-gGgiduYQ@mail.gmail.com/ [6] https://lore.kernel.org/git/20191121163706.GV23183@szeder.dev/ However, these workarounds, in addition to disabling the feature in a number of important cases, also missed one special case. I'll get back to it later. In the 2.27.0 cycle, the disabling of the feature was lifted by finally replacing the internal equivalent of `git read-tree -mu HEAD` with something that did what we wanted: the new update_sparsity() function in unpack-trees.c that only ever updates SKIP_WORKTREE bits in the index and updates the working tree to match. This new function handles all the cases that were problematic for the old implementation, except that it breaks the same special case that avoided the workarounds of the old implementation, but broke it in a different way. So...that brings us to the special case: a git clone performed with --no-checkout. As per the meaning of the flag, --no-checkout does not check out any branch, with the implication that you aren't on one and need to switch to one after the clone. Implementationally, HEAD is still set (so in some sense you are partially on a branch), but * the index is "unborn" (non-existent) * there are no files in the working tree (other than .git/) * the next time git switch (or git checkout) is run it will run unpack_trees with `initial_checkout` flag set to true. It is not until you run, e.g. `git switch <somebranch>` that the index will be written and files in the working tree populated. With this special --no-checkout case, the traditional `read-tree -mu HEAD` behavior would have done the equivalent of acting like checkout -- switch to the default branch (HEAD), write out an index that matches HEAD, and update the working tree to match. This special case slipped through the avoid-making-changes checks in the original sparse-checkout command and thus continued there. After update_sparsity() was introduced and used (see commit f56f31af03 ("sparse-checkout: use new update_sparsity() function", 2020-03-27)), the behavior for the --no-checkout case changed: Due to git's auto-vivification of an empty in-memory index (see do_read_index() and note that `must_exist` is false), and due to sparse-checkout's update_working_directory() code to always write out the index after it was done, we got a new bug. That made it so that sparse-checkout would switch the repository from a clone with an "unborn" index (i.e. still needing an initial_checkout), to one that had a recorded index with no entries. Thus, instead of all the files appearing deleted in `git status` being known to git as a special artifact of not yet being on a branch, our recording of an empty index made it suddenly look to git as though it was definitely on a branch with ALL files staged for deletion! A subsequent checkout or switch then had to contend with the fact that it wasn't on an initial_checkout but had a bunch of staged deletions. Make sure that sparse-checkout changes nothing in the index other than the SKIP_WORKTREE bit; in particular, when the index is unborn we do not have any branch checked out so there is no sparsification or de-sparsification work to do. Simply return from update_working_directory() early. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27sparse-checkout: provide a new reapply subcommandLibravatar Elijah Newren1-1/+9
If commands like merge or rebase materialize files as part of their work, or a previous sparse-checkout command failed to update individual files due to dirty changes, users may want a command to simply 'reapply' the sparsity rules. Provide one. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27sparse-checkout: use improved unpack_trees porcelain messagesLibravatar Elijah Newren1-0/+2
setup_unpack_trees_porcelain() provides much improved error/warning messages; instead of a message that assumes that there is only one path with a given problem despite being used by code that intentionally is grouping and showing errors together, it uses a message designed to be used with groups of paths. For example, this transforms error: Entry ' folder1/a folder2/a ' not uptodate. Cannot update sparse checkout. into error: Cannot update sparse checkout: the following entries are not up to date: folder1/a folder2/a In the past the suboptimal messages were never actually triggered because we would error out if the working directory wasn't clean before we even called unpack_trees(). The previous commit changed that, though, so let's use the better error messages. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27sparse-checkout: use new update_sparsity() functionLibravatar Elijah Newren1-30/+10
Remove the equivalent of 'git read-tree -mu HEAD' in the sparse-checkout codepaths for setting the SKIP_WORKTREE bits and instead use the new update_sparsity() function. Note that when an issue is hit, the error message splits 'error' and 'Cannot update sparse checkout' on separate lines. For now, we use two greps to find both pieces of the error message but subsequent commits will clean up the messages reported to the user. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27unpack-trees: simplify pattern_list freeingLibravatar Elijah Newren1-1/+0
commit e091228e17 ("sparse-checkout: update working directory in-process", 2019-11-21) allowed passing a pre-defined set of patterns to unpack_trees(). However, if o->pl was NULL, it would still read the existing patterns and use those. If those patterns were read into a data structure that was allocated, naturally they needed to be free'd. However, despite the same function being responsible for knowing about both the allocation and the free'ing, the logic for tracking whether to free the pattern_list was hoisted to an outer function with an additional flag in unpack_trees_options. Put the logic back in the relevant function and discard the now unnecessary flag. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-05Merge branch 'ds/sparse-add'Libravatar Junio C Hamano1-32/+109
"git sparse-checkout" learned a new "add" subcommand. * ds/sparse-add: sparse-checkout: allow one-character directories in cone mode sparse-checkout: work with Windows paths sparse-checkout: create 'add' subcommand sparse-checkout: extract pattern update from 'set' subcommand sparse-checkout: extract add_patterns_from_input()
2020-02-17Merge branch 'rs/strbuf-insertstr'Libravatar Junio C Hamano1-1/+1
Code clean-up. * rs/strbuf-insertstr: mailinfo: don't insert header prefix for handle_content_type() strbuf: add and use strbuf_insertstr()
2020-02-11sparse-checkout: work with Windows pathsLibravatar Derrick Stolee1-0/+3
When using Windows, a user may run 'git sparse-checkout set A\B\C' to add the Unix-style path A/B/C to their sparse-checkout patterns. Normalizing the input path converts the backslashes to slashes before we add the string 'A/B/C' to the recursive hashset. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-11sparse-checkout: create 'add' subcommandLibravatar Derrick Stolee1-6/+66
When using the sparse-checkout feature, a user may want to incrementally grow their sparse-checkout pattern set. Allow adding patterns using a new 'add' subcommand. This is not much different from the 'set' subcommand, because we still want to allow the '--stdin' option and interpret inputs as directories when in cone mode and patterns otherwise. When in cone mode, we are growing the cone. This may actually reduce the set of patterns when adding directory A when A/B is already a directory in the cone. Test the different cases: siblings, parents, ancestors. When not in cone mode, we can only assume the patterns should be appended to the sparse-checkout file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-11sparse-checkout: extract pattern update from 'set' subcommandLibravatar Derrick Stolee1-18/+26
In anticipation of adding "add" and "remove" subcommands to the sparse-checkout builtin, extract a modify_pattern_list() method from the sparse_checkout_set() method. This command will read input from the command-line or stdin to construct a set of patterns, then modify the existing sparse-checkout patterns after a successful update of the working directory. Currently, the only way to modify the patterns is to replace all of the patterns. This will be extended in a later update. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-11sparse-checkout: extract add_patterns_from_input()Libravatar Derrick Stolee1-29/+35
In anticipation of extending the sparse-checkout builtin with "add" and "remove" subcommands, extract the code that fills a pattern list based on the input values. The input changes depending on the presence of "--stdin" or the value of core.sparseCheckoutCone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10strbuf: add and use strbuf_insertstr()Libravatar René Scharfe1-1/+1
Add a function for inserting a C string into a strbuf. Use it throughout the source to get rid of magic string length constants and explicit strlen() calls. Like strbuf_addstr(), implement it as an inline function to avoid the implicit strlen() calls to cause runtime overhead. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31sparse-checkout: escape all glob characters on writeLibravatar Derrick Stolee1-1/+1
The sparse-checkout patterns allow special globs according to fnmatch(3). When writing cone-mode patterns for paths containing these characters, they must be escaped. Use is_glob_special() to check which characters must be escaped this way, and add a path to the tests that contains all glob characters at once. Note that ']' is not special, since the initial bracket '[' is escaped. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31sparse-checkout: use C-style quotes in 'list' subcommandLibravatar Derrick Stolee1-2/+4
When in cone mode, the 'git sparse-checkout list' subcommand lists the directories included in the sparse cone. When these directories contain odd characters, such as a backslash, then we need to use C-style quotes similar to 'git ls-tree'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31sparse-checkout: unquote C-style strings over --stdinLibravatar Derrick Stolee1-1/+14
If a user somehow creates a directory with an asterisk (*) or backslash (\), then the "git sparse-checkout set" command will struggle to provide the correct pattern in the sparse-checkout file. When not in cone mode, the provided pattern is written directly into the sparse-checkout file. However, in cone mode we expect a list of paths to directories and then we convert those into patterns. Even more specifically, the goal is to always allow the following from the root of a repo: git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin The ls-tree command provides directory names with an unescaped asterisk. It also quotes the directories that contain an escaped backslash. We must remove these quotes, then keep the escaped backslashes. Use unquote_c_style() when parsing lines from stdin. Command-line arguments will be parsed as-is, assuming the user can do the correct level of escaping from their environment to match the exact directory names. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31sparse-checkout: write escaped patterns in cone modeLibravatar Derrick Stolee1-2/+21
If a user somehow creates a directory with an asterisk (*) or backslash (\), then the "git sparse-checkout set" command will struggle to provide the correct pattern in the sparse-checkout file. When not in cone mode, the provided pattern is written directly into the sparse-checkout file. However, in cone mode we expect a list of paths to directories and then we convert those into patterns. However, there is some care needed for the timing of these escapes. The in-memory pattern list is used to update the working directory before writing the patterns to disk. Thus, we need the command to have the unescaped names in the hashsets for the cone comparisons, then escape the patterns later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-24sparse-checkout: create leading directoriesLibravatar Derrick Stolee1-0/+4
The 'git init' command creates the ".git/info" directory and fills it with some default files. However, 'git worktree add' does not create the info directory for that worktree. This causes a problem when running "git sparse-checkout init" inside a worktree. While care was taken to allow the sparse-checkout config to be specific to a worktree, this initialization was untested. Safely create the leading directories for the sparse-checkout file. This is the safest thing to do even without worktrees, as a user could delete their ".git/info" directory and expect Git to recover safely. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-30sparse-checkout: list directories in cone modeLibravatar Derrick Stolee1-0/+21
When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set' command takes a list of directories as input, then creates an ordered list of sparse-checkout patterns such that those directories are recursively included and all sibling entries along the parent directories are also included. Listing the patterns is less user-friendly than the directories themselves. In cone mode, and as long as the patterns match the expected cone-mode pattern types, change the output of 'git sparse-checkout list' to only show the directories that created the patterns. With this change, the following piped commands would not change the working directory: git sparse-checkout list | git sparse-checkout set --stdin The only time this would not work is if core.sparseCheckoutCone is true, but the sparse-checkout file contains patterns that do not match the expected pattern types for cone mode. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-13sparse-checkout: respect core.ignoreCase in cone modeLibravatar Derrick Stolee1-2/+8
When a user uses the sparse-checkout feature in cone mode, they add patterns using "git sparse-checkout set <dir1> <dir2> ..." or by using "--stdin" to provide the directories line-by-line over stdin. This behaviour naturally looks a lot like the way a user would type "git add <dir1> <dir2> ..." If core.ignoreCase is enabled, then "git add" will match the input using a case-insensitive match. Do the same for the sparse-checkout feature. Perform case-insensitive checks while updating the skip-worktree bits during unpack_trees(). This is done by changing the hash algorithm and hashmap comparison methods to optionally use case- insensitive methods. When this is enabled, there is a small performance cost in the hashing algorithm. To tease out the worst possible case, the following was run on a repo with a deep directory structure: git ls-tree -d -r --name-only HEAD | git sparse-checkout set --stdin The 'set' command was timed with core.ignoreCase disabled or enabled. For the repo with a deep history, the numbers were core.ignoreCase=false: 62s core.ignoreCase=true: 74s (+19.3%) For reproducibility, the equivalent test on the Linux kernel repository had these numbers: core.ignoreCase=false: 3.1s core.ignoreCase=true: 3.6s (+16%) Now, this is not an entirely fair comparison, as most users will define their sparse cone using more shallow directories, and the performance improvement from eb42feca97 ("unpack-trees: hash less in cone mode" 2019-11-21) can remove most of the hash cost. For a more realistic test, drop the "-r" from the ls-tree command to store only the first-level directories. In that case, the Linux kernel repository takes 0.2-0.25s in each case, and the deep repository takes one second, plus or minus 0.05s, in each case. Thus, we _can_ demonstrate a cost to this change, but it is unlikely to matter to any reasonable sparse-checkout cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: check for dirty statusLibravatar Derrick Stolee1-0/+13
The index-merge performed by 'git sparse-checkout' will erase any staged changes, which can lead to data loss. Prevent these attempts by requiring a clean 'git status' output. Helped-by: Szeder Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: update working directory in-process for 'init'Libravatar Derrick Stolee1-16/+20
The 'git sparse-checkout init' subcommand previously wrote directly to the sparse-checkout file and then updated the working directory. This may fail if there are modified files not included in the initial pattern set. However, that left a populated sparse-checkout file. Use the in-process working directory update to guarantee that the init subcommand only changes the sparse-checkout file if the working directory update succeeds. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: write using lockfileLibravatar Derrick Stolee1-4/+11
If two 'git sparse-checkout set' subcommands are launched at the same time, the behavior can be unexpected as they compete to write the sparse-checkout file and update the working directory. Take a lockfile around the writes to the sparse-checkout file. In addition, acquire this lock around the working directory update to avoid two commands updating the working directory in different ways. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: use in-process update for disable subcommandLibravatar Derrick Stolee1-13/+12
The 'git sparse-checkout disable' subcommand returns a user to a full working directory. The old process for doing this required updating the sparse-checkout file with the "/*" pattern and then updating the working directory with core.sparseCheckout enabled. Finally, the sparse-checkout file could be removed and the config setting disabled. However, it is valuable to keep a user's sparse-checkout file intact so they can re-enable the sparse-checkout they previously used with 'git sparse-checkout init'. This is now possible with the in-process mechanism for updating the working directory. Reported-by: Szeder Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: update working directory in-processLibravatar Derrick Stolee1-12/+71
The sparse-checkout builtin used 'git read-tree -mu HEAD' to update the skip-worktree bits in the index and to update the working directory. This extra process is overly complex, and prone to failure. It also requires that we write our changes to the sparse-checkout file before trying to update the index. Remove this extra process call by creating a direct call to unpack_trees() in the same way 'git read-tree -mu HEAD' does. In addition, provide an in-memory list of patterns so we can avoid reading from the sparse-checkout file. This allows us to test a proposed change to the file before writing to it. An earlier version of this patch included a bug when the 'set' command failed due to the "Sparse checkout leaves no entry on working directory" error. It would not rollback the index.lock file, so the replay of the old sparse-checkout specification would fail. A test in t1091 now covers that scenario. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: sanitize for nested foldersLibravatar Derrick Stolee1-4/+18
If a user provides folders A/ and A/B/ for inclusion in a cone-mode sparse-checkout file, the parsing logic will notice that A/ appears both as a "parent" type pattern and as a "recursive" type pattern. This is unexpected and hence will complain via a warning and revert to the old logic for checking sparse-checkout patterns. Prevent this from happening accidentally by sanitizing the folders for this type of inclusion in the 'git sparse-checkout' builtin. This happens in two ways: 1. Do not include any parent patterns that also appear as recursive patterns. 2. Do not include any recursive patterns deeper than other recursive patterns. In order to minimize duplicate code for scanning parents, create hashmap_contains_parent() method. It takes a strbuf buffer to avoid reallocating a buffer when calling in a tight loop. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: init and set in cone modeLibravatar Derrick Stolee1-16/+147
To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout (init|set)'. Add '--cone' flag to 'git sparse-checkout init' to set the config option 'core.sparseCheckoutCone=true'. When running 'git sparse-checkout set' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories. When testing 'git sparse-checkout set' in cone mode, check the error stream to ensure we do not see any errors. Specifically, we want to avoid the warning that the patterns do not match the cone-mode patterns. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: create 'disable' subcommandLibravatar Derrick Stolee1-1/+25
The instructions for disabling a sparse-checkout to a full working directory are complicated and non-intuitive. Add a subcommand, 'git sparse-checkout disable', to perform those steps for the user. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: add '--stdin' option to set subcommandLibravatar Derrick Stolee1-2/+32
The 'git sparse-checkout set' subcommand takes a list of patterns and places them in the sparse-checkout file. Then, it updates the working directory to match those patterns. For a large list of patterns, the command-line call can get very cumbersome. Add a '--stdin' option to instead read patterns over standard in. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: 'set' subcommandLibravatar Derrick Stolee1-2/+45
The 'git sparse-checkout set' subcommand takes a list of patterns as arguments and writes them to the sparse-checkout file. Then, it updates the working directory using 'git read-tree -mu HEAD'. The 'set' subcommand will replace the entire contents of the sparse-checkout file. The write_patterns_and_update() method is extracted from cmd_sparse_checkout() to make it easier to implement 'add' and/or 'remove' subcommands in the future. If the core.sparseCheckout config setting is disabled, then enable the config setting in the worktree config. If we set the config this way and the sparse-checkout fails, then re-disable the config setting. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22clone: add --sparse modeLibravatar Derrick Stolee1-0/+6
When someone wants to clone a large repository, but plans to work using a sparse-checkout file, they either need to do a full checkout first and then reduce the patterns they included, or clone with --no-checkout, set up their patterns, and then run a checkout manually. This requires knowing a lot about the repo shape and how sparse-checkout works. Add a new '--sparse' option to 'git clone' that initializes the sparse-checkout file to include the following patterns: /* !/*/ These patterns include every file in the root directory, but no directories. This allows a repo to include files like a README or a bootstrapping script to grow enlistments from that point. During the 'git sparse-checkout init' call, we must first look to see if HEAD is valid, since 'git clone' does not have a valid HEAD at the point where it initializes the sparse-checkout. The following checkout within the clone command will create the HEAD ref and update the working directory correctly. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: create 'init' subcommandLibravatar Derrick Stolee1-1/+74
Getting started with a sparse-checkout file can be daunting. Help users start their sparse enlistment using 'git sparse-checkout init'. This will set 'core.sparseCheckout=true' in their config, write an initial set of patterns to the sparse-checkout file, and update their working directory. Make sure to use the `extensions.worktreeConfig` setting and write the sparse checkout config to the worktree-specific config file. This avoids confusing interactions with other worktrees. The use of running another process for 'git read-tree' is sub- optimal. This will be removed in a later change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: create builtin with 'list' subcommandLibravatar Derrick Stolee1-0/+86
The sparse-checkout feature is mostly hidden to users, as its only documentation is supplementary information in the docs for 'git read-tree'. In addition, users need to know how to edit the .git/info/sparse-checkout file with the right patterns, then run the appropriate 'git read-tree -mu HEAD' command. Keeping the working directory in sync with the sparse-checkout file requires care. Begin an effort to make the sparse-checkout feature a porcelain feature by creating a new 'git sparse-checkout' builtin. This builtin will be the preferred mechanism for manipulating the sparse-checkout file and syncing the working directory. The documentation provided is adapted from the "git read-tree" documentation with a few edits for clarity in the new context. Extra sections are added to hint toward a future change to a more restricted pattern set. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>