Age | Commit message (Collapse) | Author | Files | Lines |
|
The fix is short (~30 lines), but the description is not. Sorry.
There is a set of problems caused by files in what I'll refer to as the
"present-despite-SKIP_WORKTREE" state. This commit aims to not just fix
these problems, but remove the entire class as a possibility -- for
those using sparse checkouts. But first, we need to understand the
problems this class presents. A quick outline:
* Problems
* User facing issues
* Problem space complexity
* Maintenance and code correctness challenges
* SKIP_WORKTREE expectations in Git
* Suggested solution
* Pros/Cons of suggested solution
* Notes on testcase modifications
=== User facing issues ===
There are various ways for users to get files to be present in the
working copy despite having the SKIP_WORKTREE bit set for that file in
the index. This may come from:
* various git commands not really supporting the SKIP_WORKTREE bit[1,2]
* users grabbing files from elsewhere and writing them to the worktree
(perhaps even cached in their editor)
* users attempting to "abort" a sparse-checkout operation with a
not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
working tree is not atomic)[3].
Once users have present-despite-SKIP_WORKTREE files, any modifications
users make to these files will be ignored, possibly to users' confusion.
Further:
* these files will degrade performance for the sparse-index case due
to requiring the index to be expanded (see commit 55dfcf9591
("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why
we try to delete entire directories outside the sparse cone).
* these files will not be updated by by standard commands
(switch/checkout/pull/merge/rebase will leave them alone unless
conflicts happen -- and even then, the conflicted file may be
written somewhere else to avoid overwriting the SKIP_WORKTREE file
that is present and in the way)
* there is nothing in Git that users can use to discover such
files (status, diff, grep, etc. all ignore it)
* there is no reasonable mechanism to "recover" from such a condition
(neither `git sparse-checkout reapply` nor `git reset --hard` will
correct it).
So, not only are users modifications ignored, but the files get
progressively more stale over time. At some point in the future, they
may change their sparseness specification or disable sparse-checkouts.
At that time, all present-despite-SKIP_WORKTREE files will show up as
having lots of modifications because they represent a version from a
different branch or commit. These might include user-made local changes
from days before, but the only way to tell is to have users look through
them all closely.
If these users come to others for help, there will be no logs that
explain the issue; it's just a mysterious list of changes. Users might
adamantly claim (correctly, as it turns out) that they didn't modify
these files, while others presume they did.
[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
=== Problem space complexity ===
SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of
work on it initially, and several others have since come along and put
lots of work into it. Stolee spent most of 2021 on the sparse-index,
with lots of bugfixes along the way including to non-sparse-index cases
as we are still trying to get sparse checkouts to behave reasonably.
Basically every codepath throughout the treat needs to be aware of an
additional type of file: tracked-but-not-present. The extra type
results in lots of extra testcases and lots of extra code everywhere.
But, the sad thing is that we actually have more than one extra type.
We have tracked, tracked-but-not-present (SKIP_WORKTREE), and
tracked-but-promised-to-not-be-present-but-is-present-anyway
(present-despite-SKIP_WORKTREE). Two types is a monumental amount of
effort to support, and adding a third feels a bit like insanity[4].
[4] Some examples of which can be seen at
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
=== Maintenance and code correctness challenges ===
Matheus' patches to grep stalled for nearly a year, in part because of
complications of how to handle sparse-checkouts appropriately in all
cases[5][6] (with trying to sanely figure out how to sanely handle
present-despite-SKIP_WORKTREE files being one of the complications).
His rm/add follow-ups also took months because of those kinds of
issues[7]. The corner cases with things like submodules and
SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start
becoming really complex[8].
We've had to add ugly logic to merge-ort to attempt to handle
present-despite-SKIP_WORKTREE files[9], and basically just been forced
to give up in merge-recursive knowing full well that we'll sometimes
silently discard user modifications. Despite stash essentially being a
merge, it needed extra code (beyond what was in merge-ort and
merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid
a few different bugs that'd result in an early abort with a partial
stash application[10].
[5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t
and the dates on the thread; also Matheus and I had several
conversations off-list trying to resolve the issues over that time
[6] ...it finally kind of got unstuck after
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
[7] See for example
https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t
and quotes like "The core functionality of sparse-checkout has always
been only partially implemented", a statement I still believe is true
today.
[8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/
[9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE
handling with conflicted entries", 2021-03-20)
[10] See commit ba359fd507 ("stash: fix stash application in
sparse-checkouts", 2020-12-01)
=== SKIP_WORKTREE expectations in Git ===
A couple quotes:
* From [11] (before the "sparse-checkout" command existed):
If it needs too many special cases, hacks, and conditionals, then it
is not worth the complexity---if it is easier to write a correct code
by allowing Git to populate working tree files, it is perfectly fine
to do so.
In a sense, the sparse checkout "feature" itself is a hack by itself,
and that is why I think this part should be "best effort" as well.
* From the git-sparse-checkout manual (still present today):
THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
THE FUTURE.
[11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
=== Suggested solution ===
SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as
the name of the option implies, to allow the file to NOT be in the
worktree but consider it to be unchanged rather than deleted.
The suggests a simple solution: present-despite-SKIP_WORKTREE files
should not exist, for those using sparse-checkouts.
Enforce this at index loading time by checking if core.sparseCheckout is
true; if so, check files in the index with the SKIP_WORKTREE bit set to
verify that they are absent from the working tree. If they are present,
unset the bit (in memory, though any commands that write to the index
will record the update).
Users can, of course, can get the SKIP_WORKTREE bit back such as by
running `git sparse-checkout reapply` (if they have ensured the file is
unmodified and doesn't match the specified sparsity patterns).
=== Pros/Cons of suggested solution ===
Pros:
* Solves the user visible problems reported above, which I've been
complaining about for nearly a year but couldn't find a solution to.
* Helps prevent slow performance degradation with a sparse-index.
* Much easier behavior in sparse-checkouts for users to reason about
* Very simple, ~30 lines of code.
* Significantly simplifies some ugly testcases, and obviates the need
to test an entire class of potential issues.
* Reduces code complexity, reasoning, and maintenance. Avoids
disagreements about weird corner cases[12].
* It has been reported that some users might be (ab)using
SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree
mechanism[13, and a few other similar references]. These users know
of multiple caveats and shortcomings in doing so; perhaps not
surprising given the "SKIP_WORKTREE expecations" section above.
However, these users use `git update-index --skip-worktree`, and not
`git sparse-checkout` or core.sparseCheckout=true. As such, these
users would be unaffected by this change and can continue abusing
the system as before.
[12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree
Cons:
* When core.sparseCheckout is enabled, this adds a performance cost to
reading the index. I'll defer discussion of this cost to a subsequent
patch, since I have some optimizations to add.
=== Notes on testcase modifications ===
The good:
* t1011: Compare to two cases above it ('read-tree will not throw away
dirty changes, non-sparse'); since the file is present, it should
match the non-sparse case now
* t1092: sparse-index & sparse-checkout now match full-worktree
behavior in more cases! Yaay for consistency!
* t6428, t7012: look at how much simpler the tests become! Merge and
stash can just fail early telling the user there's a file in the
way, instead of not noticing until it's about to write a file and
then have to implement sudden crash avoidance. Hurray for sanity!
* t7817: sparse behavior better matches full tree behavior. Hurray
for sanity!
The confusing:
* t3705: These changes were ONLY needed on Windows, but they don't
hurt other platforms. Let's discuss each individually:
* core.sparseCheckout should be false by default. Nothing in this
testcase toggles that until many, many tests later. However,
early tests (#5 in particular) were testing `update-index
--skip-worktree` behavior in a non-sparse-checkout, but the
Windows tests in CI were behaving as if core.sparseCheckout=true
had been specified somewhere. I do not have access to a Windows
machine. But I just manually did what should have been a no-op
and turned the config off. And it fixed the test.
* I have no idea why the leftover .gitattributes file from this
test was causing failures for test #18 on Windows, but only with
these changes of mine. Test #18 was checking for empty stderr,
and specifically wanted to know that some error completely
unrelated to file endings did not appear. The leftover
.gitattributes file thus caused some spurious stderr unrelated to
the thing being checked. Since other tests did not intend to
test normalization, just proactively remove the .gitattributes
file. I'm certain this is cleaner and better, I'm just unsure
why/how this didn't trigger problems before.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* vd/sparse-clean-etc:
update-index: reduce scope of index expansion in do_reupdate
update-index: integrate with sparse index
update-index: add tests for sparse-checkout compatibility
checkout-index: integrate with sparse index
checkout-index: add --ignore-skip-worktree-bits option
checkout-index: expand sparse checkout compatibility tests
clean: integrate with sparse index
reset: reorder wildcard pathspec conditions
reset: fix validation in sparse index test
|
|
Replace unconditional index expansion in 'do_reupdate()' with one scoped to
only where a full index is needed. A full index is only required in
'do_reupdate()' when a sparse directory in the index differs from HEAD; in
that case, the index is expanded and the operation restarted.
Because the index should only be expanded if a sparse directory is modified,
add a test ensuring the index is not expanded when differences only exist
within the sparse cone.
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Enable use of the sparse index with `update-index`. Most variations of
`update-index` work without explicitly expanding the index or making any
other updates in or outside of `update-index.c`.
The one usage requiring additional changes is `--cacheinfo`; if a file
inside a sparse directory was specified, the index would not be expanded
until after the cache tree is invalidated, leading to a mismatch between the
index and cache tree. This scenario is handled by rearranging
`add_index_entry_with_check`, allowing `index_name_stage_pos` to expand the
index *before* attempting to invalidate the relevant cache tree path,
avoiding cache tree/index corruption.
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce tests for a variety of `git update-index` use cases, including
performance scenarios. Tests are intended to exercise `update-index` with
options that change the commands interaction with the index (e.g.,
`--again`) and with files/directories inside and outside a sparse checkout
cone.
Of note is that these tests clearly establish the behavior of `git
update-index --add` with untracked, outside-of-cone files. Unlike `git add`,
which fails with an error when provided with such files, `update-index`
succeeds in adding them to the index. Additionally, the `skip-worktree` flag
is *not* automatically added to the new entry. Although this is pre-existing
behavior, there are a couple of reasons to avoid changing it in favor of
consistency with e.g. `git add`:
* `update-index` is low-level command for modifying the index; while it can
perform operations similar to those of `add`, it traditionally has fewer
"guardrails" preventing a user from doing something they may not want to
do (in this case, adding an outside-of-cone, non-`skip-worktree` file to
the index)
* `update-index` typically only exits with an error code if it is incapable
of performing an operation (e.g., if an internal function call fails);
adding a new file outside the sparse checkout definition is still a valid
operation, albeit an inadvisable one
* `update-index` does not implicitly set flags (e.g., `skip-worktree`) when
creating new index entries with `--add`; if flags need to be updated,
options like `--[no-]skip-worktree` allow a user to intentionally set them
All this to say that, while there are valid reasons to consider changing the
treatment of outside-of-cone files in `update-index`, there are also
sufficient reasons for leaving it as-is.
Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add repository settings to allow usage of the sparse index.
When using the `--all` option, sparse directories are ignored by default due
to the `skip-worktree` flag, so there is no need to expand the index. If
`--ignore-skip-worktree-bits` is specified, the index is expanded in order
to check out all files.
When checking out individual files, existing behavior in a full index is to
exit with an error if a directory is specified (as the directory name will
not match an index entry). However, it is possible in a sparse index to
match a directory name to a sparse directory index entry, but checking out
that sparse directory still results in an error on checkout. To reduce some
potential confusion for users, `checkout_file(...)` explicitly exits with an
informative error if provided with a sparse directory name. The test
corresponding to this scenario verifies the error message, which now differs
between sparse index and non-sparse index checkouts.
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update `checkout-index` to no longer refresh files that have the
`skip-worktree` bit set, exiting with an error if `skip-worktree` filenames
are directly provided to `checkout-index`. The newly-added
`--ignore-skip-worktree-bits` option provides a mechanism to replicate the
old behavior, checking out *all* files specified (even those with
`skip-worktree` enabled).
The ability to toggle whether files should be checked-out based on
`skip-worktree` already exists in `git checkout` and `git restore` (both of
which have an `--ignore-skip-worktree-bits` option). The change to, by
default, ignore `skip-worktree` files is especially helpful for
sparse-checkout; it prevents inadvertent creation of files outside the
sparse definition on disk and eliminates the need to expand a sparse index
when using the `--all` option.
Internal usage of `checkout-index` in `git stash` and `git filter-branch` do
not make explicit use of files with `skip-worktree` enabled, so
`--ignore-skip-worktree-bits` is not added to them.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add tests to cover `checkout-index`, with a focus on cases interesting in a
sparse checkout (e.g., files specified outside sparse checkout definition).
New tests are intended to serve as a baseline for existing and/or expected
behavior and performance when integrating `checkout-index` with the sparse
index. Note that the test 'checkout-index --all' is marked as
'test_expect_failure', indicating that `update-index --all` will be modified
in a subsequent patch to behave as the test expects.
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove full index requirement for `git clean` and test to ensure the index
is not expanded in `git clean`. Add to existing test for `git clean` to
verify cleanup of untracked files in sparse directories is consistent
between sparse index and non-sparse index checkouts.
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update t1092 test 'reset with pathspecs outside sparse definition' to verify
index contents. The use of `rev-parse` verifies the contents of HEAD, not
the index, providing no real validation of the reset results. Conversely,
`ls-files` reports the contents of the index (OIDs, flags, filenames), which
are then compared across checkouts to ensure compatible index states.
Fixes 741a2c9ffa (reset: expand test coverage for sparse checkouts,
2021-09-27).
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that 'git ls-files --sparse' exists, we can use it to verify the
state of a sparse index instead of 'test-tool read-cache table'. Replace
these usages within t1092-sparse-checkout-compatibility.sh.
The important changes are due to the different output format. We need to
use the '--stage' output to get a file mode and OID, but it also
includes a stage value and drops the object type. This leads to some
differences in how we handle looking for specific entries.
Some places where we previously looked for no 'tree' entries, we can
instead directly compare the output across the repository with a sparse
index and the one without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Existing callers to 'git ls-files' are expecting file names, not
directories. It is best to expand a sparse index to show all of the
contained files in this case.
However, expert users may want to inspect the contents of the index
itself including which directories are sparse. Add a --sparse option to
allow users to request this information.
During testing, I noticed that options such as --modified did not affect
the output when the files in question were outside the sparse-checkout
definition. Tests are added to document this preexisting behavior and
how it remains unchanged with the sparse index and the --sparse option.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The 'git fetch' and 'git pull' commands parse the index in order to
determine if submodules exist. Without command_requires_full_index=0,
this will expand a sparse index, causing slow performance even when
there is no new data to fetch.
The .gitmodules file will never be inside a sparse directory entry, and
even if it was, the index_name_pos() method would expand the sparse
index if needed as we search for the path by name. These commands do not
iterate over the index, which is the typical thing we are careful about
when integrating with the sparse index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach diff and blame to work well with sparse index.
* ld/sparse-diff-blame:
blame: enable and test the sparse index
diff: enable and test the sparse index
diff: replace --staged with --cached in t1092 tests
repo-settings: prepare_repo_settings only in git repos
test-read-cache: set up repo after git directory
commit-graph: return if there is no git directory
git: ensure correct git directory setup with -h
|
|
The sparse-index/sparse-checkout feature had a bug in its use of
the matching code to determine which path is in or outside the
sparse checkout patterns.
* ds/sparse-deep-pattern-checkout-fix:
unpack-trees: use traverse_path instead of name
t1092: add deeper changes during a checkout
|
|
The default setting for trace2 event nesting was too low to cause
test failures, which is worked around by bumping it up in the test
framework.
* ds/trace2-regions-in-tests:
t/t*: remove custom GIT_TRACE2_EVENT_NESTING
test-lib.sh: set GIT_TRACE2_EVENT_NESTING
|
|
Various operating modes of "git reset" have been made to work
better with the sparse index.
* vd/sparse-reset:
unpack-trees: improve performance of next_cache_entry
reset: make --mixed sparse-aware
reset: make sparse-aware (except --mixed)
reset: integrate with sparse index
reset: expand test coverage for sparse checkouts
sparse-index: update command for expand/collapse test
reset: preserve skip-worktree bit in mixed reset
reset: rename is_missing to !is_in_reset_tree
|
|
Ensure that the sparseness of the in-core index matches the
index.sparse configuration specified by the repository immediately
after the on-disk index file is read.
* vd/sparse-sparsity-fix-on-read:
sparse-index: update do_read_index to ensure correct sparsity
sparse-index: add ensure_correct_sparsity function
sparse-index: avoid unnecessary cache tree clearing
test-read-cache.c: prepare_repo_settings after config init
|
|
Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:
1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.
2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.
The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.
Test before after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5%
We do not include paths outside the sparse checkout cone because blame
does not support blaming files that are not present in the working
directory. This is true in both sparse and full checkouts.
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Enable the sparse index within the 'git diff' command. Its implementation
already safely integrates with the sparse index because it shares code
with the 'git status' and 'git checkout' commands that were already
integrated. For more details see:
d76723ee53 (status: use sparse-index throughout, 2021-07-14)
1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29)
The most interesting thing to do is to add tests that verify that 'git
diff' behaves correctly when the sparse index is enabled. These cases are:
1. The index is not expanded for 'diff' and 'diff --staged'
2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
checkout, and sparse index repositories in the following partially-staged
scenarios (i.e. the index, HEAD, and working directory differ at a given
path):
1. Path is within sparse-checkout cone
2. Path is outside sparse-checkout cone
3. A merge conflict exists for paths outside sparse-checkout cone
The `p2000` tests demonstrate a ~44% execution time reduction for 'git
diff' and a ~86% execution time reduction for 'git diff --staged' using a
sparse index:
Test before after
-------------------------------------------------------------
2000.30: git diff (full-v3) 0.33 0.34 +3.0%
2000.31: git diff (full-v4) 0.33 0.35 +6.1%
2000.32: git diff (sparse-v3) 0.53 0.31 -41.5%
2000.33: git diff (sparse-v4) 0.54 0.29 -46.3%
2000.34: git diff --cached (full-v3) 0.07 0.07 +0.0%
2000.35: git diff --cached (full-v4) 0.07 0.08 +14.3%
2000.36: git diff --cached (sparse-v3) 0.28 0.04 -85.7%
2000.37: git diff --cached (sparse-v4) 0.23 0.03 -87.0%
Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Replace uses of the synonym --staged in t1092 tests with --cached (which
is the real and preferred option). This will allow consistency in the new
tests to be added with the upcoming change to enable the sparse index for
diff.
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The sparse_dir_matches_path() method compares a cache entry that is a
sparse directory entry against a 'struct traverse_info *info' and a
'struct name_entry *p' to see if the cache entry has exactly the right
name for those other inputs.
This method was introduced in 523506d (unpack-trees: unpack sparse
directory entries, 2021-07-14), but included a significant mistake. The
path comparisons used 'info->name' instead of 'info->traverse_path'.
Since 'info->name' only stores a single tree entry name while
'info->traverse_path' stores the full path from root, this method does
not work when 'info' is in a subdirectory of a directory. Replacing the
right strings and their corresponding lengths make the method work
properly.
The previous change included a failing test that exposes this issue.
That test now passes. The critical detail is that as we go deep into
unpack_trees(), the logic for merging a sparse directory entry with a
tree entry during 'git checkout' relies on this
sparse_dir_matches_path() in order to avoid calling
traverse_trees_recursive() during unpack_callback() in this hunk:
if (!is_sparse_directory_entry(src[0], names, info) &&
traverse_trees_recursive(n, dirmask, mask & ~dirmask,
names, info) < 0) {
return -1;
}
For deep paths, the short-circuit never occurred and
traverse_trees_recursive() was being called incorrectly and that was
causing other strange issues. Specifically, the error message from the
now-passing test previously included this:
error: Your local changes to the following files would be overwritten by checkout:
deep/deeper1/deepest2/a
deep/deeper1/deepest3/a
Please commit your changes or stash them before you switch branches.
Aborting
These messages occurred because the 'current' cache entry in
twoway_merge() was showing as NULL because the index did not contain
entries for the paths contained within the sparse directory entries. We
instead had 'oldtree' given as the entry at HEAD and 'newtree' as the
entry in the target tree. This led to reject_merge() listing these
paths.
Now that sparse_dir_matches_path() works the same for deep paths as it
does for shallow depths, the rest of the logic kicks in to properly
handle modifying the sparse directory entries as designed.
Reported-by: Gustave Granroth <gus.gran@gmail.com>
Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Extend the repository data in the setup of t1092 to include more
directories within two parent directories. This reproduces a bug found
by users of the sparse index feature with suitably-complicated
sparse-checkout definitions.
Add a failing test that fails in its first 'git checkout deepest' run in
the sparse index case with this error:
error: Your local changes to the following files would be overwritten by checkout:
deep/deeper1/deepest2/a
deep/deeper1/deepest3/a
Please commit your changes or stash them before you switch branches.
Aborting
The next change will fix this error, and that fix will make it clear why
the extra depth is necessary for revealing this bug. The assignment of
the sparse-checkout definition to include deep/deeper1/deepest as a
sibling directory is important to ensure that deep/deeper1 is not a
sparse directory entry, but deep/deeper1/deepest2 is.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* vd/sparse-reset:
unpack-trees: improve performance of next_cache_entry
reset: make --mixed sparse-aware
reset: make sparse-aware (except --mixed)
reset: integrate with sparse index
reset: expand test coverage for sparse checkouts
sparse-index: update command for expand/collapse test
reset: preserve skip-worktree bit in mixed reset
reset: rename is_missing to !is_in_reset_tree
|
|
Remove the `ensure_full_index` guard on `read_from_tree` and update `git
reset --mixed` to ensure it can use sparse directory index entries wherever
possible. Sparse directory entries are reset using `diff_tree_oid`, which
requires `change` and `add_remove` functions to process the internal
contents of the sparse directory. The `recursive` diff option handles cases
in which `reset --mixed` must diff/merge files that are nested multiple
levels deep in a sparse directory.
The use of pathspecs with `git reset --mixed` introduces scenarios in which
internal contents of sparse directories may be matched by the pathspec. In
order to reset *all* files in the repo that may match the pathspec, the
following conditions on the pathspec require index expansion before
performing the reset:
* "magic" pathspecs
* wildcard pathspecs that do not match only in-cone files or entire sparse
directories
* literal pathspecs matching something outside the sparse checkout
definition
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove `ensure_full_index` guard on `prime_cache_tree` and update
`prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
must determine whether a directory entry is sparse or not by searching for
it in the index (*without* expanding the index). If a matching sparse
directory index entry is found, no subtrees are added to the cache tree
entry and the entry count is set to 1 (representing the sparse directory
itself). Otherwise, the tree is assumed to not be sparse and its subtrees
are recursively added to the cache tree.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add new tests for `--merge` and `--keep` modes, as well as mixed reset with
pathspecs. New performance test cases exercise various execution paths for
`reset`.
Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The previous change modified GIT_TRACE2_EVENT_NESTING by default within
test-lib.sh. These custom assignments throughout the test suite are no
longer necessary.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Unless `command_requires_full_index` forces index expansion, ensure in-core
index sparsity matches config settings on read by calling
`ensure_correct_sparsity`. This makes the behavior of the in-core index more
consistent between different methods of updating sparsity: manually changing
the `index.sparse` config setting vs. executing
`git sparse-checkout --[no-]sparse-index init`
Although index sparsity is normally updated with `git sparse-checkout init`,
ensuring correct sparsity after a manual `index.sparse` change has some
practical benefits:
1. It allows for command-by-command sparsity toggling with
`-c index.sparse=<true|false>`, e.g. when troubleshooting issues with the
sparse index.
2. It prevents users from experiencing abnormal slowness after setting
`index.sparse` to `true` due to use of a full index in all commands until
the on-disk index is updated.
Helped-by: Junio C Hamano <gitster@pobox.com>
Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In anticipation of `git reset --hard` being able to use the sparse index
without expanding it, replace the command in `sparse-index is expanded and
converted back` with `git reset -- folder1/a`. This command will need to
expand the index to work properly, even after integrating the rest of
`reset` with sparse index.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change `update_index_from_diff` to set `skip-worktree` when applicable for
new index entries. When `git reset --mixed <tree-ish>` is run, entries in
the index with differences between the pre-reset HEAD and reset <tree-ish>
are identified and handled with `update_index_from_diff`. For each file, a
new cache entry in inserted into the index, created from the <tree-ish> side
of the reset (without changing the working tree). However, the newly-created
entry must have `skip-worktree` explicitly set in either of the following
scenarios:
1. the file is in the current index and has `skip-worktree` set
2. the file is not in the current index but is outside of a defined sparse
checkout definition
Not setting the `skip-worktree` bit leads to likely-undesirable results for
a user. It causes `skip-worktree` settings to disappear on the
"diff"-containing files (but *only* the diff-containing files), leading to
those files now showing modifications in `git status`. For example, when
running `git reset --mixed` in a sparse checkout, some file entries outside
of sparse checkout could show up as deleted, despite the user never deleting
anything (and not wanting them on-disk anyway).
Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved
in a basic `git reset --mixed` scenario and update a failure-documenting
test from 19a0acc (t1092: test interesting sparse-checkout scenarios,
2021-01-23) with new expected behavior.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Recent sparse-index addition, namely any use of index_name_pos(),
can expand sparse index entries and breaks any code that walks
cache-tree or existing index entries. One such instance of such a
breakage has been corrected.
* pw/sparse-cache-tree-verify-fix:
t1092: run "rebase --apply" without "-q" in testing
sparse index: fix use-after-free bug in cache_tree_verify()
|
|
We run a few operations and make sure they produce identical results
with and without sparse-index; the version we merged to the "next"
branch used the "-q" option to work around a breakage caused by a
version used at Microsoft with some unreleased changes, but since
we would want to make sure the commands produce identical results,
including reports given to the output that lists which commits were
picked, use of "-q" loses too much interesting information.
Let's drop "-q" from the command invocation and revisit the issue
when the problematic changes are upstreamed.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified. Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derrick Stolee for their help analyzing the problem.
==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
#0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
#1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
#5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
#6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
#7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
#14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)
0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
#0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
#2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
#6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
#7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
#8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
#12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
#13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
#14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
previously allocated by thread T0 here:
#0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
#2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
#3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
#4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
#5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
#6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
#7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
#8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We previously modified 'git add' to refuse updating index entries
outside of the sparse-checkout cone. This is justified to prevent users
from accidentally getting into a confusing state when Git removes those
files from the working tree at some later point.
Unfortunately, this caused some workflows that were previously possible
to become impossible, especially around merge conflicts outside of the
sparse-checkout cone. These were documented in tests within t1092.
We now re-enable these workflows using a new '--sparse' option to 'git
add'. This allows users to signal "Yes, I do know what I'm doing with
these files," and accept the consequences of the files leaving the
worktree later.
We delay updating the advice message until implementing a similar option
in 'git rm' and 'git mv'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When 'git add' adds a tracked file that is outside of the
sparse-checkout cone, it checks the SKIP_WORKTREE bit to see if the file
exists outside of the sparse-checkout cone. This is usually correct,
except in the case of a merge conflict outside of the cone.
Modify add_pathspec_matched_against_index() to be more careful about
paths by checking the sparse-checkout patterns in addition to the
SKIP_WORKTREE bit. This causes 'git add' to no longer allow files
outside of the cone that removed the SKIP_WORKTREE bit due to a merge
conflict.
With only this change, users will only be able to add the file after
adding the file to the sparse-checkout cone. A later change will allow
users to force adding even though the file is outside of the
sparse-checkout cone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The add_files() method in builtin/add.c takes a set of untracked files
that are being added by the input pathspec and inserts them into the
index. If these files are outside of the sparse-checkout cone, then they
gain the SKIP_WORKTREE bit at some point. However, this was not checked
before inserting into the index, so these files are added even though we
want to avoid modifying the index outside of the sparse-checkout cone.
Add a check within add_files() for these files and write the advice
about files outside of the sparse-checkout cone.
This behavior change modifies some existing tests within t1092. These
tests intended to document how a user could interact with the existing
behavior in place. Many of these tests need to be marked as expecting
failure. A future change will allow these tests to pass by adding a flag
to 'git add' that allows users to modify index entries outside of the
sparse-checkout cone.
The 'submodule handling' test is intended to document what happens to
directories that contain a submodule when the sparse index is enabled.
It is not trying to say that users should be able to add submodules
outside of the sparse-checkout cone, so that test can be modified to
avoid that operation.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add some tests to demonstrate the current behavior around adding files
outside of the sparse-checkout cone. Currently, untracked files are
handled differently from tracked files. A future change will make these
cases be handled the same way.
Further expand checking that a failed 'git add' does not stage changes
to the index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The hard work was already done with 'git merge' and the ORT strategy.
Just add extra tests to see that we get the expected results in the
non-conflict cases.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add tests to check that cherry-pick and rebase behave the same in the
sparse-index case as in the full index cases. These tests are agnostic
to GIT_TEST_MERGE_ALGORITHM, so a full CI test suite will check both the
'ort' and 'recursive' strategies on this test.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Merge conflicts happen often enough to want to avoid expanding a sparse
index when they happen, as long as those conflicts are within the
sparse-checkout cone. If a conflict exists outside of the
sparse-checkout cone, then we still need to expand before iterating over
the index entries. This is critical to do in advance because of how the
original_cache_nr is tracked to allow inserting and replacing cache
entries.
Iterate over the conflicted files and check if any paths are outside of
the sparse-checkout cone. If so, then expand the full index.
Add a test that demonstrates that we do not expand the index, even when
we hit a conflict within the sparse-checkout cone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Allow 'git merge' to operate without expanding a sparse index, at least
not immediately. The index still will be expanded in a few cases:
1. If the merge strategy is 'recursive', then we enable
command_requires_full_index at the start of the merge_recursive()
method. We expect sparse-index users to also have the 'ort' strategy
enabled.
2. With the 'ort' strategy, if the merge results in a conflicted file,
then we expand the index before updating the working tree. The loop
that iterates over the worktree replaces index entries and tracks
'origintal_cache_nr' which can become completely wrong if the index
expands in the middle of the operation. This safety valve is
important before that loop starts. A later change will focus this
to only expand if we indeed have a conflict outside of the
sparse-checkout cone.
3. Other merge strategies are executed as a 'git merge-X' subcommand,
and those strategies are currently protected with the
'command_requires_full_index' guard.
Some test updates are required, including a mistaken 'git checkout -b'
that did not specify the base branch, causing merges to be fast-forward
merges.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since b243012 (refresh_index(): add flag to ignore SKIP_WORKTREE
entries, 2021-04-08), 'git add --refresh <path>' will output a warning
message when the path is outside the sparse-checkout definition. The
implementation of this warning happened in parallel with the
sparse-index work to add ensure_full_index() calls throughout the
codebase.
Update this loop to have the proper logic that checks to see if the
pathspec is outside the sparse-checkout definition. This avoids the need
to expand the sparse directory entry and determine if the path is
tracked, untracked, or ignored. We simply avoid updating the stat()
information because there isn't even an entry that matches the path!
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.
The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.
This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3) 0.37(0.28+0.07) 0.36(0.27+0.06) -2.7%
2000.11: git add . (full-index-v4) 0.33(0.26+0.06) 0.32(0.28+0.05) -3.0%
2000.12: git add . (sparse-index-v3) 0.57(0.53+0.07) 0.06(0.06+0.07) -89.5%
2000.13: git add . (sparse-index-v4) 0.57(0.53+0.07) 0.05(0.03+0.09) -91.2%
While the ~90% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 0.33s to 0.05s, an 85% improvement.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.
Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.
We can measure the improvement using p2000-sparse-operations.sh with
these results:
Test HEAD~1 HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3) 0.35(0.30+0.05) 0.37(0.29+0.06) +5.7%
2000.7: git add -A (full-index-v4) 0.31(0.26+0.06) 0.33(0.27+0.06) +6.5%
2000.8: git add -A (sparse-index-v3) 0.57(0.53+0.07) 0.05(0.04+0.08) -91.2%
2000.9: git add -A (sparse-index-v4) 0.58(0.55+0.06) 0.05(0.05+0.06) -91.4%
While the 91% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
0.37s to 0.05s, which is "only" an 86% improvement.
This modification to 'git add' creates some behavior change depending on
the use of a sparse index. We modify a test in t1092 to demonstrate
these changes which will be remedied in future changes.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Conflicts can occur outside of the sparse-checkout definition, and in
that case users might try to resolve the conflicts in several ways.
Document a few of these ways in a test. Make it clear that this behavior
is not necessarily the optimal flow, since users can become confused
when Git deletes these files from the worktree in later commands.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When running unpack_trees() with a sparse index, we attempt to operate
on the index without expanding the sparse directory entries. Thus, we
operate by manipulating entire directories and passing them to the
unpack function. In the case of the 'git checkout' command, this is the
twoway_merge() function.
There are several cases in twoway_merge() that handle different
situations. One new one to add is the case of a directory/file conflict
where the directory is sparse. Before the sparse index, such a conflict
would appear as a list of file additions and deletions. Now,
twoway_merge() initializes 'current', 'oldtree', and 'newtree' from
src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is
equal to the df_conflict_entry. The way to determine that we have a
directory/file conflict is to test that 'current' and 'newtree' disagree
on being sparse directory entries.
When we are in this case, we want to resolve the situation by calling
merged_entry(). This allows replacing the 'current' entry with the
'newtree' entry. This is important for cases where we want to run 'git
checkout' across the conflict and have the new HEAD represent the new
file type at that path. The first NEEDSWORK comment dropped in t1092
demonstrates this necessary behavior.
However, we still are in a confusing state when 'current' corresponds to
a staged change within a sparse directory that is not present at HEAD.
This should be atypical, because it requires adding a change outside of
the sparse-checkout cone, but it is possible. Since we are unable to
determine that this is a staged change within twoway_merge(), we cannot
add a case to reject the merge at this point. I believe this is due to
the use of df_conflict_entry in the place of 'oldtree' instead of using
the valud at HEAD, which would provide some perspective to this
decision. Any change that would allow this differentiation for staged
entries would need to involve information further up in unpack_trees().
That work should be done, sometime, because we are further confusing the
behavior of a directory/file conflict when staging a change in the
directory. The two cases 'checkout behaves oddly with df-conflict-?' in
t1092 demonstrate that even without a sparse-checkout, Git is not
consistent in its behavior. Neither of the two options seems correct,
either. This change makes the sparse-index behave differently than the
typcial sparse-checkout case, but it does match the full checkout
behavior in the df-conflict-2 case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add new branches to the test repo that demonstrate directory/file
conflicts in different ways. Since the directory 'folder1/' has
adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes
searches for 'folder1/' to land in a different place in the index than a
search for 'folder1'. This causes a change in behavior when working with
the df-conflict-1 and df-conflict-2 branches, whose only difference is
that the first uses 'folder1' as the conflict and the other uses
'folder2' which does not have these adjacent files.
We can extend two tests that compare the behavior across different 'git
checkout' commands, and we see already that the behavior will be
different in some cases and not in others. The difference between the
two test loops is that one uses 'git reset --hard' between iterations.
Further, we isolate the behavior of creating a staged change within a
directory and then checking out a branch where that directory is
replaced with a file. A full checkout behaves differently across these
two cases, while a sparse-checkout cone behaves consistently. In both
cases, the behavior is wrong. In one case, the staged change is dropped
entirely. The other case the staged change is kept, replacing the file
at that location, but none of the other files in the directory are kept.
Likely, the correct behavior in this case is to reject the checkout and
report the conflict, leaving HEAD in its previous location. None of the
cases behave this way currently. Use comments to demonstrate that the
tested behavior is only a documentation of the current, incorrect
behavior to ensure we do not _accidentally_ change it. Instead, we would
prefer to change it on purpose with a future change.
At this point, the sparse-index does not handle these 'git checkout'
commands correctly. Or rather, it _does_ reject the 'git checkout' when
we have the staged change, but for the wrong reason. It also rejects the
'git checkout' commands when there is no staged change and we want to
replace a directory with a file. A fix for that unstaged case will
follow in the next change, but that will make the sparse-index agree
with the full checkout case in these documented incorrect behaviors.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Previous changes did the necessary improvements to unpack-trees.c and
diff-lib.c in order to modify a sparse index based on its comparision
with a tree. The only remaining work is to remove some
ensure_full_index() calls and add tests that verify that the index is
not expanded in our interesting cases. Include 'switch' and 'restore' in
these tests, as they share a base implementation with 'checkout'.
Here are the relevant performance results from
p2000-sparse-operations.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------
2000.18: git checkout -f - (full-v3) 0.49(0.43+0.03) 0.47(0.39+0.05) -4.1%
2000.19: git checkout -f - (full-v4) 0.45(0.37+0.06) 0.42(0.37+0.05) -6.7%
2000.20: git checkout -f - (sparse-v3) 0.76(0.71+0.07) 0.04(0.03+0.04) -94.7%
2000.21: git checkout -f - (sparse-v4) 0.75(0.72+0.04) 0.05(0.06+0.04) -93.3%
It is important to compare the full index case to the sparse index case,
as the previous results for the sparse index were inflated by the index
expansion. For index v4, this is an 88% improvement.
On an internal repository with over two million paths at HEAD and a
sparse-checkout definition containing ~60,000 of those paths, 'git
checkout' went from 3.5s to 297ms with this change. The theoretical
optimum where only those ~60,000 paths exist was 275ms, so the extra
sparse directory entries contribute a 22ms overhead.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update 'git commit' to allow using the sparse-index in memory without
expanding to a full one. The only place that had an ensure_full_index()
call was in cache_tree_update(). The recursive algorithm for
update_one() was already updated in 2de37c536 (cache-tree: integrate
with sparse directory entries, 2021-03-03) to handle sparse directory
entries in the index.
Most of this change involves testing different command-line options that
allow specifying which on-disk changes should be included in the commit.
This includes no options (only take currently-staged changes), -a (take
all tracked changes), and --include (take a list of specific changes).
To simplify testing that these options do not expand the index, update
the test that previously verified that 'git status' does not expand the
index with a helper method, ensure_not_expanded().
This allows 'git commit' to operate much faster when the sparse-checkout
cone is much smaller than the full list of files at HEAD.
Here are the relevant lines from p2000-sparse-operations.sh:
Test HEAD~1 HEAD
----------------------------------------------------------------------------------
2000.14: git commit -a -m A (full-v3) 0.35(0.26+0.06) 0.36(0.28+0.07) +2.9%
2000.15: git commit -a -m A (full-v4) 0.32(0.26+0.05) 0.34(0.28+0.06) +6.3%
2000.16: git commit -a -m A (sparse-v3) 0.63(0.59+0.06) 0.04(0.05+0.05) -93.7%
2000.17: git commit -a -m A (sparse-v4) 0.64(0.59+0.08) 0.04(0.04+0.04) -93.8%
It is important to compare the full-index case to the sparse-index case,
so the improvement for index version v4 is actually an 88% improvement in
this synthetic example.
In a real repository with over two million files at HEAD and 60,000
files in the sparse-checkout definition, the time for 'git commit -a'
went from 2.61 seconds to 134ms. I compared this to the result if the
index only contained the paths in the sparse-checkout definition and
found the theoretical optimum to be 120ms, so the out-of-cone paths only
add a 12% overhead.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|