diff options
Diffstat (limited to 'Documentation')
24 files changed, 523 insertions, 28 deletions
diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index 45465bc0c9..1ff6d8e2d2 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -498,7 +498,12 @@ Error Messages - Do not end error messages with a full stop. - - Do not capitalize ("unable to open %s", not "Unable to open %s") + - Do not capitalize the first word, only because it is the first word + in the message ("unable to open %s", not "Unable to open %s"). But + "SHA-3 not supported" is fine, because the reason the first word is + capitalized is not because it is at the beginning of the sentence, + but because the word would be spelled in capital letters even when + it appeared in the middle of the sentence. - Say what the error is first ("cannot open %s", not "%s: cannot open") diff --git a/Documentation/RelNotes/2.32.0.txt b/Documentation/RelNotes/2.32.0.txt index 7c6aabeb1f..3f73411286 100644 --- a/Documentation/RelNotes/2.32.0.txt +++ b/Documentation/RelNotes/2.32.0.txt @@ -54,6 +54,29 @@ UI, Workflows & Features with the interpret-trailers command, this will make it easier to support custom trailers. + * "git clone --reject-shallow" option fails the clone as soon as we + notice that we are cloning from a shallow repository. + + * A configuration variable has been added to force tips of certain + refs to be given a reachability bitmap. + + * "gitweb" learned "e-mail privacy" feature to redact strings that + look like e-mail addresses on various pages. + + * "git apply --3way" has always been "to fall back to 3-way merge + only when straight application fails". Swap the order of falling + back so that 3-way is always attempted first (only when the option + is given, of course) and then straight patch application is used as + a fallback when it fails. + + * "git apply" now takes "--3way" and "--cached" at the same time, and + work and record results only in the index. + + * The command line completion (in contrib/) has learned that + CHERRY_PICK_HEAD is a possible pseudo-ref. + + * Userdiff patterns for "Scheme" has been added. + Performance, Internal Implementation, Development Support etc. @@ -89,6 +112,25 @@ Performance, Internal Implementation, Development Support etc. * CMake update for vsbuild. + * An on-disk reverse-index to map the in-pack location of an object + back to its object name across multiple packfiles is introduced. + + * Generate [ec]tags under $(QUIET_GEN). + + * Clean-up codepaths that implements "git send-email --validate" + option and improves the message from it. + + * The last remnant of gettext-poison has been removed. + + * The test framework has been taught to optionally turn the default + merge strategy to "ort" throughout the system where we use + three-way merges internally, like cherry-pick, rebase etc., + primarily to enhance its test coverage (the strategy has been + available as an explicit "-s ort" choice). + + * A bit of code clean-up and a lot of test clean-up around userdiff + area. + Fixes since v2.31 ----------------- @@ -156,6 +198,38 @@ Fixes since v2.31 easier to understand. (merge ddaf1f62e3 ds/clarify-hashwrite later to maint). + * "git cherry-pick/revert" with or without "--[no-]edit" did not spawn + the editor as expected (e.g. "revert --no-edit" after a conflict + still asked to edit the message), which has been corrected. + (merge 39edfd5cbc en/sequencer-edit-upon-conflict-fix later to maint). + + * "git daemon" has been tightened against systems that take backslash + as directory separator. + (merge 9a7f1ce8b7 rs/daemon-sanitize-dir-sep later to maint). + + * A NULL-dereference bug has been corrected in an error codepath in + "git for-each-ref", "git branch --list" etc. + (merge c685450880 jk/ref-filter-segfault-fix later to maint). + + * Streamline the codepath to fix the UTF-8 encoding issues in the + argv[] and the prefix on macOS. + (merge c7d0e61016 tb/precompose-prefix-simplify later to maint). + + * The command-line completion script (in contrib/) had a couple of + references that would have given a warning under the "-u" (nounset) + option. + (merge c5c0548d79 vs/completion-with-set-u later to maint). + + * When "git pack-objects" makes a literal copy of a part of existing + packfile using the reachability bitmaps, its update to the progress + meter was broken. + (merge 8e118e8490 jk/pack-objects-bitmap-progress-fix later to maint). + + * The dependencies for config-list.h and command-list.h were broken + when the former was split out of the latter, which has been + corrected. + (merge 56550ea718 sg/bugreport-fixes later to maint). + * Other code cleanup, docfix, build fix, etc. (merge f451960708 dl/cat-file-doc-cleanup later to maint). (merge 12604a8d0c sv/t9801-test-path-is-file-cleanup later to maint). @@ -168,3 +242,9 @@ Fixes since v2.31 (merge 2be927f3d1 ab/diff-no-index-tests later to maint). (merge 76593c09bb ab/detox-gettext-tests later to maint). (merge 28e29ee38b jc/doc-format-patch-clarify later to maint). + (merge fc12b6fdde fm/user-manual-use-preface later to maint). + (merge dba94e3a85 cc/test-helper-bloom-usage-fix later to maint). + (merge 61a7660516 hn/reftable-tables-doc-update later to maint). + (merge 81ed96a9b2 jt/fetch-pack-request-fix later to maint). + (merge 151b6c2dd7 jc/doc-do-not-capitalize-clarification later to maint). + (merge 9160068ac6 js/access-nul-emulation-on-windows later to maint). diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 0452db2e67..55287d72e0 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -117,10 +117,13 @@ If in doubt which identifier to use, run `git log --no-merges` on the files you are modifying to see the current conventions. [[summary-section]] -It's customary to start the remainder of the first line after "area: " -with a lower-case letter. E.g. "doc: clarify...", not "doc: -Clarify...", or "githooks.txt: improve...", not "githooks.txt: -Improve...". +The title sentence after the "area:" prefix omits the full stop at the +end, and its first word is not capitalized unless there is a reason to +capitalize it other than because it is the first word in the sentence. +E.g. "doc: clarify...", not "doc: Clarify...", or "githooks.txt: +improve...", not "githooks.txt: Improve...". But "refs: HEAD is also +treated as a ref" is correct, as we spell `HEAD` in all caps even when +it appears in the middle of a sentence. [[meaningful-message]] The body should provide a meaningful commit message, which: diff --git a/Documentation/config/clone.txt b/Documentation/config/clone.txt index 47de36a5fe..7bcfbd18a5 100644 --- a/Documentation/config/clone.txt +++ b/Documentation/config/clone.txt @@ -2,3 +2,7 @@ clone.defaultRemoteName:: The name of the remote to create when cloning a repository. Defaults to `origin`, and can be overridden by passing the `--origin` command-line option to linkgit:git-clone[1]. + +clone.rejectShallow:: + Reject to clone a repository if it is a shallow one, can be overridden by + passing option `--reject-shallow` in command line. See linkgit:git-clone[1] diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt index 7cb50b37e9..75f3a2d105 100644 --- a/Documentation/config/index.txt +++ b/Documentation/config/index.txt @@ -14,6 +14,11 @@ index.recordOffsetTable:: Defaults to 'true' if index.threads has been explicitly enabled, 'false' otherwise. +index.sparse:: + When enabled, write the index using sparse-directory entries. This + has no effect unless `core.sparseCheckout` and + `core.sparseCheckoutCone` are both enabled. Defaults to 'false'. + index.threads:: Specifies the number of threads to spawn when loading the index. This is meant to reduce index load time on multiprocessor machines. diff --git a/Documentation/config/log.txt b/Documentation/config/log.txt index 208d5fdcaa..456eb07800 100644 --- a/Documentation/config/log.txt +++ b/Documentation/config/log.txt @@ -24,6 +24,11 @@ log.excludeDecoration:: the config option can be overridden by the `--decorate-refs` option. +log.diffMerges:: + Set default diff format to be used for merge commits. See + `--diff-merges` in linkgit:git-log[1] for details. + Defaults to `separate`. + log.follow:: If `true`, `git log` will act as if the `--follow` option was used when a single <path> is given. This has the same limitations as `--follow`, diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt index 3da4ea98e2..c0844d8d8e 100644 --- a/Documentation/config/pack.txt +++ b/Documentation/config/pack.txt @@ -122,6 +122,21 @@ pack.useSparse:: commits contain certain types of direct renames. Default is `true`. +pack.preferBitmapTips:: + When selecting which commits will receive bitmaps, prefer a + commit at the tip of any reference that is a suffix of any value + of this configuration over any other commits in the "selection + window". ++ +Note that setting this configuration to `refs/foo` does not mean that +the commits at the tips of `refs/foo/bar` and `refs/foo/baz` will +necessarily be selected. This is because commits are selected for +bitmaps from within a series of windows of variable length. ++ +If a commit at the tip of any reference which is a suffix of any value +of this configuration is seen in a window, it is immediately given +preference over any other commit in that window. + pack.writeBitmaps (deprecated):: This is a deprecated synonym for `repack.writeBitmaps`. diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index aa2b5c11f2..6d968b9012 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -34,7 +34,7 @@ endif::git-diff[] endif::git-format-patch[] ifdef::git-log[] ---diff-merges=(off|none|first-parent|1|separate|m|combined|c|dense-combined|cc):: +--diff-merges=(off|none|on|first-parent|1|separate|m|combined|c|dense-combined|cc):: --no-diff-merges:: Specify diff format to be used for merge commits. Default is {diff-merges-default} unless `--first-parent` is in use, in which case @@ -45,17 +45,24 @@ ifdef::git-log[] Disable output of diffs for merge commits. Useful to override implied value. + +--diff-merges=on::: +--diff-merges=m::: +-m::: + This option makes diff output for merge commits to be shown in + the default format. `-m` will produce the output only if `-p` + is given as well. The default format could be changed using + `log.diffMerges` configuration parameter, which default value + is `separate`. ++ --diff-merges=first-parent::: --diff-merges=1::: This option makes merge commits show the full diff with respect to the first parent only. + --diff-merges=separate::: ---diff-merges=m::: --m::: This makes merge commits show the full diff with respect to each of the parents. Separate log entry and diff is generated - for each parent. `-m` doesn't produce any output without `-p`. + for each parent. + --diff-merges=combined::: --diff-merges=c::: diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt index 07783deee3..9e7b4e189c 100644 --- a/Documentation/fetch-options.txt +++ b/Documentation/fetch-options.txt @@ -110,6 +110,11 @@ ifndef::git-pull[] setting `fetch.writeCommitGraph`. endif::git-pull[] +--prefetch:: + Modify the configured refspec to place all refs into the + `refs/prefetch/` namespace. See the `prefetch` task in + linkgit:git-maintenance[1]. + -p:: --prune:: Before fetching, remove any remote-tracking references that no diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt index 91d9a8601c..aa1ae56a25 100644 --- a/Documentation/git-apply.txt +++ b/Documentation/git-apply.txt @@ -84,12 +84,13 @@ OPTIONS -3:: --3way:: - When the patch does not apply cleanly, fall back on 3-way merge if - the patch records the identity of blobs it is supposed to apply to, - and we have those blobs available locally, possibly leaving the + Attempt 3-way merge if the patch records the identity of blobs it is supposed + to apply to and we have those blobs available locally, possibly leaving the conflict markers in the files in the working tree for the user to - resolve. This option implies the `--index` option, and is incompatible - with the `--reject` and the `--cached` options. + resolve. This option implies the `--index` option unless the + `--cached` option is used, and is incompatible with the `--reject` option. + When used with the `--cached` option, any conflicts are left at higher stages + in the cache. --build-fake-ancestor=<file>:: Newer 'git diff' output has embedded 'index information' diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt index 02d9c19cec..3fe3810f1c 100644 --- a/Documentation/git-clone.txt +++ b/Documentation/git-clone.txt @@ -15,7 +15,7 @@ SYNOPSIS [--dissociate] [--separate-git-dir <git dir>] [--depth <depth>] [--[no-]single-branch] [--no-tags] [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules] - [--[no-]remote-submodules] [--jobs <n>] [--sparse] + [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--[no-]reject-shallow] [--filter=<filter>] [--] <repository> [<directory>] @@ -149,6 +149,11 @@ objects from the source repository into a pack in the cloned repository. --no-checkout:: No checkout of HEAD is performed after the clone is complete. +--[no-]reject-shallow:: + Fail if the source repository is a shallow repository. + The 'clone.rejectShallow' configuration variable can be used to + specify the default. + --bare:: Make a 'bare' Git repository. That is, instead of creating `<directory>` and placing the administrative diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt index 80ddd33ceb..1e738ad398 100644 --- a/Documentation/git-maintenance.txt +++ b/Documentation/git-maintenance.txt @@ -92,10 +92,8 @@ commit-graph:: prefetch:: The `prefetch` task updates the object directory with the latest objects from all registered remotes. For each remote, a `git fetch` - command is run. The refmap is custom to avoid updating local or remote - branches (those in `refs/heads` or `refs/remotes`). Instead, the - remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are - not updated. + command is run. The configured refspec is modified to place all + requested refs within `refs/prefetch/`. Also, tags are not updated. + This is done to avoid disrupting the remote-tracking branches. The end users expect these refs to stay unmoved unless they initiate a fetch. With prefetch diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index eb0caa0439..ffd601bc17 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -9,7 +9,8 @@ git-multi-pack-index - Write and verify multi-pack-indexes SYNOPSIS -------- [verse] -'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand> +'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] + [--preferred-pack=<pack>] <subcommand> DESCRIPTION ----------- @@ -30,7 +31,16 @@ OPTIONS The following subcommands are available: write:: - Write a new MIDX file. + Write a new MIDX file. The following options are available for + the `write` sub-command: ++ +-- + --preferred-pack=<pack>:: + Optionally specify the tie-breaking pack used when + multiple packs contain the same object. If not given, + ties are broken in favor of the pack with the lowest + mtime. +-- verify:: Verify the contents of the MIDX file. diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index a0eeaeb02e..fdcf43f87c 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -45,6 +45,20 @@ To avoid interfering with other worktrees, it first enables the When `--cone` is provided, the `core.sparseCheckoutCone` setting is also set, allowing for better performance with a limited set of patterns (see 'CONE PATTERN SET' below). ++ +Use the `--[no-]sparse-index` option to toggle the use of the sparse +index format. This reduces the size of the index to be more closely +aligned with your sparse-checkout definition. This can have significant +performance advantages for commands such as `git status` or `git add`. +This feature is still experimental. Some commands might be slower with +a sparse index until they are properly integrated with the feature. ++ +**WARNING:** Using a sparse index requires modifying the index in a way +that is not completely understood by external tools. If you have trouble +with this compatibility, then run `git sparse-checkout init --no-sparse-index` +to rewrite your index to not be sparse. Older versions of Git will not +understand the sparse directory entries index extension and may fail to +interact with your repository until it is disabled. 'set':: Write a set of patterns to the sparse-checkout file, as given as diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 0a60472bb5..cfcfa800c2 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -845,6 +845,8 @@ patterns are available: - `rust` suitable for source code in the Rust language. +- `scheme` suitable for source code in the Scheme language. + - `tex` suitable for source code for LaTeX documents. diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt index 7963a79ba9..34b1d6e224 100644 --- a/Documentation/gitweb.conf.txt +++ b/Documentation/gitweb.conf.txt @@ -751,6 +751,17 @@ default font sizes or lineheights are changed (e.g. via adding extra CSS stylesheet in `@stylesheets`), it may be appropriate to change these values. +email-privacy:: + Redact e-mail addresses from the generated HTML, etc. content. + This obscures e-mail addresses retrieved from the author/committer + and comment sections of the Git log. + It is meant to hinder web crawlers that harvest and abuse addresses. + Such crawlers may not respect robots.txt. + Note that users and user tools also see the addresses as redacted. + If Gitweb is not the final step in a workflow then subsequent steps + may misbehave because of the redacted information they receive. + Disabled by default. + highlight:: Server-side syntax highlight support in "blob" view. It requires `$highlight_bin` program to be available (see the description of diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt index ceeedd485c..8be4f4d0d6 100644 --- a/Documentation/technical/api-error-handling.txt +++ b/Documentation/technical/api-error-handling.txt @@ -1,8 +1,11 @@ Error reporting in git ====================== -`die`, `usage`, `error`, and `warning` report errors of various -kinds. +`BUG`, `die`, `usage`, `error`, and `warning` report errors of +various kinds. + +- `BUG` is for failed internal assertions that should never happen, + i.e. a bug in git itself. - `die` is for fatal application errors. It prints a message to the user and exits with status 128. @@ -20,6 +23,9 @@ kinds. without running into too many problems. Like `error`, it returns -1 after reporting the situation to the caller. +These reports will be logged via the trace2 facility. See the "error" +event in link:api-trace2.txt[trace2 API]. + Customizable error handlers --------------------------- diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt index c65ffafc48..3f52f981a2 100644 --- a/Documentation/technical/api-trace2.txt +++ b/Documentation/technical/api-trace2.txt @@ -465,7 +465,7 @@ completed.) ------------ `"error"`:: - This event is emitted when one of the `error()`, `die()`, + This event is emitted when one of the `BUG()`, `error()`, `die()`, `warning()`, or `usage()` functions are called. + ------------ diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index d363a71c37..65da0daaa5 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -44,6 +44,13 @@ Git index format localization, no special casing of directory separator '/'). Entries with the same name are sorted by their stage field. + An index entry typically represents a file. However, if sparse-checkout + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. + These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed this is stat(2) data @@ -385,3 +392,15 @@ The remaining data of each directory block is grouped by type: in this block of entries. - 32-bit count of cache entries in this block + +== Sparse Directory Entries + + When using sparse-checkout in cone mode, some entire directories within + the index can be summarized by pointing to a tree object instead of the + entire expanded list of paths within that tree. An index containing such + entries is a "sparse index". Index format versions 4 and less were not + implemented with such entries in mind. Thus, for these versions, an + index containing sparse directory entries will include this extension + with signature { 's', 'd', 'i', 'r' }. Like the split-index extension, + tools should avoid interacting with a sparse index unless they understand + this extension. diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt index e8e377a59f..fb688976c4 100644 --- a/Documentation/technical/multi-pack-index.txt +++ b/Documentation/technical/multi-pack-index.txt @@ -43,8 +43,9 @@ Design Details a change in format. - The MIDX keeps only one record per object ID. If an object appears - in multiple packfiles, then the MIDX selects the copy in the most- - recently modified packfile. + in multiple packfiles, then the MIDX selects the copy in the + preferred packfile, otherwise selecting from the most-recently + modified packfile. - If there exist packfiles in the pack directory not registered in the MIDX, then those packfiles are loaded into the `packed_git` diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 1faa949bf6..8d2f42f29e 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -379,3 +379,86 @@ CHUNK DATA: TRAILER: Index checksum of the above contents. + +== multi-pack-index reverse indexes + +Similar to the pack-based reverse index, the multi-pack index can also +be used to generate a reverse index. + +Instead of mapping between offset, pack-, and index position, this +reverse index maps between an object's position within the MIDX, and +that object's position within a pseudo-pack that the MIDX describes +(i.e., the ith entry of the multi-pack reverse index holds the MIDX +position of ith object in pseudo-pack order). + +To clarify the difference between these orderings, consider a multi-pack +reachability bitmap (which does not yet exist, but is what we are +building towards here). Each bit needs to correspond to an object in the +MIDX, and so we need an efficient mapping from bit position to MIDX +position. + +One solution is to let bits occupy the same position in the oid-sorted +index stored by the MIDX. But because oids are effectively random, their +resulting reachability bitmaps would have no locality, and thus compress +poorly. (This is the reason that single-pack bitmaps use the pack +ordering, and not the .idx ordering, for the same purpose.) + +So we'd like to define an ordering for the whole MIDX based around +pack ordering, which has far better locality (and thus compresses more +efficiently). We can think of a pseudo-pack created by the concatenation +of all of the packs in the MIDX. E.g., if we had a MIDX with three packs +(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an +ordering of the objects like: + + |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19| + +where the ordering of the packs is defined by the MIDX's pack list, +and then the ordering of objects within each pack is the same as the +order in the actual packfile. + +Given the list of packs and their counts of objects, you can +naïvely reconstruct that pseudo-pack ordering (e.g., the object at +position 27 must be (c,1) because packs "a" and "b" consumed 25 of the +slots). But there's a catch. Objects may be duplicated between packs, in +which case the MIDX only stores one pointer to the object (and thus we'd +want only one slot in the bitmap). + +Callers could handle duplicates themselves by reading objects in order +of their bit-position, but that's linear in the number of objects, and +much too expensive for ordinary bitmap lookups. Building a reverse index +solves this, since it is the logical inverse of the index, and that +index has already removed duplicates. But, building a reverse index on +the fly can be expensive. Since we already have an on-disk format for +pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack, +too. + +Objects from the MIDX are ordered as follows to string together the +pseudo-pack. Let `pack(o)` return the pack from which `o` was selected +by the MIDX, and define an ordering of packs based on their numeric ID +(as stored by the MIDX). Let `offset(o)` return the object offset of `o` +within `pack(o)`. Then, compare `o1` and `o2` as follows: + + - If one of `pack(o1)` and `pack(o2)` is preferred and the other + is not, then the preferred one sorts first. ++ +(This is a detail that allows the MIDX bitmap to determine which +pack should be used by the pack-reuse mechanism, since it can ask +the MIDX for the pack containing the object at bit position 0). + + - If `pack(o1) ≠ pack(o2)`, then sort the two objects in descending + order based on the pack ID. + + - Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in + pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1) + < offset(o2)`). + +In short, a MIDX's pseudo-pack is the de-duplicated concatenation of +objects in packs stored by the MIDX, laid out in pack order, and the +packs arranged in MIDX order (with the preferred pack coming first). + +Finally, note that the MIDX's reverse index is not stored as a chunk in +the multi-pack-index itself. This is done because the reverse index +includes the checksum of the pack or MIDX to which it belongs, which +makes it impossible to write in the MIDX. To avoid races when rewriting +the MIDX, a MIDX reverse index includes the MIDX's checksum in its +filename (e.g., `multi-pack-index-xyz.rev`). diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt index 3ef169af27..d7c3b645cf 100644 --- a/Documentation/technical/reftable.txt +++ b/Documentation/technical/reftable.txt @@ -1011,8 +1011,13 @@ reftable stack, reload `tables.list`, and delete any tables no longer mentioned in `tables.list`. Irregular program exit may still leave about unused files. In this case, a -cleanup operation can read `tables.list`, note its modification timestamp, and -delete any unreferenced `*.ref` files that are older. +cleanup operation should proceed as follows: + +* take a lock `tables.list.lock` to prevent concurrent modifications +* refresh the reftable stack, by reading `tables.list` +* for each `*.ref` file, remove it if +** it is not mentioned in `tables.list`, and +** its max update_index is not beyond the max update_index of the stack Alternatives considered diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt new file mode 100644 index 0000000000..3b24c1a219 --- /dev/null +++ b/Documentation/technical/sparse-index.txt @@ -0,0 +1,208 @@ +Git Sparse-Index Design Document +================================ + +The sparse-checkout feature allows users to focus a working directory on +a subset of the files at HEAD. The cone mode patterns, enabled by +`core.sparseCheckoutCone`, allow for very fast pattern matching to +discover which files at HEAD belong in the sparse-checkout cone. + +Three important scale dimensions for a Git working directory are: + +* `HEAD`: How many files are present at `HEAD`? + +* Populated: How many files are within the sparse-checkout cone. + +* Modified: How many files has the user modified in the working directory? + +We will use big-O notation -- O(X) -- to denote how expensive certain +operations are in terms of these dimensions. + +These dimensions are ordered by their magnitude: users (typically) modify +fewer files than are populated, and we can only populate files at `HEAD`. + +Problems occur if there is an extreme imbalance in these dimensions. For +example, if `HEAD` contains millions of paths but the populated set has +only tens of thousands, then commands like `git status` and `git add` can +be dominated by operations that require O(`HEAD`) operations instead of +O(Populated). Primarily, the cost is in parsing and rewriting the index, +which is filled primarily with files at `HEAD` that are marked with the +`SKIP_WORKTREE` bit. + +The sparse-index intends to take these commands that read and modify the +index from O(`HEAD`) to O(Populated). To do this, we need to modify the +index format in a significant way: add "sparse directory" entries. + +With cone mode patterns, it is possible to detect when an entire +directory will have its contents outside of the sparse-checkout definition. +Instead of listing all of the files it contains as individual entries, a +sparse-index contains an entry with the directory name, referencing the +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. +If we need to discover the details for paths within that directory, we +can parse trees to find that list. + +At time of writing, sparse-directory entries violate expectations about the +index format and its in-memory data structure. There are many consumers in +the codebase that expect to iterate through all of the index entries and +see only files. In fact, these loops expect to see a reference to every +staged file. One way to handle this is to parse trees to replace a +sparse-directory entry with all of the files within that tree as the index +is loaded. However, parsing trees is slower than parsing the index format, +so that is a slower operation than if we left the index alone. The plan is +to make all of these integrations "sparse aware" so this expansion through +tree parsing is unnecessary and they use fewer resources than when using a +full index. + +The implementation plan below follows four phases to slowly integrate with +the sparse-index. The intention is to incrementally update Git commands to +interact safely with the sparse-index without significant slowdowns. This +may not always be possible, but the hope is that the primary commands that +users need in their daily work are dramatically improved. + +Phase I: Format and initial speedups +------------------------------------ + +During this phase, Git learns to enable the sparse-index and safely parse +one. Protections are put in place so that every consumer of the in-memory +data structure can operate with its current assumption of every file at +`HEAD`. + +At first, every index parse will call a helper method, +`ensure_full_index()`, which scans the index for sparse-directory entries +(pointing to trees) and replaces them with the full list of paths (with +blob contents) by parsing tree objects. This will be slower in all cases. +The only noticeable change in behavior will be that the serialized index +file contains sparse-directory entries. + +To start, we use a new required index extension, `sdir`, to allow +inserting sparse-directory entries into indexes with file format +versions 2, 3, and 4. This prevents Git versions that do not understand +the sparse-index from operating on one, while allowing tools that do not +understand the sparse-index to operate on repositories as long as they do +not interact with the index. A new format, index v5, will be introduced +that includes sparse-directory entries by default. It might also +introduce other features that have been considered for improving the +index, as well. + +Next, consumers of the index will be guarded against operating on a +sparse-index by inserting calls to `ensure_full_index()` or +`expand_index_to_path()`. If a specific path is requested, then those will +be protected from within the `index_file_exists()` and `index_name_pos()` +API calls: they will call `ensure_full_index()` if necessary. The +intention here is to preserve existing behavior when interacting with a +sparse-checkout. We don't want a change to happen by accident, without +tests. Many of these locations may not need any change before removing the +guards, but we should not do so without tests to ensure the expected +behavior happens. + +It may be desirable to _change_ the behavior of some commands in the +presence of a sparse index or more generally in any sparse-checkout +scenario. In such cases, these should be carefully communicated and +tested. No such behavior changes are intended during this phase. + +During a scan of the codebase, not every iteration of the cache entries +needs an `ensure_full_index()` check. The basic reasons include: + +1. The loop is scanning for entries with non-zero stage. These entries + are not collapsed into a sparse-directory entry. + +2. The loop is scanning for submodules. These entries are not collapsed + into a sparse-directory entry. + +3. The loop is part of the index API, especially around reading or + writing the format. + +4. The loop is checking for correct order of cache entries and that is + correct if and only if the sparse-directory entries are in the correct + location. + +5. The loop ignores entries with the `SKIP_WORKTREE` bit set, or is + otherwise already aware of sparse directory entries. + +6. The sparse-index is disabled at this point when using the split-index + feature, so no effort is made to protect the split-index API. + +Even after inserting these guards, we will keep expanding sparse-indexes +for most Git commands using the `command_requires_full_index` repository +setting. This setting will be on by default and disabled one builtin at a +time until we have sufficient confidence that all of the index operations +are properly guarded. + +To complete this phase, the commands `git status` and `git add` will be +integrated with the sparse-index so that they operate with O(Populated) +performance. They will be carefully tested for operations within and +outside the sparse-checkout definition. + +Phase II: Careful integrations +------------------------------ + +This phase focuses on ensuring that all index extensions and APIs work +well with a sparse-index. This requires significant increases to our test +coverage, especially for operations that interact with the working +directory outside of the sparse-checkout definition. Some of these +behaviors may not be the desirable ones, such as some tests already +marked for failure in `t1092-sparse-checkout-compatibility.sh`. + +The index extensions that may require special integrations are: + +* FS Monitor +* Untracked cache + +While integrating with these features, we should look for patterns that +might lead to better APIs for interacting with the index. Coalescing +common usage patterns into an API call can reduce the number of places +where sparse-directories need to be handled carefully. + +Phase III: Important command speedups +------------------------------------- + +At this point, the patterns for testing and implementing sparse-directory +logic should be relatively stable. This phase focuses on updating some of +the most common builtins that use the index to operate as O(Populated). +Here is a potential list of commands that could be valuable to integrate +at this point: + +* `git commit` +* `git checkout` +* `git merge` +* `git rebase` + +Hopefully, commands such as `git merge` and `git rebase` can benefit +instead from merge algorithms that do not use the index as a data +structure, such as the merge-ORT strategy. As these topics mature, we +may enable the ORT strategy by default for repositories using the +sparse-index feature. + +Along with `git status` and `git add`, these commands cover the majority +of users' interactions with the working directory. In addition, we can +integrate with these commands: + +* `git grep` +* `git rm` + +These have been proposed as some whose behavior could change when in a +repo with a sparse-checkout definition. It would be good to include this +behavior automatically when using a sparse-index. Some clarity is needed +to make the behavior switch clear to the user. + +This phase is the first where parallel work might be possible without too +much conflicts between topics. + +Phase IV: The long tail +----------------------- + +This last phase is less a "phase" and more "the new normal" after all of +the previous work. + +To start, the `command_requires_full_index` option could be removed in +favor of expanding only when hitting an API guard. + +There are many Git commands that could use special attention to operate as +O(Populated), while some might be so rare that it is acceptable to leave +them with additional overhead when a sparse-index is present. + +Here are some commands that might be useful to update: + +* `git sparse-checkout set` +* `git am` +* `git clean` +* `git stash` diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt index fd480b8645..f9e54b8674 100644 --- a/Documentation/user-manual.txt +++ b/Documentation/user-manual.txt @@ -1,5 +1,8 @@ = Git User Manual +[preface] +== Introduction + Git is a fast distributed revision control system. This manual is designed to be readable by someone with basic UNIX |