diff options
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/api-error-handling.txt | 10 | ||||
-rw-r--r-- | Documentation/technical/api-simple-ipc.txt | 105 | ||||
-rw-r--r-- | Documentation/technical/api-trace2.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/chunk-format.txt | 116 | ||||
-rw-r--r-- | Documentation/technical/commit-graph-format.txt | 31 | ||||
-rw-r--r-- | Documentation/technical/commit-graph.txt | 77 | ||||
-rw-r--r-- | Documentation/technical/hash-function-transition.txt | 293 | ||||
-rw-r--r-- | Documentation/technical/index-format.txt | 65 | ||||
-rw-r--r-- | Documentation/technical/multi-pack-index.txt | 5 | ||||
-rw-r--r-- | Documentation/technical/pack-format.txt | 123 | ||||
-rw-r--r-- | Documentation/technical/packfile-uri.txt | 7 | ||||
-rw-r--r-- | Documentation/technical/parallel-checkout.txt | 270 | ||||
-rw-r--r-- | Documentation/technical/protocol-v2.txt | 54 | ||||
-rw-r--r-- | Documentation/technical/reftable.txt | 49 | ||||
-rw-r--r-- | Documentation/technical/sparse-index.txt | 208 |
15 files changed, 1211 insertions, 204 deletions
diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt index ceeedd485c..8be4f4d0d6 100644 --- a/Documentation/technical/api-error-handling.txt +++ b/Documentation/technical/api-error-handling.txt @@ -1,8 +1,11 @@ Error reporting in git ====================== -`die`, `usage`, `error`, and `warning` report errors of various -kinds. +`BUG`, `die`, `usage`, `error`, and `warning` report errors of +various kinds. + +- `BUG` is for failed internal assertions that should never happen, + i.e. a bug in git itself. - `die` is for fatal application errors. It prints a message to the user and exits with status 128. @@ -20,6 +23,9 @@ kinds. without running into too many problems. Like `error`, it returns -1 after reporting the situation to the caller. +These reports will be logged via the trace2 facility. See the "error" +event in link:api-trace2.txt[trace2 API]. + Customizable error handlers --------------------------- diff --git a/Documentation/technical/api-simple-ipc.txt b/Documentation/technical/api-simple-ipc.txt new file mode 100644 index 0000000000..d79ad323e6 --- /dev/null +++ b/Documentation/technical/api-simple-ipc.txt @@ -0,0 +1,105 @@ +Simple-IPC API +============== + +The Simple-IPC API is a collection of `ipc_` prefixed library routines +and a basic communication protocol that allow an IPC-client process to +send an application-specific IPC-request message to an IPC-server +process and receive an application-specific IPC-response message. + +Communication occurs over a named pipe on Windows and a Unix domain +socket on other platforms. IPC-clients and IPC-servers rendezvous at +a previously agreed-to application-specific pathname (which is outside +the scope of this design) that is local to the computer system. + +The IPC-server routines within the server application process create a +thread pool to listen for connections and receive request messages +from multiple concurrent IPC-clients. When received, these messages +are dispatched up to the server application callbacks for handling. +IPC-server routines then incrementally relay responses back to the +IPC-client. + +The IPC-client routines within a client application process connect +to the IPC-server and send a request message and wait for a response. +When received, the response is returned back the caller. + +For example, the `fsmonitor--daemon` feature will be built as a server +application on top of the IPC-server library routines. It will have +threads watching for file system events and a thread pool waiting for +client connections. Clients, such as `git status` will request a list +of file system events since a point in time and the server will +respond with a list of changed files and directories. The formats of +the request and response are application-specific; the IPC-client and +IPC-server routines treat them as opaque byte streams. + + +Comparison with sub-process model +--------------------------------- + +The Simple-IPC mechanism differs from the existing `sub-process.c` +model (Documentation/technical/long-running-process-protocol.txt) and +used by applications like Git-LFS. In the LFS-style sub-process model +the helper is started by the foreground process, communication happens +via a pair of file descriptors bound to the stdin/stdout of the +sub-process, the sub-process only serves the current foreground +process, and the sub-process exits when the foreground process +terminates. + +In the Simple-IPC model the server is a very long-running service. It +can service many clients at the same time and has a private socket or +named pipe connection to each active client. It might be started +(on-demand) by the current client process or it might have been +started by a previous client or by the OS at boot time. The server +process is not associated with a terminal and it persists after +clients terminate. Clients do not have access to the stdin/stdout of +the server process and therefore must communicate over sockets or +named pipes. + + +Server startup and shutdown +--------------------------- + +How an application server based upon IPC-server is started is also +outside the scope of the Simple-IPC design and is a property of the +application using it. For example, the server might be started or +restarted during routine maintenance operations, or it might be +started as a system service during the system boot-up sequence, or it +might be started on-demand by a foreground Git command when needed. + +Similarly, server shutdown is a property of the application using +the simple-ipc routines. For example, the server might decide to +shutdown when idle or only upon explicit request. + + +Simple-IPC protocol +------------------- + +The Simple-IPC protocol consists of a single request message from the +client and an optional response message from the server. Both the +client and server messages are unlimited in length and are terminated +with a flush packet. + +The pkt-line routines (Documentation/technical/protocol-common.txt) +are used to simplify buffer management during message generation, +transmission, and reception. A flush packet is used to mark the end +of the message. This allows the sender to incrementally generate and +transmit the message. It allows the receiver to incrementally receive +the message in chunks and to know when they have received the entire +message. + +The actual byte format of the client request and server response +messages are application specific. The IPC layer transmits and +receives them as opaque byte buffers without any concern for the +content within. It is the job of the calling application layer to +understand the contents of the request and response messages. + + +Summary +------- + +Conceptually, the Simple-IPC protocol is similar to an HTTP REST +request. Clients connect, make an application-specific and +stateless request, receive an application-specific +response, and disconnect. It is a one round trip facility for +querying the server. The Simple-IPC routines hide the socket, +named pipe, and thread pool details and allow the application +layer to focus on the application at hand. diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt index c65ffafc48..3f52f981a2 100644 --- a/Documentation/technical/api-trace2.txt +++ b/Documentation/technical/api-trace2.txt @@ -465,7 +465,7 @@ completed.) ------------ `"error"`:: - This event is emitted when one of the `error()`, `die()`, + This event is emitted when one of the `BUG()`, `error()`, `die()`, `warning()`, or `usage()` functions are called. + ------------ diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt new file mode 100644 index 0000000000..593614fced --- /dev/null +++ b/Documentation/technical/chunk-format.txt @@ -0,0 +1,116 @@ +Chunk-based file formats +======================== + +Some file formats in Git use a common concept of "chunks" to describe +sections of the file. This allows structured access to a large file by +scanning a small "table of contents" for the remaining data. This common +format is used by the `commit-graph` and `multi-pack-index` files. See +link:technical/pack-format.html[the `multi-pack-index` format] and +link:technical/commit-graph-format.html[the `commit-graph` format] for +how they use the chunks to describe structured data. + +A chunk-based file format begins with some header information custom to +that format. That header should include enough information to identify +the file type, format version, and number of chunks in the file. From this +information, that file can determine the start of the chunk-based region. + +The chunk-based region starts with a table of contents describing where +each chunk starts and ends. This consists of (C+1) rows of 12 bytes each, +where C is the number of chunks. Consider the following table: + + | Chunk ID (4 bytes) | Chunk Offset (8 bytes) | + |--------------------|------------------------| + | ID[0] | OFFSET[0] | + | ... | ... | + | ID[C] | OFFSET[C] | + | 0x0000 | OFFSET[C+1] | + +Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset. +Each integer is stored in network-byte order. + +The chunk identifier `ID[i]` is a label for the data stored within this +fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the +size of the `i`th chunk is equal to the difference between `OFFSET[i+1]` +and `OFFSET[i]`. This requires that the chunk data appears contiguously +in the same order as the table of contents. + +The final entry in the table of contents must be four zero bytes. This +confirms that the table of contents is ending and provides the offset for +the end of the chunk-based data. + +Note: The chunk-based format expects that the file contains _at least_ a +trailing hash after `OFFSET[C+1]`. + +Functions for working with chunk-based file formats are declared in +`chunk-format.h`. Using these methods provide extra checks that assist +developers when creating new file formats. + +Writing chunk-based file formats +-------------------------------- + +To write a chunk-based file format, create a `struct chunkfile` by +calling `init_chunkfile()` and pass a `struct hashfile` pointer. The +caller is responsible for opening the `hashfile` and writing header +information so the file format is identifiable before the chunk-based +format begins. + +Then, call `add_chunk()` for each chunk that is intended for write. This +populates the `chunkfile` with information about the order and size of +each chunk to write. Provide a `chunk_write_fn` function pointer to +perform the write of the chunk data upon request. + +Call `write_chunkfile()` to write the table of contents to the `hashfile` +followed by each of the chunks. This will verify that each chunk wrote +the expected amount of data so the table of contents is correct. + +Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The +caller is responsible for finalizing the `hashfile` by writing the trailing +hash and closing the file. + +Reading chunk-based file formats +-------------------------------- + +To read a chunk-based file format, the file must be opened as a +memory-mapped region. The chunk-format API expects that the entire file +is mapped as a contiguous memory region. + +Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`. + +After reading the header information from the beginning of the file, +including the chunk count, call `read_table_of_contents()` to populate +the `struct chunkfile` with the list of chunks, their offsets, and their +sizes. + +Extract the data information for each chunk using `pair_chunk()` or +`read_chunk()`: + +* `pair_chunk()` assigns a given pointer with the location inside the + memory-mapped file corresponding to that chunk's offset. If the chunk + does not exist, then the pointer is not modified. + +* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it + with the appropriate initial pointer and size information. The function + is not called if the chunk does not exist. Use this method to read chunks + if you need to perform immediate parsing or if you need to execute logic + based on the size of the chunk. + +After calling these methods, call `free_chunkfile()` to clear the +`struct chunkfile` data. This will not close the memory-mapped region. +Callers are expected to own that data for the timeframe the pointers into +the region are needed. + +Examples +-------- + +These file formats use the chunk-format API, and can be used as examples +for future formats: + +* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()` + in `commit-graph.c` for how the chunk-format API is used to write and + parse the commit-graph file format documented in + link:technical/commit-graph-format.html[the commit-graph file format]. + +* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()` + in `midx.c` for how the chunk-format API is used to write and + parse the multi-pack-index file format documented in + link:technical/pack-format.html[the multi-pack-index file format]. diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index b3b58880b9..87971c27dd 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -4,11 +4,7 @@ Git commit graph format The Git commit graph stores a list of commit OIDs and some associated metadata, including: -- The generation number of the commit. Commits with no parents have - generation number 1; commits with parents have generation number - one more than the maximum generation number of its parents. We - reserve zero as special, and can be used to mark a generation - number invalid or as "not computed". +- The generation number of the commit. - The root tree OID. @@ -65,6 +61,9 @@ CHUNK LOOKUP: the length using the next chunk position if necessary.) Each chunk ID appears at most once. + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified. @@ -86,13 +85,33 @@ CHUNK DATA: position. If there are more than two parents, the second value has its most-significant bit on and the other bits store an array position into the Extra Edge List chunk. - * The next 8 bytes store the generation number of the commit and + * The next 8 bytes store the topological level (generation number v1) + of the commit and the commit time in seconds since EPOCH. The generation number uses the higher 30 bits of the first 4 bytes, while the commit time uses the 32 bits of the second 4 bytes, along with the lowest 2 bits of the lowest byte, storing the 33rd and 34th bit of the commit time. + Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional] + * This list of 4-byte values store corrected commit date offsets for the + commits, arranged in the same order as commit data chunk. + * If the corrected commit date offset cannot be stored within 31 bits, + the value has its most-significant bit on and the other bits store + the position of corrected commit date into the Generation Data Overflow + chunk. + * Generation Data chunk is present only when commit-graph file is written + by compatible versions of Git and in case of split commit-graph chains, + the topmost layer also has Generation Data chunk. + + Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional] + * This list of 8-byte values stores the corrected commit date offsets + for commits with corrected commit date offsets that cannot be + stored within 31 bits. + * Generation Data Overflow chunk is present only when Generation Data + chunk is present and atleast one corrected commit date offset cannot + be stored within 31 bits. + Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] This list of 4-byte values store the second through nth parents for all octopus merges. The second parent value in the commit data stores diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index f14a7659aa..f05e7bda1a 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph: Values 1-4 satisfy the requirements of parse_commit_gently(). -Define the "generation number" of a commit recursively as follows: +There are two definitions of generation number: +1. Corrected committer dates (generation number v2) +2. Topological levels (generation nummber v1) - * A commit with no parents (a root commit) has generation number one. +Define "corrected committer date" of a commit recursively as follows: - * A commit with at least one parent has generation number one more than - the largest generation number among its parents. + * A commit with no parents (a root commit) has corrected committer date + equal to its committer date. -Equivalently, the generation number of a commit A is one more than the + * A commit with at least one parent has corrected committer date equal to + the maximum of its commiter date and one more than the largest corrected + committer date among its parents. + + * As a special case, a root commit with timestamp zero has corrected commit + date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO + (that is, an uncomputed corrected commit date). + +Define the "topological level" of a commit recursively as follows: + + * A commit with no parents (a root commit) has topological level of one. + + * A commit with at least one parent has topological level one more than + the largest topological level among its parents. + +Equivalently, the topological level of a commit A is one more than the length of a longest path from A to a root commit. The recursive definition is easier to use for computation and observing the following property: @@ -60,6 +77,9 @@ is easier to use for computation and observing the following property: generation numbers, then we always expand the boundary commit with highest generation number and can easily detect the stopping condition. +The property applies to both versions of generation number, that is both +corrected committer dates and topological levels. + This property can be used to significantly reduce the time it takes to walk commits and determine topological relationships. Without generation numbers, the general heuristic is the following: @@ -67,7 +87,9 @@ numbers, the general heuristic is the following: If A and B are commits with commit time X and Y, respectively, and X < Y, then A _probably_ cannot reach B. -This heuristic is currently used whenever the computation is allowed to +In absence of corrected commit dates (for example, old versions of Git or +mixed generation graph chains), +this heuristic is currently used whenever the computation is allowed to violate topological relationships due to clock skew (such as "git log" with default order), but is not used when the topological order is required (such as merge base calculations, "git log --graph"). @@ -77,7 +99,7 @@ in the commit graph. We can treat these commits as having "infinite" generation number and walk until reaching commits with known generation number. -We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not +We use the macro GENERATION_NUMBER_INFINITY to mark commits not in the commit-graph file. If a commit-graph file was written by a version of Git that did not compute generation numbers, then those commits will have generation number represented by the macro GENERATION_NUMBER_ZERO = 0. @@ -93,12 +115,12 @@ fully-computed generation numbers. Using strict inequality may result in walking a few extra commits, but the simplicity in dealing with commits with generation number *_INFINITY or *_ZERO is valuable. -We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose -generation numbers are computed to be at least this value. We limit at -this value since it is the largest value that can be stored in the -commit-graph file using the 30 bits available to generation numbers. This -presents another case where a commit can have generation number equal to -that of a parent. +We use the macro GENERATION_NUMBER_V1_MAX = 0x3FFFFFFF for commits whose +topological levels (generation number v1) are computed to be at least +this value. We limit at this value since it is the largest value that +can be stored in the commit-graph file using the 30 bits available +to topological levels. This presents another case where a commit can +have generation number equal to that of a parent. Design Details -------------- @@ -267,6 +289,35 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. +## Handling Mixed Generation Number Chains + +With the introduction of generation number v2 and generation data chunk, the +following scenario is possible: + +1. "New" Git writes a commit-graph with the corrected commit dates. +2. "Old" Git writes a split commit-graph on top without corrected commit dates. + +A naive approach of using the newest available generation number from +each layer would lead to violated expectations: the lower layer would +use corrected commit dates which are much larger than the topological +levels of the higher layer. For this reason, Git inspects the topmost +layer to see if the layer is missing corrected commit dates. In such a case +Git only uses topological level for generation numbers. + +When writing a new layer in split commit-graph, we write corrected commit +dates if the topmost layer has corrected commit dates written. This +guarantees that if a layer has corrected commit dates, all lower layers +must have corrected commit dates as well. + +When merging layers, we do not consider whether the merged layers had corrected +commit dates. Instead, the new layer will have corrected commit dates if the +layer below the new layer has corrected commit dates. + +While writing or merging layers, if the new layer is the only layer, it will +have corrected commit dates when written by compatible versions of Git. Thus, +rewriting split commit-graph as a single file (`--split=replace`) creates a +single layer with corrected commit dates. + ## Deleting graph-{hash} files After a new tip file is written, some `graph-{hash}` files may no longer diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index 6fd20ebbc2..7c1630bf83 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -33,16 +33,9 @@ researchers. On 23 February 2017 the SHAttered attack Git v2.13.0 and later subsequently moved to a hardened SHA-1 implementation by default, which isn't vulnerable to the SHAttered -attack. +attack, but SHA-1 is still weak. -Thus Git has in effect already migrated to a new hash that isn't SHA-1 -and doesn't share its vulnerabilities, its new hash function just -happens to produce exactly the same output for all known inputs, -except two PDFs published by the SHAttered researchers, and the new -implementation (written by those researchers) claims to detect future -cryptanalytic collision attacks. - -Regardless, it's considered prudent to move past any variant of SHA-1 +Thus it's considered prudent to move past any variant of SHA-1 to a new hash. There's no guarantee that future attacks on SHA-1 won't be published in the future, and those attacks may not have viable mitigations. @@ -57,6 +50,38 @@ SHA-1 still possesses the other properties such as fast object lookup and safe error checking, but other hash functions are equally suitable that are believed to be cryptographically secure. +Choice of Hash +-------------- +The hash to replace the hardened SHA-1 should be stronger than SHA-1 +was: we would like it to be trustworthy and useful in practice for at +least 10 years. + +Some other relevant properties: + +1. A 256-bit hash (long enough to match common security practice; not + excessively long to hurt performance and disk usage). + +2. High quality implementations should be widely available (e.g., in + OpenSSL and Apple CommonCrypto). + +3. The hash function's properties should match Git's needs (e.g. Git + requires collision and 2nd preimage resistance and does not require + length extension resistance). + +4. As a tiebreaker, the hash should be fast to compute (fortunately + many contenders are faster than SHA-1). + +There were several contenders for a successor hash to SHA-1, including +SHA-256, SHA-512/256, SHA-256x16, K12, and BLAKE2bp-256. + +In late 2018 the project picked SHA-256 as its successor hash. + +See 0ed8d8da374 (doc hash-function-transition: pick SHA-256 as +NewHash, 2018-08-04) and numerous mailing list threads at the time, +particularly the one starting at +https://lore.kernel.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/ +for more information. + Goals ----- 1. The transition to SHA-256 can be done one local repository at a time. @@ -94,7 +119,7 @@ Overview -------- We introduce a new repository format extension. Repositories with this extension enabled use SHA-256 instead of SHA-1 to name their objects. -This affects both object names and object content --- both the names +This affects both object names and object content -- both the names of objects and all references to other objects within an object are switched to the new hash function. @@ -107,7 +132,7 @@ mapping to allow naming objects using either their SHA-1 and SHA-256 names interchangeably. "git cat-file" and "git hash-object" gain options to display an object -in its sha1 form and write an object given its sha1 form. This +in its SHA-1 form and write an object given its SHA-1 form. This requires all objects referenced by that object to be present in the object database so that they can be named using the appropriate name (using the bidirectional hash mapping). @@ -115,7 +140,7 @@ object database so that they can be named using the appropriate name Fetches from a SHA-1 based server convert the fetched objects into SHA-256 form and record the mapping in the bidirectional mapping table (see below for details). Pushes to a SHA-1 based server convert the -objects being pushed into sha1 form so the server does not have to be +objects being pushed into SHA-1 form so the server does not have to be aware of the hash function the client is using. Detailed Design @@ -151,38 +176,38 @@ repository extensions. Object names ~~~~~~~~~~~~ -Objects can be named by their 40 hexadecimal digit sha1-name or 64 -hexadecimal digit sha256-name, plus names derived from those (see +Objects can be named by their 40 hexadecimal digit SHA-1 name or 64 +hexadecimal digit SHA-256 name, plus names derived from those (see gitrevisions(7)). -The sha1-name of an object is the SHA-1 of the concatenation of its -type, length, a nul byte, and the object's sha1-content. This is the +The SHA-1 name of an object is the SHA-1 of the concatenation of its +type, length, a nul byte, and the object's SHA-1 content. This is the traditional <sha1> used in Git to name objects. -The sha256-name of an object is the SHA-256 of the concatenation of its -type, length, a nul byte, and the object's sha256-content. +The SHA-256 name of an object is the SHA-256 of the concatenation of its +type, length, a nul byte, and the object's SHA-256 content. Object format ~~~~~~~~~~~~~ The content as a byte sequence of a tag, commit, or tree object named -by sha1 and sha256 differ because an object named by sha256-name refers to -other objects by their sha256-names and an object named by sha1-name -refers to other objects by their sha1-names. +by SHA-1 and SHA-256 differ because an object named by SHA-256 name refers to +other objects by their SHA-256 names and an object named by SHA-1 name +refers to other objects by their SHA-1 names. -The sha256-content of an object is the same as its sha1-content, except -that objects referenced by the object are named using their sha256-names -instead of sha1-names. Because a blob object does not refer to any -other object, its sha1-content and sha256-content are the same. +The SHA-256 content of an object is the same as its SHA-1 content, except +that objects referenced by the object are named using their SHA-256 names +instead of SHA-1 names. Because a blob object does not refer to any +other object, its SHA-1 content and SHA-256 content are the same. -The format allows round-trip conversion between sha256-content and -sha1-content. +The format allows round-trip conversion between SHA-256 content and +SHA-1 content. Object storage ~~~~~~~~~~~~~~ Loose objects use zlib compression and packed objects use the packed format described in Documentation/technical/pack-format.txt, just like -today. The content that is compressed and stored uses sha256-content -instead of sha1-content. +today. The content that is compressed and stored uses SHA-256 content +instead of SHA-1 content. Pack index ~~~~~~~~~~ @@ -191,21 +216,21 @@ hash functions. They have the following format (all integers are in network byte order): - A header appears at the beginning and consists of the following: - - The 4-byte pack index signature: '\377t0c' - - 4-byte version number: 3 - - 4-byte length of the header section, including the signature and + * The 4-byte pack index signature: '\377t0c' + * 4-byte version number: 3 + * 4-byte length of the header section, including the signature and version number - - 4-byte number of objects contained in the pack - - 4-byte number of object formats in this pack index: 2 - - For each object format: - - 4-byte format identifier (e.g., 'sha1' for SHA-1) - - 4-byte length in bytes of shortened object names. This is the + * 4-byte number of objects contained in the pack + * 4-byte number of object formats in this pack index: 2 + * For each object format: + ** 4-byte format identifier (e.g., 'sha1' for SHA-1) + ** 4-byte length in bytes of shortened object names. This is the shortest possible length needed to make names in the shortened object name table unambiguous. - - 4-byte integer, recording where tables relating to this format + ** 4-byte integer, recording where tables relating to this format are stored in this index file, as an offset from the beginning. - - 4-byte offset to the trailer from the beginning of this file. - - Zero or more additional key/value pairs (4-byte key, 4-byte + * 4-byte offset to the trailer from the beginning of this file. + * Zero or more additional key/value pairs (4-byte key, 4-byte value). Only one key is supported: 'PSRC'. See the "Loose objects and unreachable objects" section for supported values and how this is used. All other keys are reserved. Readers must ignore @@ -213,37 +238,36 @@ network byte order): - Zero or more NUL bytes. This can optionally be used to improve the alignment of the full object name table below. - Tables for the first object format: - - A sorted table of shortened object names. These are prefixes of + * A sorted table of shortened object names. These are prefixes of the names of all objects in this pack file, packed together without offset values to reduce the cache footprint of the binary search for a specific object name. - - A table of full object names in pack order. This allows resolving + * A table of full object names in pack order. This allows resolving a reference to "the nth object in the pack file" (from a reachability bitmap or from the next table of another object format) to its object name. - - A table of 4-byte values mapping object name order to pack order. + * A table of 4-byte values mapping object name order to pack order. For an object in the table of sorted shortened object names, the value at the corresponding index in this table is the index in the previous table for that same object. - This can be used to look up the object in reachability bitmaps or to look up its name in another object format. - - A table of 4-byte CRC32 values of the packed object data, in the + * A table of 4-byte CRC32 values of the packed object data, in the order that the objects appear in the pack file. This is to allow compressed data to be copied directly from pack to pack during repacking without undetected data corruption. - - A table of 4-byte offset values. For an object in the table of + * A table of 4-byte offset values. For an object in the table of sorted shortened object names, the value at the corresponding index in this table indicates where that object can be found in the pack file. These are usually 31-bit pack file offsets, but large offsets are encoded as an index into the next table with the most significant bit set. - - A table of 8-byte offset entries (empty for pack files less than + * A table of 8-byte offset entries (empty for pack files less than 2 GiB). Pack files are organized with heavily used objects toward the front, so most object references should not need to refer to this table. @@ -252,10 +276,10 @@ network byte order): up to and not including the table of CRC32 values. - Zero or more NUL bytes. - The trailer consists of the following: - - A copy of the 20-byte SHA-256 checksum at the end of the + * A copy of the 20-byte SHA-256 checksum at the end of the corresponding packfile. - - 20-byte SHA-256 checksum of all of the above. + * 20-byte SHA-256 checksum of all of the above. Loose object index ~~~~~~~~~~~~~~~~~~ @@ -288,18 +312,18 @@ To remove entries (e.g. in "git pack-refs" or "git-prune"): Translation table ~~~~~~~~~~~~~~~~~ -The index files support a bidirectional mapping between sha1-names -and sha256-names. The lookup proceeds similarly to ordinary object -lookups. For example, to convert a sha1-name to a sha256-name: +The index files support a bidirectional mapping between SHA-1 names +and SHA-256 names. The lookup proceeds similarly to ordinary object +lookups. For example, to convert a SHA-1 name to a SHA-256 name: 1. Look for the object in idx files. If a match is present in the - idx's sorted list of truncated sha1-names, then: - a. Read the corresponding entry in the sha1-name order to pack + idx's sorted list of truncated SHA-1 names, then: + a. Read the corresponding entry in the SHA-1 name order to pack name order mapping. - b. Read the corresponding entry in the full sha1-name table to + b. Read the corresponding entry in the full SHA-1 name table to verify we found the right object. If it is, then - c. Read the corresponding entry in the full sha256-name table. - That is the object's sha256-name. + c. Read the corresponding entry in the full SHA-256 name table. + That is the object's SHA-256 name. 2. Check for a loose object. Read lines from loose-object-idx until we find a match. @@ -313,10 +337,10 @@ Since all operations that make new objects (e.g., "git commit") add the new objects to the corresponding index, this mapping is possible for all objects in the object store. -Reading an object's sha1-content -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The sha1-content of an object can be read by converting all sha256-names -its sha256-content references to sha1-names using the translation table. +Reading an object's SHA-1 content +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The SHA-1 content of an object can be read by converting all SHA-256 names +of its SHA-256 content references to SHA-1 names using the translation table. Fetch ~~~~~ @@ -339,7 +363,7 @@ the following steps: 1. index-pack: inflate each object in the packfile and compute its SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against objects the client has locally. These objects can be looked up - using the translation table and their sha1-content read as + using the translation table and their SHA-1 content read as described above to resolve the deltas. 2. topological sort: starting at the "want"s from the negotiation phase, walk through objects in the pack and emit a list of them, @@ -348,12 +372,12 @@ the following steps: (This list only contains objects reachable from the "wants". If the pack from the server contained additional extraneous objects, then they will be discarded.) -3. convert to sha256: open a new (sha256) packfile. Read the topologically +3. convert to SHA-256: open a new SHA-256 packfile. Read the topologically sorted list just generated. For each object, inflate its - sha1-content, convert to sha256-content, and write it to the sha256 - pack. Record the new sha1<->sha256 mapping entry for use in the idx. + SHA-1 content, convert to SHA-256 content, and write it to the SHA-256 + pack. Record the new SHA-1<-->SHA-256 mapping entry for use in the idx. 4. sort: reorder entries in the new pack to match the order of objects - in the pack the server generated and include blobs. Write a sha256 idx + in the pack the server generated and include blobs. Write a SHA-256 idx file 5. clean up: remove the SHA-1 based pack file, index, and topologically sorted list obtained from the server in steps 1 @@ -378,19 +402,20 @@ experimenting to get this to perform well. Push ~~~~ Push is simpler than fetch because the objects referenced by the -pushed objects are already in the translation table. The sha1-content +pushed objects are already in the translation table. The SHA-1 content of each object being pushed can be read as described in the "Reading -an object's sha1-content" section to generate the pack written by git +an object's SHA-1 content" section to generate the pack written by git send-pack. Signed Commits ~~~~~~~~~~~~~~ We add a new field "gpgsig-sha256" to the commit object format to allow signing commits without relying on SHA-1. It is similar to the -existing "gpgsig" field. Its signed payload is the sha256-content of the +existing "gpgsig" field. Its signed payload is the SHA-256 content of the commit object with any "gpgsig" and "gpgsig-sha256" fields removed. This means commits can be signed + 1. using SHA-1 only, as in existing signed commit objects 2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig fields. @@ -404,10 +429,11 @@ Signed Tags ~~~~~~~~~~~ We add a new field "gpgsig-sha256" to the tag object format to allow signing tags without relying on SHA-1. Its signed payload is the -sha256-content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP +SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed. This means tags can be signed + 1. using SHA-1 only, as in existing signed tag objects 2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body signature. @@ -415,11 +441,11 @@ This means tags can be signed Mergetag embedding ~~~~~~~~~~~~~~~~~~ -The mergetag field in the sha1-content of a commit contains the -sha1-content of a tag that was merged by that commit. +The mergetag field in the SHA-1 content of a commit contains the +SHA-1 content of a tag that was merged by that commit. -The mergetag field in the sha256-content of the same commit contains the -sha256-content of the same tag. +The mergetag field in the SHA-256 content of the same commit contains the +SHA-256 content of the same tag. Submodules ~~~~~~~~~~ @@ -494,7 +520,7 @@ Caveats ------- Invalid objects ~~~~~~~~~~~~~~~ -The conversion from sha1-content to sha256-content retains any +The conversion from SHA-1 content to SHA-256 content retains any brokenness in the original object (e.g., tree entry modes encoded with leading 0, tree objects whose paths are not sorted correctly, and commit objects without an author or committer). This is a deliberate @@ -513,15 +539,15 @@ allow lifting this restriction. Alternates ~~~~~~~~~~ -For the same reason, a sha256 repository cannot borrow objects from a -sha1 repository using objects/info/alternates or +For the same reason, a SHA-256 repository cannot borrow objects from a +SHA-1 repository using objects/info/alternates or $GIT_ALTERNATE_OBJECT_REPOSITORIES. git notes ~~~~~~~~~ -The "git notes" tool annotates objects using their sha1-name as key. +The "git notes" tool annotates objects using their SHA-1 name as key. This design does not describe a way to migrate notes trees to use -sha256-names. That migration is expected to happen separately (for +SHA-256 names. That migration is expected to happen separately (for example using a file at the root of the notes tree to describe which hash it uses). @@ -555,7 +581,7 @@ unclear: Git 2.12 -Does this mean Git v2.12.0 is the commit with sha1-name +Does this mean Git v2.12.0 is the commit with SHA-1 name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7? @@ -598,44 +624,12 @@ The user can also explicitly specify which format to use for a particular revision specifier and for output, overriding the mode. For example: -git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256} - -Choice of Hash --------------- -In early 2005, around the time that Git was written, Xiaoyun Wang, -Yiqun Lisa Yin, and Hongbo Yu announced an attack finding SHA-1 -collisions in 2^69 operations. In August they published details. -Luckily, no practical demonstrations of a collision in full SHA-1 were -published until 10 years later, in 2017. - -Git v2.13.0 and later subsequently moved to a hardened SHA-1 -implementation by default that mitigates the SHAttered attack, but -SHA-1 is still believed to be weak. - -The hash to replace this hardened SHA-1 should be stronger than SHA-1 -was: we would like it to be trustworthy and useful in practice for at -least 10 years. - -Some other relevant properties: - -1. A 256-bit hash (long enough to match common security practice; not - excessively long to hurt performance and disk usage). - -2. High quality implementations should be widely available (e.g., in - OpenSSL and Apple CommonCrypto). - -3. The hash function's properties should match Git's needs (e.g. Git - requires collision and 2nd preimage resistance and does not require - length extension resistance). - -4. As a tiebreaker, the hash should be fast to compute (fortunately - many contenders are faster than SHA-1). - -We choose SHA-256. + git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256} Transition plan --------------- Some initial steps can be implemented independently of one another: + - adding a hash function API (vtable) - teaching fsck to tolerate the gpgsig-sha256 field - excluding gpgsig-* from the fields copied by "git commit --amend" @@ -647,9 +641,9 @@ Some initial steps can be implemented independently of one another: - introducing index v3 - adding support for the PSRC field and safer object pruning - The first user-visible change is the introduction of the objectFormat extension (without compatObjectFormat). This requires: + - teaching fsck about this mode of operation - using the hash function API (vtable) when computing object names - signing objects and verifying signatures @@ -657,6 +651,7 @@ extension (without compatObjectFormat). This requires: repository Next comes introduction of compatObjectFormat: + - implementing the loose-object-idx - translating object names between object formats - translating object content between object formats @@ -669,10 +664,11 @@ Next comes introduction of compatObjectFormat: "Object names on the command line" above) The next step is supporting fetches and pushes to SHA-1 repositories: + - allow pushes to a repository using the compat format - generate a topologically sorted list of the SHA-1 names of fetched objects -- convert the fetched packfile to sha256 format and generate an idx +- convert the fetched packfile to SHA-256 format and generate an idx file - re-sort to match the order of objects in the fetched packfile @@ -734,6 +730,7 @@ Using hash functions in parallel Objects newly created would be addressed by the new hash, but inside such an object (e.g. commit) it is still possible to address objects using the old hash function. + * You cannot trust its history (needed for bisectability) in the future without further work * Maintenance burden as the number of supported hash functions grows @@ -743,36 +740,38 @@ using the old hash function. Signed objects with multiple hashes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead of introducing the gpgsig-sha256 field in commit and tag objects -for sha256-content based signatures, an earlier version of this design -added "hash sha256 <sha256-name>" fields to strengthen the existing -sha1-content based signatures. +for SHA-256 content based signatures, an earlier version of this design +added "hash sha256 <SHA-256 name>" fields to strengthen the existing +SHA-1 content based signatures. In other words, a single signature was used to attest to the object content using both hash functions. This had some advantages: + * Using one signature instead of two speeds up the signing process. * Having one signed payload with both hashes allows the signer to - attest to the sha1-name and sha256-name referring to the same object. + attest to the SHA-1 name and SHA-256 name referring to the same object. * All users consume the same signature. Broken signatures are likely to be detected quickly using current versions of git. However, it also came with disadvantages: -* Verifying a signed object requires access to the sha1-names of all + +* Verifying a signed object requires access to the SHA-1 names of all objects it references, even after the transition is complete and translation table is no longer needed for anything else. To support - this, the design added fields such as "hash sha1 tree <sha1-name>" - and "hash sha1 parent <sha1-name>" to the sha256-content of a signed + this, the design added fields such as "hash sha1 tree <SHA-1 name>" + and "hash sha1 parent <SHA-1 name>" to the SHA-256 content of a signed commit, complicating the conversion process. -* Allowing signed objects without a sha1 (for after the transition is +* Allowing signed objects without a SHA-1 (for after the transition is complete) complicated the design further, requiring a "nohash sha1" - field to suppress including "hash sha1" fields in the sha256-content + field to suppress including "hash sha1" fields in the SHA-256 content and signed payload. Lazily populated translation table ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some of the work of building the translation table could be deferred to push time, but that would significantly complicate and slow down pushes. -Calculating the sha1-name at object creation time at the same time it is -being streamed to disk and having its sha256-name calculated should be +Calculating the SHA-1 name at object creation time at the same time it is +being streamed to disk and having its SHA-256 name calculated should be an acceptable cost. Document History @@ -782,18 +781,19 @@ Document History bmwill@google.com, jonathantanmy@google.com, jrnieder@gmail.com, sbeller@google.com -Initial version sent to -http://lore.kernel.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com +* Initial version sent to https://lore.kernel.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com 2017-03-03 jrnieder@gmail.com Incorporated suggestions from jonathantanmy and sbeller: -* describe purpose of signed objects with each hash type -* redefine signed object verification using object content under the + +* Describe purpose of signed objects with each hash type +* Redefine signed object verification using object content under the first hash function 2017-03-06 jrnieder@gmail.com + * Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2] -* Make sha3-based signatures a separate field, avoiding the need for +* Make SHA3-based signatures a separate field, avoiding the need for "hash" and "nohash" fields (thanks to peff[3]). * Add a sorting phase to fetch (thanks to Junio for noticing the need for this). @@ -805,23 +805,26 @@ Incorporated suggestions from jonathantanmy and sbeller: especially Junio). 2017-09-27 jrnieder@gmail.com, sbeller@google.com -* use placeholder NewHash instead of SHA3-256 -* describe criteria for picking a hash function. -* include a transition plan (thanks especially to Brandon Williams + +* Use placeholder NewHash instead of SHA3-256 +* Describe criteria for picking a hash function. +* Include a transition plan (thanks especially to Brandon Williams for fleshing these ideas out) -* define the translation table (thanks, Shawn Pearce[5], Jonathan +* Define the translation table (thanks, Shawn Pearce[5], Jonathan Tan, and Masaya Suzuki) -* avoid loose object overhead by packing more aggressively in +* Avoid loose object overhead by packing more aggressively in "git gc --auto" Later history: - See the history of this file in git.git for the history of subsequent - edits. This document history is no longer being maintained as it - would now be superfluous to the commit log +* See the history of this file in git.git for the history of subsequent + edits. This document history is no longer being maintained as it + would now be superfluous to the commit log + +References: -[1] http://lore.kernel.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/ -[2] http://lore.kernel.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/ -[3] http://lore.kernel.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/ -[4] http://lore.kernel.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net -[5] https://lore.kernel.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/ + [1] https://lore.kernel.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/ + [2] https://lore.kernel.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/ + [3] https://lore.kernel.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/ + [4] https://lore.kernel.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net + [5] https://lore.kernel.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/ diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index 69edf46c03..65da0daaa5 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -26,7 +26,7 @@ Git index format Extensions are identified by signature. Optional extensions can be ignored if Git does not understand them. - Git currently supports cached tree and resolve undo extensions. + Git currently supports cache tree and resolve undo extensions. 4-byte extension signature. If the first byte is 'A'..'Z' the extension is optional and can be ignored. @@ -44,6 +44,13 @@ Git index format localization, no special casing of directory separator '/'). Entries with the same name are sorted by their stage field. + An index entry typically represents a file. However, if sparse-checkout + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. + These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed this is stat(2) data @@ -136,14 +143,35 @@ Git index format == Extensions -=== Cached tree - - Cached tree extension contains pre-computed hashes for trees that can - be derived from the index. It helps speed up tree object generation - from index for a new commit. - - When a path is updated in index, the path must be invalidated and - removed from tree cache. +=== Cache tree + + Since the index does not record entries for directories, the cache + entries cannot describe tree objects that already exist in the object + database for regions of the index that are unchanged from an existing + commit. The cache tree extension stores a recursive tree structure that + describes the trees that already exist and completely match sections of + the cache entries. This speeds up tree object generation from the index + for a new commit by only computing the trees that are "new" to that + commit. It also assists when comparing the index to another tree, such + as `HEAD^{tree}`, since sections of the index can be skipped when a tree + comparison demonstrates equality. + + The recursive tree structure uses nodes that store a number of cache + entries, a list of subnodes, and an object ID (OID). The OID references + the existing tree for that node, if it is known to exist. The subnodes + correspond to subdirectories that themselves have cache tree nodes. The + number of cache entries corresponds to the number of cache entries in + the index that describe paths within that tree's directory. + + The extension tracks the full directory structure in the cache tree + extension, but this is generally smaller than the full cache entry list. + + When a path is updated in index, Git invalidates all nodes of the + recursive cache tree corresponding to the parent directories of that + path. We store these tree nodes as being "invalid" by using "-1" as the + number of cache entries. Invalid nodes still store a span of index + entries, allowing Git to focus its efforts when reconstructing a full + cache tree. The signature for this extension is { 'T', 'R', 'E', 'E' }. @@ -174,7 +202,8 @@ Git index format first entry represents the root level of the repository, followed by the first subtree--let's call this A--of the root level (with its name relative to the root level), followed by the first subtree of A (with - its name relative to A), ... + its name relative to A), and so on. The specified number of subtrees + indicates when the current level of the recursive stack is complete. === Resolve undo @@ -251,14 +280,14 @@ Git index format - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from ctime field until "file size". - - Stat data of core.excludesfile + - Stat data of core.excludesFile - 32-bit dir_flags (see struct dir_struct) - Hash of $GIT_DIR/info/exclude. A null hash means the file does not exist. - - Hash of core.excludesfile. A null hash means the file does + - Hash of core.excludesFile. A null hash means the file does not exist. - NUL-terminated string of per-dir exclude file name. This usually @@ -363,3 +392,15 @@ The remaining data of each directory block is grouped by type: in this block of entries. - 32-bit count of cache entries in this block + +== Sparse Directory Entries + + When using sparse-checkout in cone mode, some entire directories within + the index can be summarized by pointing to a tree object instead of the + entire expanded list of paths within that tree. An index containing such + entries is a "sparse index". Index format versions 4 and less were not + implemented with such entries in mind. Thus, for these versions, an + index containing sparse directory entries will include this extension + with signature { 's', 'd', 'i', 'r' }. Like the split-index extension, + tools should avoid interacting with a sparse index unless they understand + this extension. diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt index e8e377a59f..fb688976c4 100644 --- a/Documentation/technical/multi-pack-index.txt +++ b/Documentation/technical/multi-pack-index.txt @@ -43,8 +43,9 @@ Design Details a change in format. - The MIDX keeps only one record per object ID. If an object appears - in multiple packfiles, then the MIDX selects the copy in the most- - recently modified packfile. + in multiple packfiles, then the MIDX selects the copy in the + preferred packfile, otherwise selecting from the most-recently + modified packfile. - If there exist packfiles in the pack directory not registered in the MIDX, then those packfiles are loaded into the `packed_git` diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index f96b2e605f..8d2f42f29e 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -55,6 +55,18 @@ Valid object types are: Type 5 is reserved for future expansion. Type 0 is invalid. +=== Size encoding + +This document uses the following "size encoding" of non-negative +integers: From each byte, the seven least significant bits are +used to form the resulting integer. As long as the most significant +bit is 1, this process continues; the byte with MSB 0 provides the +last seven bits. The seven-bit chunks are concatenated. Later +values are more significant. + +This size encoding should not be confused with the "offset encoding", +which is also used in this document. + === Deltified representation Conceptually there are only four object types: commit, tree, tag and @@ -73,7 +85,10 @@ Ref-delta can also refer to an object outside the pack (i.e. the so-called "thin pack"). When stored on disk however, the pack should be self contained to avoid cyclic dependency. -The delta data is a sequence of instructions to reconstruct an object +The delta data starts with the size of the base object and the +size of the object to be reconstructed. These sizes are +encoded using the size encoding from above. The remainder of +the delta data is a sequence of instructions to reconstruct the object from the base object. If the base object is deltified, it must be converted to canonical form first. Each instruction appends more and more data to the target object until it's complete. There are two @@ -259,6 +274,26 @@ Pack file entry: <+ Index checksum of all of the above. +== pack-*.rev files have the format: + + - A 4-byte magic number '0x52494458' ('RIDX'). + + - A 4-byte version identifier (= 1). + + - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256). + + - A table of index positions (one per packed object, num_objects in + total, each a 4-byte unsigned integer in network order), sorted by + their corresponding offsets in the packfile. + + - A trailer, containing a: + + checksum of the corresponding packfile, and + + a checksum of all of the above. + +All 4-byte numbers are in network order. + == multi-pack-index (MIDX) files have the following format: The multi-pack-index files refer to multiple pack-files and loose objects. @@ -301,6 +336,9 @@ CHUNK LOOKUP: (Chunks are provided in file-order, so you can infer the length using the next chunk position if necessary.) + The CHUNK LOOKUP matches the table of contents from + link:technical/chunk-format.html[the chunk-based file format]. + The remaining data in the body is described one chunk at a time, and these chunks may be given in any order. Chunks are required unless otherwise specified. @@ -341,3 +379,86 @@ CHUNK DATA: TRAILER: Index checksum of the above contents. + +== multi-pack-index reverse indexes + +Similar to the pack-based reverse index, the multi-pack index can also +be used to generate a reverse index. + +Instead of mapping between offset, pack-, and index position, this +reverse index maps between an object's position within the MIDX, and +that object's position within a pseudo-pack that the MIDX describes +(i.e., the ith entry of the multi-pack reverse index holds the MIDX +position of ith object in pseudo-pack order). + +To clarify the difference between these orderings, consider a multi-pack +reachability bitmap (which does not yet exist, but is what we are +building towards here). Each bit needs to correspond to an object in the +MIDX, and so we need an efficient mapping from bit position to MIDX +position. + +One solution is to let bits occupy the same position in the oid-sorted +index stored by the MIDX. But because oids are effectively random, their +resulting reachability bitmaps would have no locality, and thus compress +poorly. (This is the reason that single-pack bitmaps use the pack +ordering, and not the .idx ordering, for the same purpose.) + +So we'd like to define an ordering for the whole MIDX based around +pack ordering, which has far better locality (and thus compresses more +efficiently). We can think of a pseudo-pack created by the concatenation +of all of the packs in the MIDX. E.g., if we had a MIDX with three packs +(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an +ordering of the objects like: + + |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19| + +where the ordering of the packs is defined by the MIDX's pack list, +and then the ordering of objects within each pack is the same as the +order in the actual packfile. + +Given the list of packs and their counts of objects, you can +naïvely reconstruct that pseudo-pack ordering (e.g., the object at +position 27 must be (c,1) because packs "a" and "b" consumed 25 of the +slots). But there's a catch. Objects may be duplicated between packs, in +which case the MIDX only stores one pointer to the object (and thus we'd +want only one slot in the bitmap). + +Callers could handle duplicates themselves by reading objects in order +of their bit-position, but that's linear in the number of objects, and +much too expensive for ordinary bitmap lookups. Building a reverse index +solves this, since it is the logical inverse of the index, and that +index has already removed duplicates. But, building a reverse index on +the fly can be expensive. Since we already have an on-disk format for +pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack, +too. + +Objects from the MIDX are ordered as follows to string together the +pseudo-pack. Let `pack(o)` return the pack from which `o` was selected +by the MIDX, and define an ordering of packs based on their numeric ID +(as stored by the MIDX). Let `offset(o)` return the object offset of `o` +within `pack(o)`. Then, compare `o1` and `o2` as follows: + + - If one of `pack(o1)` and `pack(o2)` is preferred and the other + is not, then the preferred one sorts first. ++ +(This is a detail that allows the MIDX bitmap to determine which +pack should be used by the pack-reuse mechanism, since it can ask +the MIDX for the pack containing the object at bit position 0). + + - If `pack(o1) ≠ pack(o2)`, then sort the two objects in descending + order based on the pack ID. + + - Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in + pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1) + < offset(o2)`). + +In short, a MIDX's pseudo-pack is the de-duplicated concatenation of +objects in packs stored by the MIDX, laid out in pack order, and the +packs arranged in MIDX order (with the preferred pack coming first). + +Finally, note that the MIDX's reverse index is not stored as a chunk in +the multi-pack-index itself. This is done because the reverse index +includes the checksum of the pack or MIDX to which it belongs, which +makes it impossible to write in the MIDX. To avoid races when rewriting +the MIDX, a MIDX reverse index includes the MIDX's checksum in its +filename (e.g., `multi-pack-index-xyz.rev`). diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt index 318713abc3..f7eabc6c76 100644 --- a/Documentation/technical/packfile-uri.txt +++ b/Documentation/technical/packfile-uri.txt @@ -37,8 +37,11 @@ at least so that we can test the client. This is the implementation: a feature, marked experimental, that allows the server to be configured by one or more `uploadpack.blobPackfileUri=<sha1> <uri>` entries. Whenever the list of objects to be sent is assembled, all such -blobs are excluded, replaced with URIs. The client will download those URIs, -expecting them to each point to packfiles containing single blobs. +blobs are excluded, replaced with URIs. As noted in "Future work" below, the +server can evolve in the future to support excluding other objects (or other +implementations of servers could be made that support excluding other objects) +without needing a protocol change, so clients should not expect that packfiles +downloaded in this way only contain single blobs. Client design ------------- diff --git a/Documentation/technical/parallel-checkout.txt b/Documentation/technical/parallel-checkout.txt new file mode 100644 index 0000000000..e790258a1a --- /dev/null +++ b/Documentation/technical/parallel-checkout.txt @@ -0,0 +1,270 @@ +Parallel Checkout Design Notes +============================== + +The "Parallel Checkout" feature attempts to use multiple processes to +parallelize the work of uncompressing the blobs, applying in-core +filters, and writing the resulting contents to the working tree during a +checkout operation. It can be used by all checkout-related commands, +such as `clone`, `checkout`, `reset`, `sparse-checkout`, and others. + +These commands share the following basic structure: + +* Step 1: Read the current index file into memory. + +* Step 2: Modify the in-memory index based upon the command, and + temporarily mark all cache entries that need to be updated. + +* Step 3: Populate the working tree to match the new candidate index. + This includes iterating over all of the to-be-updated cache entries + and delete, create, or overwrite the associated files in the working + tree. + +* Step 4: Write the new index to disk. + +Step 3 is the focus of the "parallel checkout" effort described here. + +Sequential Implementation +------------------------- + +For the purposes of discussion here, the current sequential +implementation of Step 3 is divided in 3 parts, each one implemented in +its own function: + +* Step 3a: `unpack-trees.c:check_updates()` contains a series of + sequential loops iterating over the `cache_entry`'s array. The main + loop in this function calls the Step 3b function for each of the + to-be-updated entries. + +* Step 3b: `entry.c:checkout_entry()` examines the existing working tree + for file conflicts, collisions, and unsaved changes. It removes files + and creates leading directories as necessary. It calls the Step 3c + function for each entry to be written. + +* Step 3c: `entry.c:write_entry()` loads the blob into memory, smudges + it if necessary, creates the file in the working tree, writes the + smudged contents, calls `fstat()` or `lstat()`, and updates the + associated `cache_entry` struct with the stat information gathered. + +It wouldn't be safe to perform Step 3b in parallel, as there could be +race conditions between file creations and removals. Instead, the +parallel checkout framework lets the sequential code handle Step 3b, +and uses parallel workers to replace the sequential +`entry.c:write_entry()` calls from Step 3c. + +Rejected Multi-Threaded Solution +-------------------------------- + +The most "straightforward" implementation would be to spread the set of +to-be-updated cache entries across multiple threads. But due to the +thread-unsafe functions in the ODB code, we would have to use locks to +coordinate the parallel operation. An early prototype of this solution +showed that the multi-threaded checkout would bring performance +improvements over the sequential code, but there was still too much lock +contention. A `perf` profiling indicated that around 20% of the runtime +during a local Linux clone (on an SSD) was spent in locking functions. +For this reason this approach was rejected in favor of using multiple +child processes, which led to a better performance. + +Multi-Process Solution +---------------------- + +Parallel checkout alters the aforementioned Step 3 to use multiple +`checkout--worker` background processes to distribute the work. The +long-running worker processes are controlled by the foreground Git +command using the existing run-command API. + +Overview +~~~~~~~~ + +Step 3b is only slightly altered; for each entry to be checked out, the +main process performs the following steps: + +* M1: Check whether there is any untracked or unclean file in the + working tree which would be overwritten by this entry, and decide + whether to proceed (removing the file(s)) or not. + +* M2: Create the leading directories. + +* M3: Load the conversion attributes for the entry's path. + +* M4: Check, based on the entry's type and conversion attributes, + whether the entry is eligible for parallel checkout (more on this + later). If it is eligible, enqueue the entry and the loaded + attributes to later write the entry in parallel. If not, write the + entry right away, using the default sequential code. + +Note: we save the conversion attributes associated with each entry +because the workers don't have access to the main process' index state, +so they can't load the attributes by themselves (and the attributes are +needed to properly smudge the entry). Additionally, this has a positive +impact on performance as (1) we don't need to load the attributes twice +and (2) the attributes machinery is optimized to handle paths in +sequential order. + +After all entries have passed through the above steps, the main process +checks if the number of enqueued entries is sufficient to spread among +the workers. If not, it just writes them sequentially. Otherwise, it +spawns the workers and distributes the queued entries uniformly in +continuous chunks. This aims to minimize the chances of two workers +writing to the same directory simultaneously, which could increase lock +contention in the kernel. + +Then, for each assigned item, each worker: + +* W1: Checks if there is any non-directory file in the leading part of + the entry's path or if there already exists a file at the entry' path. + If so, mark the entry with `PC_ITEM_COLLIDED` and skip it (more on + this later). + +* W2: Creates the file (with O_CREAT and O_EXCL). + +* W3: Loads the blob into memory (inflating and delta reconstructing + it). + +* W4: Applies any required in-process filter, like end-of-line + conversion and re-encoding. + +* W5: Writes the result to the file descriptor opened at W2. + +* W6: Calls `fstat()` or lstat()` on the just-written path, and sends + the result back to the main process, together with the end status of + the operation and the item's identification number. + +Note that, when possible, steps W3 to W5 are delegated to the streaming +machinery, removing the need to keep the entire blob in memory. + +If the worker fails to read the blob or to write it to the working tree, +it removes the created file to avoid leaving empty files behind. This is +the *only* time a worker is allowed to remove a file. + +As mentioned earlier, it is the responsibility of the main process to +remove any file that blocks the checkout operation (or abort if the +removal(s) would cause data loss and the user didn't ask to `--force`). +This is crucial to avoid race conditions and also to properly detect +path collisions at Step W1. + +After the workers finish writing the items and sending back the required +information, the main process handles the results in two steps: + +- First, it updates the in-memory index with the `lstat()` information + sent by the workers. (This must be done first as this information + might me required in the following step.) + +- Then it writes the items which collided on disk (i.e. items marked + with `PC_ITEM_COLLIDED`). More on this below. + +Path Collisions +--------------- + +Path collisions happen when two different paths correspond to the same +entry in the file system. E.g. the paths 'a' and 'A' would collide in a +case-insensitive file system. + +The sequential checkout deals with collisions in the same way that it +deals with files that were already present in the working tree before +checkout. Basically, it checks if the path that it wants to write +already exists on disk, makes sure the existing file doesn't have +unsaved data, and then overwrites it. (To be more pedantic: it deletes +the existing file and creates the new one.) So, if there are multiple +colliding files to be checked out, the sequential code will write each +one of them but only the last will actually survive on disk. + +Parallel checkout aims to reproduce the same behavior. However, we +cannot let the workers racily write to the same file on disk. Instead, +the workers detect when the entry that they want to check out would +collide with an existing file, and mark it with `PC_ITEM_COLLIDED`. +Later, the main process can sequentially feed these entries back to +`checkout_entry()` without the risk of race conditions. On clone, this +also has the effect of marking the colliding entries to later emit a +warning for the user, like the classic sequential checkout does. + +The workers are able to detect both collisions among the entries being +concurrently written and collisions between a parallel-eligible entry +and an ineligible entry. The general idea for collision detection is +quite straightforward: for each parallel-eligible entry, the main +process must remove all files that prevent this entry from being written +(before enqueueing it). This includes any non-directory file in the +leading path of the entry. Later, when a worker gets assigned the entry, +it looks again for the non-directories files and for an already existing +file at the entry's path. If any of these checks finds something, the +worker knows that there was a path collision. + +Because parallel checkout can distinguish path collisions from the case +where the file was already present in the working tree before checkout, +we could alternatively choose to skip the checkout of colliding entries. +However, each entry that doesn't get written would have NULL `lstat()` +fields on the index. This could cause performance penalties for +subsequent commands that need to refresh the index, as they would have +to go to the file system to see if the entry is dirty. Thus, if we have +N entries in a colliding group and we decide to write and `lstat()` only +one of them, every subsequent `git-status` will have to read, convert, +and hash the written file N - 1 times. By checking out all colliding +entries (like the sequential code does), we only pay the overhead once, +during checkout. + +Eligible Entries for Parallel Checkout +-------------------------------------- + +As previously mentioned, not all entries passed to `checkout_entry()` +will be considered eligible for parallel checkout. More specifically, we +exclude: + +- Symbolic links; to avoid race conditions that, in combination with + path collisions, could cause workers to write files at the wrong + place. For example, if we were to concurrently check out a symlink + 'a' -> 'b' and a regular file 'A/f' in a case-insensitive file system, + we could potentially end up writing the file 'A/f' at 'a/f', due to a + race condition. + +- Regular files that require external filters (either "one shot" filters + or long-running process filters). These filters are black-boxes to Git + and may have their own internal locking or non-concurrent assumptions. + So it might not be safe to run multiple instances in parallel. ++ +Besides, long-running filters may use the delayed checkout feature to +postpone the return of some filtered blobs. The delayed checkout queue +and the parallel checkout queue are not compatible and should remain +separate. ++ +Note: regular files that only require internal filters, like end-of-line +conversion and re-encoding, are eligible for parallel checkout. + +Ineligible entries are checked out by the classic sequential codepath +*before* spawning workers. + +Note: submodules's files are also eligible for parallel checkout (as +long as they don't fall into any of the excluding categories mentioned +above). But since each submodule is checked out in its own child +process, we don't mix the superproject's and the submodules' files in +the same parallel checkout process or queue. + +The API +------- + +The parallel checkout API was designed with the goal of minimizing +changes to the current users of the checkout machinery. This means that +they don't have to call a different function for sequential or parallel +checkout. As already mentioned, `checkout_entry()` will automatically +insert the given entry in the parallel checkout queue when this feature +is enabled and the entry is eligible; otherwise, it will just write the +entry right away, using the sequential code. In general, callers of the +parallel checkout API should look similar to this: + +---------------------------------------------- +int pc_workers, pc_threshold, err = 0; +struct checkout state; + +get_parallel_checkout_configs(&pc_workers, &pc_threshold); + +/* + * This check is not strictly required, but it + * should save some time in sequential mode. + */ +if (pc_workers > 1) + init_parallel_checkout(); + +for (each cache_entry ce to-be-updated) + err |= checkout_entry(ce, &state, NULL, NULL); + +err |= run_parallel_checkout(&state, pc_workers, pc_threshold, NULL, NULL); +---------------------------------------------- diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 85daeb5d9e..a1e31367f4 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -33,8 +33,8 @@ In protocol v2 these special packets will have the following semantics: * '0000' Flush Packet (flush-pkt) - indicates the end of a message * '0001' Delimiter Packet (delim-pkt) - separates sections of a message - * '0002' Message Packet (response-end-pkt) - indicates the end of a response - for stateless connections + * '0002' Response End Packet (response-end-pkt) - indicates the end of a + response for stateless connections Initial Client Request ---------------------- @@ -192,11 +192,20 @@ ls-refs takes in the following arguments: When specified, only references having a prefix matching one of the provided prefixes are displayed. +If the 'unborn' feature is advertised the following argument can be +included in the client's request. + + unborn + The server will send information about HEAD even if it is a symref + pointing to an unborn branch in the form "unborn HEAD + symref-target:<target>". + The output of ls-refs is as follows: output = *ref flush-pkt - ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF) + obj-id-or-unborn = (obj-id | "unborn") + ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF) ref-attribute = (symref | peeled) symref = "symref-target:" symref-target peeled = "peeled:" obj-id @@ -337,6 +346,14 @@ explained below. client should download from all given URIs. Currently, the protocols supported are "http" and "https". +If the 'wait-for-done' feature is advertised, the following argument +can be included in the client's request. + + wait-for-done + Indicates to the server that it should never send "ready", but + should wait for the client to say "done" before sending the + packfile. + The response of `fetch` is broken into a number of sections separated by delimiter packets (0001), with each section beginning with its section header. Most sections are sent only when the packfile is sent. @@ -505,3 +522,34 @@ packet-line, and must not contain non-printable or whitespace characters. The current implementation uses trace2 session IDs (see link:api-trace2.html[api-trace2] for details), but this may change and users of the session ID should not rely on this fact. + +object-info +~~~~~~~~~~~ + +`object-info` is the command to retrieve information about one or more objects. +Its main purpose is to allow a client to make decisions based on this +information without having to fully fetch objects. Object size is the only +information that is currently supported. + +An `object-info` request takes the following arguments: + + size + Requests size information to be returned for each listed object id. + + oid <oid> + Indicates to the server an object which the client wants to obtain + information for. + +The response of `object-info` is a list of the the requested object ids +and associated requested information, each separated by a single space. + + output = info flush-pkt + + info = PKT-LINE(attrs) LF) + *PKT-LINE(obj-info LF) + + attrs = attr | attrs SP attrs + + attr = "size" + + obj-info = obj-id SP obj-size diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt index 2951840e9c..d7c3b645cf 100644 --- a/Documentation/technical/reftable.txt +++ b/Documentation/technical/reftable.txt @@ -446,7 +446,7 @@ especially if readers will not use the object name to ref mapping. Object blocks use unique, abbreviated 2-32 object name keys, mapping to ref blocks containing references pointing to that object directly, or as the peeled value of an annotated tag. Like ref blocks, object blocks use -the file's standard block size. The abbrevation length is available in +the file's standard block size. The abbreviation length is available in the footer as `obj_id_len`. To save space in small files, object blocks may be omitted if the ref @@ -872,17 +872,11 @@ A repository must set its `$GIT_DIR/config` to configure reftable: Layout ^^^^^^ -A collection of reftable files are stored in the `$GIT_DIR/reftable/` -directory: - -.... -00000001-00000001.log -00000002-00000002.ref -00000003-00000003.ref -.... - -where reftable files are named by a unique name such as produced by the -function `${min_update_index}-${max_update_index}.ref`. +A collection of reftable files are stored in the `$GIT_DIR/reftable/` directory. +Their names should have a random element, such that each filename is globally +unique; this helps avoid spurious failures on Windows, where open files cannot +be removed or overwritten. It suggested to use +`${min_update_index}-${max_update_index}-${random}.ref` as a naming convention. Log-only files use the `.log` extension, while ref-only and mixed ref and log files use `.ref`. extension. @@ -893,9 +887,9 @@ current files, one per line, in order, from oldest (base) to newest .... $ cat .git/reftable/tables.list -00000001-00000001.log -00000002-00000002.ref -00000003-00000003.ref +00000001-00000001-RANDOM1.log +00000002-00000002-RANDOM2.ref +00000003-00000003-RANDOM3.ref .... Readers must read `$GIT_DIR/reftable/tables.list` to determine which @@ -940,7 +934,7 @@ new reftable and atomically appending it to the stack: 3. Select `update_index` to be most recent file's `max_update_index + 1`. 4. Prepare temp reftable `tmp_XXXXXX`, including log entries. -5. Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`. +5. Rename `tmp_XXXXXX` to `${update_index}-${update_index}-${random}.ref`. 6. Copy `tables.list` to `tables.list.lock`, appending file from (5). 7. Rename `tables.list.lock` to `tables.list`. @@ -993,7 +987,7 @@ prevents other processes from trying to compact these files. should always be the case, assuming that other processes are adhering to the locking protocol. 7. Rename `${min_update_index}-${max_update_index}_XXXXXX` to -`${min_update_index}-${max_update_index}.ref`. +`${min_update_index}-${max_update_index}-${random}.ref`. 8. Write the new stack to `tables.list.lock`, replacing `B` and `C` with the file from (4). 9. Rename `tables.list.lock` to `tables.list`. @@ -1005,6 +999,27 @@ This strategy permits compactions to proceed independently of updates. Each reftable (compacted or not) is uniquely identified by its name, so open reftables can be cached by their name. +Windows +^^^^^^^ + +On windows, and other systems that do not allow deleting or renaming to open +files, compaction may succeed, but other readers may prevent obsolete tables +from being deleted. + +On these platforms, the following strategy can be followed: on closing a +reftable stack, reload `tables.list`, and delete any tables no longer mentioned +in `tables.list`. + +Irregular program exit may still leave about unused files. In this case, a +cleanup operation should proceed as follows: + +* take a lock `tables.list.lock` to prevent concurrent modifications +* refresh the reftable stack, by reading `tables.list` +* for each `*.ref` file, remove it if +** it is not mentioned in `tables.list`, and +** its max update_index is not beyond the max update_index of the stack + + Alternatives considered ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt new file mode 100644 index 0000000000..3b24c1a219 --- /dev/null +++ b/Documentation/technical/sparse-index.txt @@ -0,0 +1,208 @@ +Git Sparse-Index Design Document +================================ + +The sparse-checkout feature allows users to focus a working directory on +a subset of the files at HEAD. The cone mode patterns, enabled by +`core.sparseCheckoutCone`, allow for very fast pattern matching to +discover which files at HEAD belong in the sparse-checkout cone. + +Three important scale dimensions for a Git working directory are: + +* `HEAD`: How many files are present at `HEAD`? + +* Populated: How many files are within the sparse-checkout cone. + +* Modified: How many files has the user modified in the working directory? + +We will use big-O notation -- O(X) -- to denote how expensive certain +operations are in terms of these dimensions. + +These dimensions are ordered by their magnitude: users (typically) modify +fewer files than are populated, and we can only populate files at `HEAD`. + +Problems occur if there is an extreme imbalance in these dimensions. For +example, if `HEAD` contains millions of paths but the populated set has +only tens of thousands, then commands like `git status` and `git add` can +be dominated by operations that require O(`HEAD`) operations instead of +O(Populated). Primarily, the cost is in parsing and rewriting the index, +which is filled primarily with files at `HEAD` that are marked with the +`SKIP_WORKTREE` bit. + +The sparse-index intends to take these commands that read and modify the +index from O(`HEAD`) to O(Populated). To do this, we need to modify the +index format in a significant way: add "sparse directory" entries. + +With cone mode patterns, it is possible to detect when an entire +directory will have its contents outside of the sparse-checkout definition. +Instead of listing all of the files it contains as individual entries, a +sparse-index contains an entry with the directory name, referencing the +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. +If we need to discover the details for paths within that directory, we +can parse trees to find that list. + +At time of writing, sparse-directory entries violate expectations about the +index format and its in-memory data structure. There are many consumers in +the codebase that expect to iterate through all of the index entries and +see only files. In fact, these loops expect to see a reference to every +staged file. One way to handle this is to parse trees to replace a +sparse-directory entry with all of the files within that tree as the index +is loaded. However, parsing trees is slower than parsing the index format, +so that is a slower operation than if we left the index alone. The plan is +to make all of these integrations "sparse aware" so this expansion through +tree parsing is unnecessary and they use fewer resources than when using a +full index. + +The implementation plan below follows four phases to slowly integrate with +the sparse-index. The intention is to incrementally update Git commands to +interact safely with the sparse-index without significant slowdowns. This +may not always be possible, but the hope is that the primary commands that +users need in their daily work are dramatically improved. + +Phase I: Format and initial speedups +------------------------------------ + +During this phase, Git learns to enable the sparse-index and safely parse +one. Protections are put in place so that every consumer of the in-memory +data structure can operate with its current assumption of every file at +`HEAD`. + +At first, every index parse will call a helper method, +`ensure_full_index()`, which scans the index for sparse-directory entries +(pointing to trees) and replaces them with the full list of paths (with +blob contents) by parsing tree objects. This will be slower in all cases. +The only noticeable change in behavior will be that the serialized index +file contains sparse-directory entries. + +To start, we use a new required index extension, `sdir`, to allow +inserting sparse-directory entries into indexes with file format +versions 2, 3, and 4. This prevents Git versions that do not understand +the sparse-index from operating on one, while allowing tools that do not +understand the sparse-index to operate on repositories as long as they do +not interact with the index. A new format, index v5, will be introduced +that includes sparse-directory entries by default. It might also +introduce other features that have been considered for improving the +index, as well. + +Next, consumers of the index will be guarded against operating on a +sparse-index by inserting calls to `ensure_full_index()` or +`expand_index_to_path()`. If a specific path is requested, then those will +be protected from within the `index_file_exists()` and `index_name_pos()` +API calls: they will call `ensure_full_index()` if necessary. The +intention here is to preserve existing behavior when interacting with a +sparse-checkout. We don't want a change to happen by accident, without +tests. Many of these locations may not need any change before removing the +guards, but we should not do so without tests to ensure the expected +behavior happens. + +It may be desirable to _change_ the behavior of some commands in the +presence of a sparse index or more generally in any sparse-checkout +scenario. In such cases, these should be carefully communicated and +tested. No such behavior changes are intended during this phase. + +During a scan of the codebase, not every iteration of the cache entries +needs an `ensure_full_index()` check. The basic reasons include: + +1. The loop is scanning for entries with non-zero stage. These entries + are not collapsed into a sparse-directory entry. + +2. The loop is scanning for submodules. These entries are not collapsed + into a sparse-directory entry. + +3. The loop is part of the index API, especially around reading or + writing the format. + +4. The loop is checking for correct order of cache entries and that is + correct if and only if the sparse-directory entries are in the correct + location. + +5. The loop ignores entries with the `SKIP_WORKTREE` bit set, or is + otherwise already aware of sparse directory entries. + +6. The sparse-index is disabled at this point when using the split-index + feature, so no effort is made to protect the split-index API. + +Even after inserting these guards, we will keep expanding sparse-indexes +for most Git commands using the `command_requires_full_index` repository +setting. This setting will be on by default and disabled one builtin at a +time until we have sufficient confidence that all of the index operations +are properly guarded. + +To complete this phase, the commands `git status` and `git add` will be +integrated with the sparse-index so that they operate with O(Populated) +performance. They will be carefully tested for operations within and +outside the sparse-checkout definition. + +Phase II: Careful integrations +------------------------------ + +This phase focuses on ensuring that all index extensions and APIs work +well with a sparse-index. This requires significant increases to our test +coverage, especially for operations that interact with the working +directory outside of the sparse-checkout definition. Some of these +behaviors may not be the desirable ones, such as some tests already +marked for failure in `t1092-sparse-checkout-compatibility.sh`. + +The index extensions that may require special integrations are: + +* FS Monitor +* Untracked cache + +While integrating with these features, we should look for patterns that +might lead to better APIs for interacting with the index. Coalescing +common usage patterns into an API call can reduce the number of places +where sparse-directories need to be handled carefully. + +Phase III: Important command speedups +------------------------------------- + +At this point, the patterns for testing and implementing sparse-directory +logic should be relatively stable. This phase focuses on updating some of +the most common builtins that use the index to operate as O(Populated). +Here is a potential list of commands that could be valuable to integrate +at this point: + +* `git commit` +* `git checkout` +* `git merge` +* `git rebase` + +Hopefully, commands such as `git merge` and `git rebase` can benefit +instead from merge algorithms that do not use the index as a data +structure, such as the merge-ORT strategy. As these topics mature, we +may enable the ORT strategy by default for repositories using the +sparse-index feature. + +Along with `git status` and `git add`, these commands cover the majority +of users' interactions with the working directory. In addition, we can +integrate with these commands: + +* `git grep` +* `git rm` + +These have been proposed as some whose behavior could change when in a +repo with a sparse-checkout definition. It would be good to include this +behavior automatically when using a sparse-index. Some clarity is needed +to make the behavior switch clear to the user. + +This phase is the first where parallel work might be possible without too +much conflicts between topics. + +Phase IV: The long tail +----------------------- + +This last phase is less a "phase" and more "the new normal" after all of +the previous work. + +To start, the `command_requires_full_index` option could be removed in +favor of expanding only when hitting an API guard. + +There are many Git commands that could use special attention to operate as +O(Populated), while some might be so rare that it is acceptable to leave +them with additional overhead when a sparse-index is present. + +Here are some commands that might be useful to update: + +* `git sparse-checkout set` +* `git am` +* `git clean` +* `git stash` |