diff options
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/index-format.txt | 42 | ||||
-rw-r--r-- | Documentation/technical/pack-format.txt | 37 | ||||
-rw-r--r-- | Documentation/technical/packfile-uri.txt | 7 | ||||
-rw-r--r-- | Documentation/technical/reftable.txt | 2 |
4 files changed, 74 insertions, 14 deletions
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index 69edf46c03..b633482b1b 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -26,7 +26,7 @@ Git index format Extensions are identified by signature. Optional extensions can be ignored if Git does not understand them. - Git currently supports cached tree and resolve undo extensions. + Git currently supports cache tree and resolve undo extensions. 4-byte extension signature. If the first byte is 'A'..'Z' the extension is optional and can be ignored. @@ -136,14 +136,35 @@ Git index format == Extensions -=== Cached tree - - Cached tree extension contains pre-computed hashes for trees that can - be derived from the index. It helps speed up tree object generation - from index for a new commit. - - When a path is updated in index, the path must be invalidated and - removed from tree cache. +=== Cache tree + + Since the index does not record entries for directories, the cache + entries cannot describe tree objects that already exist in the object + database for regions of the index that are unchanged from an existing + commit. The cache tree extension stores a recursive tree structure that + describes the trees that already exist and completely match sections of + the cache entries. This speeds up tree object generation from the index + for a new commit by only computing the trees that are "new" to that + commit. It also assists when comparing the index to another tree, such + as `HEAD^{tree}`, since sections of the index can be skipped when a tree + comparison demonstrates equality. + + The recursive tree structure uses nodes that store a number of cache + entries, a list of subnodes, and an object ID (OID). The OID references + the existing tree for that node, if it is known to exist. The subnodes + correspond to subdirectories that themselves have cache tree nodes. The + number of cache entries corresponds to the number of cache entries in + the index that describe paths within that tree's directory. + + The extension tracks the full directory structure in the cache tree + extension, but this is generally smaller than the full cache entry list. + + When a path is updated in index, Git invalidates all nodes of the + recursive cache tree corresponding to the parent directories of that + path. We store these tree nodes as being "invalid" by using "-1" as the + number of cache entries. Invalid nodes still store a span of index + entries, allowing Git to focus its efforts when reconstructing a full + cache tree. The signature for this extension is { 'T', 'R', 'E', 'E' }. @@ -174,7 +195,8 @@ Git index format first entry represents the root level of the repository, followed by the first subtree--let's call this A--of the root level (with its name relative to the root level), followed by the first subtree of A (with - its name relative to A), ... + its name relative to A), and so on. The specified number of subtrees + indicates when the current level of the recursive stack is complete. === Resolve undo diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index f96b2e605f..8833b71c8b 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -55,6 +55,18 @@ Valid object types are: Type 5 is reserved for future expansion. Type 0 is invalid. +=== Size encoding + +This document uses the following "size encoding" of non-negative +integers: From each byte, the seven least significant bits are +used to form the resulting integer. As long as the most significant +bit is 1, this process continues; the byte with MSB 0 provides the +last seven bits. The seven-bit chunks are concatenated. Later +values are more significant. + +This size encoding should not be confused with the "offset encoding", +which is also used in this document. + === Deltified representation Conceptually there are only four object types: commit, tree, tag and @@ -73,7 +85,10 @@ Ref-delta can also refer to an object outside the pack (i.e. the so-called "thin pack"). When stored on disk however, the pack should be self contained to avoid cyclic dependency. -The delta data is a sequence of instructions to reconstruct an object +The delta data starts with the size of the base object and the +size of the object to be reconstructed. These sizes are +encoded using the size encoding from above. The remainder of +the delta data is a sequence of instructions to reconstruct the object from the base object. If the base object is deltified, it must be converted to canonical form first. Each instruction appends more and more data to the target object until it's complete. There are two @@ -259,6 +274,26 @@ Pack file entry: <+ Index checksum of all of the above. +== pack-*.rev files have the format: + + - A 4-byte magic number '0x52494458' ('RIDX'). + + - A 4-byte version identifier (= 1). + + - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256). + + - A table of index positions (one per packed object, num_objects in + total, each a 4-byte unsigned integer in network order), sorted by + their corresponding offsets in the packfile. + + - A trailer, containing a: + + checksum of the corresponding packfile, and + + a checksum of all of the above. + +All 4-byte numbers are in network order. + == multi-pack-index (MIDX) files have the following format: The multi-pack-index files refer to multiple pack-files and loose objects. diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt index 318713abc3..f7eabc6c76 100644 --- a/Documentation/technical/packfile-uri.txt +++ b/Documentation/technical/packfile-uri.txt @@ -37,8 +37,11 @@ at least so that we can test the client. This is the implementation: a feature, marked experimental, that allows the server to be configured by one or more `uploadpack.blobPackfileUri=<sha1> <uri>` entries. Whenever the list of objects to be sent is assembled, all such -blobs are excluded, replaced with URIs. The client will download those URIs, -expecting them to each point to packfiles containing single blobs. +blobs are excluded, replaced with URIs. As noted in "Future work" below, the +server can evolve in the future to support excluding other objects (or other +implementations of servers could be made that support excluding other objects) +without needing a protocol change, so clients should not expect that packfiles +downloaded in this way only contain single blobs. Client design ------------- diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt index 2951840e9c..8095ab2590 100644 --- a/Documentation/technical/reftable.txt +++ b/Documentation/technical/reftable.txt @@ -446,7 +446,7 @@ especially if readers will not use the object name to ref mapping. Object blocks use unique, abbreviated 2-32 object name keys, mapping to ref blocks containing references pointing to that object directly, or as the peeled value of an annotated tag. Like ref blocks, object blocks use -the file's standard block size. The abbrevation length is available in +the file's standard block size. The abbreviation length is available in the footer as `obj_id_len`. To save space in small files, object blocks may be omitted if the ref |