diff options
Diffstat (limited to 'Documentation/technical/index-format.txt')
-rw-r--r-- | Documentation/technical/index-format.txt | 107 |
1 files changed, 78 insertions, 29 deletions
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index faa25c5c52..65da0daaa5 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -3,8 +3,11 @@ Git index format == The Git index file has the following format - All binary numbers are in network byte order. Version 2 is described - here unless stated otherwise. + All binary numbers are in network byte order. + In a repository using the traditional SHA-1, checksums and object IDs + (object names) mentioned below are all computed using SHA-1. Similarly, + in SHA-256 repositories, these values are computed using SHA-256. + Version 2 is described here unless stated otherwise. - A 12-byte header consisting of @@ -23,7 +26,7 @@ Git index format Extensions are identified by signature. Optional extensions can be ignored if Git does not understand them. - Git currently supports cached tree and resolve undo extensions. + Git currently supports cache tree and resolve undo extensions. 4-byte extension signature. If the first byte is 'A'..'Z' the extension is optional and can be ignored. @@ -32,8 +35,7 @@ Git index format Extension data - - 160-bit SHA-1 over the content of the index file before this - checksum. + - Hash checksum over the content of the index file before this checksum. == Index entry @@ -42,6 +44,13 @@ Git index format localization, no special casing of directory separator '/'). Entries with the same name are sorted by their stage field. + An index entry typically represents a file. However, if sparse-checkout + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. + These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed this is stat(2) data @@ -80,7 +89,7 @@ Git index format 32-bit file size This is the on-disk size from stat(2), truncated to 32-bit. - 160-bit SHA-1 for the represented object + Object name for the represented object A 16-bit 'flags' field split into (high to low bits) @@ -134,14 +143,35 @@ Git index format == Extensions -=== Cached tree - - Cached tree extension contains pre-computed hashes for trees that can - be derived from the index. It helps speed up tree object generation - from index for a new commit. - - When a path is updated in index, the path must be invalidated and - removed from tree cache. +=== Cache tree + + Since the index does not record entries for directories, the cache + entries cannot describe tree objects that already exist in the object + database for regions of the index that are unchanged from an existing + commit. The cache tree extension stores a recursive tree structure that + describes the trees that already exist and completely match sections of + the cache entries. This speeds up tree object generation from the index + for a new commit by only computing the trees that are "new" to that + commit. It also assists when comparing the index to another tree, such + as `HEAD^{tree}`, since sections of the index can be skipped when a tree + comparison demonstrates equality. + + The recursive tree structure uses nodes that store a number of cache + entries, a list of subnodes, and an object ID (OID). The OID references + the existing tree for that node, if it is known to exist. The subnodes + correspond to subdirectories that themselves have cache tree nodes. The + number of cache entries corresponds to the number of cache entries in + the index that describe paths within that tree's directory. + + The extension tracks the full directory structure in the cache tree + extension, but this is generally smaller than the full cache entry list. + + When a path is updated in index, Git invalidates all nodes of the + recursive cache tree corresponding to the parent directories of that + path. We store these tree nodes as being "invalid" by using "-1" as the + number of cache entries. Invalid nodes still store a span of index + entries, allowing Git to focus its efforts when reconstructing a full + cache tree. The signature for this extension is { 'T', 'R', 'E', 'E' }. @@ -160,8 +190,8 @@ Git index format - A newline (ASCII 10); and - - 160-bit object name for the object that would result from writing - this span of index as a tree. + - Object name for the object that would result from writing this span + of index as a tree. An entry can be in an invalidated state and is represented by having a negative number in the entry_count field. In this case, there is no @@ -172,7 +202,8 @@ Git index format first entry represents the root level of the repository, followed by the first subtree--let's call this A--of the root level (with its name relative to the root level), followed by the first subtree of A (with - its name relative to A), ... + its name relative to A), and so on. The specified number of subtrees + indicates when the current level of the recursive stack is complete. === Resolve undo @@ -198,7 +229,7 @@ Git index format stage 1 to 3 (a missing stage is represented by "0" in this field); and - - At most three 160-bit object names of the entry in stages from 1 to 3 + - At most three object names of the entry in stages from 1 to 3 (nothing is written for a missing stage). === Split index @@ -211,8 +242,8 @@ Git index format The extension consists of: - - 160-bit SHA-1 of the shared index file. The shared index file path - is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the + - Hash of the shared index file. The shared index file path + is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the index does not require a shared index file. - An ewah-encoded delete bitmap, each bit represents an entry in the @@ -249,14 +280,14 @@ Git index format - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from ctime field until "file size". - - Stat data of core.excludesfile + - Stat data of core.excludesFile - 32-bit dir_flags (see struct dir_struct) - - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file + - Hash of $GIT_DIR/info/exclude. A null hash means the file does not exist. - - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does + - Hash of core.excludesFile. A null hash means the file does not exist. - NUL-terminated string of per-dir exclude file name. This usually @@ -285,13 +316,13 @@ The remaining data of each directory block is grouped by type: - An ewah bitmap, the n-th bit records "check-only" bit of read_directory_recursive() for the n-th directory. - - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data + - An ewah bitmap, the n-th bit indicates whether hash and stat data is valid for the n-th directory and exists in the next data. - An array of stat data. The n-th data corresponds with the n-th "one" bit in the previous ewah bitmap. - - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit + - An array of hashes. The n-th hash corresponds with the n-th "one" bit in the previous ewah bitmap. - One NUL. @@ -304,12 +335,18 @@ The remaining data of each directory block is grouped by type: The extension starts with - - 32-bit version number: the current supported version is 1. + - 32-bit version number: the current supported versions are 1 and 2. - - 64-bit time: the extension data reflects all changes through the given + - (Version 1) + 64-bit time: the extension data reflects all changes through the given time which is stored as the nanoseconds elapsed since midnight, January 1, 1970. + - (Version 2) + A null terminated string: an opaque token defined by the file system + monitor application. The extension data reflects all changes relative + to that token. + - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. - An ewah bitmap, the n-th bit indicates whether the n-th index entry @@ -330,12 +367,12 @@ The remaining data of each directory block is grouped by type: - 32-bit offset to the end of the index entries - - 160-bit SHA-1 over the extension types and their sizes (but not + - Hash over the extension types and their sizes (but not their contents). E.g. if we have "TREE" extension that is N-bytes long, "REUC" extension that is M-bytes long, followed by "EOIE", then the hash would be: - SHA-1("TREE" + <binary representation of N> + + Hash("TREE" + <binary representation of N> + "REUC" + <binary representation of M>) == Index Entry Offset Table @@ -355,3 +392,15 @@ The remaining data of each directory block is grouped by type: in this block of entries. - 32-bit count of cache entries in this block + +== Sparse Directory Entries + + When using sparse-checkout in cone mode, some entire directories within + the index can be summarized by pointing to a tree object instead of the + entire expanded list of paths within that tree. An index containing such + entries is a "sparse index". Index format versions 4 and less were not + implemented with such entries in mind. Thus, for these versions, an + index containing sparse directory entries will include this extension + with signature { 's', 'd', 'i', 'r' }. Like the split-index extension, + tools should avoid interacting with a sparse index unless they understand + this extension. |