diff options
Diffstat (limited to 'Documentation/technical/index-format.txt')
-rw-r--r-- | Documentation/technical/index-format.txt | 252 |
1 files changed, 229 insertions, 23 deletions
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index dcd51b97d9..65da0daaa5 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -1,10 +1,13 @@ -GIT index format +Git index format ================ -== The git index file has the following format +== The Git index file has the following format - All binary numbers are in network byte order. Version 2 is described - here unless stated otherwise. + All binary numbers are in network byte order. + In a repository using the traditional SHA-1, checksums and object IDs + (object names) mentioned below are all computed using SHA-1. Similarly, + in SHA-256 repositories, these values are computed using SHA-256. + Version 2 is described here unless stated otherwise. - A 12-byte header consisting of @@ -21,9 +24,9 @@ GIT index format - Extensions Extensions are identified by signature. Optional extensions can - be ignored if GIT does not understand them. + be ignored if Git does not understand them. - GIT currently supports cached tree and resolve undo extensions. + Git currently supports cache tree and resolve undo extensions. 4-byte extension signature. If the first byte is 'A'..'Z' the extension is optional and can be ignored. @@ -32,8 +35,7 @@ GIT index format Extension data - - 160-bit SHA-1 over the content of the index file before this - checksum. + - Hash checksum over the content of the index file before this checksum. == Index entry @@ -42,6 +44,13 @@ GIT index format localization, no special casing of directory separator '/'). Entries with the same name are sorted by their stage field. + An index entry typically represents a file. However, if sparse-checkout + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. + These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed this is stat(2) data @@ -80,7 +89,7 @@ GIT index format 32-bit file size This is the on-disk size from stat(2), truncated to 32-bit. - 160-bit SHA-1 for the represented object + Object name for the represented object A 16-bit 'flags' field split into (high to low bits) @@ -129,16 +138,40 @@ GIT index format (Version 4) In version 4, the padding after the pathname does not exist. -== Extensions - -=== Cached tree + Interpretation of index entries in split index mode is completely + different. See below for details. - Cached tree extension contains pre-computed hashes for trees that can - be derived from the index. It helps speed up tree object generation - from index for a new commit. +== Extensions - When a path is updated in index, the path must be invalidated and - removed from tree cache. +=== Cache tree + + Since the index does not record entries for directories, the cache + entries cannot describe tree objects that already exist in the object + database for regions of the index that are unchanged from an existing + commit. The cache tree extension stores a recursive tree structure that + describes the trees that already exist and completely match sections of + the cache entries. This speeds up tree object generation from the index + for a new commit by only computing the trees that are "new" to that + commit. It also assists when comparing the index to another tree, such + as `HEAD^{tree}`, since sections of the index can be skipped when a tree + comparison demonstrates equality. + + The recursive tree structure uses nodes that store a number of cache + entries, a list of subnodes, and an object ID (OID). The OID references + the existing tree for that node, if it is known to exist. The subnodes + correspond to subdirectories that themselves have cache tree nodes. The + number of cache entries corresponds to the number of cache entries in + the index that describe paths within that tree's directory. + + The extension tracks the full directory structure in the cache tree + extension, but this is generally smaller than the full cache entry list. + + When a path is updated in index, Git invalidates all nodes of the + recursive cache tree corresponding to the parent directories of that + path. We store these tree nodes as being "invalid" by using "-1" as the + number of cache entries. Invalid nodes still store a span of index + entries, allowing Git to focus its efforts when reconstructing a full + cache tree. The signature for this extension is { 'T', 'R', 'E', 'E' }. @@ -157,8 +190,8 @@ GIT index format - A newline (ASCII 10); and - - 160-bit object name for the object that would result from writing - this span of index as a tree. + - Object name for the object that would result from writing this span + of index as a tree. An entry can be in an invalidated state and is represented by having a negative number in the entry_count field. In this case, there is no @@ -167,15 +200,16 @@ GIT index format The entries are written out in the top-down, depth-first order. The first entry represents the root level of the repository, followed by the - first subtree---let's call this A---of the root level (with its name + first subtree--let's call this A--of the root level (with its name relative to the root level), followed by the first subtree of A (with - its name relative to A), ... + its name relative to A), and so on. The specified number of subtrees + indicates when the current level of the recursive stack is complete. === Resolve undo A conflict is represented in the index as a set of higher stage entries. When a conflict is resolved (e.g. with "git add path"), these higher - stage entries will be removed and a stage-0 entry with proper resoluton + stage entries will be removed and a stage-0 entry with proper resolution is added. When these higher stage entries are removed, they are saved in the @@ -195,6 +229,178 @@ GIT index format stage 1 to 3 (a missing stage is represented by "0" in this field); and - - At most three 160-bit object names of the entry in stages from 1 to 3 + - At most three object names of the entry in stages from 1 to 3 (nothing is written for a missing stage). +=== Split index + + In split index mode, the majority of index entries could be stored + in a separate file. This extension records the changes to be made on + top of that to produce the final index. + + The signature for this extension is { 'l', 'i', 'n', 'k' }. + + The extension consists of: + + - Hash of the shared index file. The shared index file path + is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the + index does not require a shared index file. + + - An ewah-encoded delete bitmap, each bit represents an entry in the + shared index. If a bit is set, its corresponding entry in the + shared index will be removed from the final index. Note, because + a delete operation changes index entry positions, but we do need + original positions in replace phase, it's best to just mark + entries for removal, then do a mass deletion after replacement. + + - An ewah-encoded replace bitmap, each bit represents an entry in + the shared index. If a bit is set, its corresponding entry in the + shared index will be replaced with an entry in this index + file. All replaced entries are stored in sorted order in this + index. The first "1" bit in the replace bitmap corresponds to the + first index entry, the second "1" bit to the second entry and so + on. Replaced entries may have empty path names to save space. + + The remaining index entries after replaced ones will be added to the + final index. These added entries are also sorted by entry name then + stage. + +== Untracked cache + + Untracked cache saves the untracked file list and necessary data to + verify the cache. The signature for this extension is { 'U', 'N', + 'T', 'R' }. + + The extension starts with + + - A sequence of NUL-terminated strings, preceded by the size of the + sequence in variable width encoding. Each string describes the + environment where the cache can be used. + + - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from + ctime field until "file size". + + - Stat data of core.excludesFile + + - 32-bit dir_flags (see struct dir_struct) + + - Hash of $GIT_DIR/info/exclude. A null hash means the file + does not exist. + + - Hash of core.excludesFile. A null hash means the file does + not exist. + + - NUL-terminated string of per-dir exclude file name. This usually + is ".gitignore". + + - The number of following directory blocks, variable width + encoding. If this number is zero, the extension ends here with a + following NUL. + + - A number of directory blocks in depth-first-search order, each + consists of + + - The number of untracked entries, variable width encoding. + + - The number of sub-directory blocks, variable width encoding. + + - The directory name terminated by NUL. + + - A number of untracked file/dir names terminated by NUL. + +The remaining data of each directory block is grouped by type: + + - An ewah bitmap, the n-th bit marks whether the n-th directory has + valid untracked cache entries. + + - An ewah bitmap, the n-th bit records "check-only" bit of + read_directory_recursive() for the n-th directory. + + - An ewah bitmap, the n-th bit indicates whether hash and stat data + is valid for the n-th directory and exists in the next data. + + - An array of stat data. The n-th data corresponds with the n-th + "one" bit in the previous ewah bitmap. + + - An array of hashes. The n-th hash corresponds with the n-th "one" bit + in the previous ewah bitmap. + + - One NUL. + +== File System Monitor cache + + The file system monitor cache tracks files for which the core.fsmonitor + hook has told us about changes. The signature for this extension is + { 'F', 'S', 'M', 'N' }. + + The extension starts with + + - 32-bit version number: the current supported versions are 1 and 2. + + - (Version 1) + 64-bit time: the extension data reflects all changes through the given + time which is stored as the nanoseconds elapsed since midnight, + January 1, 1970. + + - (Version 2) + A null terminated string: an opaque token defined by the file system + monitor application. The extension data reflects all changes relative + to that token. + + - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. + + - An ewah bitmap, the n-th bit indicates whether the n-th index entry + is not CE_FSMONITOR_VALID. + +== End of Index Entry + + The End of Index Entry (EOIE) is used to locate the end of the variable + length index entries and the beginning of the extensions. Code can take + advantage of this to quickly locate the index extensions without having + to parse through all of the index entries. + + Because it must be able to be loaded before the variable length cache + entries and other index extensions, this extension must be written last. + The signature for this extension is { 'E', 'O', 'I', 'E' }. + + The extension consists of: + + - 32-bit offset to the end of the index entries + + - Hash over the extension types and their sizes (but not + their contents). E.g. if we have "TREE" extension that is N-bytes + long, "REUC" extension that is M-bytes long, followed by "EOIE", + then the hash would be: + + Hash("TREE" + <binary representation of N> + + "REUC" + <binary representation of M>) + +== Index Entry Offset Table + + The Index Entry Offset Table (IEOT) is used to help address the CPU + cost of loading the index by enabling multi-threading the process of + converting cache entries from the on-disk format to the in-memory format. + The signature for this extension is { 'I', 'E', 'O', 'T' }. + + The extension consists of: + + - 32-bit version (currently 1) + + - A number of index offset entries each consisting of: + + - 32-bit offset from the beginning of the file to the first cache entry + in this block of entries. + + - 32-bit count of cache entries in this block + +== Sparse Directory Entries + + When using sparse-checkout in cone mode, some entire directories within + the index can be summarized by pointing to a tree object instead of the + entire expanded list of paths within that tree. An index containing such + entries is a "sparse index". Index format versions 4 and less were not + implemented with such entries in mind. Thus, for these versions, an + index containing sparse directory entries will include this extension + with signature { 's', 'd', 'i', 'r' }. Like the split-index extension, + tools should avoid interacting with a sparse index unless they understand + this extension. |