summaryrefslogtreecommitdiff
path: root/Documentation/technical/index-format.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical/index-format.txt')
-rw-r--r--Documentation/technical/index-format.txt111
1 files changed, 80 insertions, 31 deletions
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index 7c4d67aa6a..65da0daaa5 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -3,8 +3,11 @@ Git index format
== The Git index file has the following format
- All binary numbers are in network byte order. Version 2 is described
- here unless stated otherwise.
+ All binary numbers are in network byte order.
+ In a repository using the traditional SHA-1, checksums and object IDs
+ (object names) mentioned below are all computed using SHA-1. Similarly,
+ in SHA-256 repositories, these values are computed using SHA-256.
+ Version 2 is described here unless stated otherwise.
- A 12-byte header consisting of
@@ -23,7 +26,7 @@ Git index format
Extensions are identified by signature. Optional extensions can
be ignored if Git does not understand them.
- Git currently supports cached tree and resolve undo extensions.
+ Git currently supports cache tree and resolve undo extensions.
4-byte extension signature. If the first byte is 'A'..'Z' the
extension is optional and can be ignored.
@@ -32,8 +35,7 @@ Git index format
Extension data
- - 160-bit SHA-1 over the content of the index file before this
- checksum.
+ - Hash checksum over the content of the index file before this checksum.
== Index entry
@@ -42,6 +44,13 @@ Git index format
localization, no special casing of directory separator '/'). Entries
with the same name are sorted by their stage field.
+ An index entry typically represents a file. However, if sparse-checkout
+ is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the
+ `extensions.sparseIndex` extension is enabled, then the index may
+ contain entries for directories outside of the sparse-checkout definition.
+ These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and
+ the path ends in a directory separator.
+
32-bit ctime seconds, the last time a file's metadata changed
this is stat(2) data
@@ -80,7 +89,7 @@ Git index format
32-bit file size
This is the on-disk size from stat(2), truncated to 32-bit.
- 160-bit SHA-1 for the represented object
+ Object name for the represented object
A 16-bit 'flags' field split into (high to low bits)
@@ -134,14 +143,35 @@ Git index format
== Extensions
-=== Cached tree
-
- Cached tree extension contains pre-computed hashes for trees that can
- be derived from the index. It helps speed up tree object generation
- from index for a new commit.
-
- When a path is updated in index, the path must be invalidated and
- removed from tree cache.
+=== Cache tree
+
+ Since the index does not record entries for directories, the cache
+ entries cannot describe tree objects that already exist in the object
+ database for regions of the index that are unchanged from an existing
+ commit. The cache tree extension stores a recursive tree structure that
+ describes the trees that already exist and completely match sections of
+ the cache entries. This speeds up tree object generation from the index
+ for a new commit by only computing the trees that are "new" to that
+ commit. It also assists when comparing the index to another tree, such
+ as `HEAD^{tree}`, since sections of the index can be skipped when a tree
+ comparison demonstrates equality.
+
+ The recursive tree structure uses nodes that store a number of cache
+ entries, a list of subnodes, and an object ID (OID). The OID references
+ the existing tree for that node, if it is known to exist. The subnodes
+ correspond to subdirectories that themselves have cache tree nodes. The
+ number of cache entries corresponds to the number of cache entries in
+ the index that describe paths within that tree's directory.
+
+ The extension tracks the full directory structure in the cache tree
+ extension, but this is generally smaller than the full cache entry list.
+
+ When a path is updated in index, Git invalidates all nodes of the
+ recursive cache tree corresponding to the parent directories of that
+ path. We store these tree nodes as being "invalid" by using "-1" as the
+ number of cache entries. Invalid nodes still store a span of index
+ entries, allowing Git to focus its efforts when reconstructing a full
+ cache tree.
The signature for this extension is { 'T', 'R', 'E', 'E' }.
@@ -160,8 +190,8 @@ Git index format
- A newline (ASCII 10); and
- - 160-bit object name for the object that would result from writing
- this span of index as a tree.
+ - Object name for the object that would result from writing this span
+ of index as a tree.
An entry can be in an invalidated state and is represented by having
a negative number in the entry_count field. In this case, there is no
@@ -172,7 +202,8 @@ Git index format
first entry represents the root level of the repository, followed by the
first subtree--let's call this A--of the root level (with its name
relative to the root level), followed by the first subtree of A (with
- its name relative to A), ...
+ its name relative to A), and so on. The specified number of subtrees
+ indicates when the current level of the recursive stack is complete.
=== Resolve undo
@@ -198,7 +229,7 @@ Git index format
stage 1 to 3 (a missing stage is represented by "0" in this field);
and
- - At most three 160-bit object names of the entry in stages from 1 to 3
+ - At most three object names of the entry in stages from 1 to 3
(nothing is written for a missing stage).
=== Split index
@@ -211,8 +242,8 @@ Git index format
The extension consists of:
- - 160-bit SHA-1 of the shared index file. The shared index file path
- is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
+ - Hash of the shared index file. The shared index file path
+ is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
index does not require a shared index file.
- An ewah-encoded delete bitmap, each bit represents an entry in the
@@ -249,14 +280,14 @@ Git index format
- Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
ctime field until "file size".
- - Stat data of core.excludesfile
+ - Stat data of core.excludesFile
- 32-bit dir_flags (see struct dir_struct)
- - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
+ - Hash of $GIT_DIR/info/exclude. A null hash means the file
does not exist.
- - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
+ - Hash of core.excludesFile. A null hash means the file does
not exist.
- NUL-terminated string of per-dir exclude file name. This usually
@@ -285,13 +316,13 @@ The remaining data of each directory block is grouped by type:
- An ewah bitmap, the n-th bit records "check-only" bit of
read_directory_recursive() for the n-th directory.
- - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
+ - An ewah bitmap, the n-th bit indicates whether hash and stat data
is valid for the n-th directory and exists in the next data.
- An array of stat data. The n-th data corresponds with the n-th
"one" bit in the previous ewah bitmap.
- - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
+ - An array of hashes. The n-th hash corresponds with the n-th "one" bit
in the previous ewah bitmap.
- One NUL.
@@ -304,12 +335,18 @@ The remaining data of each directory block is grouped by type:
The extension starts with
- - 32-bit version number: the current supported version is 1.
+ - 32-bit version number: the current supported versions are 1 and 2.
- - 64-bit time: the extension data reflects all changes through the given
+ - (Version 1)
+ 64-bit time: the extension data reflects all changes through the given
time which is stored as the nanoseconds elapsed since midnight,
January 1, 1970.
+ - (Version 2)
+ A null terminated string: an opaque token defined by the file system
+ monitor application. The extension data reflects all changes relative
+ to that token.
+
- 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
- An ewah bitmap, the n-th bit indicates whether the n-th index entry
@@ -318,7 +355,7 @@ The remaining data of each directory block is grouped by type:
== End of Index Entry
The End of Index Entry (EOIE) is used to locate the end of the variable
- length index entries and the begining of the extensions. Code can take
+ length index entries and the beginning of the extensions. Code can take
advantage of this to quickly locate the index extensions without having
to parse through all of the index entries.
@@ -330,12 +367,12 @@ The remaining data of each directory block is grouped by type:
- 32-bit offset to the end of the index entries
- - 160-bit SHA-1 over the extension types and their sizes (but not
+ - Hash over the extension types and their sizes (but not
their contents). E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:
- SHA-1("TREE" + <binary representation of N> +
+ Hash("TREE" + <binary representation of N> +
"REUC" + <binary representation of M>)
== Index Entry Offset Table
@@ -351,7 +388,19 @@ The remaining data of each directory block is grouped by type:
- A number of index offset entries each consisting of:
- - 32-bit offset from the begining of the file to the first cache entry
+ - 32-bit offset from the beginning of the file to the first cache entry
in this block of entries.
- 32-bit count of cache entries in this block
+
+== Sparse Directory Entries
+
+ When using sparse-checkout in cone mode, some entire directories within
+ the index can be summarized by pointing to a tree object instead of the
+ entire expanded list of paths within that tree. An index containing such
+ entries is a "sparse index". Index format versions 4 and less were not
+ implemented with such entries in mind. Thus, for these versions, an
+ index containing sparse directory entries will include this extension
+ with signature { 's', 'd', 'i', 'r' }. Like the split-index extension,
+ tools should avoid interacting with a sparse index unless they understand
+ this extension.