summaryrefslogtreecommitdiff
path: root/Documentation/technical/commit-graph-format.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical/commit-graph-format.txt')
-rw-r--r--Documentation/technical/commit-graph-format.txt43
1 files changed, 39 insertions, 4 deletions
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index a4f17441ae..b3b58880b9 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -17,6 +17,9 @@ metadata, including:
- The parents of the commit, stored using positional references within
the graph file.
+- The Bloom filter of the commit carrying the paths that were changed between
+ the commit and its first parent, if requested.
+
These positional references are stored as unsigned 32-bit integers
corresponding to the array position within the list of commit OIDs. Due
to some special constants we use to track parents, we can store at most
@@ -29,7 +32,7 @@ the body into "chunks" and provide a binary lookup table at the beginning
of the body. The header includes certain values, such as number of chunks
and hash type.
-All 4-byte numbers are in network order.
+All multi-byte numbers are in network byte order.
HEADER:
@@ -39,8 +42,13 @@ HEADER:
1-byte version number:
Currently, the only valid version is 1.
- 1-byte Hash Version (1 = SHA-1)
- We infer the hash length (H) from this value.
+ 1-byte Hash Version
+ We infer the hash length (H) from this value:
+ 1 => SHA-1
+ 2 => SHA-256
+ If the hash type does not match the repository's hash algorithm, the
+ commit-graph file should be ignored with a warning presented to the
+ user.
1-byte number (C) of "chunks"
@@ -74,7 +82,7 @@ CHUNK DATA:
Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)
* The first H bytes are for the OID of the root tree.
* The next 8 bytes are for the positions of the first two parents
- of the ith commit. Stores value 0x7000000 if no parent in that
+ of the ith commit. Stores value 0x70000000 if no parent in that
position. If there are more than two parents, the second value
has its most-significant bit on and the other bits store an array
position into the Extra Edge List chunk.
@@ -93,6 +101,33 @@ CHUNK DATA:
positions for the parents until reaching a value with the most-significant
bit on. The other bits correspond to the position of the last parent.
+ Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional]
+ * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters
+ from commit 0 to commit i (inclusive) in lexicographic order. The Bloom
+ filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header
+ length), where BIDX[-1] is 0.
+ * The BIDX chunk is ignored if the BDAT chunk is not present.
+
+ Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
+ * It starts with header consisting of three unsigned 32-bit integers:
+ - Version of the hash algorithm being used. We currently only support
+ value 1 which corresponds to the 32-bit version of the murmur3 hash
+ implemented exactly as described in
+ https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double
+ hashing technique using seed values 0x293ae76f and 0x7e646e2 as
+ described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters
+ in Probabilistic Verification"
+ - The number of times a path is hashed and hence the number of bit positions
+ that cumulatively determine whether a file is present in the commit.
+ - The minimum number of bits 'b' per entry in the Bloom filter. If the filter
+ contains 'n' entries, then the filter size is the minimum number of 64-bit
+ words that contain n*b bits.
+ * The rest of the chunk is the concatenation of all the computed Bloom
+ filters for the commits in lexicographic order.
+ * Note: Commits with no changes or more than 512 changes have Bloom filters
+ of length one, with either all bits set to zero or one respectively.
+ * The BDAT chunk is present if and only if BIDX is present.
+
Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
This list of H-byte hashes describe a set of B commit-graph files that
form a commit-graph chain. The graph position for the ith commit in this