diff options
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/api-trace2.txt | 3 | ||||
-rw-r--r-- | Documentation/technical/bundle-format.txt | 48 | ||||
-rw-r--r-- | Documentation/technical/commit-graph-format.txt | 30 | ||||
-rw-r--r-- | Documentation/technical/pack-format.txt | 5 |
4 files changed, 83 insertions, 3 deletions
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt index 4f07ceadcb..6b6085585d 100644 --- a/Documentation/technical/api-trace2.txt +++ b/Documentation/technical/api-trace2.txt @@ -656,7 +656,8 @@ The "exec_id" field is a command-unique id and is only useful if the ------------ `"def_param"`:: - This event is generated to log a global parameter. + This event is generated to log a global parameter, such as a config + setting, command-line flag, or environment variable. + ------------ { diff --git a/Documentation/technical/bundle-format.txt b/Documentation/technical/bundle-format.txt new file mode 100644 index 0000000000..0e828151a5 --- /dev/null +++ b/Documentation/technical/bundle-format.txt @@ -0,0 +1,48 @@ += Git bundle v2 format + +The Git bundle format is a format that represents both refs and Git objects. + +== Format + +We will use ABNF notation to define the Git bundle format. See +protocol-common.txt for the details. + +---- +bundle = signature *prerequisite *reference LF pack +signature = "# v2 git bundle" LF + +prerequisite = "-" obj-id SP comment LF +comment = *CHAR +reference = obj-id SP refname LF + +pack = ... ; packfile +---- + +== Semantics + +A Git bundle consists of three parts. + +* "Prerequisites" lists the objects that are NOT included in the bundle and the + reader of the bundle MUST already have, in order to use the data in the + bundle. The objects stored in the bundle may refer to prerequisite objects and + anything reachable from them (e.g. a tree object in the bundle can reference + a blob that is reachable from a prerequisite) and/or expressed as a delta + against prerequisite objects. + +* "References" record the tips of the history graph, iow, what the reader of the + bundle CAN "git fetch" from it. + +* "Pack" is the pack data stream "git fetch" would send, if you fetch from a + repository that has the references recorded in the "References" above into a + repository that has references pointing at the objects listed in + "Prerequisites" above. + +In the bundle format, there can be a comment following a prerequisite obj-id. +This is a comment and it has no specific meaning. The writer of the bundle MAY +put any string here. The reader of the bundle MUST ignore the comment. + +=== Note on the shallow clone and a Git bundle + +Note that the prerequisites does not represent a shallow-clone boundary. The +semantics of the prerequisites and the shallow-clone boundaries are different, +and the Git bundle v2 format cannot represent a shallow clone repository. diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index a4f17441ae..1beef17182 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -17,6 +17,9 @@ metadata, including: - The parents of the commit, stored using positional references within the graph file. +- The Bloom filter of the commit carrying the paths that were changed between + the commit and its first parent, if requested. + These positional references are stored as unsigned 32-bit integers corresponding to the array position within the list of commit OIDs. Due to some special constants we use to track parents, we can store at most @@ -93,6 +96,33 @@ CHUNK DATA: positions for the parents until reaching a value with the most-significant bit on. The other bits correspond to the position of the last parent. + Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional] + * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters + from commit 0 to commit i (inclusive) in lexicographic order. The Bloom + filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header + length), where BIDX[-1] is 0. + * The BIDX chunk is ignored if the BDAT chunk is not present. + + Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional] + * It starts with header consisting of three unsigned 32-bit integers: + - Version of the hash algorithm being used. We currently only support + value 1 which corresponds to the 32-bit version of the murmur3 hash + implemented exactly as described in + https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double + hashing technique using seed values 0x293ae76f and 0x7e646e2 as + described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters + in Probabilistic Verification" + - The number of times a path is hashed and hence the number of bit positions + that cumulatively determine whether a file is present in the commit. + - The minimum number of bits 'b' per entry in the Bloom filter. If the filter + contains 'n' entries, then the filter size is the minimum number of 64-bit + words that contain n*b bits. + * The rest of the chunk is the concatenation of all the computed Bloom + filters for the commits in lexicographic order. + * Note: Commits with no changes or more than 512 changes have Bloom filters + of length zero. + * The BDAT chunk is present if and only if BIDX is present. + Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] This list of H-byte hashes describe a set of B commit-graph files that form a commit-graph chain. The graph position for the ith commit in this diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index cab5bdd2ff..d3a142c652 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -315,10 +315,11 @@ CHUNK DATA: Stores two 4-byte values for every object. 1: The pack-int-id for the pack storing this object. 2: The offset within the pack. - If all offsets are less than 2^31, then the large offset chunk + If all offsets are less than 2^32, then the large offset chunk will not exist and offsets are stored as in IDX v1. If there is at least one offset value larger than 2^32-1, then - the large offset chunk must exist. If the large offset chunk + the large offset chunk must exist, and offsets larger than + 2^31-1 must be stored in it instead. If the large offset chunk exists and the 31st bit is on, then removing that bit reveals the row in the large offsets containing the 8-byte offset of this object. |