diff options
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/api-config.txt | 18 | ||||
-rw-r--r-- | Documentation/technical/api-directory-listing.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/api-object-access.txt | 4 | ||||
-rw-r--r-- | Documentation/technical/api-oid-array.txt | 17 | ||||
-rw-r--r-- | Documentation/technical/api-submodule-config.txt | 4 | ||||
-rw-r--r-- | Documentation/technical/commit-graph-format.txt | 97 | ||||
-rw-r--r-- | Documentation/technical/commit-graph.txt | 182 | ||||
-rw-r--r-- | Documentation/technical/hash-function-transition.txt | 40 | ||||
-rw-r--r-- | Documentation/technical/http-protocol.txt | 3 | ||||
-rw-r--r-- | Documentation/technical/long-running-process-protocol.txt | 50 | ||||
-rw-r--r-- | Documentation/technical/pack-format.txt | 92 | ||||
-rw-r--r-- | Documentation/technical/pack-protocol.txt | 8 | ||||
-rw-r--r-- | Documentation/technical/protocol-capabilities.txt | 8 | ||||
-rw-r--r-- | Documentation/technical/protocol-v2.txt | 19 | ||||
-rw-r--r-- | Documentation/technical/repository-version.txt | 12 | ||||
-rw-r--r-- | Documentation/technical/shallow.txt | 20 |
16 files changed, 540 insertions, 36 deletions
diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt index 9a778b0cad..fa39ac9d71 100644 --- a/Documentation/technical/api-config.txt +++ b/Documentation/technical/api-config.txt @@ -47,21 +47,23 @@ will first feed the user-wide one to the callback, and then the repo-specific one; by overwriting, the higher-priority repo-specific value is left at the end). -The `git_config_with_options` function lets the caller examine config +The `config_with_options` function lets the caller examine config while adjusting some of the default behavior of `git_config`. It should almost never be used by "regular" Git code that is looking up configuration variables. It is intended for advanced callers like `git-config`, which are intentionally tweaking the normal config-lookup process. It takes two extra parameters: -`filename`:: -If this parameter is non-NULL, it specifies the name of a file to -parse for configuration, rather than looking in the usual files. Regular -`git_config` defaults to `NULL`. +`config_source`:: +If this parameter is non-NULL, it specifies the source to parse for +configuration, rather than looking in the usual files. See `struct +git_config_source` in `config.h` for details. Regular `git_config` defaults +to `NULL`. -`respect_includes`:: -Specify whether include directives should be followed in parsed files. -Regular `git_config` defaults to `1`. +`opts`:: +Specify options to adjust the behavior of parsing config files. See `struct +config_options` in `config.h` for details. As an example: regular `git_config` +sets `opts.respect_includes` to `1` by default. Reading Specific Files ---------------------- diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt index 7fae00f44f..4f44ca24f6 100644 --- a/Documentation/technical/api-directory-listing.txt +++ b/Documentation/technical/api-directory-listing.txt @@ -53,7 +53,7 @@ The notable options are: not be returned even if all of its contents are ignored. In this case, the contents are returned as individual entries. + -If this is set, files and directories that explicity match an ignore +If this is set, files and directories that explicitly match an ignore pattern are reported. Implicity ignored directories (directories that do not match an ignore pattern, but whose contents are all ignored) are not reported, instead all of the contents are reported. diff --git a/Documentation/technical/api-object-access.txt b/Documentation/technical/api-object-access.txt index 03bb0e950d..5b29622d00 100644 --- a/Documentation/technical/api-object-access.txt +++ b/Documentation/technical/api-object-access.txt @@ -1,13 +1,13 @@ object access API ================= -Talk about <sha1_file.c> and <object.h> family, things like +Talk about <sha1-file.c> and <object.h> family, things like * read_sha1_file() * read_object_with_reference() * has_sha1_file() * write_sha1_file() -* pretend_sha1_file() +* pretend_object_file() * lookup_{object,commit,tag,blob,tree} * parse_{object,commit,tag,blob,tree} * Use of object flags diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt index b0c11f868d..9febfb1d52 100644 --- a/Documentation/technical/api-oid-array.txt +++ b/Documentation/technical/api-oid-array.txt @@ -35,13 +35,18 @@ Functions Free all memory associated with the array and return it to the initial, empty state. +`oid_array_for_each`:: + Iterate over each element of the list, executing the callback + function for each one. Does not sort the list, so any custom + hash order is retained. If the callback returns a non-zero + value, the iteration ends immediately and the callback's + return is propagated; otherwise, 0 is returned. + `oid_array_for_each_unique`:: - Efficiently iterate over each unique element of the list, - executing the callback function for each one. If the array is - not sorted, this function has the side effect of sorting it. If - the callback returns a non-zero value, the iteration ends - immediately and the callback's return is propagated; otherwise, - 0 is returned. + Iterate over each unique element of the list in sorted order, + but otherwise behave like `oid_array_for_each`. If the array + is not sorted, this function has the side effect of sorting + it. Examples -------- diff --git a/Documentation/technical/api-submodule-config.txt b/Documentation/technical/api-submodule-config.txt index 3dce003fda..fb06089393 100644 --- a/Documentation/technical/api-submodule-config.txt +++ b/Documentation/technical/api-submodule-config.txt @@ -4,7 +4,7 @@ submodule config cache API The submodule config cache API allows to read submodule configurations/information from specified revisions. Internally information is lazily read into a cache that is used to avoid -unnecessary parsing of the same .gitmodule files. Lookups can be done by +unnecessary parsing of the same .gitmodules files. Lookups can be done by submodule path or name. Usage @@ -38,7 +38,7 @@ Data Structures Functions --------- -`void submodule_free()`:: +`void submodule_free(struct repository *r)`:: Use these to free the internally cached values. diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt new file mode 100644 index 0000000000..ad6af8105c --- /dev/null +++ b/Documentation/technical/commit-graph-format.txt @@ -0,0 +1,97 @@ +Git commit graph format +======================= + +The Git commit graph stores a list of commit OIDs and some associated +metadata, including: + +- The generation number of the commit. Commits with no parents have + generation number 1; commits with parents have generation number + one more than the maximum generation number of its parents. We + reserve zero as special, and can be used to mark a generation + number invalid or as "not computed". + +- The root tree OID. + +- The commit date. + +- The parents of the commit, stored using positional references within + the graph file. + +These positional references are stored as unsigned 32-bit integers +corresponding to the array position withing the list of commit OIDs. We +use the most-significant bit for special purposes, so we can store at most +(1 << 31) - 1 (around 2 billion) commits. + +== Commit graph files have the following format: + +In order to allow extensions that add extra data to the graph, we organize +the body into "chunks" and provide a binary lookup table at the beginning +of the body. The header includes certain values, such as number of chunks +and hash type. + +All 4-byte numbers are in network order. + +HEADER: + + 4-byte signature: + The signature is: {'C', 'G', 'P', 'H'} + + 1-byte version number: + Currently, the only valid version is 1. + + 1-byte Hash Version (1 = SHA-1) + We infer the hash length (H) from this value. + + 1-byte number (C) of "chunks" + + 1-byte (reserved for later use) + Current clients should ignore this value. + +CHUNK LOOKUP: + + (C + 1) * 12 bytes listing the table of contents for the chunks: + First 4 bytes describe the chunk id. Value 0 is a terminating label. + Other 8 bytes provide the byte-offset in current file for chunk to + start. (Chunks are ordered contiguously in the file, so you can infer + the length using the next chunk position if necessary.) Each chunk + ID appears at most once. + + The remaining data in the body is described one chunk at a time, and + these chunks may be given in any order. Chunks are required unless + otherwise specified. + +CHUNK DATA: + + OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) + The ith entry, F[i], stores the number of OIDs with first + byte at most i. Thus F[255] stores the total + number of commits (N). + + OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) + The OIDs for all commits in the graph, sorted in ascending order. + + Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes) + * The first H bytes are for the OID of the root tree. + * The next 8 bytes are for the positions of the first two parents + of the ith commit. Stores value 0xffffffff if no parent in that + position. If there are more than two parents, the second value + has its most-significant bit on and the other bits store an array + position into the Large Edge List chunk. + * The next 8 bytes store the generation number of the commit and + the commit time in seconds since EPOCH. The generation number + uses the higher 30 bits of the first 4 bytes, while the commit + time uses the 32 bits of the second 4 bytes, along with the lowest + 2 bits of the lowest byte, storing the 33rd and 34th bit of the + commit time. + + Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] + This list of 4-byte values store the second through nth parents for + all octopus merges. The second parent value in the commit data stores + an array position within this list along with the most-significant bit + on. Starting at that array position, iterate through this list of commit + positions for the parents until reaching a value with the most-significant + bit on. The other bits correspond to the position of the last parent. + +TRAILER: + + H-byte HASH-checksum of all of the above. diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt new file mode 100644 index 0000000000..e1a883eb46 --- /dev/null +++ b/Documentation/technical/commit-graph.txt @@ -0,0 +1,182 @@ +Git Commit Graph Design Notes +============================= + +Git walks the commit graph for many reasons, including: + +1. Listing and filtering commit history. +2. Computing merge bases. + +These operations can become slow as the commit count grows. The merge +base calculation shows up in many user-facing commands, such as 'merge-base' +or 'status' and can take minutes to compute depending on history shape. + +There are two main costs here: + +1. Decompressing and parsing commits. +2. Walking the entire graph to satisfy topological order constraints. + +The commit graph file is a supplemental data structure that accelerates +commit graph walks. If a user downgrades or disables the 'core.commitGraph' +config setting, then the existing ODB is sufficient. The file is stored +as "commit-graph" either in the .git/objects/info directory or in the info +directory of an alternate. + +The commit graph file stores the commit graph structure along with some +extra metadata to speed up graph walks. By listing commit OIDs in lexi- +cographic order, we can identify an integer position for each commit and +refer to the parents of a commit using those integer positions. We use +binary search to find initial commits and then use the integer positions +for fast lookups during the walk. + +A consumer may load the following info for a commit from the graph: + +1. The commit OID. +2. The list of parents, along with their integer position. +3. The commit date. +4. The root tree OID. +5. The generation number (see definition below). + +Values 1-4 satisfy the requirements of parse_commit_gently(). + +Define the "generation number" of a commit recursively as follows: + + * A commit with no parents (a root commit) has generation number one. + + * A commit with at least one parent has generation number one more than + the largest generation number among its parents. + +Equivalently, the generation number of a commit A is one more than the +length of a longest path from A to a root commit. The recursive definition +is easier to use for computation and observing the following property: + + If A and B are commits with generation numbers N and M, respectively, + and N <= M, then A cannot reach B. That is, we know without searching + that B is not an ancestor of A because it is further from a root commit + than A. + + Conversely, when checking if A is an ancestor of B, then we only need + to walk commits until all commits on the walk boundary have generation + number at most N. If we walk commits using a priority queue seeded by + generation numbers, then we always expand the boundary commit with highest + generation number and can easily detect the stopping condition. + +This property can be used to significantly reduce the time it takes to +walk commits and determine topological relationships. Without generation +numbers, the general heuristic is the following: + + If A and B are commits with commit time X and Y, respectively, and + X < Y, then A _probably_ cannot reach B. + +This heuristic is currently used whenever the computation is allowed to +violate topological relationships due to clock skew (such as "git log" +with default order), but is not used when the topological order is +required (such as merge base calculations, "git log --graph"). + +In practice, we expect some commits to be created recently and not stored +in the commit graph. We can treat these commits as having "infinite" +generation number and walk until reaching commits with known generation +number. + +We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not +in the commit-graph file. If a commit-graph file was written by a version +of Git that did not compute generation numbers, then those commits will +have generation number represented by the macro GENERATION_NUMBER_ZERO = 0. + +Since the commit-graph file is closed under reachability, we can guarantee +the following weaker condition on all commits: + + If A and B are commits with generation numbers N amd M, respectively, + and N < M, then A cannot reach B. + +Note how the strict inequality differs from the inequality when we have +fully-computed generation numbers. Using strict inequality may result in +walking a few extra commits, but the simplicity in dealing with commits +with generation number *_INFINITY or *_ZERO is valuable. + +We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose +generation numbers are computed to be at least this value. We limit at +this value since it is the largest value that can be stored in the +commit-graph file using the 30 bits available to generation numbers. This +presents another case where a commit can have generation number equal to +that of a parent. + +Design Details +-------------- + +- The commit graph file is stored in a file named 'commit-graph' in the + .git/objects/info directory. This could be stored in the info directory + of an alternate. + +- The core.commitGraph config setting must be on to consume graph files. + +- The file format includes parameters for the object ID hash function, + so a future change of hash algorithm does not require a change in format. + +Future Work +----------- + +- The commit graph feature currently does not honor commit grafts. This can + be remedied by duplicating or refactoring the current graft logic. + +- The 'commit-graph' subcommand does not have a "verify" mode that is + necessary for integration with fsck. + +- After computing and storing generation numbers, we must make graph + walks aware of generation numbers to gain the performance benefits they + enable. This will mostly be accomplished by swapping a commit-date-ordered + priority queue with one ordered by generation number. The following + operations are important candidates: + + - 'log --topo-order' + - 'tag --merged' + +- Currently, parse_commit_gently() requires filling in the root tree + object for a commit. This passes through lookup_tree() and consequently + lookup_object(). Also, it calls lookup_commit() when loading the parents. + These method calls check the ODB for object existence, even if the + consumer does not need the content. For example, we do not need the + tree contents when computing merge bases. Now that commit parsing is + removed from the computation time, these lookup operations are the + slowest operations keeping graph walks from being fast. Consider + loading these objects without verifying their existence in the ODB and + only loading them fully when consumers need them. Consider a method + such as "ensure_tree_loaded(commit)" that fully loads a tree before + using commit->tree. + +- The current design uses the 'commit-graph' subcommand to generate the graph. + When this feature stabilizes enough to recommend to most users, we should + add automatic graph writes to common operations that create many commits. + For example, one could compute a graph on 'clone', 'fetch', or 'repack' + commands. + +- A server could provide a commit graph file as part of the network protocol + to avoid extra calculations by clients. This feature is only of benefit if + the user is willing to trust the file, because verifying the file is correct + is as hard as computing it from scratch. + +Related Links +------------- +[0] https://bugs.chromium.org/p/git/issues/detail?id=8 + Chromium work item for: Serialized Commit Graph + +[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/ + An abandoned patch that introduced generation numbers. + +[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/ + Discussion about generation numbers on commits and how they interact + with fsck. + +[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/ + More discussion about generation numbers and not storing them inside + commit objects. A valuable quote: + + "I think we should be moving more in the direction of keeping + repo-local caches for optimizations. Reachability bitmaps have been + a big performance win. I think we should be doing the same with our + properties of commits. Not just generation numbers, but making it + cheap to access the graph structure without zlib-inflating whole + commit objects (i.e., packv4 or something like the "metapacks" I + proposed a few years ago)." + +[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u + A patch to remove the ahead-behind calculation from 'status'. diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index 417ba491d0..4ab6cd1012 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -28,11 +28,30 @@ advantages: address stored content. Over time some flaws in SHA-1 have been discovered by security -researchers. https://shattered.io demonstrated a practical SHA-1 hash -collision. As a result, SHA-1 cannot be considered cryptographically -secure any more. This impacts the communication of hash values because -we cannot trust that a given hash value represents the known good -version of content that the speaker intended. +researchers. On 23 February 2017 the SHAttered attack +(https://shattered.io) demonstrated a practical SHA-1 hash collision. + +Git v2.13.0 and later subsequently moved to a hardened SHA-1 +implementation by default, which isn't vulnerable to the SHAttered +attack. + +Thus Git has in effect already migrated to a new hash that isn't SHA-1 +and doesn't share its vulnerabilities, its new hash function just +happens to produce exactly the same output for all known inputs, +except two PDFs published by the SHAttered researchers, and the new +implementation (written by those researchers) claims to detect future +cryptanalytic collision attacks. + +Regardless, it's considered prudent to move past any variant of SHA-1 +to a new hash. There's no guarantee that future attacks on SHA-1 won't +be published in the future, and those attacks may not have viable +mitigations. + +If SHA-1 and its variants were to be truly broken, Git's hash function +could not be considered cryptographically secure any more. This would +impact the communication of hash values because we could not trust +that a given hash value represented the known good version of content +that the speaker intended. SHA-1 still possesses the other properties such as fast object lookup and safe error checking, but other hash functions are equally suitable @@ -116,10 +135,15 @@ Documentation/technical/repository-version.txt) with extensions objectFormat = newhash compatObjectFormat = sha1 -Specifying a repository format extension ensures that versions of Git -not aware of NewHash do not try to operate on these repositories, -instead producing an error message: +The combination of setting `core.repositoryFormatVersion=1` and +populating `extensions.*` ensures that all versions of Git later than +`v0.99.9l` will die instead of trying to operate on the NewHash +repository, instead producing an error message. + # Between v0.99.9l and v2.7.0 + $ git status + fatal: Expected git repo version <= 0, found 1 + # After v2.7.0 $ git status fatal: unknown repository extensions found: objectformat diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt index a0e45f2889..64f49d0bbb 100644 --- a/Documentation/technical/http-protocol.txt +++ b/Documentation/technical/http-protocol.txt @@ -214,10 +214,12 @@ smart server reply: S: Cache-Control: no-cache S: S: 001e# service=git-upload-pack\n + S: 0000 S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n + S: 0000 The client may send Extra Parameters (see Documentation/technical/pack-protocol.txt) as a colon-separated string @@ -277,6 +279,7 @@ The returned response contains "version 1" if "version=1" was sent as an Extra Parameter. smart_reply = PKT-LINE("# service=$servicename" LF) + "0000" *1("version 1") ref_list "0000" diff --git a/Documentation/technical/long-running-process-protocol.txt b/Documentation/technical/long-running-process-protocol.txt new file mode 100644 index 0000000000..aa0aa9af1c --- /dev/null +++ b/Documentation/technical/long-running-process-protocol.txt @@ -0,0 +1,50 @@ +Long-running process protocol +============================= + +This protocol is used when Git needs to communicate with an external +process throughout the entire life of a single Git command. All +communication is in pkt-line format (see technical/protocol-common.txt) +over standard input and standard output. + +Handshake +--------- + +Git starts by sending a welcome message (for example, +"git-filter-client"), a list of supported protocol version numbers, and +a flush packet. Git expects to read the welcome message with "server" +instead of "client" (for example, "git-filter-server"), exactly one +protocol version number from the previously sent list, and a flush +packet. All further communication will be based on the selected version. +The remaining protocol description below documents "version=2". Please +note that "version=42" in the example below does not exist and is only +there to illustrate how the protocol would look like with more than one +version. + +After the version negotiation Git sends a list of all capabilities that +it supports and a flush packet. Git expects to read a list of desired +capabilities, which must be a subset of the supported capabilities list, +and a flush packet as response: +------------------------ +packet: git> git-filter-client +packet: git> version=2 +packet: git> version=42 +packet: git> 0000 +packet: git< git-filter-server +packet: git< version=2 +packet: git< 0000 +packet: git> capability=clean +packet: git> capability=smudge +packet: git> capability=not-yet-invented +packet: git> 0000 +packet: git< capability=clean +packet: git< capability=smudge +packet: git< 0000 +------------------------ + +Shutdown +-------- + +Git will close +the command pipe on exit. The filter is expected to detect EOF +and exit gracefully on its own. Git will wait until the filter +process has stopped. diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 8e5bf60be3..70a99fd142 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -36,6 +36,98 @@ Git pack format - The trailer records 20-byte SHA-1 checksum of all of the above. +=== Object types + +Valid object types are: + +- OBJ_COMMIT (1) +- OBJ_TREE (2) +- OBJ_BLOB (3) +- OBJ_TAG (4) +- OBJ_OFS_DELTA (6) +- OBJ_REF_DELTA (7) + +Type 5 is reserved for future expansion. Type 0 is invalid. + +=== Deltified representation + +Conceptually there are only four object types: commit, tree, tag and +blob. However to save space, an object could be stored as a "delta" of +another "base" object. These representations are assigned new types +ofs-delta and ref-delta, which is only valid in a pack file. + +Both ofs-delta and ref-delta store the "delta" to be applied to +another object (called 'base object') to reconstruct the object. The +difference between them is, ref-delta directly encodes 20-byte base +object name. If the base object is in the same pack, ofs-delta encodes +the offset of the base object in the pack instead. + +The base object could also be deltified if it's in the same pack. +Ref-delta can also refer to an object outside the pack (i.e. the +so-called "thin pack"). When stored on disk however, the pack should +be self contained to avoid cyclic dependency. + +The delta data is a sequence of instructions to reconstruct an object +from the base object. If the base object is deltified, it must be +converted to canonical form first. Each instruction appends more and +more data to the target object until it's complete. There are two +supported instructions so far: one for copy a byte range from the +source object and one for inserting new data embedded in the +instruction itself. + +Each instruction has variable length. Instruction type is determined +by the seventh bit of the first octet. The following diagrams follow +the convention in RFC 1951 (Deflate compressed data format). + +==== Instruction to copy from base object + + +----------+---------+---------+---------+---------+-------+-------+-------+ + | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 | + +----------+---------+---------+---------+---------+-------+-------+-------+ + +This is the instruction format to copy a byte range from the source +object. It encodes the offset to copy from and the number of bytes to +copy. Offset and size are in little-endian order. + +All offset and size bytes are optional. This is to reduce the +instruction size when encoding small offsets or sizes. The first seven +bits in the first octet determines which of the next seven octets is +present. If bit zero is set, offset1 is present. If bit one is set +offset2 is present and so on. + +Note that a more compact instruction does not change offset and size +encoding. For example, if only offset2 is omitted like below, offset3 +still contains bits 16-23. It does not become offset2 and contains +bits 8-15 even if it's right next to offset1. + + +----------+---------+---------+ + | 10000101 | offset1 | offset3 | + +----------+---------+---------+ + +In its most compact form, this instruction only takes up one byte +(0x80) with both offset and size omitted, which will have default +values zero. There is another exception: size zero is automatically +converted to 0x10000. + +==== Instruction to add new data + + +----------+============+ + | 0xxxxxxx | data | + +----------+============+ + +This is the instruction to construct target object without the base +object. The following data is appended to the target object. The first +seven bits of the first octet determines the size of data in +bytes. The size must be non-zero. + +==== Reserved instruction + + +----------+============ + | 00000000 | + +----------+============ + +This is the instruction reserved for future expansion. + == Original (version 1) pack-*.idx files have the following format: - The header consists of 256 4-byte network byte order diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index cd31edc91e..7fee6b780a 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -241,6 +241,7 @@ out of what the server said it could do with the first 'want' line. upload-request = want-list *shallow-line *1depth-request + [filter-request] flush-pkt want-list = first-want @@ -256,6 +257,8 @@ out of what the server said it could do with the first 'want' line. additional-want = PKT-LINE("want" SP obj-id) depth = 1*DIGIT + + filter-request = PKT-LINE("filter" SP filter-spec) ---- Clients MUST send all the obj-ids it wants from the reference @@ -278,6 +281,11 @@ complete those commits. Commits whose parents are not received as a result are defined as shallow and marked as such in the server. This information is sent back to the client in the next step. +The client can optionally request that pack-objects omit various +objects from the packfile using one of several filtering techniques. +These are intended for use with partial clone and partial fetch +operations. See `rev-list` for possible "filter-spec" values. + Once all the 'want's and 'shallow's (and optional 'deepen') are transferred, clients MUST send a flush-pkt, to tell the server side that it is done sending the list. diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index 26dcc6f502..332d209b58 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -309,3 +309,11 @@ to accept a signed push certificate, and asks the <nonce> to be included in the push certificate. A send-pack client MUST NOT send a push-cert packet unless the receive-pack server advertises this capability. + +filter +------ + +If the upload-pack server advertises the 'filter' capability, +fetch-pack may send "filter" commands to request a partial clone +or partial fetch and request that the server omit various objects +from the packfile. diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 11c0efbfda..f58f24b1ef 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -289,6 +289,15 @@ included in the clients request as well as the potential addition of the Cannot be used with "deepen", but can be used with "deepen-since". +If the 'filter' feature is advertised, the following argument can be +included in the client's request: + + filter <filter-spec> + Request that various objects from the packfile be omitted + using one of several filtering techniques. These are intended + for use with partial clone and partial fetch operations. See + `rev-list` for possible "filter-spec" values. + The response of `fetch` is broken into a number of sections separated by delimiter packets (0001), with each section beginning with its section header. @@ -392,3 +401,13 @@ header. 1 - pack data 2 - progress messages 3 - fatal error message just before stream aborts + + server-option +~~~~~~~~~~~~~~~ + +If advertised, indicates that any number of server specific options can be +included in a request. This is done by sending each option as a +"server-option=<option>" capability line in the capability-list section of +a request. + +The provided options must not contain a NUL or LF character. diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt index 00ad37986e..e03eaccebc 100644 --- a/Documentation/technical/repository-version.txt +++ b/Documentation/technical/repository-version.txt @@ -86,3 +86,15 @@ for testing format-1 compatibility. When the config key `extensions.preciousObjects` is set to `true`, objects in the repository MUST NOT be deleted (e.g., by `git-prune` or `git repack -d`). + +`partialclone` +~~~~~~~~~~~~~~ + +When the config key `extensions.partialclone` is set, it indicates +that the repo was created with a partial clone (or later performed +a partial fetch) and that the remote may have omitted sending +certain unwanted objects. Such a remote is called a "promisor remote" +and it promises that all such omitted objects can be fetched from it +in the future. + +The value of this key is the name of the promisor remote. diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt index 5183b15422..01dedfe9ff 100644 --- a/Documentation/technical/shallow.txt +++ b/Documentation/technical/shallow.txt @@ -8,20 +8,22 @@ repo, and therefore grafts are introduced pretending that these commits have no parents. ********************************************************* -The basic idea is to write the SHA-1s of shallow commits into -$GIT_DIR/shallow, and handle its contents like the contents -of $GIT_DIR/info/grafts (with the difference that shallow -cannot contain parent information). - -This information is stored in a new file instead of grafts, or -even the config, since the user should not touch that file -at all (even throughout development of the shallow clone, it -was never manually edited!). +$GIT_DIR/shallow lists commit object names and tells Git to +pretend as if they are root commits (e.g. "git log" traversal +stops after showing them; "git fsck" does not complain saying +the commits listed on their "parent" lines do not exist). Each line contains exactly one SHA-1. When read, a commit_graft will be constructed, which has nr_parent < 0 to make it easier to discern from user provided grafts. +Note that the shallow feature could not be changed easily to +use replace refs: a commit containing a `mergetag` is not allowed +to be replaced, not even by a root commit. Such a commit can be +made shallow, though. Also, having a `shallow` file explicitly +listing all the commits made shallow makes it a *lot* easier to +do shallow-specific things such as to deepen the history. + Since fsck-objects relies on the library to read the objects, it honours shallow commits automatically. |