summaryrefslogtreecommitdiff
path: root/Documentation/technical
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical')
-rw-r--r--Documentation/technical/api-config.txt18
-rw-r--r--Documentation/technical/api-decorate.txt6
-rw-r--r--Documentation/technical/api-directory-listing.txt27
-rw-r--r--Documentation/technical/api-object-access.txt4
-rw-r--r--Documentation/technical/api-oid-array.txt17
-rw-r--r--Documentation/technical/api-submodule-config.txt4
-rw-r--r--Documentation/technical/commit-graph-format.txt97
-rw-r--r--Documentation/technical/commit-graph.txt182
-rw-r--r--Documentation/technical/hash-function-transition.txt40
-rw-r--r--Documentation/technical/http-protocol.txt11
-rw-r--r--Documentation/technical/index-format.txt19
-rw-r--r--Documentation/technical/long-running-process-protocol.txt50
-rw-r--r--Documentation/technical/pack-format.txt92
-rw-r--r--Documentation/technical/pack-protocol.txt43
-rw-r--r--Documentation/technical/partial-clone.txt324
-rw-r--r--Documentation/technical/protocol-v2.txt414
-rw-r--r--Documentation/technical/shallow.txt20
17 files changed, 1316 insertions, 52 deletions
diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
index 9a778b0cad..fa39ac9d71 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -47,21 +47,23 @@ will first feed the user-wide one to the callback, and then the
repo-specific one; by overwriting, the higher-priority repo-specific
value is left at the end).
-The `git_config_with_options` function lets the caller examine config
+The `config_with_options` function lets the caller examine config
while adjusting some of the default behavior of `git_config`. It should
almost never be used by "regular" Git code that is looking up
configuration variables. It is intended for advanced callers like
`git-config`, which are intentionally tweaking the normal config-lookup
process. It takes two extra parameters:
-`filename`::
-If this parameter is non-NULL, it specifies the name of a file to
-parse for configuration, rather than looking in the usual files. Regular
-`git_config` defaults to `NULL`.
+`config_source`::
+If this parameter is non-NULL, it specifies the source to parse for
+configuration, rather than looking in the usual files. See `struct
+git_config_source` in `config.h` for details. Regular `git_config` defaults
+to `NULL`.
-`respect_includes`::
-Specify whether include directives should be followed in parsed files.
-Regular `git_config` defaults to `1`.
+`opts`::
+Specify options to adjust the behavior of parsing config files. See `struct
+config_options` in `config.h` for details. As an example: regular `git_config`
+sets `opts.respect_includes` to `1` by default.
Reading Specific Files
----------------------
diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
deleted file mode 100644
index 1d52a6ce14..0000000000
--- a/Documentation/technical/api-decorate.txt
+++ /dev/null
@@ -1,6 +0,0 @@
-decorate API
-============
-
-Talk about <decorate.h>
-
-(Linus)
diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
index 6c77b4920c..4f44ca24f6 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -22,16 +22,20 @@ The notable options are:
`flags`::
- A bit-field of options (the `*IGNORED*` flags are mutually exclusive):
+ A bit-field of options:
`DIR_SHOW_IGNORED`:::
- Return just ignored files in `entries[]`, not untracked files.
+ Return just ignored files in `entries[]`, not untracked
+ files. This flag is mutually exclusive with
+ `DIR_SHOW_IGNORED_TOO`.
`DIR_SHOW_IGNORED_TOO`:::
- Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
- in addition to untracked files in `entries[]`.
+ Similar to `DIR_SHOW_IGNORED`, but return ignored files in
+ `ignored[]` in addition to untracked files in
+ `entries[]`. This flag is mutually exclusive with
+ `DIR_SHOW_IGNORED`.
`DIR_KEEP_UNTRACKED_CONTENTS`:::
@@ -39,6 +43,21 @@ The notable options are:
untracked contents of untracked directories are also returned in
`entries[]`.
+`DIR_SHOW_IGNORED_TOO_MODE_MATCHING`:::
+
+ Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if
+ this is set, returns ignored files and directories that match
+ an exclude pattern. If a directory matches an exclude pattern,
+ then the directory is returned and the contained paths are
+ not. A directory that does not match an exclude pattern will
+ not be returned even if all of its contents are ignored. In
+ this case, the contents are returned as individual entries.
++
+If this is set, files and directories that explicitly match an ignore
+pattern are reported. Implicity ignored directories (directories that
+do not match an ignore pattern, but whose contents are all ignored)
+are not reported, instead all of the contents are reported.
+
`DIR_COLLECT_IGNORED`:::
Special mode for git-add. Return ignored files in `ignored[]` and
diff --git a/Documentation/technical/api-object-access.txt b/Documentation/technical/api-object-access.txt
index 03bb0e950d..5b29622d00 100644
--- a/Documentation/technical/api-object-access.txt
+++ b/Documentation/technical/api-object-access.txt
@@ -1,13 +1,13 @@
object access API
=================
-Talk about <sha1_file.c> and <object.h> family, things like
+Talk about <sha1-file.c> and <object.h> family, things like
* read_sha1_file()
* read_object_with_reference()
* has_sha1_file()
* write_sha1_file()
-* pretend_sha1_file()
+* pretend_object_file()
* lookup_{object,commit,tag,blob,tree}
* parse_{object,commit,tag,blob,tree}
* Use of object flags
diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt
index b0c11f868d..9febfb1d52 100644
--- a/Documentation/technical/api-oid-array.txt
+++ b/Documentation/technical/api-oid-array.txt
@@ -35,13 +35,18 @@ Functions
Free all memory associated with the array and return it to the
initial, empty state.
+`oid_array_for_each`::
+ Iterate over each element of the list, executing the callback
+ function for each one. Does not sort the list, so any custom
+ hash order is retained. If the callback returns a non-zero
+ value, the iteration ends immediately and the callback's
+ return is propagated; otherwise, 0 is returned.
+
`oid_array_for_each_unique`::
- Efficiently iterate over each unique element of the list,
- executing the callback function for each one. If the array is
- not sorted, this function has the side effect of sorting it. If
- the callback returns a non-zero value, the iteration ends
- immediately and the callback's return is propagated; otherwise,
- 0 is returned.
+ Iterate over each unique element of the list in sorted order,
+ but otherwise behave like `oid_array_for_each`. If the array
+ is not sorted, this function has the side effect of sorting
+ it.
Examples
--------
diff --git a/Documentation/technical/api-submodule-config.txt b/Documentation/technical/api-submodule-config.txt
index 3dce003fda..fb06089393 100644
--- a/Documentation/technical/api-submodule-config.txt
+++ b/Documentation/technical/api-submodule-config.txt
@@ -4,7 +4,7 @@ submodule config cache API
The submodule config cache API allows to read submodule
configurations/information from specified revisions. Internally
information is lazily read into a cache that is used to avoid
-unnecessary parsing of the same .gitmodule files. Lookups can be done by
+unnecessary parsing of the same .gitmodules files. Lookups can be done by
submodule path or name.
Usage
@@ -38,7 +38,7 @@ Data Structures
Functions
---------
-`void submodule_free()`::
+`void submodule_free(struct repository *r)`::
Use these to free the internally cached values.
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 0000000000..ad6af8105c
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,97 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+ generation number 1; commits with parents have generation number
+ one more than the maximum generation number of its parents. We
+ reserve zero as special, and can be used to mark a generation
+ number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+ the graph file.
+
+These positional references are stored as unsigned 32-bit integers
+corresponding to the array position withing the list of commit OIDs. We
+use the most-significant bit for special purposes, so we can store at most
+(1 << 31) - 1 (around 2 billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+ 4-byte signature:
+ The signature is: {'C', 'G', 'P', 'H'}
+
+ 1-byte version number:
+ Currently, the only valid version is 1.
+
+ 1-byte Hash Version (1 = SHA-1)
+ We infer the hash length (H) from this value.
+
+ 1-byte number (C) of "chunks"
+
+ 1-byte (reserved for later use)
+ Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+ (C + 1) * 12 bytes listing the table of contents for the chunks:
+ First 4 bytes describe the chunk id. Value 0 is a terminating label.
+ Other 8 bytes provide the byte-offset in current file for chunk to
+ start. (Chunks are ordered contiguously in the file, so you can infer
+ the length using the next chunk position if necessary.) Each chunk
+ ID appears at most once.
+
+ The remaining data in the body is described one chunk at a time, and
+ these chunks may be given in any order. Chunks are required unless
+ otherwise specified.
+
+CHUNK DATA:
+
+ OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+ The ith entry, F[i], stores the number of OIDs with first
+ byte at most i. Thus F[255] stores the total
+ number of commits (N).
+
+ OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+ The OIDs for all commits in the graph, sorted in ascending order.
+
+ Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+ * The first H bytes are for the OID of the root tree.
+ * The next 8 bytes are for the positions of the first two parents
+ of the ith commit. Stores value 0xffffffff if no parent in that
+ position. If there are more than two parents, the second value
+ has its most-significant bit on and the other bits store an array
+ position into the Large Edge List chunk.
+ * The next 8 bytes store the generation number of the commit and
+ the commit time in seconds since EPOCH. The generation number
+ uses the higher 30 bits of the first 4 bytes, while the commit
+ time uses the 32 bits of the second 4 bytes, along with the lowest
+ 2 bits of the lowest byte, storing the 33rd and 34th bit of the
+ commit time.
+
+ Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+ This list of 4-byte values store the second through nth parents for
+ all octopus merges. The second parent value in the commit data stores
+ an array position within this list along with the most-significant bit
+ on. Starting at that array position, iterate through this list of commit
+ positions for the parents until reaching a value with the most-significant
+ bit on. The other bits correspond to the position of the last parent.
+
+TRAILER:
+
+ H-byte HASH-checksum of all of the above.
diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
new file mode 100644
index 0000000000..e1a883eb46
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,182 @@
+Git Commit Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+ the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+ If A and B are commits with generation numbers N and M, respectively,
+ and N <= M, then A cannot reach B. That is, we know without searching
+ that B is not an ancestor of A because it is further from a root commit
+ than A.
+
+ Conversely, when checking if A is an ancestor of B, then we only need
+ to walk commits until all commits on the walk boundary have generation
+ number at most N. If we walk commits using a priority queue seeded by
+ generation numbers, then we always expand the boundary commit with highest
+ generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+ If A and B are commits with commit time X and Y, respectively, and
+ X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
+in the commit-graph file. If a commit-graph file was written by a version
+of Git that did not compute generation numbers, then those commits will
+have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
+
+Since the commit-graph file is closed under reachability, we can guarantee
+the following weaker condition on all commits:
+
+ If A and B are commits with generation numbers N amd M, respectively,
+ and N < M, then A cannot reach B.
+
+Note how the strict inequality differs from the inequality when we have
+fully-computed generation numbers. Using strict inequality may result in
+walking a few extra commits, but the simplicity in dealing with commits
+with generation number *_INFINITY or *_ZERO is valuable.
+
+We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
+generation numbers are computed to be at least this value. We limit at
+this value since it is the largest value that can be stored in the
+commit-graph file using the 30 bits available to generation numbers. This
+presents another case where a commit can have generation number equal to
+that of a parent.
+
+Design Details
+--------------
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+ .git/objects/info directory. This could be stored in the info directory
+ of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+ so a future change of hash algorithm does not require a change in format.
+
+Future Work
+-----------
+
+- The commit graph feature currently does not honor commit grafts. This can
+ be remedied by duplicating or refactoring the current graft logic.
+
+- The 'commit-graph' subcommand does not have a "verify" mode that is
+ necessary for integration with fsck.
+
+- After computing and storing generation numbers, we must make graph
+ walks aware of generation numbers to gain the performance benefits they
+ enable. This will mostly be accomplished by swapping a commit-date-ordered
+ priority queue with one ordered by generation number. The following
+ operations are important candidates:
+
+ - 'log --topo-order'
+ - 'tag --merged'
+
+- Currently, parse_commit_gently() requires filling in the root tree
+ object for a commit. This passes through lookup_tree() and consequently
+ lookup_object(). Also, it calls lookup_commit() when loading the parents.
+ These method calls check the ODB for object existence, even if the
+ consumer does not need the content. For example, we do not need the
+ tree contents when computing merge bases. Now that commit parsing is
+ removed from the computation time, these lookup operations are the
+ slowest operations keeping graph walks from being fast. Consider
+ loading these objects without verifying their existence in the ODB and
+ only loading them fully when consumers need them. Consider a method
+ such as "ensure_tree_loaded(commit)" that fully loads a tree before
+ using commit->tree.
+
+- The current design uses the 'commit-graph' subcommand to generate the graph.
+ When this feature stabilizes enough to recommend to most users, we should
+ add automatic graph writes to common operations that create many commits.
+ For example, one could compute a graph on 'clone', 'fetch', or 'repack'
+ commands.
+
+- A server could provide a commit graph file as part of the network protocol
+ to avoid extra calculations by clients. This feature is only of benefit if
+ the user is willing to trust the file, because verifying the file is correct
+ is as hard as computing it from scratch.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+ Chromium work item for: Serialized Commit Graph
+
+[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+ An abandoned patch that introduced generation numbers.
+
+[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+ Discussion about generation numbers on commits and how they interact
+ with fsck.
+
+[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
+ More discussion about generation numbers and not storing them inside
+ commit objects. A valuable quote:
+
+ "I think we should be moving more in the direction of keeping
+ repo-local caches for optimizations. Reachability bitmaps have been
+ a big performance win. I think we should be doing the same with our
+ properties of commits. Not just generation numbers, but making it
+ cheap to access the graph structure without zlib-inflating whole
+ commit objects (i.e., packv4 or something like the "metapacks" I
+ proposed a few years ago)."
+
+[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+ A patch to remove the ahead-behind calculation from 'status'.
diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index 417ba491d0..4ab6cd1012 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -28,11 +28,30 @@ advantages:
address stored content.
Over time some flaws in SHA-1 have been discovered by security
-researchers. https://shattered.io demonstrated a practical SHA-1 hash
-collision. As a result, SHA-1 cannot be considered cryptographically
-secure any more. This impacts the communication of hash values because
-we cannot trust that a given hash value represents the known good
-version of content that the speaker intended.
+researchers. On 23 February 2017 the SHAttered attack
+(https://shattered.io) demonstrated a practical SHA-1 hash collision.
+
+Git v2.13.0 and later subsequently moved to a hardened SHA-1
+implementation by default, which isn't vulnerable to the SHAttered
+attack.
+
+Thus Git has in effect already migrated to a new hash that isn't SHA-1
+and doesn't share its vulnerabilities, its new hash function just
+happens to produce exactly the same output for all known inputs,
+except two PDFs published by the SHAttered researchers, and the new
+implementation (written by those researchers) claims to detect future
+cryptanalytic collision attacks.
+
+Regardless, it's considered prudent to move past any variant of SHA-1
+to a new hash. There's no guarantee that future attacks on SHA-1 won't
+be published in the future, and those attacks may not have viable
+mitigations.
+
+If SHA-1 and its variants were to be truly broken, Git's hash function
+could not be considered cryptographically secure any more. This would
+impact the communication of hash values because we could not trust
+that a given hash value represented the known good version of content
+that the speaker intended.
SHA-1 still possesses the other properties such as fast object lookup
and safe error checking, but other hash functions are equally suitable
@@ -116,10 +135,15 @@ Documentation/technical/repository-version.txt) with extensions
objectFormat = newhash
compatObjectFormat = sha1
-Specifying a repository format extension ensures that versions of Git
-not aware of NewHash do not try to operate on these repositories,
-instead producing an error message:
+The combination of setting `core.repositoryFormatVersion=1` and
+populating `extensions.*` ensures that all versions of Git later than
+`v0.99.9l` will die instead of trying to operate on the NewHash
+repository, instead producing an error message.
+ # Between v0.99.9l and v2.7.0
+ $ git status
+ fatal: Expected git repo version <= 0, found 1
+ # After v2.7.0
$ git status
fatal: unknown repository extensions found:
objectformat
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 1c561bdd92..64f49d0bbb 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -214,10 +214,16 @@ smart server reply:
S: Cache-Control: no-cache
S:
S: 001e# service=git-upload-pack\n
+ S: 0000
S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n
S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n
S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
+ S: 0000
+
+The client may send Extra Parameters (see
+Documentation/technical/pack-protocol.txt) as a colon-separated string
+in the Git-Protocol HTTP header.
Dumb Server Response
^^^^^^^^^^^^^^^^^^^^
@@ -269,7 +275,12 @@ the C locale ordering. The stream SHOULD include the default ref
named `HEAD` as the first ref. The stream MUST include capability
declarations behind a NUL on the first ref.
+The returned response contains "version 1" if "version=1" was sent as an
+Extra Parameter.
+
smart_reply = PKT-LINE("# service=$servicename" LF)
+ "0000"
+ *1("version 1")
ref_list
"0000"
ref_list = empty_list / non_empty_list
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index ade0b0c445..db3572626b 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -295,3 +295,22 @@ The remaining data of each directory block is grouped by type:
in the previous ewah bitmap.
- One NUL.
+
+== File System Monitor cache
+
+ The file system monitor cache tracks files for which the core.fsmonitor
+ hook has told us about changes. The signature for this extension is
+ { 'F', 'S', 'M', 'N' }.
+
+ The extension starts with
+
+ - 32-bit version number: the current supported version is 1.
+
+ - 64-bit time: the extension data reflects all changes through the given
+ time which is stored as the nanoseconds elapsed since midnight,
+ January 1, 1970.
+
+ - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
+
+ - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+ is not CE_FSMONITOR_VALID.
diff --git a/Documentation/technical/long-running-process-protocol.txt b/Documentation/technical/long-running-process-protocol.txt
new file mode 100644
index 0000000000..aa0aa9af1c
--- /dev/null
+++ b/Documentation/technical/long-running-process-protocol.txt
@@ -0,0 +1,50 @@
+Long-running process protocol
+=============================
+
+This protocol is used when Git needs to communicate with an external
+process throughout the entire life of a single Git command. All
+communication is in pkt-line format (see technical/protocol-common.txt)
+over standard input and standard output.
+
+Handshake
+---------
+
+Git starts by sending a welcome message (for example,
+"git-filter-client"), a list of supported protocol version numbers, and
+a flush packet. Git expects to read the welcome message with "server"
+instead of "client" (for example, "git-filter-server"), exactly one
+protocol version number from the previously sent list, and a flush
+packet. All further communication will be based on the selected version.
+The remaining protocol description below documents "version=2". Please
+note that "version=42" in the example below does not exist and is only
+there to illustrate how the protocol would look like with more than one
+version.
+
+After the version negotiation Git sends a list of all capabilities that
+it supports and a flush packet. Git expects to read a list of desired
+capabilities, which must be a subset of the supported capabilities list,
+and a flush packet as response:
+------------------------
+packet: git> git-filter-client
+packet: git> version=2
+packet: git> version=42
+packet: git> 0000
+packet: git< git-filter-server
+packet: git< version=2
+packet: git< 0000
+packet: git> capability=clean
+packet: git> capability=smudge
+packet: git> capability=not-yet-invented
+packet: git> 0000
+packet: git< capability=clean
+packet: git< capability=smudge
+packet: git< 0000
+------------------------
+
+Shutdown
+--------
+
+Git will close
+the command pipe on exit. The filter is expected to detect EOF
+and exit gracefully on its own. Git will wait until the filter
+process has stopped.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 8e5bf60be3..70a99fd142 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -36,6 +36,98 @@ Git pack format
- The trailer records 20-byte SHA-1 checksum of all of the above.
+=== Object types
+
+Valid object types are:
+
+- OBJ_COMMIT (1)
+- OBJ_TREE (2)
+- OBJ_BLOB (3)
+- OBJ_TAG (4)
+- OBJ_OFS_DELTA (6)
+- OBJ_REF_DELTA (7)
+
+Type 5 is reserved for future expansion. Type 0 is invalid.
+
+=== Deltified representation
+
+Conceptually there are only four object types: commit, tree, tag and
+blob. However to save space, an object could be stored as a "delta" of
+another "base" object. These representations are assigned new types
+ofs-delta and ref-delta, which is only valid in a pack file.
+
+Both ofs-delta and ref-delta store the "delta" to be applied to
+another object (called 'base object') to reconstruct the object. The
+difference between them is, ref-delta directly encodes 20-byte base
+object name. If the base object is in the same pack, ofs-delta encodes
+the offset of the base object in the pack instead.
+
+The base object could also be deltified if it's in the same pack.
+Ref-delta can also refer to an object outside the pack (i.e. the
+so-called "thin pack"). When stored on disk however, the pack should
+be self contained to avoid cyclic dependency.
+
+The delta data is a sequence of instructions to reconstruct an object
+from the base object. If the base object is deltified, it must be
+converted to canonical form first. Each instruction appends more and
+more data to the target object until it's complete. There are two
+supported instructions so far: one for copy a byte range from the
+source object and one for inserting new data embedded in the
+instruction itself.
+
+Each instruction has variable length. Instruction type is determined
+by the seventh bit of the first octet. The following diagrams follow
+the convention in RFC 1951 (Deflate compressed data format).
+
+==== Instruction to copy from base object
+
+ +----------+---------+---------+---------+---------+-------+-------+-------+
+ | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
+ +----------+---------+---------+---------+---------+-------+-------+-------+
+
+This is the instruction format to copy a byte range from the source
+object. It encodes the offset to copy from and the number of bytes to
+copy. Offset and size are in little-endian order.
+
+All offset and size bytes are optional. This is to reduce the
+instruction size when encoding small offsets or sizes. The first seven
+bits in the first octet determines which of the next seven octets is
+present. If bit zero is set, offset1 is present. If bit one is set
+offset2 is present and so on.
+
+Note that a more compact instruction does not change offset and size
+encoding. For example, if only offset2 is omitted like below, offset3
+still contains bits 16-23. It does not become offset2 and contains
+bits 8-15 even if it's right next to offset1.
+
+ +----------+---------+---------+
+ | 10000101 | offset1 | offset3 |
+ +----------+---------+---------+
+
+In its most compact form, this instruction only takes up one byte
+(0x80) with both offset and size omitted, which will have default
+values zero. There is another exception: size zero is automatically
+converted to 0x10000.
+
+==== Instruction to add new data
+
+ +----------+============+
+ | 0xxxxxxx | data |
+ +----------+============+
+
+This is the instruction to construct target object without the base
+object. The following data is appended to the target object. The first
+seven bits of the first octet determines the size of data in
+bytes. The size must be non-zero.
+
+==== Reserved instruction
+
+ +----------+============
+ | 00000000 |
+ +----------+============
+
+This is the instruction reserved for future expansion.
+
== Original (version 1) pack-*.idx files have the following format:
- The header consists of 256 4-byte network byte order
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index a43a113e44..7fee6b780a 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -39,6 +39,19 @@ communicates with that invoked process over the SSH connection.
The file:// transport runs the 'upload-pack' or 'receive-pack'
process locally and communicates with it over a pipe.
+Extra Parameters
+----------------
+
+The protocol provides a mechanism in which clients can send additional
+information in its first message to the server. These are called "Extra
+Parameters", and are supported by the Git, SSH, and HTTP protocols.
+
+Each Extra Parameter takes the form of `<key>=<value>` or `<key>`.
+
+Servers that receive any such Extra Parameters MUST ignore all
+unrecognized keys. Currently, the only Extra Parameter recognized is
+"version=1".
+
Git Transport
-------------
@@ -46,18 +59,25 @@ The Git transport starts off by sending the command and repository
on the wire using the pkt-line format, followed by a NUL byte and a
hostname parameter, terminated by a NUL byte.
- 0032git-upload-pack /project.git\0host=myserver.com\0
+ 0033git-upload-pack /project.git\0host=myserver.com\0
+
+The transport may send Extra Parameters by adding an additional NUL
+byte, and then adding one or more NUL-terminated strings:
+
+ 003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0
--
- git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
+ git-proto-request = request-command SP pathname NUL
+ [ host-parameter NUL ] [ NUL extra-parameters ]
request-command = "git-upload-pack" / "git-receive-pack" /
"git-upload-archive" ; case sensitive
pathname = *( %x01-ff ) ; exclude NUL
host-parameter = "host=" hostname [ ":" port ]
+ extra-parameters = 1*extra-parameter
+ extra-parameter = 1*( %x01-ff ) NUL
--
-Only host-parameter is allowed in the git-proto-request. Clients
-MUST NOT attempt to send additional parameters. It is used for the
+host-parameter is used for the
git-daemon name based virtual hosting. See --interpolated-path
option to git daemon, with the %H/%CH format characters.
@@ -117,6 +137,12 @@ we execute it without the leading '/'.
v
ssh user@example.com "git-upload-pack '~alice/project.git'"
+Depending on the value of the `protocol.version` configuration variable,
+Git may attempt to send Extra Parameters as a colon-separated string in
+the GIT_PROTOCOL environment variable. This is done only if
+the `ssh.variant` configuration variable indicates that the ssh command
+supports passing environment variables as an argument.
+
A few things to remember here:
- The "command name" is spelled with dash (e.g. git-upload-pack), but
@@ -137,11 +163,13 @@ Reference Discovery
-------------------
When the client initially connects the server will immediately respond
-with a listing of each reference it has (all branches and tags) along
+with a version number (if "version=1" is sent as an Extra Parameter),
+and a listing of each reference it has (all branches and tags) along
with the object name that each reference currently points to.
- $ echo -e -n "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
+ $ echo -e -n "0044git-upload-pack /schacon/gitbook.git\0host=example.com\0\0version=1\0" |
nc -v example.com 9418
+ 000aversion 1
00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack
side-band side-band-64k ofs-delta shallow no-progress include-tag
00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration
@@ -165,7 +193,8 @@ immediately after the ref itself, if presented. A conforming server
MUST peel the ref if it's an annotated tag.
----
- advertised-refs = (no-refs / list-of-refs)
+ advertised-refs = *1("version 1")
+ (no-refs / list-of-refs)
*shallow
flush-pkt
diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
new file mode 100644
index 0000000000..0bed2472c8
--- /dev/null
+++ b/Documentation/technical/partial-clone.txt
@@ -0,0 +1,324 @@
+Partial Clone Design Notes
+==========================
+
+The "Partial Clone" feature is a performance optimization for Git that
+allows Git to function without having a complete copy of the repository.
+The goal of this work is to allow Git better handle extremely large
+repositories.
+
+During clone and fetch operations, Git downloads the complete contents
+and history of the repository. This includes all commits, trees, and
+blobs for the complete life of the repository. For extremely large
+repositories, clones can take hours (or days) and consume 100+GiB of disk
+space.
+
+Often in these repositories there are many blobs and trees that the user
+does not need such as:
+
+ 1. files outside of the user's work area in the tree. For example, in
+ a repository with 500K directories and 3.5M files in every commit,
+ we can avoid downloading many objects if the user only needs a
+ narrow "cone" of the source tree.
+
+ 2. large binary assets. For example, in a repository where large build
+ artifacts are checked into the tree, we can avoid downloading all
+ previous versions of these non-mergeable binary assets and only
+ download versions that are actually referenced.
+
+Partial clone allows us to avoid downloading such unneeded objects *in
+advance* during clone and fetch operations and thereby reduce download
+times and disk usage. Missing objects can later be "demand fetched"
+if/when needed.
+
+Use of partial clone requires that the user be online and the origin
+remote be available for on-demand fetching of missing objects. This may
+or may not be problematic for the user. For example, if the user can
+stay within the pre-selected subset of the source tree, they may not
+encounter any missing objects. Alternatively, the user could try to
+pre-fetch various objects if they know that they are going offline.
+
+
+Non-Goals
+---------
+
+Partial clone is a mechanism to limit the number of blobs and trees downloaded
+*within* a given range of commits -- and is therefore independent of and not
+intended to conflict with existing DAG-level mechanisms to limit the set of
+requested commits (i.e. shallow clone, single branch, or fetch '<refspec>').
+
+
+Design Overview
+---------------
+
+Partial clone logically consists of the following parts:
+
+- A mechanism for the client to describe unneeded or unwanted objects to
+ the server.
+
+- A mechanism for the server to omit such unwanted objects from packfiles
+ sent to the client.
+
+- A mechanism for the client to gracefully handle missing objects (that
+ were previously omitted by the server).
+
+- A mechanism for the client to backfill missing objects as needed.
+
+
+Design Details
+--------------
+
+- A new pack-protocol capability "filter" is added to the fetch-pack and
+ upload-pack negotiation.
+
+ This uses the existing capability discovery mechanism.
+ See "filter" in Documentation/technical/pack-protocol.txt.
+
+- Clients pass a "filter-spec" to clone and fetch which is passed to the
+ server to request filtering during packfile construction.
+
+ There are various filters available to accommodate different situations.
+ See "--filter=<filter-spec>" in Documentation/rev-list-options.txt.
+
+- On the server pack-objects applies the requested filter-spec as it
+ creates "filtered" packfiles for the client.
+
+ These filtered packfiles are *incomplete* in the traditional sense because
+ they may contain objects that reference objects not contained in the
+ packfile and that the client doesn't already have. For example, the
+ filtered packfile may contain trees or tags that reference missing blobs
+ or commits that reference missing trees.
+
+- On the client these incomplete packfiles are marked as "promisor packfiles"
+ and treated differently by various commands.
+
+- On the client a repository extension is added to the local config to
+ prevent older versions of git from failing mid-operation because of
+ missing objects that they cannot handle.
+ See "extensions.partialClone" in Documentation/technical/repository-version.txt"
+
+
+Handling Missing Objects
+------------------------
+
+- An object may be missing due to a partial clone or fetch, or missing due
+ to repository corruption. To differentiate these cases, the local
+ repository specially indicates such filtered packfiles obtained from the
+ promisor remote as "promisor packfiles".
+
+ These promisor packfiles consist of a "<name>.promisor" file with
+ arbitrary contents (like the "<name>.keep" files), in addition to
+ their "<name>.pack" and "<name>.idx" files.
+
+- The local repository considers a "promisor object" to be an object that
+ it knows (to the best of its ability) that the promisor remote has promised
+ that it has, either because the local repository has that object in one of
+ its promisor packfiles, or because another promisor object refers to it.
+
+ When Git encounters a missing object, Git can see if it a promisor object
+ and handle it appropriately. If not, Git can report a corruption.
+
+ This means that there is no need for the client to explicitly maintain an
+ expensive-to-modify list of missing objects.[a]
+
+- Since almost all Git code currently expects any referenced object to be
+ present locally and because we do not want to force every command to do
+ a dry-run first, a fallback mechanism is added to allow Git to attempt
+ to dynamically fetch missing objects from the promisor remote.
+
+ When the normal object lookup fails to find an object, Git invokes
+ fetch-object to try to get the object from the server and then retry
+ the object lookup. This allows objects to be "faulted in" without
+ complicated prediction algorithms.
+
+ For efficiency reasons, no check as to whether the missing object is
+ actually a promisor object is performed.
+
+ Dynamic object fetching tends to be slow as objects are fetched one at
+ a time.
+
+- `checkout` (and any other command using `unpack-trees`) has been taught
+ to bulk pre-fetch all required missing blobs in a single batch.
+
+- `rev-list` has been taught to print missing objects.
+
+ This can be used by other commands to bulk prefetch objects.
+ For example, a "git log -p A..B" may internally want to first do
+ something like "git rev-list --objects --quiet --missing=print A..B"
+ and prefetch those objects in bulk.
+
+- `fsck` has been updated to be fully aware of promisor objects.
+
+- `repack` in GC has been updated to not touch promisor packfiles at all,
+ and to only repack other objects.
+
+- The global variable "fetch_if_missing" is used to control whether an
+ object lookup will attempt to dynamically fetch a missing object or
+ report an error.
+
+ We are not happy with this global variable and would like to remove it,
+ but that requires significant refactoring of the object code to pass an
+ additional flag. We hope that concurrent efforts to add an ODB API can
+ encompass this.
+
+
+Fetching Missing Objects
+------------------------
+
+- Fetching of objects is done using the existing transport mechanism using
+ transport_fetch_refs(), setting a new transport option
+ TRANS_OPT_NO_DEPENDENTS to indicate that only the objects themselves are
+ desired, not any object that they refer to.
+
+ Because some transports invoke fetch_pack() in the same process, fetch_pack()
+ has been updated to not use any object flags when the corresponding argument
+ (no_dependents) is set.
+
+- The local repository sends a request with the hashes of all requested
+ objects as "want" lines, and does not perform any packfile negotiation.
+ It then receives a packfile.
+
+- Because we are reusing the existing fetch-pack mechanism, fetching
+ currently fetches all objects referred to by the requested objects, even
+ though they are not necessary.
+
+
+Current Limitations
+-------------------
+
+- The remote used for a partial clone (or the first partial fetch
+ following a regular clone) is marked as the "promisor remote".
+
+ We are currently limited to a single promisor remote and only that
+ remote may be used for subsequent partial fetches.
+
+ We accept this limitation because we believe initial users of this
+ feature will be using it on repositories with a strong single central
+ server.
+
+- Dynamic object fetching will only ask the promisor remote for missing
+ objects. We assume that the promisor remote has a complete view of the
+ repository and can satisfy all such requests.
+
+- Repack essentially treats promisor and non-promisor packfiles as 2
+ distinct partitions and does not mix them. Repack currently only works
+ on non-promisor packfiles and loose objects.
+
+- Dynamic object fetching invokes fetch-pack once *for each item*
+ because most algorithms stumble upon a missing object and need to have
+ it resolved before continuing their work. This may incur significant
+ overhead -- and multiple authentication requests -- if many objects are
+ needed.
+
+- Dynamic object fetching currently uses the existing pack protocol V0
+ which means that each object is requested via fetch-pack. The server
+ will send a full set of info/refs when the connection is established.
+ If there are large number of refs, this may incur significant overhead.
+
+
+Future Work
+-----------
+
+- Allow more than one promisor remote and define a strategy for fetching
+ missing objects from specific promisor remotes or of iterating over the
+ set of promisor remotes until a missing object is found.
+
+ A user might want to have multiple geographically-close cache servers
+ for fetching missing blobs while continuing to do filtered `git-fetch`
+ commands from the central server, for example.
+
+ Or the user might want to work in a triangular work flow with multiple
+ promisor remotes that each have an incomplete view of the repository.
+
+- Allow repack to work on promisor packfiles (while keeping them distinct
+ from non-promisor packfiles).
+
+- Allow non-pathname-based filters to make use of packfile bitmaps (when
+ present). This was just an omission during the initial implementation.
+
+- Investigate use of a long-running process to dynamically fetch a series
+ of objects, such as proposed in [5,6] to reduce process startup and
+ overhead costs.
+
+ It would be nice if pack protocol V2 could allow that long-running
+ process to make a series of requests over a single long-running
+ connection.
+
+- Investigate pack protocol V2 to avoid the info/refs broadcast on
+ each connection with the server to dynamically fetch missing objects.
+
+- Investigate the need to handle loose promisor objects.
+
+ Objects in promisor packfiles are allowed to reference missing objects
+ that can be dynamically fetched from the server. An assumption was
+ made that loose objects are only created locally and therefore should
+ not reference a missing object. We may need to revisit that assumption
+ if, for example, we dynamically fetch a missing tree and store it as a
+ loose object rather than a single object packfile.
+
+ This does not necessarily mean we need to mark loose objects as promisor;
+ it may be sufficient to relax the object lookup or is-promisor functions.
+
+
+Non-Tasks
+---------
+
+- Every time the subject of "demand loading blobs" comes up it seems
+ that someone suggests that the server be allowed to "guess" and send
+ additional objects that may be related to the requested objects.
+
+ No work has gone into actually doing that; we're just documenting that
+ it is a common suggestion. We're not sure how it would work and have
+ no plans to work on it.
+
+ It is valid for the server to send more objects than requested (even
+ for a dynamic object fetch), but we are not building on that.
+
+
+Footnotes
+---------
+
+[a] expensive-to-modify list of missing objects: Earlier in the design of
+ partial clone we discussed the need for a single list of missing objects.
+ This would essentially be a sorted linear list of OIDs that the were
+ omitted by the server during a clone or subsequent fetches.
+
+ This file would need to be loaded into memory on every object lookup.
+ It would need to be read, updated, and re-written (like the .git/index)
+ on every explicit "git fetch" command *and* on any dynamic object fetch.
+
+ The cost to read, update, and write this file could add significant
+ overhead to every command if there are many missing objects. For example,
+ if there are 100M missing blobs, this file would be at least 2GiB on disk.
+
+ With the "promisor" concept, we *infer* a missing object based upon the
+ type of packfile that references it.
+
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=2
+ Chromium work item for: Partial Clone
+
+[1] https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
+ Subject: [RFC] Add support for downloading blobs on demand
+ Date: Fri, 13 Jan 2017 10:52:53 -0500
+
+[2] https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/
+ Subject: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches)
+ Date: Fri, 29 Sep 2017 13:11:36 -0700
+
+[3] https://public-inbox.org/git/20170426221346.25337-1-jonathantanmy@google.com/
+ Subject: Proposal for missing blob support in Git repos
+ Date: Wed, 26 Apr 2017 15:13:46 -0700
+
+[4] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@jeffhostetler.com/
+ Subject: [PATCH 00/10] RFC Partial Clone and Fetch
+ Date: Wed, 8 Mar 2017 18:50:29 +0000
+
+[5] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/
+ Subject: [PATCH v7 00/10] refactor the filter process code into a reusable module
+ Date: Fri, 5 May 2017 11:27:52 -0400
+
+[6] https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/
+ Subject: [RFC/PATCH v2 0/1] Add support for downloading blobs on demand
+ Date: Fri, 14 Jul 2017 09:26:50 -0400
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
new file mode 100644
index 0000000000..49bda76d23
--- /dev/null
+++ b/Documentation/technical/protocol-v2.txt
@@ -0,0 +1,414 @@
+ Git Wire Protocol, Version 2
+==============================
+
+This document presents a specification for a version 2 of Git's wire
+protocol. Protocol v2 will improve upon v1 in the following ways:
+
+ * Instead of multiple service names, multiple commands will be
+ supported by a single service
+ * Easily extendable as capabilities are moved into their own section
+ of the protocol, no longer being hidden behind a NUL byte and
+ limited by the size of a pkt-line
+ * Separate out other information hidden behind NUL bytes (e.g. agent
+ string as a capability and symrefs can be requested using 'ls-refs')
+ * Reference advertisement will be omitted unless explicitly requested
+ * ls-refs command to explicitly request some refs
+ * Designed with http and stateless-rpc in mind. With clear flush
+ semantics the http remote helper can simply act as a proxy
+
+In protocol v2 communication is command oriented. When first contacting a
+server a list of capabilities will advertised. Some of these capabilities
+will be commands which a client can request be executed. Once a command
+has completed, a client can reuse the connection and request that other
+commands be executed.
+
+ Packet-Line Framing
+---------------------
+
+All communication is done using packet-line framing, just as in v1. See
+`Documentation/technical/pack-protocol.txt` and
+`Documentation/technical/protocol-common.txt` for more information.
+
+In protocol v2 these special packets will have the following semantics:
+
+ * '0000' Flush Packet (flush-pkt) - indicates the end of a message
+ * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
+
+ Initial Client Request
+------------------------
+
+In general a client can request to speak protocol v2 by sending
+`version=2` through the respective side-channel for the transport being
+used which inevitably sets `GIT_PROTOCOL`. More information can be
+found in `pack-protocol.txt` and `http-protocol.txt`. In all cases the
+response from the server is the capability advertisement.
+
+ Git Transport
+~~~~~~~~~~~~~~~
+
+When using the git:// transport, you can request to use protocol v2 by
+sending "version=2" as an extra parameter:
+
+ 003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0
+
+ SSH and File Transport
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+When using either the ssh:// or file:// transport, the GIT_PROTOCOL
+environment variable must be set explicitly to include "version=2".
+
+ HTTP Transport
+~~~~~~~~~~~~~~~~
+
+When using the http:// or https:// transport a client makes a "smart"
+info/refs request as described in `http-protocol.txt` and requests that
+v2 be used by supplying "version=2" in the `Git-Protocol` header.
+
+ C: Git-Protocol: version=2
+ C:
+ C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
+
+A v2 server would reply:
+
+ S: 200 OK
+ S: <Some headers>
+ S: ...
+ S:
+ S: 000eversion 2\n
+ S: <capability-advertisement>
+
+Subsequent requests are then made directly to the service
+`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack).
+
+ Capability Advertisement
+--------------------------
+
+A server which decides to communicate (based on a request from a client)
+using protocol version 2, notifies the client by sending a version string
+in its initial response followed by an advertisement of its capabilities.
+Each capability is a key with an optional value. Clients must ignore all
+unknown keys. Semantics of unknown values are left to the definition of
+each key. Some capabilities will describe commands which can be requested
+to be executed by the client.
+
+ capability-advertisement = protocol-version
+ capability-list
+ flush-pkt
+
+ protocol-version = PKT-LINE("version 2" LF)
+ capability-list = *capability
+ capability = PKT-LINE(key[=value] LF)
+
+ key = 1*(ALPHA | DIGIT | "-_")
+ value = 1*(ALPHA | DIGIT | " -_.,?\/{}[]()<>!@#$%^&*+=:;")
+
+ Command Request
+-----------------
+
+After receiving the capability advertisement, a client can then issue a
+request to select the command it wants with any particular capabilities
+or arguments. There is then an optional section where the client can
+provide any command specific parameters or queries. Only a single
+command can be requested at a time.
+
+ request = empty-request | command-request
+ empty-request = flush-pkt
+ command-request = command
+ capability-list
+ [command-args]
+ flush-pkt
+ command = PKT-LINE("command=" key LF)
+ command-args = delim-pkt
+ *command-specific-arg
+
+ command-specific-args are packet line framed arguments defined by
+ each individual command.
+
+The server will then check to ensure that the client's request is
+comprised of a valid command as well as valid capabilities which were
+advertised. If the request is valid the server will then execute the
+command. A server MUST wait till it has received the client's entire
+request before issuing a response. The format of the response is
+determined by the command being executed, but in all cases a flush-pkt
+indicates the end of the response.
+
+When a command has finished, and the client has received the entire
+response from the server, a client can either request that another
+command be executed or can terminate the connection. A client may
+optionally send an empty request consisting of just a flush-pkt to
+indicate that no more requests will be made.
+
+ Capabilities
+--------------
+
+There are two different types of capabilities: normal capabilities,
+which can be used to to convey information or alter the behavior of a
+request, and commands, which are the core actions that a client wants to
+perform (fetch, push, etc).
+
+Protocol version 2 is stateless by default. This means that all commands
+must only last a single round and be stateless from the perspective of the
+server side, unless the client has requested a capability indicating that
+state should be maintained by the server. Clients MUST NOT require state
+management on the server side in order to function correctly. This
+permits simple round-robin load-balancing on the server side, without
+needing to worry about state management.
+
+ agent
+~~~~~~~
+
+The server can advertise the `agent` capability with a value `X` (in the
+form `agent=X`) to notify the client that the server is running version
+`X`. The client may optionally send its own agent string by including
+the `agent` capability with a value `Y` (in the form `agent=Y`) in its
+request to the server (but it MUST NOT do so if the server did not
+advertise the agent capability). The `X` and `Y` strings may contain any
+printable ASCII characters except space (i.e., the byte range 32 < x <
+127), and are typically of the form "package/version" (e.g.,
+"git/1.8.3.1"). The agent strings are purely informative for statistics
+and debugging purposes, and MUST NOT be used to programmatically assume
+the presence or absence of particular features.
+
+ ls-refs
+~~~~~~~~~
+
+`ls-refs` is the command used to request a reference advertisement in v2.
+Unlike the current reference advertisement, ls-refs takes in arguments
+which can be used to limit the refs sent from the server.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "<command>=<feature 1> <feature 2>"
+
+ls-refs takes in the following arguments:
+
+ symrefs
+ In addition to the object pointed by it, show the underlying ref
+ pointed by it when showing a symbolic ref.
+ peel
+ Show peeled tags.
+ ref-prefix <prefix>
+ When specified, only references having a prefix matching one of
+ the provided prefixes are displayed.
+
+The output of ls-refs is as follows:
+
+ output = *ref
+ flush-pkt
+ ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
+ ref-attribute = (symref | peeled)
+ symref = "symref-target:" symref-target
+ peeled = "peeled:" obj-id
+
+ fetch
+~~~~~~~
+
+`fetch` is the command used to fetch a packfile in v2. It can be looked
+at as a modified version of the v1 fetch where the ref-advertisement is
+stripped out (since the `ls-refs` command fills that role) and the
+message format is tweaked to eliminate redundancies and permit easy
+addition of future extensions.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "<command>=<feature 1> <feature 2>"
+
+A `fetch` request can take the following arguments:
+
+ want <oid>
+ Indicates to the server an object which the client wants to
+ retrieve. Wants can be anything and are not limited to
+ advertised objects.
+
+ have <oid>
+ Indicates to the server an object which the client has locally.
+ This allows the server to make a packfile which only contains
+ the objects that the client needs. Multiple 'have' lines can be
+ supplied.
+
+ done
+ Indicates to the server that negotiation should terminate (or
+ not even begin if performing a clone) and that the server should
+ use the information supplied in the request to construct the
+ packfile.
+
+ thin-pack
+ Request that a thin pack be sent, which is a pack with deltas
+ which reference base objects not contained within the pack (but
+ are known to exist at the receiving end). This can reduce the
+ network traffic significantly, but it requires the receiving end
+ to know how to "thicken" these packs by adding the missing bases
+ to the pack.
+
+ no-progress
+ Request that progress information that would normally be sent on
+ side-band channel 2, during the packfile transfer, should not be
+ sent. However, the side-band channel 3 is still used for error
+ responses.
+
+ include-tag
+ Request that annotated tags should be sent if the objects they
+ point to are being sent.
+
+ ofs-delta
+ Indicate that the client understands PACKv2 with delta referring
+ to its base by position in pack rather than by an oid. That is,
+ they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
+
+If the 'shallow' feature is advertised the following arguments can be
+included in the clients request as well as the potential addition of the
+'shallow-info' section in the server's response as explained below.
+
+ shallow <oid>
+ A client must notify the server of all commits for which it only
+ has shallow copies (meaning that it doesn't have the parents of
+ a commit) by supplying a 'shallow <oid>' line for each such
+ object so that the server is aware of the limitations of the
+ client's history. This is so that the server is aware that the
+ client may not have all objects reachable from such commits.
+
+ deepen <depth>
+ Requests that the fetch/clone should be shallow having a commit
+ depth of <depth> relative to the remote side.
+
+ deepen-relative
+ Requests that the semantics of the "deepen" command be changed
+ to indicate that the depth requested is relative to the client's
+ current shallow boundary, instead of relative to the requested
+ commits.
+
+ deepen-since <timestamp>
+ Requests that the shallow clone/fetch should be cut at a
+ specific time, instead of depth. Internally it's equivalent to
+ doing "git rev-list --max-age=<timestamp>". Cannot be used with
+ "deepen".
+
+ deepen-not <rev>
+ Requests that the shallow clone/fetch should be cut at a
+ specific revision specified by '<rev>', instead of a depth.
+ Internally it's equivalent of doing "git rev-list --not <rev>".
+ Cannot be used with "deepen", but can be used with
+ "deepen-since".
+
+If the 'filter' feature is advertised, the following argument can be
+included in the client's request:
+
+ filter <filter-spec>
+ Request that various objects from the packfile be omitted
+ using one of several filtering techniques. These are intended
+ for use with partial clone and partial fetch operations. See
+ `rev-list` for possible "filter-spec" values.
+
+The response of `fetch` is broken into a number of sections separated by
+delimiter packets (0001), with each section beginning with its section
+header.
+
+ output = *section
+ section = (acknowledgments | shallow-info | packfile)
+ (flush-pkt | delim-pkt)
+
+ acknowledgments = PKT-LINE("acknowledgments" LF)
+ (nak | *ack)
+ (ready)
+ ready = PKT-LINE("ready" LF)
+ nak = PKT-LINE("NAK" LF)
+ ack = PKT-LINE("ACK" SP obj-id LF)
+
+ shallow-info = PKT-LINE("shallow-info" LF)
+ *PKT-LINE((shallow | unshallow) LF)
+ shallow = "shallow" SP obj-id
+ unshallow = "unshallow" SP obj-id
+
+ packfile = PKT-LINE("packfile" LF)
+ *PKT-LINE(%x01-03 *%x00-ff)
+
+ acknowledgments section
+ * If the client determines that it is finished with negotiations
+ by sending a "done" line, the acknowledgments sections MUST be
+ omitted from the server's response.
+
+ * Always begins with the section header "acknowledgments"
+
+ * The server will respond with "NAK" if none of the object ids sent
+ as have lines were common.
+
+ * The server will respond with "ACK obj-id" for all of the
+ object ids sent as have lines which are common.
+
+ * A response cannot have both "ACK" lines as well as a "NAK"
+ line.
+
+ * The server will respond with a "ready" line indicating that
+ the server has found an acceptable common base and is ready to
+ make and send a packfile (which will be found in the packfile
+ section of the same response)
+
+ * If the server has found a suitable cut point and has decided
+ to send a "ready" line, then the server can decide to (as an
+ optimization) omit any "ACK" lines it would have sent during
+ its response. This is because the server will have already
+ determined the objects it plans to send to the client and no
+ further negotiation is needed.
+
+ shallow-info section
+ * If the client has requested a shallow fetch/clone, a shallow
+ client requests a fetch or the server is shallow then the
+ server's response may include a shallow-info section. The
+ shallow-info section will be included if (due to one of the
+ above conditions) the server needs to inform the client of any
+ shallow boundaries or adjustments to the clients already
+ existing shallow boundaries.
+
+ * Always begins with the section header "shallow-info"
+
+ * If a positive depth is requested, the server will compute the
+ set of commits which are no deeper than the desired depth.
+
+ * The server sends a "shallow obj-id" line for each commit whose
+ parents will not be sent in the following packfile.
+
+ * The server sends an "unshallow obj-id" line for each commit
+ which the client has indicated is shallow, but is no longer
+ shallow as a result of the fetch (due to its parents being
+ sent in the following packfile).
+
+ * The server MUST NOT send any "unshallow" lines for anything
+ which the client has not indicated was shallow as a part of
+ its request.
+
+ * This section is only included if a packfile section is also
+ included in the response.
+
+ packfile section
+ * This section is only included if the client has sent 'want'
+ lines in its request and either requested that no more
+ negotiation be done by sending 'done' or if the server has
+ decided it has found a sufficient cut point to produce a
+ packfile.
+
+ * Always begins with the section header "packfile"
+
+ * The transmission of the packfile begins immediately after the
+ section header
+
+ * The data transfer of the packfile is always multiplexed, using
+ the same semantics of the 'side-band-64k' capability from
+ protocol version 1. This means that each packet, during the
+ packfile data stream, is made up of a leading 4-byte pkt-line
+ length (typical of the pkt-line format), followed by a 1-byte
+ stream code, followed by the actual data.
+
+ The stream code can be one of:
+ 1 - pack data
+ 2 - progress messages
+ 3 - fatal error message just before stream aborts
+
+ server-option
+~~~~~~~~~~~~~~~
+
+If advertised, indicates that any number of server specific options can be
+included in a request. This is done by sending each option as a
+"server-option=<option>" capability line in the capability-list section of
+a request.
+
+The provided options must not contain a NUL or LF character.
diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt
index 5183b15422..01dedfe9ff 100644
--- a/Documentation/technical/shallow.txt
+++ b/Documentation/technical/shallow.txt
@@ -8,20 +8,22 @@ repo, and therefore grafts are introduced pretending that
these commits have no parents.
*********************************************************
-The basic idea is to write the SHA-1s of shallow commits into
-$GIT_DIR/shallow, and handle its contents like the contents
-of $GIT_DIR/info/grafts (with the difference that shallow
-cannot contain parent information).
-
-This information is stored in a new file instead of grafts, or
-even the config, since the user should not touch that file
-at all (even throughout development of the shallow clone, it
-was never manually edited!).
+$GIT_DIR/shallow lists commit object names and tells Git to
+pretend as if they are root commits (e.g. "git log" traversal
+stops after showing them; "git fsck" does not complain saying
+the commits listed on their "parent" lines do not exist).
Each line contains exactly one SHA-1. When read, a commit_graft
will be constructed, which has nr_parent < 0 to make it easier
to discern from user provided grafts.
+Note that the shallow feature could not be changed easily to
+use replace refs: a commit containing a `mergetag` is not allowed
+to be replaced, not even by a root commit. Such a commit can be
+made shallow, though. Also, having a `shallow` file explicitly
+listing all the commits made shallow makes it a *lot* easier to
+do shallow-specific things such as to deepen the history.
+
Since fsck-objects relies on the library to read the objects,
it honours shallow commits automatically.