diff options
author | Junio C Hamano <gitster@pobox.com> | 2020-06-25 12:27:47 -0700 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2020-06-25 12:27:47 -0700 |
commit | 34e849b05a454a2c6487f8fbfa68c39932d22730 (patch) | |
tree | 7a71a357e9f7e2c977849224ca18b2cb9a4461ea /Documentation | |
parent | Merge branch 'ss/submodule-set-branch-in-c' (diff) | |
parent | upload-pack: fix a sparse '0 as NULL pointer' warning (diff) | |
download | tgif-34e849b05a454a2c6487f8fbfa68c39932d22730.tar.xz |
Merge branch 'jt/cdn-offload'
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.
* jt/cdn-offload:
upload-pack: fix a sparse '0 as NULL pointer' warning
upload-pack: send part of packfile response as uri
fetch-pack: support more than one pack lockfile
upload-pack: refactor reading of pack-objects out
Documentation: add Packfile URIs design doc
Documentation: order protocol v2 sections
http-fetch: support fetching packfiles by URL
http-fetch: refactor into function
http: refactor finish_http_pack_request()
http: use --stdin when indexing dumb HTTP pack
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/git-http-fetch.txt | 9 | ||||
-rw-r--r-- | Documentation/technical/packfile-uri.txt | 78 | ||||
-rw-r--r-- | Documentation/technical/protocol-v2.txt | 48 |
3 files changed, 124 insertions, 11 deletions
diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt index 666b042679..4deb4893f5 100644 --- a/Documentation/git-http-fetch.txt +++ b/Documentation/git-http-fetch.txt @@ -9,7 +9,7 @@ git-http-fetch - Download from a remote Git repository via HTTP SYNOPSIS -------- [verse] -'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin] <commit> <url> +'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin | --packfile=<hash> | <commit>] <url> DESCRIPTION ----------- @@ -40,6 +40,13 @@ commit-id:: <commit-id>['\t'<filename-as-in--w>] +--packfile=<hash>:: + Instead of a commit id on the command line (which is not expected in + this case), 'git http-fetch' fetches the packfile directly at the given + URL and uses index-pack to generate corresponding .idx and .keep files. + The hash is used to determine the name of the temporary file and is + arbitrary. The output of index-pack is printed to stdout. + --recover:: Verify that everything reachable from target is fetched. Used after an earlier fetch is interrupted. diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt new file mode 100644 index 0000000000..318713abc3 --- /dev/null +++ b/Documentation/technical/packfile-uri.txt @@ -0,0 +1,78 @@ +Packfile URIs +============= + +This feature allows servers to serve part of their packfile response as URIs. +This allows server designs that improve scalability in bandwidth and CPU usage +(for example, by serving some data through a CDN), and (in the future) provides +some measure of resumability to clients. + +This feature is available only in protocol version 2. + +Protocol +-------- + +The server advertises the `packfile-uris` capability. + +If the client then communicates which protocols (HTTPS, etc.) it supports with +a `packfile-uris` argument, the server MAY send a `packfile-uris` section +directly before the `packfile` section (right after `wanted-refs` if it is +sent) containing URIs of any of the given protocols. The URIs point to +packfiles that use only features that the client has declared that it supports +(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of +this section. + +Clients should then download and index all the given URIs (in addition to +downloading and indexing the packfile given in the `packfile` section of the +response) before performing the connectivity check. + +Server design +------------- + +The server can be trivially made compatible with the proposed protocol by +having it advertise `packfile-uris`, tolerating the client sending +`packfile-uris`, and never sending any `packfile-uris` section. But we should +include some sort of non-trivial implementation in the Minimum Viable Product, +at least so that we can test the client. + +This is the implementation: a feature, marked experimental, that allows the +server to be configured by one or more `uploadpack.blobPackfileUri=<sha1> +<uri>` entries. Whenever the list of objects to be sent is assembled, all such +blobs are excluded, replaced with URIs. The client will download those URIs, +expecting them to each point to packfiles containing single blobs. + +Client design +------------- + +The client has a config variable `fetch.uriprotocols` that determines which +protocols the end user is willing to use. By default, this is empty. + +When the client downloads the given URIs, it should store them with "keep" +files, just like it does with the packfile in the `packfile` section. These +additional "keep" files can only be removed after the refs have been updated - +just like the "keep" file for the packfile in the `packfile` section. + +The division of work (initial fetch + additional URIs) introduces convenient +points for resumption of an interrupted clone - such resumption can be done +after the Minimum Viable Product (see "Future work"). + +Future work +----------- + +The protocol design allows some evolution of the server and client without any +need for protocol changes, so only a small-scoped design is included here to +form the MVP. For example, the following can be done: + + * On the server, more sophisticated means of excluding objects (e.g. by + specifying a commit to represent that commit and all objects that it + references). + * On the client, resumption of clone. If a clone is interrupted, information + could be recorded in the repository's config and a "clone-resume" command + can resume the clone in progress. (Resumption of subsequent fetches is more + difficult because that must deal with the user wanting to use the repository + even after the fetch was interrupted.) + +There are some possible features that will require a change in protocol: + + * Additional HTTP headers (e.g. authentication) + * Byte range support + * Different file formats referenced by URIs (e.g. raw object) diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 3996d70891..5852f499a6 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -325,13 +325,26 @@ included in the client's request: indicating its sideband (1, 2, or 3), and the server may send "0005\2" (a PKT-LINE of sideband 2 with no payload) as a keepalive packet. +If the 'packfile-uris' feature is advertised, the following argument +can be included in the client's request as well as the potential +addition of the 'packfile-uris' section in the server's response as +explained below. + + packfile-uris <comma-separated list of protocols> + Indicates to the server that the client is willing to receive + URIs of any of the given protocols in place of objects in the + sent packfile. Before performing the connectivity check, the + client should download from all given URIs. Currently, the + protocols supported are "http" and "https". + The response of `fetch` is broken into a number of sections separated by delimiter packets (0001), with each section beginning with its section -header. +header. Most sections are sent only when the packfile is sent. - output = *section - section = (acknowledgments | shallow-info | wanted-refs | packfile) - (flush-pkt | delim-pkt) + output = acknowledgements flush-pkt | + [acknowledgments delim-pkt] [shallow-info delim-pkt] + [wanted-refs delim-pkt] [packfile-uris delim-pkt] + packfile flush-pkt acknowledgments = PKT-LINE("acknowledgments" LF) (nak | *ack) @@ -349,13 +362,17 @@ header. *PKT-LINE(wanted-ref LF) wanted-ref = obj-id SP refname + packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri + packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF) + packfile = PKT-LINE("packfile" LF) *PKT-LINE(%x01-03 *%x00-ff) acknowledgments section - * If the client determines that it is finished with negotiations - by sending a "done" line, the acknowledgments sections MUST be - omitted from the server's response. + * If the client determines that it is finished with negotiations by + sending a "done" line (thus requiring the server to send a packfile), + the acknowledgments sections MUST be omitted from the server's + response. * Always begins with the section header "acknowledgments" @@ -406,9 +423,6 @@ header. which the client has not indicated was shallow as a part of its request. - * This section is only included if a packfile section is also - included in the response. - wanted-refs section * This section is only included if the client has requested a ref using a 'want-ref' line and if a packfile section is also @@ -422,6 +436,20 @@ header. * The server MUST NOT send any refs which were not requested using 'want-ref' lines. + packfile-uris section + * This section is only included if the client sent + 'packfile-uris' and the server has at least one such URI to + send. + + * Always begins with the section header "packfile-uris". + + * For each URI the server sends, it sends a hash of the pack's + contents (as output by git index-pack) followed by the URI. + + * The hashes are 40 hex characters long. When Git upgrades to a new + hash algorithm, this might need to be updated. (It should match + whatever index-pack outputs after "pack\t" or "keep\t". + packfile section * This section is only included if the client has sent 'want' lines in its request and either requested that no more |