summaryrefslogtreecommitdiff
path: root/Documentation/technical
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical')
-rw-r--r--Documentation/technical/api-decorate.txt6
-rw-r--r--Documentation/technical/api-directory-listing.txt27
-rw-r--r--Documentation/technical/api-submodule-config.txt2
-rw-r--r--Documentation/technical/http-protocol.txt8
-rw-r--r--Documentation/technical/index-format.txt19
-rw-r--r--Documentation/technical/pack-protocol.txt43
-rw-r--r--Documentation/technical/partial-clone.txt324
7 files changed, 411 insertions, 18 deletions
diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
deleted file mode 100644
index 1d52a6ce14..0000000000
--- a/Documentation/technical/api-decorate.txt
+++ /dev/null
@@ -1,6 +0,0 @@
-decorate API
-============
-
-Talk about <decorate.h>
-
-(Linus)
diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
index 6c77b4920c..7fae00f44f 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -22,16 +22,20 @@ The notable options are:
`flags`::
- A bit-field of options (the `*IGNORED*` flags are mutually exclusive):
+ A bit-field of options:
`DIR_SHOW_IGNORED`:::
- Return just ignored files in `entries[]`, not untracked files.
+ Return just ignored files in `entries[]`, not untracked
+ files. This flag is mutually exclusive with
+ `DIR_SHOW_IGNORED_TOO`.
`DIR_SHOW_IGNORED_TOO`:::
- Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
- in addition to untracked files in `entries[]`.
+ Similar to `DIR_SHOW_IGNORED`, but return ignored files in
+ `ignored[]` in addition to untracked files in
+ `entries[]`. This flag is mutually exclusive with
+ `DIR_SHOW_IGNORED`.
`DIR_KEEP_UNTRACKED_CONTENTS`:::
@@ -39,6 +43,21 @@ The notable options are:
untracked contents of untracked directories are also returned in
`entries[]`.
+`DIR_SHOW_IGNORED_TOO_MODE_MATCHING`:::
+
+ Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if
+ this is set, returns ignored files and directories that match
+ an exclude pattern. If a directory matches an exclude pattern,
+ then the directory is returned and the contained paths are
+ not. A directory that does not match an exclude pattern will
+ not be returned even if all of its contents are ignored. In
+ this case, the contents are returned as individual entries.
++
+If this is set, files and directories that explicity match an ignore
+pattern are reported. Implicity ignored directories (directories that
+do not match an ignore pattern, but whose contents are all ignored)
+are not reported, instead all of the contents are reported.
+
`DIR_COLLECT_IGNORED`:::
Special mode for git-add. Return ignored files in `ignored[]` and
diff --git a/Documentation/technical/api-submodule-config.txt b/Documentation/technical/api-submodule-config.txt
index 3dce003fda..ee907c4a82 100644
--- a/Documentation/technical/api-submodule-config.txt
+++ b/Documentation/technical/api-submodule-config.txt
@@ -4,7 +4,7 @@ submodule config cache API
The submodule config cache API allows to read submodule
configurations/information from specified revisions. Internally
information is lazily read into a cache that is used to avoid
-unnecessary parsing of the same .gitmodule files. Lookups can be done by
+unnecessary parsing of the same .gitmodules files. Lookups can be done by
submodule path or name.
Usage
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 1c561bdd92..a0e45f2889 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -219,6 +219,10 @@ smart server reply:
S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
+The client may send Extra Parameters (see
+Documentation/technical/pack-protocol.txt) as a colon-separated string
+in the Git-Protocol HTTP header.
+
Dumb Server Response
^^^^^^^^^^^^^^^^^^^^
Dumb servers MUST respond with the dumb server reply format.
@@ -269,7 +273,11 @@ the C locale ordering. The stream SHOULD include the default ref
named `HEAD` as the first ref. The stream MUST include capability
declarations behind a NUL on the first ref.
+The returned response contains "version 1" if "version=1" was sent as an
+Extra Parameter.
+
smart_reply = PKT-LINE("# service=$servicename" LF)
+ *1("version 1")
ref_list
"0000"
ref_list = empty_list / non_empty_list
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index ade0b0c445..db3572626b 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -295,3 +295,22 @@ The remaining data of each directory block is grouped by type:
in the previous ewah bitmap.
- One NUL.
+
+== File System Monitor cache
+
+ The file system monitor cache tracks files for which the core.fsmonitor
+ hook has told us about changes. The signature for this extension is
+ { 'F', 'S', 'M', 'N' }.
+
+ The extension starts with
+
+ - 32-bit version number: the current supported version is 1.
+
+ - 64-bit time: the extension data reflects all changes through the given
+ time which is stored as the nanoseconds elapsed since midnight,
+ January 1, 1970.
+
+ - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
+
+ - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+ is not CE_FSMONITOR_VALID.
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index ed1eae8b83..cd31edc91e 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -39,6 +39,19 @@ communicates with that invoked process over the SSH connection.
The file:// transport runs the 'upload-pack' or 'receive-pack'
process locally and communicates with it over a pipe.
+Extra Parameters
+----------------
+
+The protocol provides a mechanism in which clients can send additional
+information in its first message to the server. These are called "Extra
+Parameters", and are supported by the Git, SSH, and HTTP protocols.
+
+Each Extra Parameter takes the form of `<key>=<value>` or `<key>`.
+
+Servers that receive any such Extra Parameters MUST ignore all
+unrecognized keys. Currently, the only Extra Parameter recognized is
+"version=1".
+
Git Transport
-------------
@@ -46,18 +59,25 @@ The Git transport starts off by sending the command and repository
on the wire using the pkt-line format, followed by a NUL byte and a
hostname parameter, terminated by a NUL byte.
- 0032git-upload-pack /project.git\0host=myserver.com\0
+ 0033git-upload-pack /project.git\0host=myserver.com\0
+
+The transport may send Extra Parameters by adding an additional NUL
+byte, and then adding one or more NUL-terminated strings:
+
+ 003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0
--
- git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
+ git-proto-request = request-command SP pathname NUL
+ [ host-parameter NUL ] [ NUL extra-parameters ]
request-command = "git-upload-pack" / "git-receive-pack" /
"git-upload-archive" ; case sensitive
pathname = *( %x01-ff ) ; exclude NUL
host-parameter = "host=" hostname [ ":" port ]
+ extra-parameters = 1*extra-parameter
+ extra-parameter = 1*( %x01-ff ) NUL
--
-Only host-parameter is allowed in the git-proto-request. Clients
-MUST NOT attempt to send additional parameters. It is used for the
+host-parameter is used for the
git-daemon name based virtual hosting. See --interpolated-path
option to git daemon, with the %H/%CH format characters.
@@ -117,6 +137,12 @@ we execute it without the leading '/'.
v
ssh user@example.com "git-upload-pack '~alice/project.git'"
+Depending on the value of the `protocol.version` configuration variable,
+Git may attempt to send Extra Parameters as a colon-separated string in
+the GIT_PROTOCOL environment variable. This is done only if
+the `ssh.variant` configuration variable indicates that the ssh command
+supports passing environment variables as an argument.
+
A few things to remember here:
- The "command name" is spelled with dash (e.g. git-upload-pack), but
@@ -137,11 +163,13 @@ Reference Discovery
-------------------
When the client initially connects the server will immediately respond
-with a listing of each reference it has (all branches and tags) along
+with a version number (if "version=1" is sent as an Extra Parameter),
+and a listing of each reference it has (all branches and tags) along
with the object name that each reference currently points to.
- $ echo -e -n "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
+ $ echo -e -n "0044git-upload-pack /schacon/gitbook.git\0host=example.com\0\0version=1\0" |
nc -v example.com 9418
+ 000aversion 1
00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack
side-band side-band-64k ofs-delta shallow no-progress include-tag
00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration
@@ -165,7 +193,8 @@ immediately after the ref itself, if presented. A conforming server
MUST peel the ref if it's an annotated tag.
----
- advertised-refs = (no-refs / list-of-refs)
+ advertised-refs = *1("version 1")
+ (no-refs / list-of-refs)
*shallow
flush-pkt
diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
new file mode 100644
index 0000000000..0bed2472c8
--- /dev/null
+++ b/Documentation/technical/partial-clone.txt
@@ -0,0 +1,324 @@
+Partial Clone Design Notes
+==========================
+
+The "Partial Clone" feature is a performance optimization for Git that
+allows Git to function without having a complete copy of the repository.
+The goal of this work is to allow Git better handle extremely large
+repositories.
+
+During clone and fetch operations, Git downloads the complete contents
+and history of the repository. This includes all commits, trees, and
+blobs for the complete life of the repository. For extremely large
+repositories, clones can take hours (or days) and consume 100+GiB of disk
+space.
+
+Often in these repositories there are many blobs and trees that the user
+does not need such as:
+
+ 1. files outside of the user's work area in the tree. For example, in
+ a repository with 500K directories and 3.5M files in every commit,
+ we can avoid downloading many objects if the user only needs a
+ narrow "cone" of the source tree.
+
+ 2. large binary assets. For example, in a repository where large build
+ artifacts are checked into the tree, we can avoid downloading all
+ previous versions of these non-mergeable binary assets and only
+ download versions that are actually referenced.
+
+Partial clone allows us to avoid downloading such unneeded objects *in
+advance* during clone and fetch operations and thereby reduce download
+times and disk usage. Missing objects can later be "demand fetched"
+if/when needed.
+
+Use of partial clone requires that the user be online and the origin
+remote be available for on-demand fetching of missing objects. This may
+or may not be problematic for the user. For example, if the user can
+stay within the pre-selected subset of the source tree, they may not
+encounter any missing objects. Alternatively, the user could try to
+pre-fetch various objects if they know that they are going offline.
+
+
+Non-Goals
+---------
+
+Partial clone is a mechanism to limit the number of blobs and trees downloaded
+*within* a given range of commits -- and is therefore independent of and not
+intended to conflict with existing DAG-level mechanisms to limit the set of
+requested commits (i.e. shallow clone, single branch, or fetch '<refspec>').
+
+
+Design Overview
+---------------
+
+Partial clone logically consists of the following parts:
+
+- A mechanism for the client to describe unneeded or unwanted objects to
+ the server.
+
+- A mechanism for the server to omit such unwanted objects from packfiles
+ sent to the client.
+
+- A mechanism for the client to gracefully handle missing objects (that
+ were previously omitted by the server).
+
+- A mechanism for the client to backfill missing objects as needed.
+
+
+Design Details
+--------------
+
+- A new pack-protocol capability "filter" is added to the fetch-pack and
+ upload-pack negotiation.
+
+ This uses the existing capability discovery mechanism.
+ See "filter" in Documentation/technical/pack-protocol.txt.
+
+- Clients pass a "filter-spec" to clone and fetch which is passed to the
+ server to request filtering during packfile construction.
+
+ There are various filters available to accommodate different situations.
+ See "--filter=<filter-spec>" in Documentation/rev-list-options.txt.
+
+- On the server pack-objects applies the requested filter-spec as it
+ creates "filtered" packfiles for the client.
+
+ These filtered packfiles are *incomplete* in the traditional sense because
+ they may contain objects that reference objects not contained in the
+ packfile and that the client doesn't already have. For example, the
+ filtered packfile may contain trees or tags that reference missing blobs
+ or commits that reference missing trees.
+
+- On the client these incomplete packfiles are marked as "promisor packfiles"
+ and treated differently by various commands.
+
+- On the client a repository extension is added to the local config to
+ prevent older versions of git from failing mid-operation because of
+ missing objects that they cannot handle.
+ See "extensions.partialClone" in Documentation/technical/repository-version.txt"
+
+
+Handling Missing Objects
+------------------------
+
+- An object may be missing due to a partial clone or fetch, or missing due
+ to repository corruption. To differentiate these cases, the local
+ repository specially indicates such filtered packfiles obtained from the
+ promisor remote as "promisor packfiles".
+
+ These promisor packfiles consist of a "<name>.promisor" file with
+ arbitrary contents (like the "<name>.keep" files), in addition to
+ their "<name>.pack" and "<name>.idx" files.
+
+- The local repository considers a "promisor object" to be an object that
+ it knows (to the best of its ability) that the promisor remote has promised
+ that it has, either because the local repository has that object in one of
+ its promisor packfiles, or because another promisor object refers to it.
+
+ When Git encounters a missing object, Git can see if it a promisor object
+ and handle it appropriately. If not, Git can report a corruption.
+
+ This means that there is no need for the client to explicitly maintain an
+ expensive-to-modify list of missing objects.[a]
+
+- Since almost all Git code currently expects any referenced object to be
+ present locally and because we do not want to force every command to do
+ a dry-run first, a fallback mechanism is added to allow Git to attempt
+ to dynamically fetch missing objects from the promisor remote.
+
+ When the normal object lookup fails to find an object, Git invokes
+ fetch-object to try to get the object from the server and then retry
+ the object lookup. This allows objects to be "faulted in" without
+ complicated prediction algorithms.
+
+ For efficiency reasons, no check as to whether the missing object is
+ actually a promisor object is performed.
+
+ Dynamic object fetching tends to be slow as objects are fetched one at
+ a time.
+
+- `checkout` (and any other command using `unpack-trees`) has been taught
+ to bulk pre-fetch all required missing blobs in a single batch.
+
+- `rev-list` has been taught to print missing objects.
+
+ This can be used by other commands to bulk prefetch objects.
+ For example, a "git log -p A..B" may internally want to first do
+ something like "git rev-list --objects --quiet --missing=print A..B"
+ and prefetch those objects in bulk.
+
+- `fsck` has been updated to be fully aware of promisor objects.
+
+- `repack` in GC has been updated to not touch promisor packfiles at all,
+ and to only repack other objects.
+
+- The global variable "fetch_if_missing" is used to control whether an
+ object lookup will attempt to dynamically fetch a missing object or
+ report an error.
+
+ We are not happy with this global variable and would like to remove it,
+ but that requires significant refactoring of the object code to pass an
+ additional flag. We hope that concurrent efforts to add an ODB API can
+ encompass this.
+
+
+Fetching Missing Objects
+------------------------
+
+- Fetching of objects is done using the existing transport mechanism using
+ transport_fetch_refs(), setting a new transport option
+ TRANS_OPT_NO_DEPENDENTS to indicate that only the objects themselves are
+ desired, not any object that they refer to.
+
+ Because some transports invoke fetch_pack() in the same process, fetch_pack()
+ has been updated to not use any object flags when the corresponding argument
+ (no_dependents) is set.
+
+- The local repository sends a request with the hashes of all requested
+ objects as "want" lines, and does not perform any packfile negotiation.
+ It then receives a packfile.
+
+- Because we are reusing the existing fetch-pack mechanism, fetching
+ currently fetches all objects referred to by the requested objects, even
+ though they are not necessary.
+
+
+Current Limitations
+-------------------
+
+- The remote used for a partial clone (or the first partial fetch
+ following a regular clone) is marked as the "promisor remote".
+
+ We are currently limited to a single promisor remote and only that
+ remote may be used for subsequent partial fetches.
+
+ We accept this limitation because we believe initial users of this
+ feature will be using it on repositories with a strong single central
+ server.
+
+- Dynamic object fetching will only ask the promisor remote for missing
+ objects. We assume that the promisor remote has a complete view of the
+ repository and can satisfy all such requests.
+
+- Repack essentially treats promisor and non-promisor packfiles as 2
+ distinct partitions and does not mix them. Repack currently only works
+ on non-promisor packfiles and loose objects.
+
+- Dynamic object fetching invokes fetch-pack once *for each item*
+ because most algorithms stumble upon a missing object and need to have
+ it resolved before continuing their work. This may incur significant
+ overhead -- and multiple authentication requests -- if many objects are
+ needed.
+
+- Dynamic object fetching currently uses the existing pack protocol V0
+ which means that each object is requested via fetch-pack. The server
+ will send a full set of info/refs when the connection is established.
+ If there are large number of refs, this may incur significant overhead.
+
+
+Future Work
+-----------
+
+- Allow more than one promisor remote and define a strategy for fetching
+ missing objects from specific promisor remotes or of iterating over the
+ set of promisor remotes until a missing object is found.
+
+ A user might want to have multiple geographically-close cache servers
+ for fetching missing blobs while continuing to do filtered `git-fetch`
+ commands from the central server, for example.
+
+ Or the user might want to work in a triangular work flow with multiple
+ promisor remotes that each have an incomplete view of the repository.
+
+- Allow repack to work on promisor packfiles (while keeping them distinct
+ from non-promisor packfiles).
+
+- Allow non-pathname-based filters to make use of packfile bitmaps (when
+ present). This was just an omission during the initial implementation.
+
+- Investigate use of a long-running process to dynamically fetch a series
+ of objects, such as proposed in [5,6] to reduce process startup and
+ overhead costs.
+
+ It would be nice if pack protocol V2 could allow that long-running
+ process to make a series of requests over a single long-running
+ connection.
+
+- Investigate pack protocol V2 to avoid the info/refs broadcast on
+ each connection with the server to dynamically fetch missing objects.
+
+- Investigate the need to handle loose promisor objects.
+
+ Objects in promisor packfiles are allowed to reference missing objects
+ that can be dynamically fetched from the server. An assumption was
+ made that loose objects are only created locally and therefore should
+ not reference a missing object. We may need to revisit that assumption
+ if, for example, we dynamically fetch a missing tree and store it as a
+ loose object rather than a single object packfile.
+
+ This does not necessarily mean we need to mark loose objects as promisor;
+ it may be sufficient to relax the object lookup or is-promisor functions.
+
+
+Non-Tasks
+---------
+
+- Every time the subject of "demand loading blobs" comes up it seems
+ that someone suggests that the server be allowed to "guess" and send
+ additional objects that may be related to the requested objects.
+
+ No work has gone into actually doing that; we're just documenting that
+ it is a common suggestion. We're not sure how it would work and have
+ no plans to work on it.
+
+ It is valid for the server to send more objects than requested (even
+ for a dynamic object fetch), but we are not building on that.
+
+
+Footnotes
+---------
+
+[a] expensive-to-modify list of missing objects: Earlier in the design of
+ partial clone we discussed the need for a single list of missing objects.
+ This would essentially be a sorted linear list of OIDs that the were
+ omitted by the server during a clone or subsequent fetches.
+
+ This file would need to be loaded into memory on every object lookup.
+ It would need to be read, updated, and re-written (like the .git/index)
+ on every explicit "git fetch" command *and* on any dynamic object fetch.
+
+ The cost to read, update, and write this file could add significant
+ overhead to every command if there are many missing objects. For example,
+ if there are 100M missing blobs, this file would be at least 2GiB on disk.
+
+ With the "promisor" concept, we *infer* a missing object based upon the
+ type of packfile that references it.
+
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=2
+ Chromium work item for: Partial Clone
+
+[1] https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
+ Subject: [RFC] Add support for downloading blobs on demand
+ Date: Fri, 13 Jan 2017 10:52:53 -0500
+
+[2] https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/
+ Subject: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches)
+ Date: Fri, 29 Sep 2017 13:11:36 -0700
+
+[3] https://public-inbox.org/git/20170426221346.25337-1-jonathantanmy@google.com/
+ Subject: Proposal for missing blob support in Git repos
+ Date: Wed, 26 Apr 2017 15:13:46 -0700
+
+[4] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@jeffhostetler.com/
+ Subject: [PATCH 00/10] RFC Partial Clone and Fetch
+ Date: Wed, 8 Mar 2017 18:50:29 +0000
+
+[5] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/
+ Subject: [PATCH v7 00/10] refactor the filter process code into a reusable module
+ Date: Fri, 5 May 2017 11:27:52 -0400
+
+[6] https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/
+ Subject: [RFC/PATCH v2 0/1] Add support for downloading blobs on demand
+ Date: Fri, 14 Jul 2017 09:26:50 -0400