41 files changed, 3046 insertions, 1028 deletions
diff --git a/Documentation/technical/api-argv-array.txt b/Documentation/technical/api-argv-array.txt
index 1a797812fb..870c8edbfb 100644
--- a/Documentation/technical/api-argv-array.txt
+++ b/Documentation/technical/api-argv-array.txt
@@ -8,7 +8,7 @@ always NULL-terminated at the element pointed to by `argv[argc]`. This
 makes the result suitable for passing to functions expecting to receive
 argv from main(), or the link:api-run-command.html[run-command API].
 
-The link:api-string-list.html[string-list API] is similar, but cannot be
+The string-list API (documented in string-list.h) is similar, but cannot be
 used for these purposes; instead of storing a straight string pointer,
 it contains an item structure with a `util` field that is not compatible
 with the traditional argv interface.
@@ -46,6 +46,9 @@ Functions
 	Format a string and push it onto the end of the array. This is a
 	convenience wrapper combining `strbuf_addf` and `argv_array_push`.
 
+`argv_array_pushv`::
+	Push a null-terminated array of strings onto the end of the array.
+
 `argv_array_pop`::
 	Remove the final element from the array. If there are no
 	elements in the array, do nothing.
@@ -53,3 +56,10 @@ Functions
 `argv_array_clear`::
 	Free all memory associated with the array and return it to the
 	initial, empty state.
+
+`argv_array_detach`::
+	Disconnect the `argv` member from the `argv_array` struct and
+	return it. The caller is responsible for freeing the memory used
+	by the array, and by the strings it references. After detaching,
+	the `argv_array` is in a reinitialized state and can be pushed
+	into again.
diff --git a/Documentation/technical/api-builtin.txt b/Documentation/technical/api-builtin.txt
deleted file mode 100644
index 22a39b9299..0000000000
--- a/Documentation/technical/api-builtin.txt
+++ /dev/null
@@ -1,73 +0,0 @@
-builtin API
-===========
-
-Adding a new built-in
----------------------
-
-There are 4 things to do to add a built-in command implementation to
-Git:
-
-. Define the implementation of the built-in command `foo` with
-  signature:
-
-	int cmd_foo(int argc, const char **argv, const char *prefix);
-
-. Add the external declaration for the function to `builtin.h`.
-
-. Add the command to the `commands[]` table defined in `git.c`.
-  The entry should look like:
-
-	{ "foo", cmd_foo, <options> },
-+
-where options is the bitwise-or of:
-
-`RUN_SETUP`::
-	If there is not a Git directory to work on, abort.  If there
-	is a work tree, chdir to the top of it if the command was
-	invoked in a subdirectory.  If there is no work tree, no
-	chdir() is done.
-
-`RUN_SETUP_GENTLY`::
-	If there is a Git directory, chdir as per RUN_SETUP, otherwise,
-	don't chdir anywhere.
-
-`USE_PAGER`::
-
-	If the standard output is connected to a tty, spawn a pager and
-	feed our output to it.
-
-`NEED_WORK_TREE`::
-
-	Make sure there is a work tree, i.e. the command cannot act
-	on bare repositories.
-	This only makes sense when `RUN_SETUP` is also set.
-
-. Add `builtin/foo.o` to `BUILTIN_OBJS` in `Makefile`.
-
-Additionally, if `foo` is a new command, there are 3 more things to do:
-
-. Add tests to `t/` directory.
-
-. Write documentation in `Documentation/git-foo.txt`.
-
-. Add an entry for `git-foo` to `command-list.txt`.
-
-. Add an entry for `/git-foo` to `.gitignore`.
-
-
-How a built-in is called
-------------------------
-
-The implementation `cmd_foo()` takes three parameters, `argc`, `argv,
-and `prefix`.  The first two are similar to what `main()` of a
-standalone command would be called with.
-
-When `RUN_SETUP` is specified in the `commands[]` table, and when you
-were started from a subdirectory of the work tree, `cmd_foo()` is called
-after chdir(2) to the top of the work tree, and `prefix` gets the path
-to the subdirectory the command started from.  This allows you to
-convert a user-supplied pathname (typically relative to that directory)
-to a pathname relative to the top of the work tree.
-
-The return value from `cmd_foo()` becomes the exit status of the
-command.
diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
index 0d8b99b368..fa39ac9d71 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -47,28 +47,23 @@ will first feed the user-wide one to the callback, and then the
 repo-specific one; by overwriting, the higher-priority repo-specific
 value is left at the end).
 
-The `git_config_with_options` function lets the caller examine config
+The `config_with_options` function lets the caller examine config
 while adjusting some of the default behavior of `git_config`. It should
 almost never be used by "regular" Git code that is looking up
 configuration variables. It is intended for advanced callers like
 `git-config`, which are intentionally tweaking the normal config-lookup
 process. It takes two extra parameters:
 
-`filename`::
-If this parameter is non-NULL, it specifies the name of a file to
-parse for configuration, rather than looking in the usual files. Regular
-`git_config` defaults to `NULL`.
+`config_source`::
+If this parameter is non-NULL, it specifies the source to parse for
+configuration, rather than looking in the usual files. See `struct
+git_config_source` in `config.h` for details. Regular `git_config` defaults
+to `NULL`.
 
-`respect_includes`::
-Specify whether include directives should be followed in parsed files.
-Regular `git_config` defaults to `1`.
-
-There is a special version of `git_config` called `git_config_early`.
-This version takes an additional parameter to specify the repository
-config, instead of having it looked up via `git_path`. This is useful
-early in a Git program before the repository has been found. Unless
-you're working with early setup code, you probably don't want to use
-this.
+`opts`::
+Specify options to adjust the behavior of parsing config files. See `struct
+config_options` in `config.h` for details. As an example: regular `git_config`
+sets `opts.respect_includes` to `1` by default.
 
 Reading Specific Files
 ----------------------
@@ -193,7 +188,7 @@ parsing is successful, the return value is the result.
 Same as `git_config_bool`, except that integers are returned as-is, and
 an `is_bool` flag is unset.
 
-`git_config_maybe_bool`::
+`git_parse_maybe_bool`::
 Same as `git_config_bool`, except that it returns -1 on error rather
 than dying.
 
diff --git a/Documentation/technical/api-credentials.txt b/Documentation/technical/api-credentials.txt
index e44426dd04..75368f26ca 100644
--- a/Documentation/technical/api-credentials.txt
+++ b/Documentation/technical/api-credentials.txt
@@ -243,7 +243,7 @@ appended to its command line, which is one of:
 The details of the credential will be provided on the helper's stdin
 stream. The exact format is the same as the input/output format of the
 `git credential` plumbing command (see the section `INPUT/OUTPUT
-FORMAT` in linkgit:git-credential[7] for a detailed specification).
+FORMAT` in linkgit:git-credential[1] for a detailed specification).
 
 For a `get` operation, the helper should produce a list of attributes
 on stdout in the same format. A helper is free to produce a subset, or
@@ -268,4 +268,4 @@ See also
 
 linkgit:gitcredentials[7]
 
-linkgit:git-config[5] (See configuration variables `credential.*`)
+linkgit:git-config[1] (See configuration variables `credential.*`)
diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
deleted file mode 100644
index 1d52a6ce14..0000000000
--- a/Documentation/technical/api-decorate.txt
+++ /dev/null
@@ -1,6 +0,0 @@
-decorate API
-============
-
-Talk about <decorate.h>
-
-(Linus)
diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
index 7f8e78d916..5abb8e8b1f 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -22,16 +22,41 @@ The notable options are:
 
 `flags`::
 
-	A bit-field of options (the `*IGNORED*` flags are mutually exclusive):
+	A bit-field of options:
 
 `DIR_SHOW_IGNORED`:::
 
-	Return just ignored files in `entries[]`, not untracked files.
+	Return just ignored files in `entries[]`, not untracked
+	files. This flag is mutually exclusive with
+	`DIR_SHOW_IGNORED_TOO`.
 
 `DIR_SHOW_IGNORED_TOO`:::
 
-	Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
-	in addition to untracked files in `entries[]`.
+	Similar to `DIR_SHOW_IGNORED`, but return ignored files in
+	`ignored[]` in addition to untracked files in
+	`entries[]`. This flag is mutually exclusive with
+	`DIR_SHOW_IGNORED`.
+
+`DIR_KEEP_UNTRACKED_CONTENTS`:::
+
+	Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if this is set, the
+	untracked contents of untracked directories are also returned in
+	`entries[]`.
+
+`DIR_SHOW_IGNORED_TOO_MODE_MATCHING`:::
+
+	Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if
+	this is set, returns ignored files and directories that match
+	an exclude pattern. If a directory matches an exclude pattern,
+	then the directory is returned and the contained paths are
+	not. A directory that does not match an exclude pattern will
+	not be returned even if all of its contents are ignored. In
+	this case, the contents are returned as individual entries.
++
+If this is set, files and directories that explicitly match an ignore
+pattern are reported. Implicitly ignored directories (directories that
+do not match an ignore pattern, but whose contents are all ignored)
+are not reported, instead all of the contents are reported.
 
 `DIR_COLLECT_IGNORED`:::
 
diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index fc68db126e..ceeedd485c 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -58,7 +58,7 @@ to `die` or `error` as-is.  For example:
 	if (ref_transaction_commit(transaction, &err))
 		die("%s", err.buf);
 
-The 'err' parameter will be untouched if no error occured, so multiple
+The 'err' parameter will be untouched if no error occurred, so multiple
 function calls can be chained:
 
 	t = ref_transaction_begin(&err);
diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt
index 2602668677..45f0df600f 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -16,10 +16,15 @@ Data Structure
 	of no interest to the calling programs.  The name of the
 	attribute can be retrieved by calling `git_attr_name()`.
 
-`struct git_attr_check`::
+`struct attr_check_item`::
 
-	This structure represents a set of attributes to check in a call
-	to `git_check_attr()` function, and receives the results.
+	This structure represents one attribute and its value.
+
+`struct attr_check`::
+
+	This structure represents a collection of `attr_check_item`.
+	It is passed to `git_check_attr()` function, specifying the
+	attributes to check, and receives their values.
 
 
 Attribute Values
@@ -27,7 +32,7 @@ Attribute Values
 
 An attribute for a path can be in one of four states: Set, Unset,
 Unspecified or set to a string, and `.value` member of `struct
-git_attr_check` records it.  There are three macros to check these:
+attr_check_item` records it.  There are three macros to check these:
 
 `ATTR_TRUE()`::
 
@@ -48,49 +53,51 @@ value of the attribute for the path.
 Querying Specific Attributes
 ----------------------------
 
-* Prepare an array of `struct git_attr_check` to define the list of
-  attributes you would want to check.  To populate this array, you would
-  need to define necessary attributes by calling `git_attr()` function.
+* Prepare `struct attr_check` using attr_check_initl()
+  function, enumerating the names of attributes whose values you are
+  interested in, terminated with a NULL pointer.  Alternatively, an
+  empty `struct attr_check` can be prepared by calling
+  `attr_check_alloc()` function and then attributes you want to
+  ask about can be added to it with `attr_check_append()`
+  function.
 
 * Call `git_check_attr()` to check the attributes for the path.
 
-* Inspect `git_attr_check` structure to see how each of the attribute in
-  the array is defined for the path.
+* Inspect `attr_check` structure to see how each of the
+  attribute in the array is defined for the path.
 
 
 Example
 -------
 
-To see how attributes "crlf" and "indent" are set for different paths.
+To see how attributes "crlf" and "ident" are set for different paths.
 
-. Prepare an array of `struct git_attr_check` with two elements (because
-  we are checking two attributes).  Initialize their `attr` member with
-  pointers to `struct git_attr` obtained by calling `git_attr()`:
+. Prepare a `struct attr_check` with two elements (because
+  we are checking two attributes):
 
 ------------
-static struct git_attr_check check[2];
+static struct attr_check *check;
 static void setup_check(void)
 {
-	if (check[0].attr)
+	if (check)
 		return; /* already done */
-	check[0].attr = git_attr("crlf");
-	check[1].attr = git_attr("ident");
+	check = attr_check_initl("crlf", "ident", NULL);
 }
 ------------
 
-. Call `git_check_attr()` with the prepared array of `struct git_attr_check`:
+. Call `git_check_attr()` with the prepared `struct attr_check`:
 
 ------------
 	const char *path;
 
 	setup_check();
-	git_check_attr(path, ARRAY_SIZE(check), check);
+	git_check_attr(path, check);
 ------------
 
-. Act on `.value` member of the result, left in `check[]`:
+. Act on `.value` member of the result, left in `check->items[]`:
 
 ------------
-	const char *value = check[0].value;
+	const char *value = check->items[0].value;
 
 	if (ATTR_TRUE(value)) {
 		The attribute is Set, by listing only the name of the
@@ -109,20 +116,39 @@ static void setup_check(void)
 	}
 ------------
 
+To see how attributes in argv[] are set for different paths, only
+the first step in the above would be different.
+
+------------
+static struct attr_check *check;
+static void setup_check(const char **argv)
+{
+	check = attr_check_alloc();
+	while (*argv) {
+		struct git_attr *attr = git_attr(*argv);
+		attr_check_append(check, attr);
+		argv++;
+	}
+}
+------------
+
 
 Querying All Attributes
 -----------------------
 
 To get the values of all attributes associated with a file:
 
-* Call `git_all_attrs()`, which returns an array of `git_attr_check`
-  structures.
+* Prepare an empty `attr_check` structure by calling
+  `attr_check_alloc()`.
+
+* Call `git_all_attrs()`, which populates the `attr_check`
+  with the attributes attached to the path.
 
-* Iterate over the `git_attr_check` array to examine the attribute
-  names and values.  The name of the attribute described by a
-  `git_attr_check` object can be retrieved via
-  `git_attr_name(check[i].attr)`.  (Please note that no items will be
-  returned for unset attributes, so `ATTR_UNSET()` will return false
-  for all returned `git_array_check` objects.)
+* Iterate over the `attr_check.items[]` array to examine
+  the attribute names and values.  The name of the attribute
+  described by an `attr_check.items[]` object can be retrieved via
+  `git_attr_name(check->items[i].attr)`.  (Please note that no items
+  will be returned for unset attributes, so `ATTR_UNSET()` will return
+  false for all returned `attr_check.items[]` objects.)
 
-* Free the `git_array_check` array.
+* Free the `attr_check` struct by calling `attr_check_free()`.
diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
deleted file mode 100644
index ad7a5bddd2..0000000000
--- a/Documentation/technical/api-hashmap.txt
+++ /dev/null
@@ -1,280 +0,0 @@
-hashmap API
-===========
-
-The hashmap API is a generic implementation of hash-based key-value mappings.
-
-Data Structures
----------------
-
-`struct hashmap`::
-
-	The hash table structure. Members can be used as follows, but should
-	not be modified directly:
-+
-The `size` member keeps track of the total number of entries (0 means the
-hashmap is empty).
-+
-`tablesize` is the allocated size of the hash table. A non-0 value indicates
-that the hashmap is initialized. It may also be useful for statistical purposes
-(i.e. `size / tablesize` is the current load factor).
-+
-`cmpfn` stores the comparison function specified in `hashmap_init()`. In
-advanced scenarios, it may be useful to change this, e.g. to switch between
-case-sensitive and case-insensitive lookup.
-
-`struct hashmap_entry`::
-
-	An opaque structure representing an entry in the hash table, which must
-	be used as first member of user data structures. Ideally it should be
-	followed by an int-sized member to prevent unused memory on 64-bit
-	systems due to alignment.
-+
-The `hash` member is the entry's hash code and the `next` member points to the
-next entry in case of collisions (i.e. if multiple entries map to the same
-bucket).
-
-`struct hashmap_iter`::
-
-	An iterator structure, to be used with hashmap_iter_* functions.
-
-Types
------
-
-`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
-
-	User-supplied function to test two hashmap entries for equality. Shall
-	return 0 if the entries are equal.
-+
-This function is always called with non-NULL `entry` / `entry_or_key`
-parameters that have the same hash code. When looking up an entry, the `key`
-and `keydata` parameters to hashmap_get and hashmap_remove are always passed
-as second and third argument, respectively. Otherwise, `keydata` is NULL.
-
-Functions
----------
-
-`unsigned int strhash(const char *buf)`::
-`unsigned int strihash(const char *buf)`::
-`unsigned int memhash(const void *buf, size_t len)`::
-`unsigned int memihash(const void *buf, size_t len)`::
-
-	Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
-	http://www.isthe.com/chongo/tech/comp/fnv).
-+
-`strhash` and `strihash` take 0-terminated strings, while `memhash` and
-`memihash` operate on arbitrary-length memory.
-+
-`strihash` and `memihash` are case insensitive versions.
-
-`unsigned int sha1hash(const unsigned char *sha1)`::
-
-	Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
-	for use in hash tables. Cryptographic hashes are supposed to have
-	uniform distribution, so in contrast to `memhash()`, this just copies
-	the first `sizeof(int)` bytes without shuffling any bits. Note that
-	the results will be different on big-endian and little-endian
-	platforms, so they should not be stored or transferred over the net.
-
-`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
-
-	Initializes a hashmap structure.
-+
-`map` is the hashmap to initialize.
-+
-The `equals_function` can be specified to compare two entries for equality.
-If NULL, entries are considered equal if their hash codes are equal.
-+
-If the total number of entries is known in advance, the `initial_size`
-parameter may be used to preallocate a sufficiently large table and thus
-prevent expensive resizing. If 0, the table is dynamically resized.
-
-`void hashmap_free(struct hashmap *map, int free_entries)`::
-
-	Frees a hashmap structure and allocated memory.
-+
-`map` is the hashmap to free.
-+
-If `free_entries` is true, each hashmap_entry in the map is freed as well
-(using stdlib's free()).
-
-`void hashmap_entry_init(void *entry, unsigned int hash)`::
-
-	Initializes a hashmap_entry structure.
-+
-`entry` points to the entry to initialize.
-+
-`hash` is the hash code of the entry.
-
-`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
-
-	Returns the hashmap entry for the specified key, or NULL if not found.
-+
-`map` is the hashmap structure.
-+
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-+
-If an entry with matching hash code is found, `key` and `keydata` are passed
-to `hashmap_cmp_fn` to decide whether the entry matches the key.
-
-`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
-
-	Returns the hashmap entry for the specified hash code and key data,
-	or NULL if not found.
-+
-`map` is the hashmap structure.
-+
-`hash` is the hash code of the entry to look up.
-+
-If an entry with matching hash code is found, `keydata` is passed to
-`hashmap_cmp_fn` to decide whether the entry matches the key. The
-`entry_or_key` parameter points to a bogus hashmap_entry structure that
-should not be used in the comparison.
-
-`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
-
-	Returns the next equal hashmap entry, or NULL if not found. This can be
-	used to iterate over duplicate entries (see `hashmap_add`).
-+
-`map` is the hashmap structure.
-+
-`entry` is the hashmap_entry to start the search from, obtained via a previous
-call to `hashmap_get` or `hashmap_get_next`.
-
-`void hashmap_add(struct hashmap *map, void *entry)`::
-
-	Adds a hashmap entry. This allows to add duplicate entries (i.e.
-	separate values with the same key according to hashmap_cmp_fn).
-+
-`map` is the hashmap structure.
-+
-`entry` is the entry to add.
-
-`void *hashmap_put(struct hashmap *map, void *entry)`::
-
-	Adds or replaces a hashmap entry. If the hashmap contains duplicate
-	entries equal to the specified entry, only one of them will be replaced.
-+
-`map` is the hashmap structure.
-+
-`entry` is the entry to add or replace.
-+
-Returns the replaced entry, or NULL if not found (i.e. the entry was added).
-
-`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
-
-	Removes a hashmap entry matching the specified key. If the hashmap
-	contains duplicate entries equal to the specified key, only one of
-	them will be removed.
-+
-`map` is the hashmap structure.
-+
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-+
-If an entry with matching hash code is found, `key` and `keydata` are
-passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
-+
-Returns the removed entry, or NULL if not found.
-
-`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
-`void *hashmap_iter_next(struct hashmap_iter *iter)`::
-`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
-
-	Used to iterate over all entries of a hashmap.
-+
-`hashmap_iter_init` initializes a `hashmap_iter` structure.
-+
-`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
-more entries.
-+
-`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
-and returns the first entry, if any).
-
-`const char *strintern(const char *string)`::
-`const void *memintern(const void *data, size_t len)`::
-
-	Returns the unique, interned version of the specified string or data,
-	similar to the `String.intern` API in Java and .NET, respectively.
-	Interned strings remain valid for the entire lifetime of the process.
-+
-Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
-strings / data must not be modified or freed.
-+
-Interned strings are best used for short strings with high probability of
-duplicates.
-+
-Uses a hashmap to store the pool of interned strings.
-
-Usage example
--------------
-
-Here's a simple usage example that maps long keys to double values.
-------------
-struct hashmap map;
-
-struct long2double {
-	struct hashmap_entry ent; /* must be the first member! */
-	long key;
-	double value;
-};
-
-static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
-{
-	return !(e1->key == e2->key);
-}
-
-void long2double_init(void)
-{
-	hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
-}
-
-void long2double_free(void)
-{
-	hashmap_free(&map, 1);
-}
-
-static struct long2double *find_entry(long key)
-{
-	struct long2double k;
-	hashmap_entry_init(&k, memhash(&key, sizeof(long)));
-	k.key = key;
-	return hashmap_get(&map, &k, NULL);
-}
-
-double get_value(long key)
-{
-	struct long2double *e = find_entry(key);
-	return e ? e->value : 0;
-}
-
-void set_value(long key, double value)
-{
-	struct long2double *e = find_entry(key);
-	if (!e) {
-		e = malloc(sizeof(struct long2double));
-		hashmap_entry_init(e, memhash(&key, sizeof(long)));
-		e->key = key;
-		hashmap_add(&map, e);
-	}
-	e->value = value;
-}
-------------
-
-Using variable-sized keys
--------------------------
-
-The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
-`hashmap_entry` structure as key to find the correct entry. If the key data is
-variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
-to create a full-fledged entry structure on the heap and copy all the key data
-into the structure.
-
-In this case, the `keydata` parameter can be used to pass
-variable-sized key data directly to the comparison function, and the `key`
-parameter can be a stripped-down, fixed size entry structure allocated on the
-stack.
-
-See test-hashmap.c for an example using arbitrary-length strings as keys.
diff --git a/Documentation/technical/api-in-core-index.txt b/Documentation/technical/api-in-core-index.txt
deleted file mode 100644
index adbdbf5d75..0000000000
--- a/Documentation/technical/api-in-core-index.txt
+++ /dev/null
@@ -1,21 +0,0 @@
-in-core index API
-=================
-
-Talk about <read-cache.c> and <cache-tree.c>, things like:
-
-* cache -> the_index macros
-* read_index()
-* write_index()
-* ie_match_stat() and ie_modified(); how they are different and when to
-  use which.
-* index_name_pos()
-* remove_index_entry_at()
-* remove_file_from_index()
-* add_file_to_index()
-* add_index_entry()
-* refresh_index()
-* discard_index()
-* cache_tree_invalidate_path()
-* cache_tree_update()
-
-(JC, Linus)
diff --git a/Documentation/technical/api-lockfile.txt b/Documentation/technical/api-lockfile.txt
deleted file mode 100644
index 93b5f23e4c..0000000000
--- a/Documentation/technical/api-lockfile.txt
+++ /dev/null
@@ -1,220 +0,0 @@
-lockfile API
-============
-
-The lockfile API serves two purposes:
-
-* Mutual exclusion and atomic file updates. When we want to change a
-  file, we create a lockfile `<filename>.lock`, write the new file
-  contents into it, and then rename the lockfile to its final
-  destination `<filename>`. We create the `<filename>.lock` file with
-  `O_CREAT|O_EXCL` so that we can notice and fail if somebody else has
-  already locked the file, then atomically rename the lockfile to its
-  final destination to commit the changes and unlock the file.
-
-* Automatic cruft removal. If the program exits after we lock a file
-  but before the changes have been committed, we want to make sure
-  that we remove the lockfile. This is done by remembering the
-  lockfiles we have created in a linked list and setting up an
-  `atexit(3)` handler and a signal handler that clean up the
-  lockfiles. This mechanism ensures that outstanding lockfiles are
-  cleaned up if the program exits (including when `die()` is called)
-  or if the program dies on a signal.
-
-Please note that lockfiles only block other writers. Readers do not
-block, but they are guaranteed to see either the old contents of the
-file or the new contents of the file (assuming that the filesystem
-implements `rename(2)` atomically).
-
-
-Calling sequence
-----------------
-
-The caller:
-
-* Allocates a `struct lock_file` either as a static variable or on the
-  heap, initialized to zeros. Once you use the structure to call the
-  `hold_lock_file_*` family of functions, it belongs to the lockfile
-  subsystem and its storage must remain valid throughout the life of
-  the program (i.e. you cannot use an on-stack variable to hold this
-  structure).
-
-* Attempts to create a lockfile by passing that variable and the path
-  of the final destination (e.g. `$GIT_DIR/index`) to
-  `hold_lock_file_for_update` or `hold_lock_file_for_append`.
-
-* Writes new content for the destination file by either:
-
-  * writing to the file descriptor returned by the `hold_lock_file_*`
-    functions (also available via `lock->fd`).
-
-  * calling `fdopen_lock_file` to get a `FILE` pointer for the open
-    file and writing to the file using stdio.
-
-When finished writing, the caller can:
-
-* Close the file descriptor and rename the lockfile to its final
-  destination by calling `commit_lock_file` or `commit_lock_file_to`.
-
-* Close the file descriptor and remove the lockfile by calling
-  `rollback_lock_file`.
-
-* Close the file descriptor without removing or renaming the lockfile
-  by calling `close_lock_file`, and later call `commit_lock_file`,
-  `commit_lock_file_to`, `rollback_lock_file`, or `reopen_lock_file`.
-
-Even after the lockfile is committed or rolled back, the `lock_file`
-object must not be freed or altered by the caller. However, it may be
-reused; just pass it to another call of `hold_lock_file_for_update` or
-`hold_lock_file_for_append`.
-
-If the program exits before you have called one of `commit_lock_file`,
-`commit_lock_file_to`, `rollback_lock_file`, or `close_lock_file`, an
-`atexit(3)` handler will close and remove the lockfile, rolling back
-any uncommitted changes.
-
-If you need to close the file descriptor you obtained from a
-`hold_lock_file_*` function yourself, do so by calling
-`close_lock_file`. You should never call `close(2)` or `fclose(3)`
-yourself! Otherwise the `struct lock_file` structure would still think
-that the file descriptor needs to be closed, and a commit or rollback
-would result in duplicate calls to `close(2)`. Worse yet, if you close
-and then later open another file descriptor for a completely different
-purpose, then a commit or rollback might close that unrelated file
-descriptor.
-
-
-Error handling
---------------
-
-The `hold_lock_file_*` functions return a file descriptor on success
-or -1 on failure (unless `LOCK_DIE_ON_ERROR` is used; see below). On
-errors, `errno` describes the reason for failure. Errors can be
-reported by passing `errno` to one of the following helper functions:
-
-unable_to_lock_message::
-
-	Append an appropriate error message to a `strbuf`.
-
-unable_to_lock_error::
-
-	Emit an appropriate error message using `error()`.
-
-unable_to_lock_die::
-
-	Emit an appropriate error message and `die()`.
-
-Similarly, `commit_lock_file`, `commit_lock_file_to`, and
-`close_lock_file` return 0 on success. On failure they set `errno`
-appropriately, do their best to roll back the lockfile, and return -1.
-
-
-Flags
------
-
-The following flags can be passed to `hold_lock_file_for_update` or
-`hold_lock_file_for_append`:
-
-LOCK_NO_DEREF::
-
-	Usually symbolic links in the destination path are resolved
-	and the lockfile is created by adding ".lock" to the resolved
-	path. If `LOCK_NO_DEREF` is set, then the lockfile is created
-	by adding ".lock" to the path argument itself. This option is
-	used, for example, when locking a symbolic reference, which
-	for backwards-compatibility reasons can be a symbolic link
-	containing the name of the referred-to-reference.
-
-LOCK_DIE_ON_ERROR::
-
-	If a lock is already taken for the file, `die()` with an error
-	message. If this option is not specified, trying to lock a
-	file that is already locked returns -1 to the caller.
-
-
-The functions
--------------
-
-hold_lock_file_for_update::
-
-	Take a pointer to `struct lock_file`, the path of the file to
-	be locked (e.g. `$GIT_DIR/index`) and a flags argument (see
-	above). Attempt to create a lockfile for the destination and
-	return the file descriptor for writing to the file.
-
-hold_lock_file_for_append::
-
-	Like `hold_lock_file_for_update`, but before returning copy
-	the existing contents of the file (if any) to the lockfile and
-	position its write pointer at the end of the file.
-
-fdopen_lock_file::
-
-	Associate a stdio stream with the lockfile. Return NULL
-	(*without* rolling back the lockfile) on error. The stream is
-	closed automatically when `close_lock_file` is called or when
-	the file is committed or rolled back.
-
-get_locked_file_path::
-
-	Return the path of the file that is locked by the specified
-	lock_file object. The caller must free the memory.
-
-commit_lock_file::
-
-	Take a pointer to the `struct lock_file` initialized with an
-	earlier call to `hold_lock_file_for_update` or
-	`hold_lock_file_for_append`, close the file descriptor, and
-	rename the lockfile to its final destination. Return 0 upon
-	success. On failure, roll back the lock file and return -1,
-	with `errno` set to the value from the failing call to
-	`close(2)` or `rename(2)`. It is a bug to call
-	`commit_lock_file` for a `lock_file` object that is not
-	currently locked.
-
-commit_lock_file_to::
-
-	Like `commit_lock_file()`, except that it takes an explicit
-	`path` argument to which the lockfile should be renamed. The
-	`path` must be on the same filesystem as the lock file.
-
-rollback_lock_file::
-
-	Take a pointer to the `struct lock_file` initialized with an
-	earlier call to `hold_lock_file_for_update` or
-	`hold_lock_file_for_append`, close the file descriptor and
-	remove the lockfile. It is a NOOP to call
-	`rollback_lock_file()` for a `lock_file` object that has
-	already been committed or rolled back.
-
-close_lock_file::
-
-	Take a pointer to the `struct lock_file` initialized with an
-	earlier call to `hold_lock_file_for_update` or
-	`hold_lock_file_for_append`. Close the file descriptor (and
-	the file pointer if it has been opened using
-	`fdopen_lock_file`). Return 0 upon success. On failure to
-	`close(2)`, return a negative value and roll back the lock
-	file. Usually `commit_lock_file`, `commit_lock_file_to`, or
-	`rollback_lock_file` should eventually be called if
-	`close_lock_file` succeeds.
-
-reopen_lock_file::
-
-	Re-open a lockfile that has been closed (using
-	`close_lock_file`) but not yet committed or rolled back. This
-	can be used to implement a sequence of operations like the
-	following:
-
-	* Lock file.
-
-	* Write new contents to lockfile, then `close_lock_file` to
-	  cause the contents to be written to disk.
-
-	* Pass the name of the lockfile to another program to allow it
-	  (and nobody else) to inspect the contents you wrote, while
-	  still holding the lock yourself.
-
-	* `reopen_lock_file` to reopen the lockfile. Make further
-	  updates to the contents.
-
-	* `commit_lock_file` to make the final version permanent.
diff --git a/Documentation/technical/api-object-access.txt b/Documentation/technical/api-object-access.txt
index 03bb0e950d..5b29622d00 100644
--- a/Documentation/technical/api-object-access.txt
+++ b/Documentation/technical/api-object-access.txt
@@ -1,13 +1,13 @@
 object access API
 =================
 
-Talk about <sha1_file.c> and <object.h> family, things like
+Talk about <sha1-file.c> and <object.h> family, things like
 
 * read_sha1_file()
 * read_object_with_reference()
 * has_sha1_file()
 * write_sha1_file()
-* pretend_sha1_file()
+* pretend_object_file()
 * lookup_{object,commit,tag,blob,tree}
 * parse_{object,commit,tag,blob,tree}
 * Use of object flags
diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt
new file mode 100644
index 0000000000..9febfb1d52
--- /dev/null
+++ b/Documentation/technical/api-oid-array.txt
@@ -0,0 +1,85 @@
+oid-array API
+==============
+
+The oid-array API provides storage and manipulation of sets of object
+identifiers. The emphasis is on storage and processing efficiency,
+making them suitable for large lists. Note that the ordering of items is
+not preserved over some operations.
+
+Data Structures
+---------------
+
+`struct oid_array`::
+
+	A single array of object IDs. This should be initialized by
+	assignment from `OID_ARRAY_INIT`.  The `oid` member contains
+	the actual data. The `nr` member contains the number of items in
+	the set.  The `alloc` and `sorted` members are used internally,
+	and should not be needed by API callers.
+
+Functions
+---------
+
+`oid_array_append`::
+	Add an item to the set. The object ID will be placed at the end of
+	the array (but note that some operations below may lose this
+	ordering).
+
+`oid_array_lookup`::
+	Perform a binary search of the array for a specific object ID.
+	If found, returns the offset (in number of elements) of the
+	object ID. If not found, returns a negative integer. If the array
+	is not sorted, this function has the side effect of sorting it.
+
+`oid_array_clear`::
+	Free all memory associated with the array and return it to the
+	initial, empty state.
+
+`oid_array_for_each`::
+	Iterate over each element of the list, executing the callback
+	function for each one. Does not sort the list, so any custom
+	hash order is retained. If the callback returns a non-zero
+	value, the iteration ends immediately and the callback's
+	return is propagated; otherwise, 0 is returned.
+
+`oid_array_for_each_unique`::
+	Iterate over each unique element of the list in sorted order,
+	but otherwise behave like `oid_array_for_each`. If the array
+	is not sorted, this function has the side effect of sorting
+	it.
+
+Examples
+--------
+
+-----------------------------------------
+int print_callback(const struct object_id *oid,
+		    void *data)
+{
+	printf("%s\n", oid_to_hex(oid));
+	return 0; /* always continue */
+}
+
+void some_func(void)
+{
+	struct sha1_array hashes = OID_ARRAY_INIT;
+	struct object_id oid;
+
+	/* Read objects into our set */
+	while (read_object_from_stdin(oid.hash))
+		oid_array_append(&hashes, &oid);
+
+	/* Check if some objects are in our set */
+	while (read_object_from_stdin(oid.hash)) {
+		if (oid_array_lookup(&hashes, &oid) >= 0)
+			printf("it's in there!\n");
+
+	/*
+	 * Print the unique set of objects. We could also have
+	 * avoided adding duplicate objects in the first place,
+	 * but we would end up re-sorting the array repeatedly.
+	 * Instead, this will sort once and then skip duplicates
+	 * in linear time.
+	 */
+	oid_array_for_each_unique(&hashes, print_callback, NULL);
+}
+-----------------------------------------
diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt
index 1f2db31312..829b558110 100644
--- a/Documentation/technical/api-parse-options.txt
+++ b/Documentation/technical/api-parse-options.txt
@@ -144,8 +144,12 @@ There are some macros to easily define options:
 
 `OPT_COUNTUP(short, long, &int_var, description)`::
 	Introduce a count-up option.
-	`int_var` is incremented on each use of `--option`, and
-	reset to zero with `--no-option`.
+	Each use of `--option` increments `int_var`, starting from zero
+	(even if initially negative), and `--no-option` resets it to
+	zero. To determine if `--option` or `--no-option` was encountered at
+	all, initialize `int_var` to a negative value, and if it is still
+	negative after parse_options(), then neither `--option` nor
+	`--no-option` was seen.
 
 `OPT_BIT(short, long, &int_var, description, mask)`::
 	Introduce a boolean option.
@@ -164,17 +168,28 @@ There are some macros to easily define options:
 	Introduce an option with string argument.
 	The string argument is put into `str_var`.
 
+`OPT_STRING_LIST(short, long, &struct string_list, arg_str, description)`::
+	Introduce an option with string argument.
+	The string argument is stored as an element in `string_list`.
+	Use of `--no-option` will clear the list of preceding values.
+
 `OPT_INTEGER(short, long, &int_var, description)`::
 	Introduce an option with integer argument.
 	The integer is put into `int_var`.
 
-`OPT_DATE(short, long, &int_var, description)`::
+`OPT_MAGNITUDE(short, long, &unsigned_long_var, description)`::
+	Introduce an option with a size argument. The argument must be a
+	non-negative integer and may include a suffix of 'k', 'm' or 'g' to
+	scale the provided value by 1024, 1024^2 or 1024^3 respectively.
+	The scaled value is put into `unsigned_long_var`.
+
+`OPT_DATE(short, long, &timestamp_t_var, description)`::
 	Introduce an option with date argument, see `approxidate()`.
-	The timestamp is put into `int_var`.
+	The timestamp is put into `timestamp_t_var`.
 
-`OPT_EXPIRY_DATE(short, long, &int_var, description)`::
+`OPT_EXPIRY_DATE(short, long, &timestamp_t_var, description)`::
 	Introduce an option with expiry date argument, see `parse_expiry_date()`.
-	The timestamp is put into `int_var`.
+	The timestamp is put into `timestamp_t_var`.
 
 `OPT_CALLBACK(short, long, &var, arg_str, description, func_ptr)`::
 	Introduce an option with argument.
@@ -212,6 +227,26 @@ There are some macros to easily define options:
 	Use it to hide deprecated options that are still to be recognized
 	and ignored silently.
 
+`OPT_PASSTHRU(short, long, &char_var, arg_str, description, flags)`::
+	Introduce an option that will be reconstructed into a char* string,
+	which must be initialized to NULL. This is useful when you need to
+	pass the command-line option to another command. Any previous value
+	will be overwritten, so this should only be used for options where
+	the last one specified on the command line wins.
+
+`OPT_PASSTHRU_ARGV(short, long, &argv_array_var, arg_str, description, flags)`::
+	Introduce an option where all instances of it on the command-line will
+	be reconstructed into an argv_array. This is useful when you need to
+	pass the command-line option, which can be specified multiple times,
+	to another command.
+
+`OPT_CMDMODE(short, long, &int_var, description, enum_val)`::
+	Define an "operation mode" option, only one of which in the same
+	group of "operating mode" options that share the same `int_var`
+	can be given by the user. `enum_val` is set to `int_var` when the
+	option is used, but an error is reported if other "operating mode"
+	option has already set its value to the same `int_var`.
+
 
 The last element of the array must be `OPT_END()`.
 
diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt
index 02adfd45d3..46c3d5c355 100644
--- a/Documentation/technical/api-ref-iteration.txt
+++ b/Documentation/technical/api-ref-iteration.txt
@@ -6,7 +6,7 @@ Iteration of refs is done by using an iterate function which will call a
 callback function for every ref. The callback function has this
 signature:
 
-	int handle_one_ref(const char *refname, const unsigned char *sha1,
+	int handle_one_ref(const char *refname, const struct object_id *oid,
 			   int flags, void *cb_data);
 
 There are different kinds of iterate functions which all take a
@@ -32,11 +32,8 @@ Iteration functions
 
 * `for_each_glob_ref_in()` the previous and `for_each_ref_in()` combined.
 
-* `head_ref_submodule()`, `for_each_ref_submodule()`,
-  `for_each_ref_in_submodule()`, `for_each_tag_ref_submodule()`,
-  `for_each_branch_ref_submodule()`, `for_each_remote_ref_submodule()`
-  do the same as the functions described above but for a specified
-  submodule.
+* Use `refs_` API for accessing submodules. The submodule ref store could
+  be obtained with `get_submodule_ref_store()`.
 
 * `for_each_rawref()` can be used to learn about broken ref and symref.
 
diff --git a/Documentation/technical/api-remote.txt b/Documentation/technical/api-remote.txt
index 5d245aa9d1..f10941b2e8 100644
--- a/Documentation/technical/api-remote.txt
+++ b/Documentation/technical/api-remote.txt
@@ -51,6 +51,10 @@ struct remote
 
 	The proxy to use for curl (http, https, ftp, etc.) URLs.
 
+`http_proxy_authmethod`::
+
+	The method used for authenticating against `http_proxy`.
+
 struct remotes can be found by name with remote_get(), and iterated
 through with for_each_remote(). remote_get(NULL) will return the
 default remote, given the current branch and configuration.
@@ -97,10 +101,6 @@ It contains:
 
 	The name of the remote listed in the configuration.
 
-`remote`::
-
-	The struct remote for that remote.
-
 `merge_name`::
 
 	An array of the "merge" lines in the configuration.
diff --git a/Documentation/technical/api-run-command.txt b/Documentation/technical/api-run-command.txt
index a9fdb45b93..8bf3e37f53 100644
--- a/Documentation/technical/api-run-command.txt
+++ b/Documentation/technical/api-run-command.txt
@@ -46,6 +46,13 @@ Functions
 	The argument dir corresponds the member .dir. The argument env
 	corresponds to the member .env.
 
+`child_process_clear`::
+
+	Release the memory associated with the struct child_process.
+	Most users of the run-command API don't need to call this
+	function explicitly because `start_command` invokes it on
+	failure and `finish_command` calls it automatically already.
+
 The functions above do the following:
 
 . If a system call failed, errno is set and -1 is returned. A diagnostic
diff --git a/Documentation/technical/api-setup.txt b/Documentation/technical/api-setup.txt
index 540e455689..eb1fa9853e 100644
--- a/Documentation/technical/api-setup.txt
+++ b/Documentation/technical/api-setup.txt
@@ -27,8 +27,6 @@ parse_pathspec(). This function takes several arguments:
 
 - prefix and args come from cmd_* functions
 
-get_pathspec() is obsolete and should never be used in new code.
-
 parse_pathspec() helps catch unsupported features and reject them
 politely. At a lower level, different pathspec-related functions may
 not support the same set of features. Such pathspec-sensitive
diff --git a/Documentation/technical/api-sha1-array.txt b/Documentation/technical/api-sha1-array.txt
deleted file mode 100644
index 3e75497a37..0000000000
--- a/Documentation/technical/api-sha1-array.txt
+++ /dev/null
@@ -1,76 +0,0 @@
-sha1-array API
-==============
-
-The sha1-array API provides storage and manipulation of sets of SHA-1
-identifiers. The emphasis is on storage and processing efficiency,
-making them suitable for large lists. Note that the ordering of items is
-not preserved over some operations.
-
-Data Structures
----------------
-
-`struct sha1_array`::
-
-	A single array of SHA-1 hashes. This should be initialized by
-	assignment from `SHA1_ARRAY_INIT`.  The `sha1` member contains
-	the actual data. The `nr` member contains the number of items in
-	the set.  The `alloc` and `sorted` members are used internally,
-	and should not be needed by API callers.
-
-Functions
----------
-
-`sha1_array_append`::
-	Add an item to the set. The sha1 will be placed at the end of
-	the array (but note that some operations below may lose this
-	ordering).
-
-`sha1_array_lookup`::
-	Perform a binary search of the array for a specific sha1.
-	If found, returns the offset (in number of elements) of the
-	sha1. If not found, returns a negative integer. If the array is
-	not sorted, this function has the side effect of sorting it.
-
-`sha1_array_clear`::
-	Free all memory associated with the array and return it to the
-	initial, empty state.
-
-`sha1_array_for_each_unique`::
-	Efficiently iterate over each unique element of the list,
-	executing the callback function for each one. If the array is
-	not sorted, this function has the side effect of sorting it.
-
-Examples
---------
-
------------------------------------------
-void print_callback(const unsigned char sha1[20],
-		    void *data)
-{
-	printf("%s\n", sha1_to_hex(sha1));
-}
-
-void some_func(void)
-{
-	struct sha1_array hashes = SHA1_ARRAY_INIT;
-	unsigned char sha1[20];
-
-	/* Read objects into our set */
-	while (read_object_from_stdin(sha1))
-		sha1_array_append(&hashes, sha1);
-
-	/* Check if some objects are in our set */
-	while (read_object_from_stdin(sha1)) {
-		if (sha1_array_lookup(&hashes, sha1) >= 0)
-			printf("it's in there!\n");
-
-	/*
-	 * Print the unique set of objects. We could also have
-	 * avoided adding duplicate objects in the first place,
-	 * but we would end up re-sorting the array repeatedly.
-	 * Instead, this will sort once and then skip duplicates
-	 * in linear time.
-	 */
-	sha1_array_for_each_unique(&hashes, print_callback, NULL);
-}
------------------------------------------
diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt
deleted file mode 100644
index c08402b12e..0000000000
--- a/Documentation/technical/api-string-list.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-string-list API
-===============
-
-The string_list API offers a data structure and functions to handle
-sorted and unsorted string lists.  A "sorted" list is one whose
-entries are sorted by string value in `strcmp()` order.
-
-The 'string_list' struct used to be called 'path_list', but was renamed
-because it is not specific to paths.
-
-The caller:
-
-. Allocates and clears a `struct string_list` variable.
-
-. Initializes the members. You might want to set the flag `strdup_strings`
-  if the strings should be strdup()ed. For example, this is necessary
-  when you add something like git_path("..."), since that function returns
-  a static buffer that will change with the next call to git_path().
-+
-If you need something advanced, you can manually malloc() the `items`
-member (you need this if you add things later) and you should set the
-`nr` and `alloc` members in that case, too.
-
-. Adds new items to the list, using `string_list_append`,
-  `string_list_append_nodup`, `string_list_insert`,
-  `string_list_split`, and/or `string_list_split_in_place`.
-
-. Can check if a string is in the list using `string_list_has_string` or
-  `unsorted_string_list_has_string` and get it from the list using
-  `string_list_lookup` for sorted lists.
-
-. Can sort an unsorted list using `string_list_sort`.
-
-. Can remove duplicate items from a sorted list using
-  `string_list_remove_duplicates`.
-
-. Can remove individual items of an unsorted list using
-  `unsorted_string_list_delete_item`.
-
-. Can remove items not matching a criterion from a sorted or unsorted
-  list using `filter_string_list`, or remove empty strings using
-  `string_list_remove_empty_items`.
-
-. Finally it should free the list using `string_list_clear`.
-
-Example:
-
-----
-struct string_list list = STRING_LIST_INIT_NODUP;
-int i;
-
-string_list_append(&list, "foo");
-string_list_append(&list, "bar");
-for (i = 0; i < list.nr; i++)
-	printf("%s\n", list.items[i].string)
-----
-
-NOTE: It is more efficient to build an unsorted list and sort it
-afterwards, instead of building a sorted list (`O(n log n)` instead of
-`O(n^2)`).
-+
-However, if you use the list to check if a certain string was added
-already, you should not do that (using unsorted_string_list_has_string()),
-because the complexity would be quadratic again (but with a worse factor).
-
-Functions
----------
-
-* General ones (works with sorted and unsorted lists as well)
-
-`string_list_init`::
-
-	Initialize the members of the string_list, set `strdup_strings`
-	member according to the value of the second parameter.
-
-`filter_string_list`::
-
-	Apply a function to each item in a list, retaining only the
-	items for which the function returns true.  If free_util is
-	true, call free() on the util members of any items that have
-	to be deleted.  Preserve the order of the items that are
-	retained.
-
-`string_list_remove_empty_items`::
-
-	Remove any empty strings from the list.  If free_util is true,
-	call free() on the util members of any items that have to be
-	deleted.  Preserve the order of the items that are retained.
-
-`print_string_list`::
-
-	Dump a string_list to stdout, useful mainly for debugging purposes. It
-	can take an optional header argument and it writes out the
-	string-pointer pairs of the string_list, each one in its own line.
-
-`string_list_clear`::
-
-	Free a string_list. The `string` pointer of the items will be freed in
-	case the `strdup_strings` member of the string_list is set. The second
-	parameter controls if the `util` pointer of the items should be freed
-	or not.
-
-* Functions for sorted lists only
-
-`string_list_has_string`::
-
-	Determine if the string_list has a given string or not.
-
-`string_list_insert`::
-
-	Insert a new element to the string_list. The returned pointer can be
-	handy if you want to write something to the `util` pointer of the
-	string_list_item containing the just added string. If the given
-	string already exists the insertion will be skipped and the
-	pointer to the existing item returned.
-+
-Since this function uses xrealloc() (which die()s if it fails) if the
-list needs to grow, it is safe not to check the pointer. I.e. you may
-write `string_list_insert(...)->util = ...;`.
-
-`string_list_lookup`::
-
-	Look up a given string in the string_list, returning the containing
-	string_list_item. If the string is not found, NULL is returned.
-
-`string_list_remove_duplicates`::
-
-	Remove all but the first of consecutive entries that have the
-	same string value.  If free_util is true, call free() on the
-	util members of any items that have to be deleted.
-
-* Functions for unsorted lists only
-
-`string_list_append`::
-
-	Append a new string to the end of the string_list.  If
-	`strdup_string` is set, then the string argument is copied;
-	otherwise the new `string_list_entry` refers to the input
-	string.
-
-`string_list_append_nodup`::
-
-	Append a new string to the end of the string_list.  The new
-	`string_list_entry` always refers to the input string, even if
-	`strdup_string` is set.  This function can be used to hand
-	ownership of a malloc()ed string to a `string_list` that has
-	`strdup_string` set.
-
-`string_list_sort`::
-
-	Sort the list's entries by string value in `strcmp()` order.
-
-`unsorted_string_list_has_string`::
-
-	It's like `string_list_has_string()` but for unsorted lists.
-
-`unsorted_string_list_lookup`::
-
-	It's like `string_list_lookup()` but for unsorted lists.
-+
-The above two functions need to look through all items, as opposed to their
-counterpart for sorted lists, which performs a binary search.
-
-`unsorted_string_list_delete_item`::
-
-	Remove an item from a string_list. The `string` pointer of the items
-	will be freed in case the `strdup_strings` member of the string_list
-	is set. The third parameter controls if the `util` pointer of the
-	items should be freed or not.
-
-`string_list_split`::
-`string_list_split_in_place`::
-
-	Split a string into substrings on a delimiter character and
-	append the substrings to a `string_list`.  If `maxsplit` is
-	non-negative, then split at most `maxsplit` times.  Return the
-	number of substrings appended to the list.
-+
-`string_list_split` requires a `string_list` that has `strdup_strings`
-set to true; it leaves the input string untouched and makes copies of
-the substrings in newly-allocated memory.
-`string_list_split_in_place` requires a `string_list` that has
-`strdup_strings` set to false; it splits the input string in place,
-overwriting the delimiter characters with NULs and creating new
-string_list_items that point into the original string (the original
-string must therefore not be modified or freed while the `string_list`
-is in use).
-
-
-Data structures
----------------
-
-* `struct string_list_item`
-
-Represents an item of the list. The `string` member is a pointer to the
-string, and you may use the `util` member for any purpose, if you want.
-
-* `struct string_list`
-
-Represents the list itself.
-
-. The array of items are available via the `items` member.
-. The `nr` member contains the number of items stored in the list.
-. The `alloc` member is used to avoid reallocating at every insertion.
-  You should not tamper with it.
-. Setting the `strdup_strings` member to 1 will strdup() the strings
-  before adding them, see above.
-. The `compare_strings_fn` member is used to specify a custom compare
-  function, otherwise `strcmp()` is used as the default function.
diff --git a/Documentation/technical/api-submodule-config.txt b/Documentation/technical/api-submodule-config.txt
new file mode 100644
index 0000000000..fb06089393
--- /dev/null
+++ b/Documentation/technical/api-submodule-config.txt
@@ -0,0 +1,66 @@
+submodule config cache API
+==========================
+
+The submodule config cache API allows to read submodule
+configurations/information from specified revisions. Internally
+information is lazily read into a cache that is used to avoid
+unnecessary parsing of the same .gitmodules files. Lookups can be done by
+submodule path or name.
+
+Usage
+-----
+
+To initialize the cache with configurations from the worktree the caller
+typically first calls `gitmodules_config()` to read values from the
+worktree .gitmodules and then to overlay the local git config values
+`parse_submodule_config_option()` from the config parsing
+infrastructure.
+
+The caller can look up information about submodules by using the
+`submodule_from_path()` or `submodule_from_name()` functions. They return
+a `struct submodule` which contains the values. The API automatically
+initializes and allocates the needed infrastructure on-demand. If the
+caller does only want to lookup values from revisions the initialization
+can be skipped.
+
+If the internal cache might grow too big or when the caller is done with
+the API, all internally cached values can be freed with submodule_free().
+
+Data Structures
+---------------
+
+`struct submodule`::
+
+	This structure is used to return the information about one
+	submodule for a certain revision. It is returned by the lookup
+	functions.
+
+Functions
+---------
+
+`void submodule_free(struct repository *r)`::
+
+	Use these to free the internally cached values.
+
+`int parse_submodule_config_option(const char *var, const char *value)`::
+
+	Can be passed to the config parsing infrastructure to parse
+	local (worktree) submodule configurations.
+
+`const struct submodule *submodule_from_path(const unsigned char *treeish_name, const char *path)`::
+
+	Given a tree-ish in the superproject and a path, return the
+	submodule that is bound at the path in the named tree.
+
+`const struct submodule *submodule_from_name(const unsigned char *treeish_name, const char *name)`::
+
+	The same as above but lookup by name.
+
+Whenever a submodule configuration is parsed in `parse_submodule_config_option`
+via e.g. `gitmodules_config()`, it will overwrite the null_sha1 entry.
+So in the normal case, when HEAD:.gitmodules is parsed first and then overlayed
+with the repository configuration, the null_sha1 entry contains the local
+configuration of a submodule (e.g. consolidated values from local git
+configuration and the .gitmodules file in the worktree).
+
+For an example usage see test-submodule-config.c.
diff --git a/Documentation/technical/api-trace.txt b/Documentation/technical/api-trace.txt
index 097a651d96..fadb5979c4 100644
--- a/Documentation/technical/api-trace.txt
+++ b/Documentation/technical/api-trace.txt
@@ -28,7 +28,7 @@ static struct trace_key trace_foo = TRACE_KEY_INIT(FOO);
 
 static void trace_print_foo(const char *message)
 {
-	trace_print_key(&trace_foo, message);
+	trace_printf_key(&trace_foo, "%s", message);
 }
 ------------
 +
@@ -95,3 +95,46 @@ for (;;) {
 }
 trace_performance(t, "frotz");
 ------------
+
+Bugs & Caveats
+--------------
+
+GIT_TRACE_* environment variables can be used to tell Git to show
+trace output to its standard error stream. Git can often spawn a pager
+internally to run its subcommand and send its standard output and
+standard error to it.
+
+Because GIT_TRACE_PERFORMANCE trace is generated only at the very end
+of the program with atexit(), which happens after the pager exits, it
+would not work well if you send its log to the standard error output
+and let Git spawn the pager at the same time.
+
+As a work around, you can for example use '--no-pager', or set
+GIT_TRACE_PERFORMANCE to another file descriptor which is redirected
+to stderr, or set GIT_TRACE_PERFORMANCE to a file specified by its
+absolute path.
+
+For example instead of the following command which by default may not
+print any performance information:
+
+------------
+GIT_TRACE_PERFORMANCE=2 git log -1
+------------
+
+you may want to use:
+
+------------
+GIT_TRACE_PERFORMANCE=2 git --no-pager log -1
+------------
+
+or:
+
+------------
+GIT_TRACE_PERFORMANCE=3 3>&2 git log -1
+------------
+
+or:
+
+------------
+GIT_TRACE_PERFORMANCE=/path/to/log/file git log -1
+------------
diff --git a/Documentation/technical/api-tree-walking.txt b/Documentation/technical/api-tree-walking.txt
index 14af37c3f1..bde18622a8 100644
--- a/Documentation/technical/api-tree-walking.txt
+++ b/Documentation/technical/api-tree-walking.txt
@@ -55,9 +55,9 @@ Initializing
 
 `fill_tree_descriptor`::
 
-	Initialize a `tree_desc` and decode its first entry given the sha1 of
-	a tree. Returns the `buffer` member if the sha1 is a valid tree
-	identifier and NULL otherwise.
+	Initialize a `tree_desc` and decode its first entry given the
+	object ID of a tree. Returns the `buffer` member if the latter
+	is a valid tree identifier and NULL otherwise.
 
 `setup_traverse_info`::
 
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 0000000000..cc0474ba3e
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,97 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+These positional references are stored as unsigned 32-bit integers
+corresponding to the array position within the list of commit OIDs. Due
+to some special constants we use to track parents, we can store at most
+(1 << 30) + (1 << 29) + (1 << 28) - 1 (around 1.8 billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+  4-byte signature:
+      The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+      Currently, the only valid version is 1.
+
+  1-byte Hash Version (1 = SHA-1)
+      We infer the hash length (H) from this value.
+
+  1-byte number (C) of "chunks"
+
+  1-byte (reserved for later use)
+     Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+      First 4 bytes describe the chunk id. Value 0 is a terminating label.
+      Other 8 bytes provide the byte-offset in current file for chunk to
+      start. (Chunks are ordered contiguously in the file, so you can infer
+      the length using the next chunk position if necessary.) Each chunk
+      ID appears at most once.
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+      The ith entry, F[i], stores the number of OIDs with first
+      byte at most i. Thus F[255] stores the total
+      number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+      The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)
+    * The first H bytes are for the OID of the root tree.
+    * The next 8 bytes are for the positions of the first two parents
+      of the ith commit. Stores value 0x7000000 if no parent in that
+      position. If there are more than two parents, the second value
+      has its most-significant bit on and the other bits store an array
+      position into the Large Edge List chunk.
+    * The next 8 bytes store the generation number of the commit and
+      the commit time in seconds since EPOCH. The generation number
+      uses the higher 30 bits of the first 4 bytes, while the commit
+      time uses the 32 bits of the second 4 bytes, along with the lowest
+      2 bits of the lowest byte, storing the 33rd and 34th bit of the
+      commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+      This list of 4-byte values store the second through nth parents for
+      all octopus merges. The second parent value in the commit data stores
+      an array position within this list along with the most-significant bit
+      on. Starting at that array position, iterate through this list of commit
+      positions for the parents until reaching a value with the most-significant
+      bit on. The other bits correspond to the position of the last parent.
+
+TRAILER:
+
+	H-byte HASH-checksum of all of the above.
diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
new file mode 100644
index 0000000000..c664acbd76
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,160 @@
+Git Commit Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+   the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+    If A and B are commits with generation numbers N and M, respectively,
+    and N <= M, then A cannot reach B. That is, we know without searching
+    that B is not an ancestor of A because it is further from a root commit
+    than A.
+
+    Conversely, when checking if A is an ancestor of B, then we only need
+    to walk commits until all commits on the walk boundary have generation
+    number at most N. If we walk commits using a priority queue seeded by
+    generation numbers, then we always expand the boundary commit with highest
+    generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+    If A and B are commits with commit time X and Y, respectively, and
+    X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
+in the commit-graph file. If a commit-graph file was written by a version
+of Git that did not compute generation numbers, then those commits will
+have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
+
+Since the commit-graph file is closed under reachability, we can guarantee
+the following weaker condition on all commits:
+
+    If A and B are commits with generation numbers N amd M, respectively,
+    and N < M, then A cannot reach B.
+
+Note how the strict inequality differs from the inequality when we have
+fully-computed generation numbers. Using strict inequality may result in
+walking a few extra commits, but the simplicity in dealing with commits
+with generation number *_INFINITY or *_ZERO is valuable.
+
+We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
+generation numbers are computed to be at least this value. We limit at
+this value since it is the largest value that can be stored in the
+commit-graph file using the 30 bits available to generation numbers. This
+presents another case where a commit can have generation number equal to
+that of a parent.
+
+Design Details
+--------------
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+  .git/objects/info directory. This could be stored in the info directory
+  of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+  so a future change of hash algorithm does not require a change in format.
+
+Future Work
+-----------
+
+- The commit graph feature currently does not honor commit grafts. This can
+  be remedied by duplicating or refactoring the current graft logic.
+
+- After computing and storing generation numbers, we must make graph
+  walks aware of generation numbers to gain the performance benefits they
+  enable. This will mostly be accomplished by swapping a commit-date-ordered
+  priority queue with one ordered by generation number. The following
+  operations are important candidates:
+
+    - 'log --topo-order'
+    - 'tag --merged'
+
+- A server could provide a commit graph file as part of the network protocol
+  to avoid extra calculations by clients. This feature is only of benefit if
+  the user is willing to trust the file, because verifying the file is correct
+  is as hard as computing it from scratch.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+    Chromium work item for: Serialized Commit Graph
+
+[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+    An abandoned patch that introduced generation numbers.
+
+[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+    Discussion about generation numbers on commits and how they interact
+    with fsck.
+
+[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
+    More discussion about generation numbers and not storing them inside
+    commit objects. A valuable quote:
+
+    "I think we should be moving more in the direction of keeping
+     repo-local caches for optimizations. Reachability bitmaps have been
+     a big performance win. I think we should be doing the same with our
+     properties of commits. Not just generation numbers, but making it
+     cheap to access the graph structure without zlib-inflating whole
+     commit objects (i.e., packv4 or something like the "metapacks" I
+     proposed a few years ago)."
+
+[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+    A patch to remove the ahead-behind calculation from 'status'.
diff --git a/Documentation/technical/directory-rename-detection.txt b/Documentation/technical/directory-rename-detection.txt
new file mode 100644
index 0000000000..1c0086e287
--- /dev/null
+++ b/Documentation/technical/directory-rename-detection.txt
@@ -0,0 +1,115 @@
+Directory rename detection
+==========================
+
+Rename detection logic in diffcore-rename that checks for renames of
+individual files is aggregated and analyzed in merge-recursive for cases
+where combinations of renames indicate that a full directory has been
+renamed.
+
+Scope of abilities
+------------------
+
+It is perhaps easiest to start with an example:
+
+  * When all of x/a, x/b and x/c have moved to z/a, z/b and z/c, it is
+    likely that x/d added in the meantime would also want to move to z/d by
+    taking the hint that the entire directory 'x' moved to 'z'.
+
+More interesting possibilities exist, though, such as:
+
+  * one side of history renames x -> z, and the other renames some file to
+    x/e, causing the need for the merge to do a transitive rename.
+
+  * one side of history renames x -> z, but also renames all files within
+    x.  For example, x/a -> z/alpha, x/b -> z/bravo, etc.
+
+  * both 'x' and 'y' being merged into a single directory 'z', with a
+    directory rename being detected for both x->z and y->z.
+
+  * not all files in a directory being renamed to the same location;
+    i.e. perhaps most the files in 'x' are now found under 'z', but a few
+    are found under 'w'.
+
+  * a directory being renamed, which also contained a subdirectory that was
+    renamed to some entirely different location.  (And perhaps the inner
+    directory itself contained inner directories that were renamed to yet
+    other locations).
+
+  * combinations of the above; see t/t6043-merge-rename-directories.sh for
+    various interesting cases.
+
+Limitations -- applicability of directory renames
+-------------------------------------------------
+
+In order to prevent edge and corner cases resulting in either conflicts
+that cannot be represented in the index or which might be too complex for
+users to try to understand and resolve, a couple basic rules limit when
+directory rename detection applies:
+
+  1) If a given directory still exists on both sides of a merge, we do
+     not consider it to have been renamed.
+
+  2) If a subset of to-be-renamed files have a file or directory in the
+     way (or would be in the way of each other), "turn off" the directory
+     rename for those specific sub-paths and report the conflict to the
+     user.
+
+  3) If the other side of history did a directory rename to a path that
+     your side of history renamed away, then ignore that particular
+     rename from the other side of history for any implicit directory
+     renames (but warn the user).
+
+Limitations -- detailed rules and testcases
+-------------------------------------------
+
+t/t6043-merge-rename-directories.sh contains extensive tests and commentary
+which generate and explore the rules listed above.  It also lists a few
+additional rules:
+
+  a) If renames split a directory into two or more others, the directory
+     with the most renames, "wins".
+
+  b) Avoid directory-rename-detection for a path, if that path is the
+     source of a rename on either side of a merge.
+
+  c) Only apply implicit directory renames to directories if the other side
+     of history is the one doing the renaming.
+
+Limitations -- support in different commands
+--------------------------------------------
+
+Directory rename detection is supported by 'merge' and 'cherry-pick'.
+Other git commands which users might be surprised to see limited or no
+directory rename detection support in:
+
+  * diff
+
+    Folks have requested in the past that `git diff` detect directory
+    renames and somehow simplify its output.  It is not clear whether this
+    would be desirable or how the output should be simplified, so this was
+    simply not implemented.  Further, to implement this, directory rename
+    detection logic would need to move from merge-recursive to
+    diffcore-rename.
+
+  * am
+
+    git-am tries to avoid a full three way merge, instead calling
+    git-apply.  That prevents us from detecting renames at all, which may
+    defeat the directory rename detection.  There is a fallback, though; if
+    the initial git-apply fails and the user has specified the -3 option,
+    git-am will fall back to a three way merge.  However, git-am lacks the
+    necessary information to do a "real" three way merge.  Instead, it has
+    to use build_fake_ancestor() to get a merge base that is missing files
+    whose rename may have been important to detect for directory rename
+    detection to function.
+
+  * rebase
+
+    Since am-based rebases work by first generating a bunch of patches
+    (which no longer record what the original commits were and thus don't
+    have the necessary info from which we can find a real merge-base), and
+    then calling git-am, this implies that am-based rebases will not always
+    successfully detect directory renames either (see the 'am' section
+    above).  merged-based rebases (rebase -m) and cherry-pick-based rebases
+    (rebase -i) are not affected by this shortcoming, and fully support
+    directory rename detection.
diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
new file mode 100644
index 0000000000..bc2ace2a6e
--- /dev/null
+++ b/Documentation/technical/hash-function-transition.txt
@@ -0,0 +1,827 @@
+Git hash function transition
+============================
+
+Objective
+---------
+Migrate Git from SHA-1 to a stronger hash function.
+
+Background
+----------
+At its core, the Git version control system is a content addressable
+filesystem. It uses the SHA-1 hash function to name content. For
+example, files, directories, and revisions are referred to by hash
+values unlike in other traditional version control systems where files
+or versions are referred to via sequential numbers. The use of a hash
+function to address its content delivers a few advantages:
+
+* Integrity checking is easy. Bit flips, for example, are easily
+  detected, as the hash of corrupted content does not match its name.
+* Lookup of objects is fast.
+
+Using a cryptographically secure hash function brings additional
+advantages:
+
+* Object names can be signed and third parties can trust the hash to
+  address the signed object and all objects it references.
+* Communication using Git protocol and out of band communication
+  methods have a short reliable string that can be used to reliably
+  address stored content.
+
+Over time some flaws in SHA-1 have been discovered by security
+researchers. On 23 February 2017 the SHAttered attack
+(https://shattered.io) demonstrated a practical SHA-1 hash collision.
+
+Git v2.13.0 and later subsequently moved to a hardened SHA-1
+implementation by default, which isn't vulnerable to the SHAttered
+attack.
+
+Thus Git has in effect already migrated to a new hash that isn't SHA-1
+and doesn't share its vulnerabilities, its new hash function just
+happens to produce exactly the same output for all known inputs,
+except two PDFs published by the SHAttered researchers, and the new
+implementation (written by those researchers) claims to detect future
+cryptanalytic collision attacks.
+
+Regardless, it's considered prudent to move past any variant of SHA-1
+to a new hash. There's no guarantee that future attacks on SHA-1 won't
+be published in the future, and those attacks may not have viable
+mitigations.
+
+If SHA-1 and its variants were to be truly broken, Git's hash function
+could not be considered cryptographically secure any more. This would
+impact the communication of hash values because we could not trust
+that a given hash value represented the known good version of content
+that the speaker intended.
+
+SHA-1 still possesses the other properties such as fast object lookup
+and safe error checking, but other hash functions are equally suitable
+that are believed to be cryptographically secure.
+
+Goals
+-----
+1. The transition to SHA-256 can be done one local repository at a time.
+   a. Requiring no action by any other party.
+   b. A SHA-256 repository can communicate with SHA-1 Git servers
+      (push/fetch).
+   c. Users can use SHA-1 and SHA-256 identifiers for objects
+      interchangeably (see "Object names on the command line", below).
+   d. New signed objects make use of a stronger hash function than
+      SHA-1 for their security guarantees.
+2. Allow a complete transition away from SHA-1.
+   a. Local metadata for SHA-1 compatibility can be removed from a
+      repository if compatibility with SHA-1 is no longer needed.
+3. Maintainability throughout the process.
+   a. The object format is kept simple and consistent.
+   b. Creation of a generalized repository conversion tool.
+
+Non-Goals
+---------
+1. Add SHA-256 support to Git protocol. This is valuable and the
+   logical next step but it is out of scope for this initial design.
+2. Transparently improving the security of existing SHA-1 signed
+   objects.
+3. Intermixing objects using multiple hash functions in a single
+   repository.
+4. Taking the opportunity to fix other bugs in Git's formats and
+   protocols.
+5. Shallow clones and fetches into a SHA-256 repository. (This will
+   change when we add SHA-256 support to Git protocol.)
+6. Skip fetching some submodules of a project into a SHA-256
+   repository. (This also depends on SHA-256 support in Git
+   protocol.)
+
+Overview
+--------
+We introduce a new repository format extension. Repositories with this
+extension enabled use SHA-256 instead of SHA-1 to name their objects.
+This affects both object names and object content --- both the names
+of objects and all references to other objects within an object are
+switched to the new hash function.
+
+SHA-256 repositories cannot be read by older versions of Git.
+
+Alongside the packfile, a SHA-256 repository stores a bidirectional
+mapping between SHA-256 and SHA-1 object names. The mapping is generated
+locally and can be verified using "git fsck". Object lookups use this
+mapping to allow naming objects using either their SHA-1 and SHA-256 names
+interchangeably.
+
+"git cat-file" and "git hash-object" gain options to display an object
+in its sha1 form and write an object given its sha1 form. This
+requires all objects referenced by that object to be present in the
+object database so that they can be named using the appropriate name
+(using the bidirectional hash mapping).
+
+Fetches from a SHA-1 based server convert the fetched objects into
+SHA-256 form and record the mapping in the bidirectional mapping table
+(see below for details). Pushes to a SHA-1 based server convert the
+objects being pushed into sha1 form so the server does not have to be
+aware of the hash function the client is using.
+
+Detailed Design
+---------------
+Repository format extension
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A SHA-256 repository uses repository format version `1` (see
+Documentation/technical/repository-version.txt) with extensions
+`objectFormat` and `compatObjectFormat`:
+
+	[core]
+		repositoryFormatVersion = 1
+	[extensions]
+		objectFormat = sha256
+		compatObjectFormat = sha1
+
+The combination of setting `core.repositoryFormatVersion=1` and
+populating `extensions.*` ensures that all versions of Git later than
+`v0.99.9l` will die instead of trying to operate on the SHA-256
+repository, instead producing an error message.
+
+	# Between v0.99.9l and v2.7.0
+	$ git status
+	fatal: Expected git repo version <= 0, found 1
+	# After v2.7.0
+	$ git status
+	fatal: unknown repository extensions found:
+		objectformat
+		compatobjectformat
+
+See the "Transition plan" section below for more details on these
+repository extensions.
+
+Object names
+~~~~~~~~~~~~
+Objects can be named by their 40 hexadecimal digit sha1-name or 64
+hexadecimal digit sha256-name, plus names derived from those (see
+gitrevisions(7)).
+
+The sha1-name of an object is the SHA-1 of the concatenation of its
+type, length, a nul byte, and the object's sha1-content. This is the
+traditional <sha1> used in Git to name objects.
+
+The sha256-name of an object is the SHA-256 of the concatenation of its
+type, length, a nul byte, and the object's sha256-content.
+
+Object format
+~~~~~~~~~~~~~
+The content as a byte sequence of a tag, commit, or tree object named
+by sha1 and sha256 differ because an object named by sha256-name refers to
+other objects by their sha256-names and an object named by sha1-name
+refers to other objects by their sha1-names.
+
+The sha256-content of an object is the same as its sha1-content, except
+that objects referenced by the object are named using their sha256-names
+instead of sha1-names. Because a blob object does not refer to any
+other object, its sha1-content and sha256-content are the same.
+
+The format allows round-trip conversion between sha256-content and
+sha1-content.
+
+Object storage
+~~~~~~~~~~~~~~
+Loose objects use zlib compression and packed objects use the packed
+format described in Documentation/technical/pack-format.txt, just like
+today. The content that is compressed and stored uses sha256-content
+instead of sha1-content.
+
+Pack index
+~~~~~~~~~~
+Pack index (.idx) files use a new v3 format that supports multiple
+hash functions. They have the following format (all integers are in
+network byte order):
+
+- A header appears at the beginning and consists of the following:
+  - The 4-byte pack index signature: '\377t0c'
+  - 4-byte version number: 3
+  - 4-byte length of the header section, including the signature and
+    version number
+  - 4-byte number of objects contained in the pack
+  - 4-byte number of object formats in this pack index: 2
+  - For each object format:
+    - 4-byte format identifier (e.g., 'sha1' for SHA-1)
+    - 4-byte length in bytes of shortened object names. This is the
+      shortest possible length needed to make names in the shortened
+      object name table unambiguous.
+    - 4-byte integer, recording where tables relating to this format
+      are stored in this index file, as an offset from the beginning.
+  - 4-byte offset to the trailer from the beginning of this file.
+  - Zero or more additional key/value pairs (4-byte key, 4-byte
+    value). Only one key is supported: 'PSRC'. See the "Loose objects
+    and unreachable objects" section for supported values and how this
+    is used.  All other keys are reserved. Readers must ignore
+    unrecognized keys.
+- Zero or more NUL bytes. This can optionally be used to improve the
+  alignment of the full object name table below.
+- Tables for the first object format:
+  - A sorted table of shortened object names.  These are prefixes of
+    the names of all objects in this pack file, packed together
+    without offset values to reduce the cache footprint of the binary
+    search for a specific object name.
+
+  - A table of full object names in pack order. This allows resolving
+    a reference to "the nth object in the pack file" (from a
+    reachability bitmap or from the next table of another object
+    format) to its object name.
+
+  - A table of 4-byte values mapping object name order to pack order.
+    For an object in the table of sorted shortened object names, the
+    value at the corresponding index in this table is the index in the
+    previous table for that same object.
+
+    This can be used to look up the object in reachability bitmaps or
+    to look up its name in another object format.
+
+  - A table of 4-byte CRC32 values of the packed object data, in the
+    order that the objects appear in the pack file. This is to allow
+    compressed data to be copied directly from pack to pack during
+    repacking without undetected data corruption.
+
+  - A table of 4-byte offset values. For an object in the table of
+    sorted shortened object names, the value at the corresponding
+    index in this table indicates where that object can be found in
+    the pack file. These are usually 31-bit pack file offsets, but
+    large offsets are encoded as an index into the next table with the
+    most significant bit set.
+
+  - A table of 8-byte offset entries (empty for pack files less than
+    2 GiB). Pack files are organized with heavily used objects toward
+    the front, so most object references should not need to refer to
+    this table.
+- Zero or more NUL bytes.
+- Tables for the second object format, with the same layout as above,
+  up to and not including the table of CRC32 values.
+- Zero or more NUL bytes.
+- The trailer consists of the following:
+  - A copy of the 20-byte SHA-256 checksum at the end of the
+    corresponding packfile.
+
+  - 20-byte SHA-256 checksum of all of the above.
+
+Loose object index
+~~~~~~~~~~~~~~~~~~
+A new file $GIT_OBJECT_DIR/loose-object-idx contains information about
+all loose objects. Its format is
+
+  # loose-object-idx
+  (sha256-name SP sha1-name LF)*
+
+where the object names are in hexadecimal format. The file is not
+sorted.
+
+The loose object index is protected against concurrent writes by a
+lock file $GIT_OBJECT_DIR/loose-object-idx.lock. To add a new loose
+object:
+
+1. Write the loose object to a temporary file, like today.
+2. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the lock.
+3. Rename the loose object into place.
+4. Open loose-object-idx with O_APPEND and write the new object
+5. Unlink loose-object-idx.lock to release the lock.
+
+To remove entries (e.g. in "git pack-refs" or "git-prune"):
+
+1. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the
+   lock.
+2. Write the new content to loose-object-idx.lock.
+3. Unlink any loose objects being removed.
+4. Rename to replace loose-object-idx, releasing the lock.
+
+Translation table
+~~~~~~~~~~~~~~~~~
+The index files support a bidirectional mapping between sha1-names
+and sha256-names. The lookup proceeds similarly to ordinary object
+lookups. For example, to convert a sha1-name to a sha256-name:
+
+ 1. Look for the object in idx files. If a match is present in the
+    idx's sorted list of truncated sha1-names, then:
+    a. Read the corresponding entry in the sha1-name order to pack
+       name order mapping.
+    b. Read the corresponding entry in the full sha1-name table to
+       verify we found the right object. If it is, then
+    c. Read the corresponding entry in the full sha256-name table.
+       That is the object's sha256-name.
+ 2. Check for a loose object. Read lines from loose-object-idx until
+    we find a match.
+
+Step (1) takes the same amount of time as an ordinary object lookup:
+O(number of packs * log(objects per pack)). Step (2) takes O(number of
+loose objects) time. To maintain good performance it will be necessary
+to keep the number of loose objects low. See the "Loose objects and
+unreachable objects" section below for more details.
+
+Since all operations that make new objects (e.g., "git commit") add
+the new objects to the corresponding index, this mapping is possible
+for all objects in the object store.
+
+Reading an object's sha1-content
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The sha1-content of an object can be read by converting all sha256-names
+its sha256-content references to sha1-names using the translation table.
+
+Fetch
+~~~~~
+Fetching from a SHA-1 based server requires translating between SHA-1
+and SHA-256 based representations on the fly.
+
+SHA-1s named in the ref advertisement that are present on the client
+can be translated to SHA-256 and looked up as local objects using the
+translation table.
+
+Negotiation proceeds as today. Any "have"s generated locally are
+converted to SHA-1 before being sent to the server, and SHA-1s
+mentioned by the server are converted to SHA-256 when looking them up
+locally.
+
+After negotiation, the server sends a packfile containing the
+requested objects. We convert the packfile to SHA-256 format using
+the following steps:
+
+1. index-pack: inflate each object in the packfile and compute its
+   SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
+   objects the client has locally. These objects can be looked up
+   using the translation table and their sha1-content read as
+   described above to resolve the deltas.
+2. topological sort: starting at the "want"s from the negotiation
+   phase, walk through objects in the pack and emit a list of them,
+   excluding blobs, in reverse topologically sorted order, with each
+   object coming later in the list than all objects it references.
+   (This list only contains objects reachable from the "wants". If the
+   pack from the server contained additional extraneous objects, then
+   they will be discarded.)
+3. convert to sha256: open a new (sha256) packfile. Read the topologically
+   sorted list just generated. For each object, inflate its
+   sha1-content, convert to sha256-content, and write it to the sha256
+   pack. Record the new sha1<->sha256 mapping entry for use in the idx.
+4. sort: reorder entries in the new pack to match the order of objects
+   in the pack the server generated and include blobs. Write a sha256 idx
+   file
+5. clean up: remove the SHA-1 based pack file, index, and
+   topologically sorted list obtained from the server in steps 1
+   and 2.
+
+Step 3 requires every object referenced by the new object to be in the
+translation table. This is why the topological sort step is necessary.
+
+As an optimization, step 1 could write a file describing what non-blob
+objects each object it has inflated from the packfile references. This
+makes the topological sort in step 2 possible without inflating the
+objects in the packfile for a second time. The objects need to be
+inflated again in step 3, for a total of two inflations.
+
+Step 4 is probably necessary for good read-time performance. "git
+pack-objects" on the server optimizes the pack file for good data
+locality (see Documentation/technical/pack-heuristics.txt).
+
+Details of this process are likely to change. It will take some
+experimenting to get this to perform well.
+
+Push
+~~~~
+Push is simpler than fetch because the objects referenced by the
+pushed objects are already in the translation table. The sha1-content
+of each object being pushed can be read as described in the "Reading
+an object's sha1-content" section to generate the pack written by git
+send-pack.
+
+Signed Commits
+~~~~~~~~~~~~~~
+We add a new field "gpgsig-sha256" to the commit object format to allow
+signing commits without relying on SHA-1. It is similar to the
+existing "gpgsig" field. Its signed payload is the sha256-content of the
+commit object with any "gpgsig" and "gpgsig-sha256" fields removed.
+
+This means commits can be signed
+1. using SHA-1 only, as in existing signed commit objects
+2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
+   fields.
+3. using only SHA-256, by only using the gpgsig-sha256 field.
+
+Old versions of "git verify-commit" can verify the gpgsig signature in
+cases (1) and (2) without modifications and view case (3) as an
+ordinary unsigned commit.
+
+Signed Tags
+~~~~~~~~~~~
+We add a new field "gpgsig-sha256" to the tag object format to allow
+signing tags without relying on SHA-1. Its signed payload is the
+sha256-content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
+SIGNATURE-----" delimited in-body signature removed.
+
+This means tags can be signed
+1. using SHA-1 only, as in existing signed tag objects
+2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
+   signature.
+3. using only SHA-256, by only using the gpgsig-sha256 field.
+
+Mergetag embedding
+~~~~~~~~~~~~~~~~~~
+The mergetag field in the sha1-content of a commit contains the
+sha1-content of a tag that was merged by that commit.
+
+The mergetag field in the sha256-content of the same commit contains the
+sha256-content of the same tag.
+
+Submodules
+~~~~~~~~~~
+To convert recorded submodule pointers, you need to have the converted
+submodule repository in place. The translation table of the submodule
+can be used to look up the new hash.
+
+Loose objects and unreachable objects
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Fast lookups in the loose-object-idx require that the number of loose
+objects not grow too high.
+
+"git gc --auto" currently waits for there to be 6700 loose objects
+present before consolidating them into a packfile. We will need to
+measure to find a more appropriate threshold for it to use.
+
+"git gc --auto" currently waits for there to be 50 packs present
+before combining packfiles. Packing loose objects more aggressively
+may cause the number of pack files to grow too quickly. This can be
+mitigated by using a strategy similar to Martin Fick's exponential
+rolling garbage collection script:
+https://gerrit-review.googlesource.com/c/gerrit/+/35215
+
+"git gc" currently expels any unreachable objects it encounters in
+pack files to loose objects in an attempt to prevent a race when
+pruning them (in case another process is simultaneously writing a new
+object that refers to the about-to-be-deleted object). This leads to
+an explosion in the number of loose objects present and disk space
+usage due to the objects in delta form being replaced with independent
+loose objects.  Worse, the race is still present for loose objects.
+
+Instead, "git gc" will need to move unreachable objects to a new
+packfile marked as UNREACHABLE_GARBAGE (using the PSRC field; see
+below). To avoid the race when writing new objects referring to an
+about-to-be-deleted object, code paths that write new objects will
+need to copy any objects from UNREACHABLE_GARBAGE packs that they
+refer to to new, non-UNREACHABLE_GARBAGE packs (or loose objects).
+UNREACHABLE_GARBAGE are then safe to delete if their creation time (as
+indicated by the file's mtime) is long enough ago.
+
+To avoid a proliferation of UNREACHABLE_GARBAGE packs, they can be
+combined under certain circumstances. If "gc.garbageTtl" is set to
+greater than one day, then packs created within a single calendar day,
+UTC, can be coalesced together. The resulting packfile would have an
+mtime before midnight on that day, so this makes the effective maximum
+ttl the garbageTtl + 1 day. If "gc.garbageTtl" is less than one day,
+then we divide the calendar day into intervals one-third of that ttl
+in duration. Packs created within the same interval can be coalesced
+together. The resulting packfile would have an mtime before the end of
+the interval, so this makes the effective maximum ttl equal to the
+garbageTtl * 4/3.
+
+This rule comes from Thirumala Reddy Mutchukota's JGit change
+https://git.eclipse.org/r/90465.
+
+The UNREACHABLE_GARBAGE setting goes in the PSRC field of the pack
+index. More generally, that field indicates where a pack came from:
+
+ - 1 (PACK_SOURCE_RECEIVE) for a pack received over the network
+ - 2 (PACK_SOURCE_AUTO) for a pack created by a lightweight
+   "gc --auto" operation
+ - 3 (PACK_SOURCE_GC) for a pack created by a full gc
+ - 4 (PACK_SOURCE_UNREACHABLE_GARBAGE) for potential garbage
+   discovered by gc
+ - 5 (PACK_SOURCE_INSERT) for locally created objects that were
+   written directly to a pack file, e.g. from "git add ."
+
+This information can be useful for debugging and for "gc --auto" to
+make appropriate choices about which packs to coalesce.
+
+Caveats
+-------
+Invalid objects
+~~~~~~~~~~~~~~~
+The conversion from sha1-content to sha256-content retains any
+brokenness in the original object (e.g., tree entry modes encoded with
+leading 0, tree objects whose paths are not sorted correctly, and
+commit objects without an author or committer). This is a deliberate
+feature of the design to allow the conversion to round-trip.
+
+More profoundly broken objects (e.g., a commit with a truncated "tree"
+header line) cannot be converted but were not usable by current Git
+anyway.
+
+Shallow clone and submodules
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Because it requires all referenced objects to be available in the
+locally generated translation table, this design does not support
+shallow clone or unfetched submodules. Protocol improvements might
+allow lifting this restriction.
+
+Alternates
+~~~~~~~~~~
+For the same reason, a sha256 repository cannot borrow objects from a
+sha1 repository using objects/info/alternates or
+$GIT_ALTERNATE_OBJECT_REPOSITORIES.
+
+git notes
+~~~~~~~~~
+The "git notes" tool annotates objects using their sha1-name as key.
+This design does not describe a way to migrate notes trees to use
+sha256-names. That migration is expected to happen separately (for
+example using a file at the root of the notes tree to describe which
+hash it uses).
+
+Server-side cost
+~~~~~~~~~~~~~~~~
+Until Git protocol gains SHA-256 support, using SHA-256 based storage
+on public-facing Git servers is strongly discouraged. Once Git
+protocol gains SHA-256 support, SHA-256 based servers are likely not
+to support SHA-1 compatibility, to avoid what may be a very expensive
+hash reencode during clone and to encourage peers to modernize.
+
+The design described here allows fetches by SHA-1 clients of a
+personal SHA-256 repository because it's not much more difficult than
+allowing pushes from that repository. This support needs to be guarded
+by a configuration option --- servers like git.kernel.org that serve a
+large number of clients would not be expected to bear that cost.
+
+Meaning of signatures
+~~~~~~~~~~~~~~~~~~~~~
+The signed payload for signed commits and tags does not explicitly
+name the hash used to identify objects. If some day Git adopts a new
+hash function with the same length as the current SHA-1 (40
+hexadecimal digit) or SHA-256 (64 hexadecimal digit) objects then the
+intent behind the PGP signed payload in an object signature is
+unclear:
+
+	object e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7
+	type commit
+	tag v2.12.0
+	tagger Junio C Hamano <gitster@pobox.com> 1487962205 -0800
+
+	Git 2.12
+
+Does this mean Git v2.12.0 is the commit with sha1-name
+e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with
+new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7?
+
+Fortunately SHA-256 and SHA-1 have different lengths. If Git starts
+using another hash with the same length to name objects, then it will
+need to change the format of signed payloads using that hash to
+address this issue.
+
+Object names on the command line
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+To support the transition (see Transition plan below), this design
+supports four different modes of operation:
+
+ 1. ("dark launch") Treat object names input by the user as SHA-1 and
+    convert any object names written to output to SHA-1, but store
+    objects using SHA-256.  This allows users to test the code with no
+    visible behavior change except for performance.  This allows
+    allows running even tests that assume the SHA-1 hash function, to
+    sanity-check the behavior of the new mode.
+
+ 2. ("early transition") Allow both SHA-1 and SHA-256 object names in
+    input. Any object names written to output use SHA-1. This allows
+    users to continue to make use of SHA-1 to communicate with peers
+    (e.g. by email) that have not migrated yet and prepares for mode 3.
+
+ 3. ("late transition") Allow both SHA-1 and SHA-256 object names in
+    input. Any object names written to output use SHA-256. In this
+    mode, users are using a more secure object naming method by
+    default.  The disruption is minimal as long as most of their peers
+    are in mode 2 or mode 3.
+
+ 4. ("post-transition") Treat object names input by the user as
+    SHA-256 and write output using SHA-256. This is safer than mode 3
+    because there is less risk that input is incorrectly interpreted
+    using the wrong hash function.
+
+The mode is specified in configuration.
+
+The user can also explicitly specify which format to use for a
+particular revision specifier and for output, overriding the mode. For
+example:
+
+git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
+
+Choice of Hash
+--------------
+In early 2005, around the time that Git was written,  Xiaoyun Wang,
+Yiqun Lisa Yin, and Hongbo Yu announced an attack finding SHA-1
+collisions in 2^69 operations. In August they published details.
+Luckily, no practical demonstrations of a collision in full SHA-1 were
+published until 10 years later, in 2017.
+
+Git v2.13.0 and later subsequently moved to a hardened SHA-1
+implementation by default that mitigates the SHAttered attack, but
+SHA-1 is still believed to be weak.
+
+The hash to replace this hardened SHA-1 should be stronger than SHA-1
+was: we would like it to be trustworthy and useful in practice for at
+least 10 years.
+
+Some other relevant properties:
+
+1. A 256-bit hash (long enough to match common security practice; not
+   excessively long to hurt performance and disk usage).
+
+2. High quality implementations should be widely available (e.g., in
+   OpenSSL and Apple CommonCrypto).
+
+3. The hash function's properties should match Git's needs (e.g. Git
+   requires collision and 2nd preimage resistance and does not require
+   length extension resistance).
+
+4. As a tiebreaker, the hash should be fast to compute (fortunately
+   many contenders are faster than SHA-1).
+
+We choose SHA-256.
+
+Transition plan
+---------------
+Some initial steps can be implemented independently of one another:
+- adding a hash function API (vtable)
+- teaching fsck to tolerate the gpgsig-sha256 field
+- excluding gpgsig-* from the fields copied by "git commit --amend"
+- annotating tests that depend on SHA-1 values with a SHA1 test
+  prerequisite
+- using "struct object_id", GIT_MAX_RAWSZ, and GIT_MAX_HEXSZ
+  consistently instead of "unsigned char *" and the hardcoded
+  constants 20 and 40.
+- introducing index v3
+- adding support for the PSRC field and safer object pruning
+
+
+The first user-visible change is the introduction of the objectFormat
+extension (without compatObjectFormat). This requires:
+- implementing the loose-object-idx
+- teaching fsck about this mode of operation
+- using the hash function API (vtable) when computing object names
+- signing objects and verifying signatures
+- rejecting attempts to fetch from or push to an incompatible
+  repository
+
+Next comes introduction of compatObjectFormat:
+- translating object names between object formats
+- translating object content between object formats
+- generating and verifying signatures in the compat format
+- adding appropriate index entries when adding a new object to the
+  object store
+- --output-format option
+- ^{sha1} and ^{sha256} revision notation
+- configuration to specify default input and output format (see
+  "Object names on the command line" above)
+
+The next step is supporting fetches and pushes to SHA-1 repositories:
+- allow pushes to a repository using the compat format
+- generate a topologically sorted list of the SHA-1 names of fetched
+  objects
+- convert the fetched packfile to sha256 format and generate an idx
+  file
+- re-sort to match the order of objects in the fetched packfile
+
+The infrastructure supporting fetch also allows converting an existing
+repository. In converted repositories and new clones, end users can
+gain support for the new hash function without any visible change in
+behavior (see "dark launch" in the "Object names on the command line"
+section). In particular this allows users to verify SHA-256 signatures
+on objects in the repository, and it should ensure the transition code
+is stable in production in preparation for using it more widely.
+
+Over time projects would encourage their users to adopt the "early
+transition" and then "late transition" modes to take advantage of the
+new, more futureproof SHA-256 object names.
+
+When objectFormat and compatObjectFormat are both set, commands
+generating signatures would generate both SHA-1 and SHA-256 signatures
+by default to support both new and old users.
+
+In projects using SHA-256 heavily, users could be encouraged to adopt
+the "post-transition" mode to avoid accidentally making implicit use
+of SHA-1 object names.
+
+Once a critical mass of users have upgraded to a version of Git that
+can verify SHA-256 signatures and have converted their existing
+repositories to support verifying them, we can add support for a
+setting to generate only SHA-256 signatures. This is expected to be at
+least a year later.
+
+That is also a good moment to advertise the ability to convert
+repositories to use SHA-256 only, stripping out all SHA-1 related
+metadata. This improves performance by eliminating translation
+overhead and security by avoiding the possibility of accidentally
+relying on the safety of SHA-1.
+
+Updating Git's protocols to allow a server to specify which hash
+functions it supports is also an important part of this transition. It
+is not discussed in detail in this document but this transition plan
+assumes it happens. :)
+
+Alternatives considered
+-----------------------
+Upgrading everyone working on a particular project on a flag day
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Projects like the Linux kernel are large and complex enough that
+flipping the switch for all projects based on the repository at once
+is infeasible.
+
+Not only would all developers and server operators supporting
+developers have to switch on the same flag day, but supporting tooling
+(continuous integration, code review, bug trackers, etc) would have to
+be adapted as well. This also makes it difficult to get early feedback
+from some project participants testing before it is time for mass
+adoption.
+
+Using hash functions in parallel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+(e.g. https://public-inbox.org/git/22708.8913.864049.452252@chiark.greenend.org.uk/ )
+Objects newly created would be addressed by the new hash, but inside
+such an object (e.g. commit) it is still possible to address objects
+using the old hash function.
+* You cannot trust its history (needed for bisectability) in the
+  future without further work
+* Maintenance burden as the number of supported hash functions grows
+  (they will never go away, so they accumulate). In this proposal, by
+  comparison, converted objects lose all references to SHA-1.
+
+Signed objects with multiple hashes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Instead of introducing the gpgsig-sha256 field in commit and tag objects
+for sha256-content based signatures, an earlier version of this design
+added "hash sha256 <sha256-name>" fields to strengthen the existing
+sha1-content based signatures.
+
+In other words, a single signature was used to attest to the object
+content using both hash functions. This had some advantages:
+* Using one signature instead of two speeds up the signing process.
+* Having one signed payload with both hashes allows the signer to
+  attest to the sha1-name and sha256-name referring to the same object.
+* All users consume the same signature. Broken signatures are likely
+  to be detected quickly using current versions of git.
+
+However, it also came with disadvantages:
+* Verifying a signed object requires access to the sha1-names of all
+  objects it references, even after the transition is complete and
+  translation table is no longer needed for anything else. To support
+  this, the design added fields such as "hash sha1 tree <sha1-name>"
+  and "hash sha1 parent <sha1-name>" to the sha256-content of a signed
+  commit, complicating the conversion process.
+* Allowing signed objects without a sha1 (for after the transition is
+  complete) complicated the design further, requiring a "nohash sha1"
+  field to suppress including "hash sha1" fields in the sha256-content
+  and signed payload.
+
+Lazily populated translation table
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Some of the work of building the translation table could be deferred to
+push time, but that would significantly complicate and slow down pushes.
+Calculating the sha1-name at object creation time at the same time it is
+being streamed to disk and having its sha256-name calculated should be
+an acceptable cost.
+
+Document History
+----------------
+
+2017-03-03
+bmwill@google.com, jonathantanmy@google.com, jrnieder@gmail.com,
+sbeller@google.com
+
+Initial version sent to
+http://public-inbox.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com
+
+2017-03-03 jrnieder@gmail.com
+Incorporated suggestions from jonathantanmy and sbeller:
+* describe purpose of signed objects with each hash type
+* redefine signed object verification using object content under the
+  first hash function
+
+2017-03-06 jrnieder@gmail.com
+* Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2]
+* Make sha3-based signatures a separate field, avoiding the need for
+  "hash" and "nohash" fields (thanks to peff[3]).
+* Add a sorting phase to fetch (thanks to Junio for noticing the need
+  for this).
+* Omit blobs from the topological sort during fetch (thanks to peff).
+* Discuss alternates, git notes, and git servers in the caveats
+  section (thanks to Junio Hamano, brian m. carlson[4], and Shawn
+  Pearce).
+* Clarify language throughout (thanks to various commenters,
+  especially Junio).
+
+2017-09-27 jrnieder@gmail.com, sbeller@google.com
+* use placeholder NewHash instead of SHA3-256
+* describe criteria for picking a hash function.
+* include a transition plan (thanks especially to Brandon Williams
+  for fleshing these ideas out)
+* define the translation table (thanks, Shawn Pearce[5], Jonathan
+  Tan, and Masaya Suzuki)
+* avoid loose object overhead by packing more aggressively in
+  "git gc --auto"
+
+Later history:
+
+ See the history of this file in git.git for the history of subsequent
+ edits. This document history is no longer being maintained as it
+ would now be superfluous to the commit log
+
+[1] http://public-inbox.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/
+[2] http://public-inbox.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/
+[3] http://public-inbox.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/
+[4] http://public-inbox.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net
+[5] https://public-inbox.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 229f845dfa..9c5b6f0fac 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -214,10 +214,16 @@ smart server reply:
    S: Cache-Control: no-cache
    S:
    S: 001e# service=git-upload-pack\n
+   S: 0000
    S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n
    S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n
    S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
    S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
+   S: 0000
+
+The client may send Extra Parameters (see
+Documentation/technical/pack-protocol.txt) as a colon-separated string
+in the Git-Protocol HTTP header.
 
 Dumb Server Response
 ^^^^^^^^^^^^^^^^^^^^
@@ -269,7 +275,12 @@ the C locale ordering.  The stream SHOULD include the default ref
 named `HEAD` as the first ref.  The stream MUST include capability
 declarations behind a NUL on the first ref.
 
+The returned response contains "version 1" if "version=1" was sent as an
+Extra Parameter.
+
   smart_reply     =  PKT-LINE("# service=$servicename" LF)
+		     "0000"
+		     *1("version 1")
 		     ref_list
 		     "0000"
   ref_list        =  empty_list / non_empty_list
@@ -319,18 +330,19 @@ Servers SHOULD support all capabilities defined here.
 Clients MUST send at least one "want" command in the request body.
 Clients MUST NOT reference an id in a "want" command which did not
 appear in the response obtained through ref discovery unless the
-server advertises capability `allow-tip-sha1-in-want`.
+server advertises capability `allow-tip-sha1-in-want` or
+`allow-reachable-sha1-in-want`.
 
   compute_request   =  want_list
 		       have_list
 		       request_end
   request_end       =  "0000" / "done"
 
-  want_list         =  PKT-LINE(want NUL cap_list LF)
+  want_list         =  PKT-LINE(want SP cap_list LF)
 		       *(want_pkt)
   want_pkt          =  PKT-LINE(want LF)
   want              =  "want" SP id
-  cap_list          =  *(SP capability) SP
+  cap_list          =  capability *(SP capability)
 
   have_list         =  *PKT-LINE("have" SP id LF)
 
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index 35112e4966..db3572626b 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -170,7 +170,7 @@ Git index format
 
   The entries are written out in the top-down, depth-first order.  The
   first entry represents the root level of the repository, followed by the
-  first subtree---let's call this A---of the root level (with its name
+  first subtree--let's call this A--of the root level (with its name
   relative to the root level), followed by the first subtree of A (with
   its name relative to A), ...
 
@@ -233,3 +233,84 @@ Git index format
   The remaining index entries after replaced ones will be added to the
   final index. These added entries are also sorted by entry name then
   stage.
+
+== Untracked cache
+
+  Untracked cache saves the untracked file list and necessary data to
+  verify the cache. The signature for this extension is { 'U', 'N',
+  'T', 'R' }.
+
+  The extension starts with
+
+  - A sequence of NUL-terminated strings, preceded by the size of the
+    sequence in variable width encoding. Each string describes the
+    environment where the cache can be used.
+
+  - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
+    ctime field until "file size".
+
+  - Stat data of core.excludesfile
+
+  - 32-bit dir_flags (see struct dir_struct)
+
+  - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
+    does not exist.
+
+  - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
+    not exist.
+
+  - NUL-terminated string of per-dir exclude file name. This usually
+    is ".gitignore".
+
+  - The number of following directory blocks, variable width
+    encoding. If this number is zero, the extension ends here with a
+    following NUL.
+
+  - A number of directory blocks in depth-first-search order, each
+    consists of
+
+    - The number of untracked entries, variable width encoding.
+
+    - The number of sub-directory blocks, variable width encoding.
+
+    - The directory name terminated by NUL.
+
+    - A number of untracked file/dir names terminated by NUL.
+
+The remaining data of each directory block is grouped by type:
+
+  - An ewah bitmap, the n-th bit marks whether the n-th directory has
+    valid untracked cache entries.
+
+  - An ewah bitmap, the n-th bit records "check-only" bit of
+    read_directory_recursive() for the n-th directory.
+
+  - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
+    is valid for the n-th directory and exists in the next data.
+
+  - An array of stat data. The n-th data corresponds with the n-th
+    "one" bit in the previous ewah bitmap.
+
+  - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
+    in the previous ewah bitmap.
+
+  - One NUL.
+
+== File System Monitor cache
+
+  The file system monitor cache tracks files for which the core.fsmonitor
+  hook has told us about changes.  The signature for this extension is
+  { 'F', 'S', 'M', 'N' }.
+
+  The extension starts with
+
+  - 32-bit version number: the current supported version is 1.
+
+  - 64-bit time: the extension data reflects all changes through the given
+	time which is stored as the nanoseconds elapsed since midnight,
+	January 1, 1970.
+
+  - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+    is not CE_FSMONITOR_VALID.
diff --git a/Documentation/technical/long-running-process-protocol.txt b/Documentation/technical/long-running-process-protocol.txt
new file mode 100644
index 0000000000..aa0aa9af1c
--- /dev/null
+++ b/Documentation/technical/long-running-process-protocol.txt
@@ -0,0 +1,50 @@
+Long-running process protocol
+=============================
+
+This protocol is used when Git needs to communicate with an external
+process throughout the entire life of a single Git command. All
+communication is in pkt-line format (see technical/protocol-common.txt)
+over standard input and standard output.
+
+Handshake
+---------
+
+Git starts by sending a welcome message (for example,
+"git-filter-client"), a list of supported protocol version numbers, and
+a flush packet. Git expects to read the welcome message with "server"
+instead of "client" (for example, "git-filter-server"), exactly one
+protocol version number from the previously sent list, and a flush
+packet. All further communication will be based on the selected version.
+The remaining protocol description below documents "version=2". Please
+note that "version=42" in the example below does not exist and is only
+there to illustrate how the protocol would look like with more than one
+version.
+
+After the version negotiation Git sends a list of all capabilities that
+it supports and a flush packet. Git expects to read a list of desired
+capabilities, which must be a subset of the supported capabilities list,
+and a flush packet as response:
+------------------------
+packet:          git> git-filter-client
+packet:          git> version=2
+packet:          git> version=42
+packet:          git> 0000
+packet:          git< git-filter-server
+packet:          git< version=2
+packet:          git< 0000
+packet:          git> capability=clean
+packet:          git> capability=smudge
+packet:          git> capability=not-yet-invented
+packet:          git> 0000
+packet:          git< capability=clean
+packet:          git< capability=smudge
+packet:          git< 0000
+------------------------
+
+Shutdown
+--------
+
+Git will close
+the command pipe on exit. The filter is expected to detect EOF
+and exit gracefully on its own. Git will wait until the filter
+process has stopped.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 8e5bf60be3..70a99fd142 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -36,6 +36,98 @@ Git pack format
 
   - The trailer records 20-byte SHA-1 checksum of all of the above.
 
+=== Object types
+
+Valid object types are:
+
+- OBJ_COMMIT (1)
+- OBJ_TREE (2)
+- OBJ_BLOB (3)
+- OBJ_TAG (4)
+- OBJ_OFS_DELTA (6)
+- OBJ_REF_DELTA (7)
+
+Type 5 is reserved for future expansion. Type 0 is invalid.
+
+=== Deltified representation
+
+Conceptually there are only four object types: commit, tree, tag and
+blob. However to save space, an object could be stored as a "delta" of
+another "base" object. These representations are assigned new types
+ofs-delta and ref-delta, which is only valid in a pack file.
+
+Both ofs-delta and ref-delta store the "delta" to be applied to
+another object (called 'base object') to reconstruct the object. The
+difference between them is, ref-delta directly encodes 20-byte base
+object name. If the base object is in the same pack, ofs-delta encodes
+the offset of the base object in the pack instead.
+
+The base object could also be deltified if it's in the same pack.
+Ref-delta can also refer to an object outside the pack (i.e. the
+so-called "thin pack"). When stored on disk however, the pack should
+be self contained to avoid cyclic dependency.
+
+The delta data is a sequence of instructions to reconstruct an object
+from the base object. If the base object is deltified, it must be
+converted to canonical form first. Each instruction appends more and
+more data to the target object until it's complete. There are two
+supported instructions so far: one for copy a byte range from the
+source object and one for inserting new data embedded in the
+instruction itself.
+
+Each instruction has variable length. Instruction type is determined
+by the seventh bit of the first octet. The following diagrams follow
+the convention in RFC 1951 (Deflate compressed data format).
+
+==== Instruction to copy from base object
+
+  +----------+---------+---------+---------+---------+-------+-------+-------+
+  | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
+  +----------+---------+---------+---------+---------+-------+-------+-------+
+
+This is the instruction format to copy a byte range from the source
+object. It encodes the offset to copy from and the number of bytes to
+copy. Offset and size are in little-endian order.
+
+All offset and size bytes are optional. This is to reduce the
+instruction size when encoding small offsets or sizes. The first seven
+bits in the first octet determines which of the next seven octets is
+present. If bit zero is set, offset1 is present. If bit one is set
+offset2 is present and so on.
+
+Note that a more compact instruction does not change offset and size
+encoding. For example, if only offset2 is omitted like below, offset3
+still contains bits 16-23. It does not become offset2 and contains
+bits 8-15 even if it's right next to offset1.
+
+  +----------+---------+---------+
+  | 10000101 | offset1 | offset3 |
+  +----------+---------+---------+
+
+In its most compact form, this instruction only takes up one byte
+(0x80) with both offset and size omitted, which will have default
+values zero. There is another exception: size zero is automatically
+converted to 0x10000.
+
+==== Instruction to add new data
+
+  +----------+============+
+  | 0xxxxxxx |    data    |
+  +----------+============+
+
+This is the instruction to construct target object without the base
+object. The following data is appended to the target object. The first
+seven bits of the first octet determines the size of data in
+bytes. The size must be non-zero.
+
+==== Reserved instruction
+
+  +----------+============
+  | 00000000 |
+  +----------+============
+
+This is the instruction reserved for future expansion.
+
 == Original (version 1) pack-*.idx files have the following format:
 
   - The header consists of 256 4-byte network byte order
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 462e20645f..6ac774d5f6 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -1,11 +1,11 @@
 Packfile transfer protocols
 ===========================
 
-Git supports transferring data in packfiles over the ssh://, git:// and
+Git supports transferring data in packfiles over the ssh://, git://, http:// and
 file:// transports.  There exist two sets of protocols, one for pushing
 data from a client to a server and another for fetching data from a
-server to a client.  All three transports (ssh, git, file) use the same
-protocol to transfer data.
+server to a client.  The three transports (ssh, git, file) use the same
+protocol to transfer data. http is documented in http-protocol.txt.
 
 The processes invoked in the canonical Git implementation are 'upload-pack'
 on the server side and 'fetch-pack' on the client side for fetching data;
@@ -14,6 +14,14 @@ data.  The protocol functions to have a server tell a client what is
 currently on the server, then for the two to negotiate the smallest amount
 of data to send in order to fully update one or the other.
 
+pkt-line Format
+---------------
+
+The descriptions below build on the pkt-line format described in
+protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless
+otherwise noted the usual pkt-line LF rules apply: the sender SHOULD
+include a LF, but the receiver MUST NOT complain if it is not present.
+
 Transports
 ----------
 There are three transports over which the packfile protocol is
@@ -31,6 +39,20 @@ communicates with that invoked process over the SSH connection.
 The file:// transport runs the 'upload-pack' or 'receive-pack'
 process locally and communicates with it over a pipe.
 
+Extra Parameters
+----------------
+
+The protocol provides a mechanism in which clients can send additional
+information in its first message to the server. These are called "Extra
+Parameters", and are supported by the Git, SSH, and HTTP protocols.
+
+Each Extra Parameter takes the form of `<key>=<value>` or `<key>`.
+
+Servers that receive any such Extra Parameters MUST ignore all
+unrecognized keys. Currently, the only Extra Parameter recognized is
+"version" with a value of '1' or '2'.  See protocol-v2.txt for more
+information on protocol version 2.
+
 Git Transport
 -------------
 
@@ -38,18 +60,25 @@ The Git transport starts off by sending the command and repository
 on the wire using the pkt-line format, followed by a NUL byte and a
 hostname parameter, terminated by a NUL byte.
 
-   0032git-upload-pack /project.git\0host=myserver.com\0
+   0033git-upload-pack /project.git\0host=myserver.com\0
+
+The transport may send Extra Parameters by adding an additional NUL
+byte, and then adding one or more NUL-terminated strings:
+
+   003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0
 
 --
-   git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
+   git-proto-request = request-command SP pathname NUL
+		       [ host-parameter NUL ] [ NUL extra-parameters ]
    request-command   = "git-upload-pack" / "git-receive-pack" /
 		       "git-upload-archive"   ; case sensitive
    pathname          = *( %x01-ff ) ; exclude NUL
    host-parameter    = "host=" hostname [ ":" port ]
+   extra-parameters  = 1*extra-parameter
+   extra-parameter   = 1*( %x01-ff ) NUL
 --
 
-Only host-parameter is allowed in the git-proto-request. Clients
-MUST NOT attempt to send additional parameters. It is used for the
+host-parameter is used for the
 git-daemon name based virtual hosting.  See --interpolated-path
 option to git daemon, with the %H/%CH format characters.
 
@@ -109,6 +138,12 @@ we execute it without the leading '/'.
 		     v
    ssh user@example.com "git-upload-pack '~alice/project.git'"
 
+Depending on the value of the `protocol.version` configuration variable,
+Git may attempt to send Extra Parameters as a colon-separated string in
+the GIT_PROTOCOL environment variable. This is done only if
+the `ssh.variant` configuration variable indicates that the ssh command
+supports passing environment variables as an argument.
+
 A few things to remember here:
 
 - The "command name" is spelled with dash (e.g. git-upload-pack), but
@@ -129,11 +164,13 @@ Reference Discovery
 -------------------
 
 When the client initially connects the server will immediately respond
-with a listing of each reference it has (all branches and tags) along
+with a version number (if "version=1" is sent as an Extra Parameter),
+and a listing of each reference it has (all branches and tags) along
 with the object name that each reference currently points to.
 
-   $ echo -e -n "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
+   $ echo -e -n "0044git-upload-pack /schacon/gitbook.git\0host=example.com\0\0version=1\0" |
       nc -v example.com 9418
+   000aversion 1
    00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack
 		side-band side-band-64k ofs-delta shallow no-progress include-tag
    00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration
@@ -143,9 +180,6 @@ with the object name that each reference currently points to.
    003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{}
    0000
 
-Server SHOULD terminate each non-flush line using LF ("\n") terminator;
-client MUST NOT complain if there is no terminator.
-
 The returned response is a pkt-line stream describing each ref and
 its current value.  The stream MUST be sorted by name according to
 the C locale ordering.
@@ -160,20 +194,21 @@ immediately after the ref itself, if presented. A conforming server
 MUST peel the ref if it's an annotated tag.
 
 ----
-  advertised-refs  =  (no-refs / list-of-refs)
+  advertised-refs  =  *1("version 1")
+		      (no-refs / list-of-refs)
 		      *shallow
 		      flush-pkt
 
   no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
-		      NUL capability-list LF)
+		      NUL capability-list)
 
   list-of-refs     =  first-ref *other-ref
   first-ref        =  PKT-LINE(obj-id SP refname
-		      NUL capability-list LF)
+		      NUL capability-list)
 
   other-ref        =  PKT-LINE(other-tip / other-peeled)
-  other-tip        =  obj-id SP refname LF
-  other-peeled     =  obj-id SP refname "^{}" LF
+  other-tip        =  obj-id SP refname
+  other-peeled     =  obj-id SP refname "^{}"
 
   shallow          =  PKT-LINE("shallow" SP obj-id)
 
@@ -194,7 +229,7 @@ After reference and capabilities discovery, the client can decide to
 terminate the connection by sending a flush-pkt, telling the server it can
 now gracefully terminate, and disconnect, when it does not need any pack
 data. This can happen with the ls-remote command, and also can happen when
-the client already is up-to-date.
+the client already is up to date.
 
 Otherwise, it enters the negotiation phase, where the client and
 server determine what the minimal packfile necessary for transport is,
@@ -207,6 +242,7 @@ out of what the server said it could do with the first 'want' line.
   upload-request    =  want-list
 		       *shallow-line
 		       *1depth-request
+		       [filter-request]
 		       flush-pkt
 
   want-list         =  first-want
@@ -214,12 +250,16 @@ out of what the server said it could do with the first 'want' line.
 
   shallow-line      =  PKT-LINE("shallow" SP obj-id)
 
-  depth-request     =  PKT-LINE("deepen" SP depth)
+  depth-request     =  PKT-LINE("deepen" SP depth) /
+		       PKT-LINE("deepen-since" SP timestamp) /
+		       PKT-LINE("deepen-not" SP ref)
 
-  first-want        =  PKT-LINE("want" SP obj-id SP capability-list LF)
-  additional-want   =  PKT-LINE("want" SP obj-id LF)
+  first-want        =  PKT-LINE("want" SP obj-id SP capability-list)
+  additional-want   =  PKT-LINE("want" SP obj-id)
 
   depth             =  1*DIGIT
+
+  filter-request    =  PKT-LINE("filter" SP filter-spec)
 ----
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -242,6 +282,13 @@ complete those commits. Commits whose parents are not received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial clone and partial fetch
+operations. An object that does not meet a filter-spec value is
+omitted unless explicitly requested in a 'want' line. See `rev-list`
+for possible filter-spec values.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
@@ -284,7 +331,7 @@ so that there is always a block of 32 "in-flight on the wire" at a time.
 		       compute-end
 
   have-list         =  *have-line
-  have-line         =  PKT-LINE("have" SP obj-id LF)
+  have-line         =  PKT-LINE("have" SP obj-id)
   compute-end       =  flush-pkt / PKT-LINE("done")
 ----
 
@@ -302,7 +349,7 @@ In multi_ack mode:
     ready to make a packfile, it will blindly ACK all 'have' obj-ids
     back to the client.
 
-  * the server will then send a 'NACK' and then wait for another response
+  * the server will then send a 'NAK' and then wait for another response
     from the client - either a 'done' or another list of 'have' lines.
 
 In multi_ack_detailed mode:
@@ -344,14 +391,19 @@ ACK after 'done' if there is at least one common base and multi_ack or
 multi_ack_detailed is enabled. The server always sends NAK after 'done'
 if there is no common base found.
 
+Instead of 'ACK' or 'NAK', the server may send an error message (for
+example, if it does not recognize an object in a 'want' line received
+from the client).
+
 Then the server will start sending its packfile data.
 
 ----
-  server-response = *ack_multi ack / nak
-  ack_multi       = PKT-LINE("ACK" SP obj-id ack_status LF)
+  server-response = *ack_multi ack / nak / error-line
+  ack_multi       = PKT-LINE("ACK" SP obj-id ack_status)
   ack_status      = "continue" / "common" / "ready"
-  ack             = PKT-LINE("ACK SP obj-id LF)
-  nak             = PKT-LINE("NAK" LF)
+  ack             = PKT-LINE("ACK" SP obj-id)
+  nak             = PKT-LINE("NAK")
+  error-line     =  PKT-LINE("ERR" SP explanation-text)
 ----
 
 A simple clone may look like this (with no 'have' lines):
@@ -449,7 +501,8 @@ The reference discovery phase is done nearly the same way as it is in the
 fetching protocol. Each reference obj-id and name on the server is sent
 in packet-line format to the client, followed by a flush-pkt.  The only
 real difference is that the capability listing is different - the only
-possible values are 'report-status', 'delete-refs' and 'ofs-delta'.
+possible values are 'report-status', 'delete-refs', 'ofs-delta' and
+'push-options'.
 
 Reference Update Request and Packfile Transfer
 ----------------------------------------------
@@ -460,17 +513,15 @@ that it wants to update, it sends a line listing the obj-id currently on
 the server, the obj-id the client would like to update it to and the name
 of the reference.
 
-This list is followed by a flush-pkt and then the packfile that should
-contain all the objects that the server will need to complete the new
-references.
+This list is followed by a flush-pkt.
 
 ----
-  update-request    =  *shallow ( command-list | push-cert ) [pack-file]
+  update-requests   =  *shallow ( command-list | push-cert )
 
-  shallow           =  PKT-LINE("shallow" SP obj-id LF)
+  shallow           =  PKT-LINE("shallow" SP obj-id)
 
-  command-list      =  PKT-LINE(command NUL capability-list LF)
-		       *PKT-LINE(command LF)
+  command-list      =  PKT-LINE(command NUL capability-list)
+		       *PKT-LINE(command)
 		       flush-pkt
 
   command           =  create / delete / update
@@ -486,12 +537,35 @@ references.
 		      PKT-LINE("pusher" SP ident LF)
 		      PKT-LINE("pushee" SP url LF)
 		      PKT-LINE("nonce" SP nonce LF)
+		      *PKT-LINE("push-option" SP push-option LF)
 		      PKT-LINE(LF)
 		      *PKT-LINE(command LF)
 		      *PKT-LINE(gpg-signature-lines LF)
 		      PKT-LINE("push-cert-end" LF)
 
-  pack-file         = "PACK" 28*(OCTET)
+  push-option       =  1*( VCHAR | SP )
+----
+
+If the server has advertised the 'push-options' capability and the client has
+specified 'push-options' as part of the capability list above, the client then
+sends its push options followed by a flush-pkt.
+
+----
+  push-options      =  *PKT-LINE(push-option) flush-pkt
+----
+
+For backwards compatibility with older Git servers, if the client sends a push
+cert and push options, it MUST send its push options both embedded within the
+push cert and after the push cert. (Note that the push options within the cert
+are prefixed, but the push options after the cert are not.) Both these lists
+MUST be the same, modulo the prefix.
+
+After that the packfile that
+should contain all the objects that the server will need to complete the new
+references will be sent.
+
+----
+  packfile          =  "PACK" 28*(OCTET)
 ----
 
 If the receiving end does not support delete-refs, the sending end MUST
@@ -502,11 +576,11 @@ MUST NOT send a push-cert command.  When a push-cert command is
 sent, command-list MUST NOT be sent; the commands recorded in the
 push certificate is used instead.
 
-The pack-file MUST NOT be sent if the only command used is 'delete'.
+The packfile MUST NOT be sent if the only command used is 'delete'.
 
-A pack-file MUST be sent if either create or update command is used,
+A packfile MUST be sent if either create or update command is used,
 even if the server already has all the necessary objects.  In this
-case the client MUST send an empty pack-file.   The only time this
+case the client MUST send an empty packfile.   The only time this
 is likely to happen is if the client is creating
 a new branch or a tag that points to an existing obj-id.
 
@@ -521,7 +595,8 @@ Push Certificate
 
 A push certificate begins with a set of header lines.  After the
 header and an empty line, the protocol commands follow, one per
-line.
+line. Note that the trailing LF in push-cert PKT-LINEs is _not_
+optional; it must be present.
 
 Currently, the following header fields are defined:
 
@@ -560,12 +635,12 @@ update was successful, or 'ng [refname] [error]' if the update was not.
 		      1*(command-status)
 		      flush-pkt
 
-  unpack-status     = PKT-LINE("unpack" SP unpack-result LF)
+  unpack-status     = PKT-LINE("unpack" SP unpack-result)
   unpack-result     = "ok" / error-msg
 
   command-status    = command-ok / command-fail
-  command-ok        = PKT-LINE("ok" SP refname LF)
-  command-fail      = PKT-LINE("ng" SP refname SP error-msg LF)
+  command-ok        = PKT-LINE("ok" SP refname)
+  command-fail      = PKT-LINE("ng" SP refname SP error-msg)
 
   error-msg         = 1*(OCTECT) ; where not "ok"
 ----
diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
new file mode 100644
index 0000000000..1ef66bd788
--- /dev/null
+++ b/Documentation/technical/partial-clone.txt
@@ -0,0 +1,324 @@
+Partial Clone Design Notes
+==========================
+
+The "Partial Clone" feature is a performance optimization for Git that
+allows Git to function without having a complete copy of the repository.
+The goal of this work is to allow Git better handle extremely large
+repositories.
+
+During clone and fetch operations, Git downloads the complete contents
+and history of the repository.  This includes all commits, trees, and
+blobs for the complete life of the repository.  For extremely large
+repositories, clones can take hours (or days) and consume 100+GiB of disk
+space.
+
+Often in these repositories there are many blobs and trees that the user
+does not need such as:
+
+  1. files outside of the user's work area in the tree.  For example, in
+     a repository with 500K directories and 3.5M files in every commit,
+     we can avoid downloading many objects if the user only needs a
+     narrow "cone" of the source tree.
+
+  2. large binary assets.  For example, in a repository where large build
+     artifacts are checked into the tree, we can avoid downloading all
+     previous versions of these non-mergeable binary assets and only
+     download versions that are actually referenced.
+
+Partial clone allows us to avoid downloading such unneeded objects *in
+advance* during clone and fetch operations and thereby reduce download
+times and disk usage.  Missing objects can later be "demand fetched"
+if/when needed.
+
+Use of partial clone requires that the user be online and the origin
+remote be available for on-demand fetching of missing objects.  This may
+or may not be problematic for the user.  For example, if the user can
+stay within the pre-selected subset of the source tree, they may not
+encounter any missing objects.  Alternatively, the user could try to
+pre-fetch various objects if they know that they are going offline.
+
+
+Non-Goals
+---------
+
+Partial clone is a mechanism to limit the number of blobs and trees downloaded
+*within* a given range of commits -- and is therefore independent of and not
+intended to conflict with existing DAG-level mechanisms to limit the set of
+requested commits (i.e. shallow clone, single branch, or fetch '<refspec>').
+
+
+Design Overview
+---------------
+
+Partial clone logically consists of the following parts:
+
+- A mechanism for the client to describe unneeded or unwanted objects to
+  the server.
+
+- A mechanism for the server to omit such unwanted objects from packfiles
+  sent to the client.
+
+- A mechanism for the client to gracefully handle missing objects (that
+  were previously omitted by the server).
+
+- A mechanism for the client to backfill missing objects as needed.
+
+
+Design Details
+--------------
+
+- A new pack-protocol capability "filter" is added to the fetch-pack and
+  upload-pack negotiation.
++
+This uses the existing capability discovery mechanism.
+See "filter" in Documentation/technical/pack-protocol.txt.
+
+- Clients pass a "filter-spec" to clone and fetch which is passed to the
+  server to request filtering during packfile construction.
++
+There are various filters available to accommodate different situations.
+See "--filter=<filter-spec>" in Documentation/rev-list-options.txt.
+
+- On the server pack-objects applies the requested filter-spec as it
+  creates "filtered" packfiles for the client.
++
+These filtered packfiles are *incomplete* in the traditional sense because
+they may contain objects that reference objects not contained in the
+packfile and that the client doesn't already have.  For example, the
+filtered packfile may contain trees or tags that reference missing blobs
+or commits that reference missing trees.
+
+- On the client these incomplete packfiles are marked as "promisor packfiles"
+  and treated differently by various commands.
+
+- On the client a repository extension is added to the local config to
+  prevent older versions of git from failing mid-operation because of
+  missing objects that they cannot handle.
+  See "extensions.partialClone" in Documentation/technical/repository-version.txt"
+
+
+Handling Missing Objects
+------------------------
+
+- An object may be missing due to a partial clone or fetch, or missing due
+  to repository corruption.  To differentiate these cases, the local
+  repository specially indicates such filtered packfiles obtained from the
+  promisor remote as "promisor packfiles".
++
+These promisor packfiles consist of a "<name>.promisor" file with
+arbitrary contents (like the "<name>.keep" files), in addition to
+their "<name>.pack" and "<name>.idx" files.
+
+- The local repository considers a "promisor object" to be an object that
+  it knows (to the best of its ability) that the promisor remote has promised
+  that it has, either because the local repository has that object in one of
+  its promisor packfiles, or because another promisor object refers to it.
++
+When Git encounters a missing object, Git can see if it a promisor object
+and handle it appropriately.  If not, Git can report a corruption.
++
+This means that there is no need for the client to explicitly maintain an
+expensive-to-modify list of missing objects.[a]
+
+- Since almost all Git code currently expects any referenced object to be
+  present locally and because we do not want to force every command to do
+  a dry-run first, a fallback mechanism is added to allow Git to attempt
+  to dynamically fetch missing objects from the promisor remote.
++
+When the normal object lookup fails to find an object, Git invokes
+fetch-object to try to get the object from the server and then retry
+the object lookup.  This allows objects to be "faulted in" without
+complicated prediction algorithms.
++
+For efficiency reasons, no check as to whether the missing object is
+actually a promisor object is performed.
++
+Dynamic object fetching tends to be slow as objects are fetched one at
+a time.
+
+- `checkout` (and any other command using `unpack-trees`) has been taught
+  to bulk pre-fetch all required missing blobs in a single batch.
+
+- `rev-list` has been taught to print missing objects.
++
+This can be used by other commands to bulk prefetch objects.
+For example, a "git log -p A..B" may internally want to first do
+something like "git rev-list --objects --quiet --missing=print A..B"
+and prefetch those objects in bulk.
+
+- `fsck` has been updated to be fully aware of promisor objects.
+
+- `repack` in GC has been updated to not touch promisor packfiles at all,
+  and to only repack other objects.
+
+- The global variable "fetch_if_missing" is used to control whether an
+  object lookup will attempt to dynamically fetch a missing object or
+  report an error.
++
+We are not happy with this global variable and would like to remove it,
+but that requires significant refactoring of the object code to pass an
+additional flag.  We hope that concurrent efforts to add an ODB API can
+encompass this.
+
+
+Fetching Missing Objects
+------------------------
+
+- Fetching of objects is done using the existing transport mechanism using
+  transport_fetch_refs(), setting a new transport option
+  TRANS_OPT_NO_DEPENDENTS to indicate that only the objects themselves are
+  desired, not any object that they refer to.
++
+Because some transports invoke fetch_pack() in the same process, fetch_pack()
+has been updated to not use any object flags when the corresponding argument
+(no_dependents) is set.
+
+- The local repository sends a request with the hashes of all requested
+  objects as "want" lines, and does not perform any packfile negotiation.
+  It then receives a packfile.
+
+- Because we are reusing the existing fetch-pack mechanism, fetching
+  currently fetches all objects referred to by the requested objects, even
+  though they are not necessary.
+
+
+Current Limitations
+-------------------
+
+- The remote used for a partial clone (or the first partial fetch
+  following a regular clone) is marked as the "promisor remote".
++
+We are currently limited to a single promisor remote and only that
+remote may be used for subsequent partial fetches.
++
+We accept this limitation because we believe initial users of this
+feature will be using it on repositories with a strong single central
+server.
+
+- Dynamic object fetching will only ask the promisor remote for missing
+  objects.  We assume that the promisor remote has a complete view of the
+  repository and can satisfy all such requests.
+
+- Repack essentially treats promisor and non-promisor packfiles as 2
+  distinct partitions and does not mix them.  Repack currently only works
+  on non-promisor packfiles and loose objects.
+
+- Dynamic object fetching invokes fetch-pack once *for each item*
+  because most algorithms stumble upon a missing object and need to have
+  it resolved before continuing their work.  This may incur significant
+  overhead -- and multiple authentication requests -- if many objects are
+  needed.
+
+- Dynamic object fetching currently uses the existing pack protocol V0
+  which means that each object is requested via fetch-pack.  The server
+  will send a full set of info/refs when the connection is established.
+  If there are large number of refs, this may incur significant overhead.
+
+
+Future Work
+-----------
+
+- Allow more than one promisor remote and define a strategy for fetching
+  missing objects from specific promisor remotes or of iterating over the
+  set of promisor remotes until a missing object is found.
++
+A user might want to have multiple geographically-close cache servers
+for fetching missing blobs while continuing to do filtered `git-fetch`
+commands from the central server, for example.
++
+Or the user might want to work in a triangular work flow with multiple
+promisor remotes that each have an incomplete view of the repository.
+
+- Allow repack to work on promisor packfiles (while keeping them distinct
+  from non-promisor packfiles).
+
+- Allow non-pathname-based filters to make use of packfile bitmaps (when
+  present).  This was just an omission during the initial implementation.
+
+- Investigate use of a long-running process to dynamically fetch a series
+  of objects, such as proposed in [5,6] to reduce process startup and
+  overhead costs.
++
+It would be nice if pack protocol V2 could allow that long-running
+process to make a series of requests over a single long-running
+connection.
+
+- Investigate pack protocol V2 to avoid the info/refs broadcast on
+  each connection with the server to dynamically fetch missing objects.
+
+- Investigate the need to handle loose promisor objects.
++
+Objects in promisor packfiles are allowed to reference missing objects
+that can be dynamically fetched from the server.  An assumption was
+made that loose objects are only created locally and therefore should
+not reference a missing object.  We may need to revisit that assumption
+if, for example, we dynamically fetch a missing tree and store it as a
+loose object rather than a single object packfile.
++
+This does not necessarily mean we need to mark loose objects as promisor;
+it may be sufficient to relax the object lookup or is-promisor functions.
+
+
+Non-Tasks
+---------
+
+- Every time the subject of "demand loading blobs" comes up it seems
+  that someone suggests that the server be allowed to "guess" and send
+  additional objects that may be related to the requested objects.
++
+No work has gone into actually doing that; we're just documenting that
+it is a common suggestion.  We're not sure how it would work and have
+no plans to work on it.
++
+It is valid for the server to send more objects than requested (even
+for a dynamic object fetch), but we are not building on that.
+
+
+Footnotes
+---------
+
+[a] expensive-to-modify list of missing objects:  Earlier in the design of
+    partial clone we discussed the need for a single list of missing objects.
+    This would essentially be a sorted linear list of OIDs that the were
+    omitted by the server during a clone or subsequent fetches.
+
+This file would need to be loaded into memory on every object lookup.
+It would need to be read, updated, and re-written (like the .git/index)
+on every explicit "git fetch" command *and* on any dynamic object fetch.
+
+The cost to read, update, and write this file could add significant
+overhead to every command if there are many missing objects.  For example,
+if there are 100M missing blobs, this file would be at least 2GiB on disk.
+
+With the "promisor" concept, we *infer* a missing object based upon the
+type of packfile that references it.
+
+
+Related Links
+-------------
+[0] https://crbug.com/git/2
+    Bug#2: Partial Clone
+
+[1] https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/ +
+    Subject: [RFC] Add support for downloading blobs on demand +
+    Date: Fri, 13 Jan 2017 10:52:53 -0500
+
+[2] https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/ +
+    Subject: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches) +
+    Date: Fri, 29 Sep 2017 13:11:36 -0700
+
+[3] https://public-inbox.org/git/20170426221346.25337-1-jonathantanmy@google.com/ +
+    Subject: Proposal for missing blob support in Git repos +
+    Date: Wed, 26 Apr 2017 15:13:46 -0700
+
+[4] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@jeffhostetler.com/ +
+    Subject: [PATCH 00/10] RFC Partial Clone and Fetch +
+    Date: Wed,  8 Mar 2017 18:50:29 +0000
+
+[5] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/ +
+    Subject: [PATCH v7 00/10] refactor the filter process code into a reusable module +
+    Date: Fri,  5 May 2017 11:27:52 -0400
+
+[6] https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/ +
+    Subject: [RFC/PATCH v2 0/1] Add support for downloading blobs on demand +
+    Date: Fri, 14 Jul 2017 09:26:50 -0400
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 4f8a7bfb4c..332d209b58 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -179,6 +179,31 @@ This capability adds "deepen", "shallow" and "unshallow" commands to
 the  fetch-pack/upload-pack protocol so clients can request shallow
 clones.
 
+deepen-since
+------------
+
+This capability adds "deepen-since" command to fetch-pack/upload-pack
+protocol so the client can request shallow clones that are cut at a
+specific time, instead of depth. Internally it's equivalent of doing
+"rev-list --max-age=<timestamp>" on the server side. "deepen-since"
+cannot be used with "deepen".
+
+deepen-not
+----------
+
+This capability adds "deepen-not" command to fetch-pack/upload-pack
+protocol so the client can request shallow clones that are cut at a
+specific revision, instead of depth. Internally it's equivalent of
+doing "rev-list --not <rev>" on the server side. "deepen-not"
+cannot be used with "deepen", but can be used with "deepen-since".
+
+deepen-relative
+---------------
+
+If this capability is requested by the client, the semantics of
+"deepen" command is changed. The "depth" argument is the depth from
+the current shallow boundary, instead of the depth from remote refs.
+
 no-progress
 -----------
 
@@ -253,6 +278,15 @@ atomic pushes. If the pushing client requests this capability, the server
 will update the refs in one atomic transaction. Either all refs are
 updated or none.
 
+push-options
+------------
+
+If the server sends the 'push-options' capability it is able to accept
+push options after the update commands have been sent, but before the
+packfile is streamed. If the pushing client requests this capability,
+the server will pass the options to the pre- and post- receive hooks
+that process this push request.
+
 allow-tip-sha1-in-want
 ----------------------
 
@@ -260,6 +294,13 @@ If the upload-pack server advertises this capability, fetch-pack may
 send "want" lines with SHA-1s that exist at the server but are not
 advertised by upload-pack.
 
+allow-reachable-sha1-in-want
+----------------------------
+
+If the upload-pack server advertises this capability, fetch-pack may
+send "want" lines with SHA-1s that exist at the server but are not
+advertised by upload-pack.
+
 push-cert=<nonce>
 -----------------
 
@@ -268,3 +309,11 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+filter
+------
+
+If the upload-pack server advertises the 'filter' capability,
+fetch-pack may send "filter" commands to request a partial clone
+or partial fetch and request that the server omit various objects
+from the packfile.
diff --git a/Documentation/technical/protocol-common.txt b/Documentation/technical/protocol-common.txt
index 889985f707..ecedb34bba 100644
--- a/Documentation/technical/protocol-common.txt
+++ b/Documentation/technical/protocol-common.txt
@@ -62,11 +62,14 @@ A pkt-line MAY contain binary data, so implementors MUST ensure
 pkt-line parsing/formatting routines are 8-bit clean.
 
 A non-binary line SHOULD BE terminated by an LF, which if present
-MUST be included in the total length.
-
-The maximum length of a pkt-line's data component is 65520 bytes.
-Implementations MUST NOT send pkt-line whose length exceeds 65524
-(65520 bytes of payload + 4 bytes of length data).
+MUST be included in the total length. Receivers MUST treat pkt-lines
+with non-binary data the same whether or not they contain the trailing
+LF (stripping the LF if present, and not complaining when it is
+missing).
+
+The maximum length of a pkt-line's data component is 65516 bytes.
+Implementations MUST NOT send pkt-line whose length exceeds 65520
+(65516 bytes of payload + 4 bytes of length data).
 
 Implementations SHOULD NOT send an empty pkt-line ("0004").
 
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
new file mode 100644
index 0000000000..09e4e0273f
--- /dev/null
+++ b/Documentation/technical/protocol-v2.txt
@@ -0,0 +1,439 @@
+ Git Wire Protocol, Version 2
+==============================
+
+This document presents a specification for a version 2 of Git's wire
+protocol.  Protocol v2 will improve upon v1 in the following ways:
+
+  * Instead of multiple service names, multiple commands will be
+    supported by a single service
+  * Easily extendable as capabilities are moved into their own section
+    of the protocol, no longer being hidden behind a NUL byte and
+    limited by the size of a pkt-line
+  * Separate out other information hidden behind NUL bytes (e.g. agent
+    string as a capability and symrefs can be requested using 'ls-refs')
+  * Reference advertisement will be omitted unless explicitly requested
+  * ls-refs command to explicitly request some refs
+  * Designed with http and stateless-rpc in mind.  With clear flush
+    semantics the http remote helper can simply act as a proxy
+
+In protocol v2 communication is command oriented.  When first contacting a
+server a list of capabilities will advertised.  Some of these capabilities
+will be commands which a client can request be executed.  Once a command
+has completed, a client can reuse the connection and request that other
+commands be executed.
+
+ Packet-Line Framing
+---------------------
+
+All communication is done using packet-line framing, just as in v1.  See
+`Documentation/technical/pack-protocol.txt` and
+`Documentation/technical/protocol-common.txt` for more information.
+
+In protocol v2 these special packets will have the following semantics:
+
+  * '0000' Flush Packet (flush-pkt) - indicates the end of a message
+  * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
+
+ Initial Client Request
+------------------------
+
+In general a client can request to speak protocol v2 by sending
+`version=2` through the respective side-channel for the transport being
+used which inevitably sets `GIT_PROTOCOL`.  More information can be
+found in `pack-protocol.txt` and `http-protocol.txt`.  In all cases the
+response from the server is the capability advertisement.
+
+ Git Transport
+~~~~~~~~~~~~~~~
+
+When using the git:// transport, you can request to use protocol v2 by
+sending "version=2" as an extra parameter:
+
+   003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0
+
+ SSH and File Transport
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+When using either the ssh:// or file:// transport, the GIT_PROTOCOL
+environment variable must be set explicitly to include "version=2".
+
+ HTTP Transport
+~~~~~~~~~~~~~~~~
+
+When using the http:// or https:// transport a client makes a "smart"
+info/refs request as described in `http-protocol.txt` and requests that
+v2 be used by supplying "version=2" in the `Git-Protocol` header.
+
+   C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
+   C: Git-Protocol: version=2
+
+A v2 server would reply:
+
+   S: 200 OK
+   S: <Some headers>
+   S: ...
+   S:
+   S: 000eversion 2\n
+   S: <capability-advertisement>
+
+Subsequent requests are then made directly to the service
+`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack).
+
+ Capability Advertisement
+--------------------------
+
+A server which decides to communicate (based on a request from a client)
+using protocol version 2, notifies the client by sending a version string
+in its initial response followed by an advertisement of its capabilities.
+Each capability is a key with an optional value.  Clients must ignore all
+unknown keys.  Semantics of unknown values are left to the definition of
+each key.  Some capabilities will describe commands which can be requested
+to be executed by the client.
+
+    capability-advertisement = protocol-version
+			       capability-list
+			       flush-pkt
+
+    protocol-version = PKT-LINE("version 2" LF)
+    capability-list = *capability
+    capability = PKT-LINE(key[=value] LF)
+
+    key = 1*(ALPHA | DIGIT | "-_")
+    value = 1*(ALPHA | DIGIT | " -_.,?\/{}[]()<>!@#$%^&*+=:;")
+
+ Command Request
+-----------------
+
+After receiving the capability advertisement, a client can then issue a
+request to select the command it wants with any particular capabilities
+or arguments.  There is then an optional section where the client can
+provide any command specific parameters or queries.  Only a single
+command can be requested at a time.
+
+    request = empty-request | command-request
+    empty-request = flush-pkt
+    command-request = command
+		      capability-list
+		      [command-args]
+		      flush-pkt
+    command = PKT-LINE("command=" key LF)
+    command-args = delim-pkt
+		   *command-specific-arg
+
+    command-specific-args are packet line framed arguments defined by
+    each individual command.
+
+The server will then check to ensure that the client's request is
+comprised of a valid command as well as valid capabilities which were
+advertised.  If the request is valid the server will then execute the
+command.  A server MUST wait till it has received the client's entire
+request before issuing a response.  The format of the response is
+determined by the command being executed, but in all cases a flush-pkt
+indicates the end of the response.
+
+When a command has finished, and the client has received the entire
+response from the server, a client can either request that another
+command be executed or can terminate the connection.  A client may
+optionally send an empty request consisting of just a flush-pkt to
+indicate that no more requests will be made.
+
+ Capabilities
+--------------
+
+There are two different types of capabilities: normal capabilities,
+which can be used to to convey information or alter the behavior of a
+request, and commands, which are the core actions that a client wants to
+perform (fetch, push, etc).
+
+Protocol version 2 is stateless by default.  This means that all commands
+must only last a single round and be stateless from the perspective of the
+server side, unless the client has requested a capability indicating that
+state should be maintained by the server.  Clients MUST NOT require state
+management on the server side in order to function correctly.  This
+permits simple round-robin load-balancing on the server side, without
+needing to worry about state management.
+
+ agent
+~~~~~~~
+
+The server can advertise the `agent` capability with a value `X` (in the
+form `agent=X`) to notify the client that the server is running version
+`X`.  The client may optionally send its own agent string by including
+the `agent` capability with a value `Y` (in the form `agent=Y`) in its
+request to the server (but it MUST NOT do so if the server did not
+advertise the agent capability). The `X` and `Y` strings may contain any
+printable ASCII characters except space (i.e., the byte range 32 < x <
+127), and are typically of the form "package/version" (e.g.,
+"git/1.8.3.1"). The agent strings are purely informative for statistics
+and debugging purposes, and MUST NOT be used to programmatically assume
+the presence or absence of particular features.
+
+ ls-refs
+~~~~~~~~~
+
+`ls-refs` is the command used to request a reference advertisement in v2.
+Unlike the current reference advertisement, ls-refs takes in arguments
+which can be used to limit the refs sent from the server.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "<command>=<feature 1> <feature 2>"
+
+ls-refs takes in the following arguments:
+
+    symrefs
+	In addition to the object pointed by it, show the underlying ref
+	pointed by it when showing a symbolic ref.
+    peel
+	Show peeled tags.
+    ref-prefix <prefix>
+	When specified, only references having a prefix matching one of
+	the provided prefixes are displayed.
+
+The output of ls-refs is as follows:
+
+    output = *ref
+	     flush-pkt
+    ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
+    ref-attribute = (symref | peeled)
+    symref = "symref-target:" symref-target
+    peeled = "peeled:" obj-id
+
+ fetch
+~~~~~~~
+
+`fetch` is the command used to fetch a packfile in v2.  It can be looked
+at as a modified version of the v1 fetch where the ref-advertisement is
+stripped out (since the `ls-refs` command fills that role) and the
+message format is tweaked to eliminate redundancies and permit easy
+addition of future extensions.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "<command>=<feature 1> <feature 2>"
+
+A `fetch` request can take the following arguments:
+
+    want <oid>
+	Indicates to the server an object which the client wants to
+	retrieve.  Wants can be anything and are not limited to
+	advertised objects.
+
+    have <oid>
+	Indicates to the server an object which the client has locally.
+	This allows the server to make a packfile which only contains
+	the objects that the client needs. Multiple 'have' lines can be
+	supplied.
+
+    done
+	Indicates to the server that negotiation should terminate (or
+	not even begin if performing a clone) and that the server should
+	use the information supplied in the request to construct the
+	packfile.
+
+    thin-pack
+	Request that a thin pack be sent, which is a pack with deltas
+	which reference base objects not contained within the pack (but
+	are known to exist at the receiving end). This can reduce the
+	network traffic significantly, but it requires the receiving end
+	to know how to "thicken" these packs by adding the missing bases
+	to the pack.
+
+    no-progress
+	Request that progress information that would normally be sent on
+	side-band channel 2, during the packfile transfer, should not be
+	sent.  However, the side-band channel 3 is still used for error
+	responses.
+
+    include-tag
+	Request that annotated tags should be sent if the objects they
+	point to are being sent.
+
+    ofs-delta
+	Indicate that the client understands PACKv2 with delta referring
+	to its base by position in pack rather than by an oid.  That is,
+	they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
+
+If the 'shallow' feature is advertised the following arguments can be
+included in the clients request as well as the potential addition of the
+'shallow-info' section in the server's response as explained below.
+
+    shallow <oid>
+	A client must notify the server of all commits for which it only
+	has shallow copies (meaning that it doesn't have the parents of
+	a commit) by supplying a 'shallow <oid>' line for each such
+	object so that the server is aware of the limitations of the
+	client's history.  This is so that the server is aware that the
+	client may not have all objects reachable from such commits.
+
+    deepen <depth>
+	Requests that the fetch/clone should be shallow having a commit
+	depth of <depth> relative to the remote side.
+
+    deepen-relative
+	Requests that the semantics of the "deepen" command be changed
+	to indicate that the depth requested is relative to the client's
+	current shallow boundary, instead of relative to the requested
+	commits.
+
+    deepen-since <timestamp>
+	Requests that the shallow clone/fetch should be cut at a
+	specific time, instead of depth.  Internally it's equivalent to
+	doing "git rev-list --max-age=<timestamp>". Cannot be used with
+	"deepen".
+
+    deepen-not <rev>
+	Requests that the shallow clone/fetch should be cut at a
+	specific revision specified by '<rev>', instead of a depth.
+	Internally it's equivalent of doing "git rev-list --not <rev>".
+	Cannot be used with "deepen", but can be used with
+	"deepen-since".
+
+If the 'filter' feature is advertised, the following argument can be
+included in the client's request:
+
+    filter <filter-spec>
+	Request that various objects from the packfile be omitted
+	using one of several filtering techniques. These are intended
+	for use with partial clone and partial fetch operations. See
+	`rev-list` for possible "filter-spec" values.
+
+If the 'ref-in-want' feature is advertised, the following argument can
+be included in the client's request as well as the potential addition of
+the 'wanted-refs' section in the server's response as explained below.
+
+    want-ref <ref>
+	Indicates to the server that the client wants to retrieve a
+	particular ref, where <ref> is the full name of a ref on the
+	server.
+
+The response of `fetch` is broken into a number of sections separated by
+delimiter packets (0001), with each section beginning with its section
+header.
+
+    output = *section
+    section = (acknowledgments | shallow-info | wanted-refs | packfile)
+	      (flush-pkt | delim-pkt)
+
+    acknowledgments = PKT-LINE("acknowledgments" LF)
+		      (nak | *ack)
+		      (ready)
+    ready = PKT-LINE("ready" LF)
+    nak = PKT-LINE("NAK" LF)
+    ack = PKT-LINE("ACK" SP obj-id LF)
+
+    shallow-info = PKT-LINE("shallow-info" LF)
+		   *PKT-LINE((shallow | unshallow) LF)
+    shallow = "shallow" SP obj-id
+    unshallow = "unshallow" SP obj-id
+
+    wanted-refs = PKT-LINE("wanted-refs" LF)
+		  *PKT-LINE(wanted-ref LF)
+    wanted-ref = obj-id SP refname
+
+    packfile = PKT-LINE("packfile" LF)
+	       *PKT-LINE(%x01-03 *%x00-ff)
+
+    acknowledgments section
+	* If the client determines that it is finished with negotiations
+	  by sending a "done" line, the acknowledgments sections MUST be
+	  omitted from the server's response.
+
+	* Always begins with the section header "acknowledgments"
+
+	* The server will respond with "NAK" if none of the object ids sent
+	  as have lines were common.
+
+	* The server will respond with "ACK obj-id" for all of the
+	  object ids sent as have lines which are common.
+
+	* A response cannot have both "ACK" lines as well as a "NAK"
+	  line.
+
+	* The server will respond with a "ready" line indicating that
+	  the server has found an acceptable common base and is ready to
+	  make and send a packfile (which will be found in the packfile
+	  section of the same response)
+
+	* If the server has found a suitable cut point and has decided
+	  to send a "ready" line, then the server can decide to (as an
+	  optimization) omit any "ACK" lines it would have sent during
+	  its response.  This is because the server will have already
+	  determined the objects it plans to send to the client and no
+	  further negotiation is needed.
+
+    shallow-info section
+	* If the client has requested a shallow fetch/clone, a shallow
+	  client requests a fetch or the server is shallow then the
+	  server's response may include a shallow-info section.  The
+	  shallow-info section will be included if (due to one of the
+	  above conditions) the server needs to inform the client of any
+	  shallow boundaries or adjustments to the clients already
+	  existing shallow boundaries.
+
+	* Always begins with the section header "shallow-info"
+
+	* If a positive depth is requested, the server will compute the
+	  set of commits which are no deeper than the desired depth.
+
+	* The server sends a "shallow obj-id" line for each commit whose
+	  parents will not be sent in the following packfile.
+
+	* The server sends an "unshallow obj-id" line for each commit
+	  which the client has indicated is shallow, but is no longer
+	  shallow as a result of the fetch (due to its parents being
+	  sent in the following packfile).
+
+	* The server MUST NOT send any "unshallow" lines for anything
+	  which the client has not indicated was shallow as a part of
+	  its request.
+
+	* This section is only included if a packfile section is also
+	  included in the response.
+
+    wanted-refs section
+	* This section is only included if the client has requested a
+	  ref using a 'want-ref' line and if a packfile section is also
+	  included in the response.
+
+	* Always begins with the section header "wanted-refs".
+
+	* The server will send a ref listing ("<oid> <refname>") for
+	  each reference requested using 'want-ref' lines.
+
+	* The server MUST NOT send any refs which were not requested
+	  using 'want-ref' lines.
+
+    packfile section
+	* This section is only included if the client has sent 'want'
+	  lines in its request and either requested that no more
+	  negotiation be done by sending 'done' or if the server has
+	  decided it has found a sufficient cut point to produce a
+	  packfile.
+
+	* Always begins with the section header "packfile"
+
+	* The transmission of the packfile begins immediately after the
+	  section header
+
+	* The data transfer of the packfile is always multiplexed, using
+	  the same semantics of the 'side-band-64k' capability from
+	  protocol version 1.  This means that each packet, during the
+	  packfile data stream, is made up of a leading 4-byte pkt-line
+	  length (typical of the pkt-line format), followed by a 1-byte
+	  stream code, followed by the actual data.
+
+	  The stream code can be one of:
+		1 - pack data
+		2 - progress messages
+		3 - fatal error message just before stream aborts
+
+ server-option
+~~~~~~~~~~~~~~~
+
+If advertised, indicates that any number of server specific options can be
+included in a request.  This is done by sending each option as a
+"server-option=<option>" capability line in the capability-list section of
+a request.
+
+The provided options must not contain a NUL or LF character.
diff --git a/Documentation/technical/racy-git.txt b/Documentation/technical/racy-git.txt
index 242a044db9..4a8be4d144 100644
--- a/Documentation/technical/racy-git.txt
+++ b/Documentation/technical/racy-git.txt
@@ -41,13 +41,17 @@ With a `USE_STDEV` compile-time option, `st_dev` is also
 compared, but this is not enabled by default because this member
 is not stable on network filesystems.  With `USE_NSEC`
 compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec`
-members are also compared, but this is not enabled by default
+members are also compared. On Linux, this is not enabled by default
 because in-core timestamps can have finer granularity than
 on-disk timestamps, resulting in meaningless changes when an
 inode is evicted from the inode cache.  See commit 8ce13b0
 of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
 ([PATCH] Sync in core time granularity with filesystems,
-2005-01-04).
+2005-01-04). This patch is included in kernel 2.6.11 and newer, but
+only fixes the issue for file systems with exactly 1 ns or 1 s
+resolution. Other file systems are still broken in current Linux
+kernels (e.g. CEPH, CIFS, NTFS, UDF), see
+https://lkml.org/lkml/2015/6/9/714
 
 Racy Git
 --------
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
new file mode 100644
index 0000000000..e03eaccebc
--- /dev/null
+++ b/Documentation/technical/repository-version.txt
@@ -0,0 +1,100 @@
+Git Repository Format Versions
+==============================
+
+Every git repository is marked with a numeric version in the
+`core.repositoryformatversion` key of its `config` file. This version
+specifies the rules for operating on the on-disk repository data. An
+implementation of git which does not understand a particular version
+advertised by an on-disk repository MUST NOT operate on that repository;
+doing so risks not only producing wrong results, but actually losing
+data.
+
+Because of this rule, version bumps should be kept to an absolute
+minimum. Instead, we generally prefer these strategies:
+
+  - bumping format version numbers of individual data files (e.g.,
+    index, packfiles, etc). This restricts the incompatibilities only to
+    those files.
+
+  - introducing new data that gracefully degrades when used by older
+    clients (e.g., pack bitmap files are ignored by older clients, which
+    simply do not take advantage of the optimization they provide).
+
+A whole-repository format version bump should only be part of a change
+that cannot be independently versioned. For instance, if one were to
+change the reachability rules for objects, or the rules for locking
+refs, that would require a bump of the repository format version.
+
+Note that this applies only to accessing the repository's disk contents
+directly. An older client which understands only format `0` may still
+connect via `git://` to a repository using format `1`, as long as the
+server process understands format `1`.
+
+The preferred strategy for rolling out a version bump (whether whole
+repository or for a single file) is to teach git to read the new format,
+and allow writing the new format with a config switch or command line
+option (for experimentation or for those who do not care about backwards
+compatibility with older gits). Then after a long period to allow the
+reading capability to become common, we may switch to writing the new
+format by default.
+
+The currently defined format versions are:
+
+Version `0`
+-----------
+
+This is the format defined by the initial version of git, including but
+not limited to the format of the repository directory, the repository
+configuration file, and the object and ref storage. Specifying the
+complete behavior of git is beyond the scope of this document.
+
+Version `1`
+-----------
+
+This format is identical to version `0`, with the following exceptions:
+
+  1. When reading the `core.repositoryformatversion` variable, a git
+     implementation which supports version 1 MUST also read any
+     configuration keys found in the `extensions` section of the
+     configuration file.
+
+  2. If a version-1 repository specifies any `extensions.*` keys that
+     the running git has not implemented, the operation MUST NOT
+     proceed. Similarly, if the value of any known key is not understood
+     by the implementation, the operation MUST NOT proceed.
+
+Note that if no extensions are specified in the config file, then
+`core.repositoryformatversion` SHOULD be set to `0` (setting it to `1`
+provides no benefit, and makes the repository incompatible with older
+implementations of git).
+
+This document will serve as the master list for extensions. Any
+implementation wishing to define a new extension should make a note of
+it here, in order to claim the name.
+
+The defined extensions are:
+
+`noop`
+~~~~~~
+
+This extension does not change git's behavior at all. It is useful only
+for testing format-1 compatibility.
+
+`preciousObjects`
+~~~~~~~~~~~~~~~~~
+
+When the config key `extensions.preciousObjects` is set to `true`,
+objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
+`git repack -d`).
+
+`partialclone`
+~~~~~~~~~~~~~~
+
+When the config key `extensions.partialclone` is set, it indicates
+that the repo was created with a partial clone (or later performed
+a partial fetch) and that the remote may have omitted sending
+certain unwanted objects.  Such a remote is called a "promisor remote"
+and it promises that all such omitted objects can be fetched from it
+in the future.
+
+The value of this key is the name of the promisor remote.
diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt
index 5183b15422..01dedfe9ff 100644
--- a/Documentation/technical/shallow.txt
+++ b/Documentation/technical/shallow.txt
@@ -8,20 +8,22 @@ repo, and therefore grafts are introduced pretending that
 these commits have no parents.
 *********************************************************
 
-The basic idea is to write the SHA-1s of shallow commits into
-$GIT_DIR/shallow, and handle its contents like the contents
-of $GIT_DIR/info/grafts (with the difference that shallow
-cannot contain parent information).
-
-This information is stored in a new file instead of grafts, or
-even the config, since the user should not touch that file
-at all (even throughout development of the shallow clone, it
-was never manually edited!).
+$GIT_DIR/shallow lists commit object names and tells Git to
+pretend as if they are root commits (e.g. "git log" traversal
+stops after showing them; "git fsck" does not complain saying
+the commits listed on their "parent" lines do not exist).
 
 Each line contains exactly one SHA-1. When read, a commit_graft
 will be constructed, which has nr_parent < 0 to make it easier
 to discern from user provided grafts.
 
+Note that the shallow feature could not be changed easily to
+use replace refs: a commit containing a `mergetag` is not allowed
+to be replaced, not even by a root commit. Such a commit can be
+made shallow, though. Also, having a `shallow` file explicitly
+listing all the commits made shallow makes it a *lot* easier to
+do shallow-specific things such as to deepen the history.
+
 Since fsck-objects relies on the library to read the objects,
 it honours shallow commits automatically.
 
diff --git a/Documentation/technical/signature-format.txt b/Documentation/technical/signature-format.txt
new file mode 100644
index 0000000000..2c9406a56a
--- /dev/null
+++ b/Documentation/technical/signature-format.txt
@@ -0,0 +1,186 @@
+Git signature format
+====================
+
+== Overview
+
+Git uses cryptographic signatures in various places, currently objects (tags,
+commits, mergetags) and transactions (pushes). In every case, the command which
+is about to create an object or transaction determines a payload from that,
+calls gpg to obtain a detached signature for the payload (`gpg -bsa`) and
+embeds the signature into the object or transaction.
+
+Signatures always begin with `-----BEGIN PGP SIGNATURE-----`
+and end with `-----END PGP SIGNATURE-----`, unless gpg is told to
+produce RFC1991 signatures which use `MESSAGE` instead of `SIGNATURE`.
+
+The signed payload and the way the signature is embedded depends
+on the type of the object resp. transaction.
+
+== Tag signatures
+
+- created by: `git tag -s`
+- payload: annotated tag object
+- embedding: append the signature to the unsigned tag object
+- example: tag `signedtag` with subject `signed tag`
+
+----
+object 04b871796dc0420f8e7561a895b52484b701d51a
+type commit
+tag signedtag
+tagger C O Mitter <committer@example.com> 1465981006 +0000
+
+signed tag
+
+signed tag message body
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1
+
+iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
+rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
+8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
+q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
+rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
+lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
+=jpXa
+-----END PGP SIGNATURE-----
+----
+
+- verify with: `git verify-tag [-v]` or `git tag -v`
+
+----
+gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
+gpg: Good signature from "Eris Discordia <discord@example.net>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg:          There is no indication that the signature belongs to the owner.
+Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+object 04b871796dc0420f8e7561a895b52484b701d51a
+type commit
+tag signedtag
+tagger C O Mitter <committer@example.com> 1465981006 +0000
+
+signed tag
+
+signed tag message body
+----
+
+== Commit signatures
+
+- created by: `git commit -S`
+- payload: commit object
+- embedding: header entry `gpgsig`
+  (content is preceded by a space)
+- example: commit with subject `signed commit`
+
+----
+tree eebfed94e75e7760540d1485c740902590a00332
+parent 04b871796dc0420f8e7561a895b52484b701d51a
+author A U Thor <author@example.com> 1465981137 +0000
+committer C O Mitter <committer@example.com> 1465981137 +0000
+gpgsig -----BEGIN PGP SIGNATURE-----
+ Version: GnuPG v1
+
+ iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
+ HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
+ DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
+ zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
+ HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
+ EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
+ =jKHM
+ -----END PGP SIGNATURE-----
+
+signed commit
+
+signed commit message body
+----
+
+- verify with: `git verify-commit [-v]` (or `git show --show-signature`)
+
+----
+gpg: Signature made Wed Jun 15 10:58:57 2016 CEST using RSA key ID B7227189
+gpg: Good signature from "Eris Discordia <discord@example.net>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg:          There is no indication that the signature belongs to the owner.
+Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+tree eebfed94e75e7760540d1485c740902590a00332
+parent 04b871796dc0420f8e7561a895b52484b701d51a
+author A U Thor <author@example.com> 1465981137 +0000
+committer C O Mitter <committer@example.com> 1465981137 +0000
+
+signed commit
+
+signed commit message body
+----
+
+== Mergetag signatures
+
+- created by: `git merge` on signed tag
+- payload/embedding: the whole signed tag object is embedded into
+  the (merge) commit object as header entry `mergetag`
+- example: merge of the signed tag `signedtag` as above
+
+----
+tree c7b1cff039a93f3600a1d18b82d26688668c7dea
+parent c33429be94b5f2d3ee9b0adad223f877f174b05d
+parent 04b871796dc0420f8e7561a895b52484b701d51a
+author A U Thor <author@example.com> 1465982009 +0000
+committer C O Mitter <committer@example.com> 1465982009 +0000
+mergetag object 04b871796dc0420f8e7561a895b52484b701d51a
+ type commit
+ tag signedtag
+ tagger C O Mitter <committer@example.com> 1465981006 +0000
+
+ signed tag
+
+ signed tag message body
+ -----BEGIN PGP SIGNATURE-----
+ Version: GnuPG v1
+
+ iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
+ rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
+ 8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
+ q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
+ rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
+ lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
+ =jpXa
+ -----END PGP SIGNATURE-----
+
+Merge tag 'signedtag' into downstream
+
+signed tag
+
+signed tag message body
+
+# gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
+# gpg: Good signature from "Eris Discordia <discord@example.net>"
+# gpg: WARNING: This key is not certified with a trusted signature!
+# gpg:          There is no indication that the signature belongs to the owner.
+# Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+----
+
+- verify with: verification is embedded in merge commit message by default,
+  alternatively with `git show --show-signature`:
+
+----
+commit 9863f0c76ff78712b6800e199a46aa56afbcbd49
+merged tag 'signedtag'
+gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
+gpg: Good signature from "Eris Discordia <discord@example.net>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg:          There is no indication that the signature belongs to the owner.
+Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+Merge: c33429b 04b8717
+Author: A U Thor <author@example.com>
+Date:   Wed Jun 15 09:13:29 2016 +0000
+
+    Merge tag 'signedtag' into downstream
+
+    signed tag
+
+    signed tag message body
+
+    # gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
+    # gpg: Good signature from "Eris Discordia <discord@example.net>"
+    # gpg: WARNING: This key is not certified with a trusted signature!
+    # gpg:          There is no indication that the signature belongs to the owner.
+    # Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+----
diff --git a/Documentation/technical/trivial-merge.txt b/Documentation/technical/trivial-merge.txt
index c79d4a7c47..1f1c33d0da 100644
--- a/Documentation/technical/trivial-merge.txt
+++ b/Documentation/technical/trivial-merge.txt
@@ -32,7 +32,7 @@ or the result.
 If multiple cases apply, the one used is listed first.
 
 A result which changes the index is an error if the index is not empty
-and not up-to-date.
+and not up to date.
 
 Entries marked '+' have stat information. Spaces marked '*' don't
 affect the result.
@@ -65,7 +65,7 @@ empty, no entry is left for that stage). Otherwise, the given entry is
 left in stage 0, and there are no other entries.
 
 A result of "no merge" is an error if the index is not empty and not
-up-to-date.
+up to date.
 
 *empty* means that the tree must not have a directory-file conflict
  with the entry.