34 files changed, 9565 insertions, 0 deletions
diff --git a/Documentation/technical/.gitignore b/Documentation/technical/.gitignore
new file mode 100644
index 0000000000..8aa891daee
--- /dev/null
+++ b/Documentation/technical/.gitignore
@@ -0,0 +1 @@
+api-index.txt
diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
new file mode 100644
index 0000000000..ceeedd485c
--- /dev/null
+++ b/Documentation/technical/api-error-handling.txt
@@ -0,0 +1,75 @@
+Error reporting in git
+======================
+
+`die`, `usage`, `error`, and `warning` report errors of various
+kinds.
+
+- `die` is for fatal application errors.  It prints a message to
+  the user and exits with status 128.
+
+- `usage` is for errors in command line usage.  After printing its
+  message, it exits with status 129.  (See also `usage_with_options`
+  in the link:api-parse-options.html[parse-options API].)
+
+- `error` is for non-fatal library errors.  It prints a message
+  to the user and returns -1 for convenience in signaling the error
+  to the caller.
+
+- `warning` is for reporting situations that probably should not
+  occur but which the user (and Git) can continue to work around
+  without running into too many problems.  Like `error`, it
+  returns -1 after reporting the situation to the caller.
+
+Customizable error handlers
+---------------------------
+
+The default behavior of `die` and `error` is to write a message to
+stderr and then exit or return as appropriate.  This behavior can be
+overridden using `set_die_routine` and `set_error_routine`.  For
+example, "git daemon" uses set_die_routine to write the reason `die`
+was called to syslog before exiting.
+
+Library errors
+--------------
+
+Functions return a negative integer on error.  Details beyond that
+vary from function to function:
+
+- Some functions return -1 for all errors.  Others return a more
+  specific value depending on how the caller might want to react
+  to the error.
+
+- Some functions report the error to stderr with `error`,
+  while others leave that for the caller to do.
+
+- errno is not meaningful on return from most functions (except
+  for thin wrappers for system calls).
+
+Check the function's API documentation to be sure.
+
+Caller-handled errors
+---------------------
+
+An increasing number of functions take a parameter 'struct strbuf *err'.
+On error, such functions append a message about what went wrong to the
+'err' strbuf.  The message is meant to be complete enough to be passed
+to `die` or `error` as-is.  For example:
+
+	if (ref_transaction_commit(transaction, &err))
+		die("%s", err.buf);
+
+The 'err' parameter will be untouched if no error occurred, so multiple
+function calls can be chained:
+
+	t = ref_transaction_begin(&err);
+	if (!t ||
+	    ref_transaction_update(t, "HEAD", ..., &err) ||
+	    ret_transaction_commit(t, &err))
+		die("%s", err.buf);
+
+The 'err' parameter must be a pointer to a valid strbuf.  To silence
+a message, pass a strbuf that is explicitly ignored:
+
+	if (thing_that_can_fail_in_an_ignorable_way(..., &err))
+		/* This failure is okay. */
+		strbuf_reset(&err);
diff --git a/Documentation/technical/api-index-skel.txt b/Documentation/technical/api-index-skel.txt
new file mode 100644
index 0000000000..eda8c195c1
--- /dev/null
+++ b/Documentation/technical/api-index-skel.txt
@@ -0,0 +1,13 @@
+Git API Documents
+=================
+
+Git has grown a set of internal API over time.  This collection
+documents them.
+
+////////////////////////////////////////////////////////////////
+// table of contents begin
+////////////////////////////////////////////////////////////////
+
+////////////////////////////////////////////////////////////////
+// table of contents end
+////////////////////////////////////////////////////////////////
diff --git a/Documentation/technical/api-index.sh b/Documentation/technical/api-index.sh
new file mode 100755
index 0000000000..9c3f4131b8
--- /dev/null
+++ b/Documentation/technical/api-index.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+(
+	c=////////////////////////////////////////////////////////////////
+	skel=api-index-skel.txt
+	sed -e '/^\/\/ table of contents begin/q' "$skel"
+	echo "$c"
+
+	ls api-*.txt |
+	while read filename
+	do
+		case "$filename" in
+		api-index-skel.txt | api-index.txt) continue ;;
+		esac
+		title=$(sed -e 1q "$filename")
+		html=${filename%.txt}.html
+		echo "* link:$html[$title]"
+	done
+	echo "$c"
+	sed -n -e '/^\/\/ table of contents end/,$p' "$skel"
+) >api-index.txt+
+
+if test -f api-index.txt && cmp api-index.txt api-index.txt+ >/dev/null
+then
+	rm -f api-index.txt+
+else
+	mv api-index.txt+ api-index.txt
+fi
diff --git a/Documentation/technical/api-merge.txt b/Documentation/technical/api-merge.txt
new file mode 100644
index 0000000000..487d4d83ff
--- /dev/null
+++ b/Documentation/technical/api-merge.txt
@@ -0,0 +1,36 @@
+merge API
+=========
+
+The merge API helps a program to reconcile two competing sets of
+improvements to some files (e.g., unregistered changes from the work
+tree versus changes involved in switching to a new branch), reporting
+conflicts if found.  The library called through this API is
+responsible for a few things.
+
+ * determining which trees to merge (recursive ancestor consolidation);
+
+ * lining up corresponding files in the trees to be merged (rename
+   detection, subtree shifting), reporting edge cases like add/add
+   and rename/rename conflicts to the user;
+
+ * performing a three-way merge of corresponding files, taking
+   path-specific merge drivers (specified in `.gitattributes`)
+   into account.
+
+Data structures
+---------------
+
+* `mmbuffer_t`, `mmfile_t`
+
+These store data usable for use by the xdiff backend, for writing and
+for reading, respectively.  See `xdiff/xdiff.h` for the definitions
+and `diff.c` for examples.
+
+* `struct ll_merge_options`
+
+Check ll-merge.h for details.
+
+Low-level (single file) merge
+-----------------------------
+
+Check ll-merge.h for details.
diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt
new file mode 100644
index 0000000000..5a60bbfa7f
--- /dev/null
+++ b/Documentation/technical/api-parse-options.txt
@@ -0,0 +1,313 @@
+parse-options API
+=================
+
+The parse-options API is used to parse and massage options in Git
+and to provide a usage help with consistent look.
+
+Basics
+------
+
+The argument vector `argv[]` may usually contain mandatory or optional
+'non-option arguments', e.g. a filename or a branch, and 'options'.
+Options are optional arguments that start with a dash and
+that allow to change the behavior of a command.
+
+* There are basically three types of options:
+  'boolean' options,
+  options with (mandatory) 'arguments' and
+  options with 'optional arguments'
+  (i.e. a boolean option that can be adjusted).
+
+* There are basically two forms of options:
+  'Short options' consist of one dash (`-`) and one alphanumeric
+  character.
+  'Long options' begin with two dashes (`--`) and some
+  alphanumeric characters.
+
+* Options are case-sensitive.
+  Please define 'lower-case long options' only.
+
+The parse-options API allows:
+
+* 'stuck' and 'separate form' of options with arguments.
+  `-oArg` is stuck, `-o Arg` is separate form.
+  `--option=Arg` is stuck, `--option Arg` is separate form.
+
+* Long options may be 'abbreviated', as long as the abbreviation
+  is unambiguous.
+
+* Short options may be bundled, e.g. `-a -b` can be specified as `-ab`.
+
+* Boolean long options can be 'negated' (or 'unset') by prepending
+  `no-`, e.g. `--no-abbrev` instead of `--abbrev`. Conversely,
+  options that begin with `no-` can be 'negated' by removing it.
+  Other long options can be unset (e.g., set string to NULL, set
+  integer to 0) by prepending `no-`.
+
+* Options and non-option arguments can clearly be separated using the `--`
+  option, e.g. `-a -b --option -- --this-is-a-file` indicates that
+  `--this-is-a-file` must not be processed as an option.
+
+Steps to parse options
+----------------------
+
+. `#include "parse-options.h"`
+
+. define a NULL-terminated
+  `static const char * const builtin_foo_usage[]` array
+  containing alternative usage strings
+
+. define `builtin_foo_options` array as described below
+  in section 'Data Structure'.
+
+. in `cmd_foo(int argc, const char **argv, const char *prefix)`
+  call
+
+	argc = parse_options(argc, argv, prefix, builtin_foo_options, builtin_foo_usage, flags);
++
+`parse_options()` will filter out the processed options of `argv[]` and leave the
+non-option arguments in `argv[]`.
+`argc` is updated appropriately because of the assignment.
++
+You can also pass NULL instead of a usage array as the fifth parameter of
+parse_options(), to avoid displaying a help screen with usage info and
+option list.  This should only be done if necessary, e.g. to implement
+a limited parser for only a subset of the options that needs to be run
+before the full parser, which in turn shows the full help message.
++
+Flags are the bitwise-or of:
+
+`PARSE_OPT_KEEP_DASHDASH`::
+	Keep the `--` that usually separates options from
+	non-option arguments.
+
+`PARSE_OPT_STOP_AT_NON_OPTION`::
+	Usually the whole argument vector is massaged and reordered.
+	Using this flag, processing is stopped at the first non-option
+	argument.
+
+`PARSE_OPT_KEEP_ARGV0`::
+	Keep the first argument, which contains the program name.  It's
+	removed from argv[] by default.
+
+`PARSE_OPT_KEEP_UNKNOWN`::
+	Keep unknown arguments instead of erroring out.  This doesn't
+	work for all combinations of arguments as users might expect
+	it to do.  E.g. if the first argument in `--unknown --known`
+	takes a value (which we can't know), the second one is
+	mistakenly interpreted as a known option.  Similarly, if
+	`PARSE_OPT_STOP_AT_NON_OPTION` is set, the second argument in
+	`--unknown value` will be mistakenly interpreted as a
+	non-option, not as a value belonging to the unknown option,
+	the parser early.  That's why parse_options() errors out if
+	both options are set.
+
+`PARSE_OPT_NO_INTERNAL_HELP`::
+	By default, parse_options() handles `-h`, `--help` and
+	`--help-all` internally, by showing a help screen.  This option
+	turns it off and allows one to add custom handlers for these
+	options, or to just leave them unknown.
+
+Data Structure
+--------------
+
+The main data structure is an array of the `option` struct,
+say `static struct option builtin_add_options[]`.
+There are some macros to easily define options:
+
+`OPT__ABBREV(&int_var)`::
+	Add `--abbrev[=<n>]`.
+
+`OPT__COLOR(&int_var, description)`::
+	Add `--color[=<when>]` and `--no-color`.
+
+`OPT__DRY_RUN(&int_var, description)`::
+	Add `-n, --dry-run`.
+
+`OPT__FORCE(&int_var, description)`::
+	Add `-f, --force`.
+
+`OPT__QUIET(&int_var, description)`::
+	Add `-q, --quiet`.
+
+`OPT__VERBOSE(&int_var, description)`::
+	Add `-v, --verbose`.
+
+`OPT_GROUP(description)`::
+	Start an option group. `description` is a short string that
+	describes the group or an empty string.
+	Start the description with an upper-case letter.
+
+`OPT_BOOL(short, long, &int_var, description)`::
+	Introduce a boolean option. `int_var` is set to one with
+	`--option` and set to zero with `--no-option`.
+
+`OPT_COUNTUP(short, long, &int_var, description)`::
+	Introduce a count-up option.
+	Each use of `--option` increments `int_var`, starting from zero
+	(even if initially negative), and `--no-option` resets it to
+	zero. To determine if `--option` or `--no-option` was encountered at
+	all, initialize `int_var` to a negative value, and if it is still
+	negative after parse_options(), then neither `--option` nor
+	`--no-option` was seen.
+
+`OPT_BIT(short, long, &int_var, description, mask)`::
+	Introduce a boolean option.
+	If used, `int_var` is bitwise-ored with `mask`.
+
+`OPT_NEGBIT(short, long, &int_var, description, mask)`::
+	Introduce a boolean option.
+	If used, `int_var` is bitwise-anded with the inverted `mask`.
+
+`OPT_SET_INT(short, long, &int_var, description, integer)`::
+	Introduce an integer option.
+	`int_var` is set to `integer` with `--option`, and
+	reset to zero with `--no-option`.
+
+`OPT_STRING(short, long, &str_var, arg_str, description)`::
+	Introduce an option with string argument.
+	The string argument is put into `str_var`.
+
+`OPT_STRING_LIST(short, long, &struct string_list, arg_str, description)`::
+	Introduce an option with string argument.
+	The string argument is stored as an element in `string_list`.
+	Use of `--no-option` will clear the list of preceding values.
+
+`OPT_INTEGER(short, long, &int_var, description)`::
+	Introduce an option with integer argument.
+	The integer is put into `int_var`.
+
+`OPT_MAGNITUDE(short, long, &unsigned_long_var, description)`::
+	Introduce an option with a size argument. The argument must be a
+	non-negative integer and may include a suffix of 'k', 'm' or 'g' to
+	scale the provided value by 1024, 1024^2 or 1024^3 respectively.
+	The scaled value is put into `unsigned_long_var`.
+
+`OPT_EXPIRY_DATE(short, long, &timestamp_t_var, description)`::
+	Introduce an option with expiry date argument, see `parse_expiry_date()`.
+	The timestamp is put into `timestamp_t_var`.
+
+`OPT_CALLBACK(short, long, &var, arg_str, description, func_ptr)`::
+	Introduce an option with argument.
+	The argument will be fed into the function given by `func_ptr`
+	and the result will be put into `var`.
+	See 'Option Callbacks' below for a more elaborate description.
+
+`OPT_FILENAME(short, long, &var, description)`::
+	Introduce an option with a filename argument.
+	The filename will be prefixed by passing the filename along with
+	the prefix argument of `parse_options()` to `prefix_filename()`.
+
+`OPT_ARGUMENT(long, &int_var, description)`::
+	Introduce a long-option argument that will be kept in `argv[]`.
+	If this option was seen, `int_var` will be set to one (except
+	if a `NULL` pointer was passed).
+
+`OPT_NUMBER_CALLBACK(&var, description, func_ptr)`::
+	Recognize numerical options like -123 and feed the integer as
+	if it was an argument to the function given by `func_ptr`.
+	The result will be put into `var`.  There can be only one such
+	option definition.  It cannot be negated and it takes no
+	arguments.  Short options that happen to be digits take
+	precedence over it.
+
+`OPT_COLOR_FLAG(short, long, &int_var, description)`::
+	Introduce an option that takes an optional argument that can
+	have one of three values: "always", "never", or "auto".  If the
+	argument is not given, it defaults to "always".  The `--no-` form
+	works like `--long=never`; it cannot take an argument.  If
+	"always", set `int_var` to 1; if "never", set `int_var` to 0; if
+	"auto", set `int_var` to 1 if stdout is a tty or a pager,
+	0 otherwise.
+
+`OPT_NOOP_NOARG(short, long)`::
+	Introduce an option that has no effect and takes no arguments.
+	Use it to hide deprecated options that are still to be recognized
+	and ignored silently.
+
+`OPT_PASSTHRU(short, long, &char_var, arg_str, description, flags)`::
+	Introduce an option that will be reconstructed into a char* string,
+	which must be initialized to NULL. This is useful when you need to
+	pass the command-line option to another command. Any previous value
+	will be overwritten, so this should only be used for options where
+	the last one specified on the command line wins.
+
+`OPT_PASSTHRU_ARGV(short, long, &strvec_var, arg_str, description, flags)`::
+	Introduce an option where all instances of it on the command-line will
+	be reconstructed into a strvec. This is useful when you need to
+	pass the command-line option, which can be specified multiple times,
+	to another command.
+
+`OPT_CMDMODE(short, long, &int_var, description, enum_val)`::
+	Define an "operation mode" option, only one of which in the same
+	group of "operating mode" options that share the same `int_var`
+	can be given by the user. `enum_val` is set to `int_var` when the
+	option is used, but an error is reported if other "operating mode"
+	option has already set its value to the same `int_var`.
+
+
+The last element of the array must be `OPT_END()`.
+
+If not stated otherwise, interpret the arguments as follows:
+
+* `short` is a character for the short option
+  (e.g. `'e'` for `-e`, use `0` to omit),
+
+* `long` is a string for the long option
+  (e.g. `"example"` for `--example`, use `NULL` to omit),
+
+* `int_var` is an integer variable,
+
+* `str_var` is a string variable (`char *`),
+
+* `arg_str` is the string that is shown as argument
+  (e.g. `"branch"` will result in `<branch>`).
+  If set to `NULL`, three dots (`...`) will be displayed.
+
+* `description` is a short string to describe the effect of the option.
+  It shall begin with a lower-case letter and a full stop (`.`) shall be
+  omitted at the end.
+
+Option Callbacks
+----------------
+
+The function must be defined in this form:
+
+	int func(const struct option *opt, const char *arg, int unset)
+
+The callback mechanism is as follows:
+
+* Inside `func`, the only interesting member of the structure
+  given by `opt` is the void pointer `opt->value`.
+  `*opt->value` will be the value that is saved into `var`, if you
+  use `OPT_CALLBACK()`.
+  For example, do `*(unsigned long *)opt->value = 42;` to get 42
+  into an `unsigned long` variable.
+
+* Return value `0` indicates success and non-zero return
+  value will invoke `usage_with_options()` and, thus, die.
+
+* If the user negates the option, `arg` is `NULL` and `unset` is 1.
+
+Sophisticated option parsing
+----------------------------
+
+If you need, for example, option callbacks with optional arguments
+or without arguments at all, or if you need other special cases,
+that are not handled by the macros above, you need to specify the
+members of the `option` structure manually.
+
+This is not covered in this document, but well documented
+in `parse-options.h` itself.
+
+Examples
+--------
+
+See `test-parse-options.c` and
+`builtin/add.c`,
+`builtin/clone.c`,
+`builtin/commit.c`,
+`builtin/fetch.c`,
+`builtin/fsck.c`,
+`builtin/rm.c`
+for real-world examples.
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
new file mode 100644
index 0000000000..c65ffafc48
--- /dev/null
+++ b/Documentation/technical/api-trace2.txt
@@ -0,0 +1,1171 @@
+= Trace2 API
+
+The Trace2 API can be used to print debug, performance, and telemetry
+information to stderr or a file.  The Trace2 feature is inactive unless
+explicitly enabled by enabling one or more Trace2 Targets.
+
+The Trace2 API is intended to replace the existing (Trace1)
+printf-style tracing provided by the existing `GIT_TRACE` and
+`GIT_TRACE_PERFORMANCE` facilities.  During initial implementation,
+Trace2 and Trace1 may operate in parallel.
+
+The Trace2 API defines a set of high-level messages with known fields,
+such as (`start`: `argv`) and (`exit`: {`exit-code`, `elapsed-time`}).
+
+Trace2 instrumentation throughout the Git code base sends Trace2
+messages to the enabled Trace2 Targets.  Targets transform these
+messages content into purpose-specific formats and write events to
+their data streams.  In this manner, the Trace2 API can drive
+many different types of analysis.
+
+Targets are defined using a VTable allowing easy extension to other
+formats in the future.  This might be used to define a binary format,
+for example.
+
+Trace2 is controlled using `trace2.*` config values in the system and
+global config files and `GIT_TRACE2*` environment variables.  Trace2 does
+not read from repo local or worktree config files or respect `-c`
+command line config settings.
+
+== Trace2 Targets
+
+Trace2 defines the following set of Trace2 Targets.
+Format details are given in a later section.
+
+=== The Normal Format Target
+
+The normal format target is a tradition printf format and similar
+to GIT_TRACE format.  This format is enabled with the `GIT_TRACE2`
+environment variable or the `trace2.normalTarget` system or global
+config setting.
+
+For example
+
+------------
+$ export GIT_TRACE2=~/log.normal
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+or
+
+------------
+$ git config --global trace2.normalTarget ~/log.normal
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+yields
+
+------------
+$ cat ~/log.normal
+12:28:42.620009 common-main.c:38                  version 2.20.1.155.g426c96fcdb
+12:28:42.620989 common-main.c:39                  start git version
+12:28:42.621101 git.c:432                         cmd_name version (version)
+12:28:42.621215 git.c:662                         exit elapsed:0.001227 code:0
+12:28:42.621250 trace2/tr2_tgt_normal.c:124       atexit elapsed:0.001265 code:0
+------------
+
+=== The Performance Format Target
+
+The performance format target (PERF) is a column-based format to
+replace GIT_TRACE_PERFORMANCE and is suitable for development and
+testing, possibly to complement tools like gprof.  This format is
+enabled with the `GIT_TRACE2_PERF` environment variable or the
+`trace2.perfTarget` system or global config setting.
+
+For example
+
+------------
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+or
+
+------------
+$ git config --global trace2.perfTarget ~/log.perf
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+yields
+
+------------
+$ cat ~/log.perf
+12:28:42.620675 common-main.c:38                  | d0 | main                     | version      |     |           |           |            | 2.20.1.155.g426c96fcdb
+12:28:42.621001 common-main.c:39                  | d0 | main                     | start        |     |  0.001173 |           |            | git version
+12:28:42.621111 git.c:432                         | d0 | main                     | cmd_name     |     |           |           |            | version (version)
+12:28:42.621225 git.c:662                         | d0 | main                     | exit         |     |  0.001227 |           |            | code:0
+12:28:42.621259 trace2/tr2_tgt_perf.c:211         | d0 | main                     | atexit       |     |  0.001265 |           |            | code:0
+------------
+
+=== The Event Format Target
+
+The event format target is a JSON-based format of event data suitable
+for telemetry analysis.  This format is enabled with the `GIT_TRACE2_EVENT`
+environment variable or the `trace2.eventTarget` system or global config
+setting.
+
+For example
+
+------------
+$ export GIT_TRACE2_EVENT=~/log.event
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+or
+
+------------
+$ git config --global trace2.eventTarget ~/log.event
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+yields
+
+------------
+$ cat ~/log.event
+{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"2","exe":"2.20.1.155.g426c96fcdb"}
+{"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]}
+{"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"}
+{"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0}
+{"event":"atexit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621268Z","file":"trace2/tr2_tgt_event.c","line":163,"t_abs":0.001265,"code":0}
+------------
+
+=== Enabling a Target
+
+To enable a target, set the corresponding environment variable or
+system or global config value to one of the following:
+
+include::../trace2-target-values.txt[]
+
+When trace files are written to a target directory, they will be named according
+to the last component of the SID (optionally followed by a counter to avoid
+filename collisions).
+
+== Trace2 API
+
+All public Trace2 functions and macros are defined in `trace2.h` and
+`trace2.c`.  All public symbols are prefixed with `trace2_`.
+
+There are no public Trace2 data structures.
+
+The Trace2 code also defines a set of private functions and data types
+in the `trace2/` directory.  These symbols are prefixed with `tr2_`
+and should only be used by functions in `trace2.c`.
+
+== Conventions for Public Functions and Macros
+
+The functions defined by the Trace2 API are declared and documented
+in `trace2.h`.  It defines the API functions and wrapper macros for
+Trace2.
+
+Some functions have a `_fl()` suffix to indicate that they take `file`
+and `line-number` arguments.
+
+Some functions have a `_va_fl()` suffix to indicate that they also
+take a `va_list` argument.
+
+Some functions have a `_printf_fl()` suffix to indicate that they also
+take a varargs argument.
+
+There are CPP wrapper macros and ifdefs to hide most of these details.
+See `trace2.h` for more details.  The following discussion will only
+describe the simplified forms.
+
+== Public API
+
+All Trace2 API functions send a message to all of the active
+Trace2 Targets.  This section describes the set of available
+messages.
+
+It helps to divide these functions into groups for discussion
+purposes.
+
+=== Basic Command Messages
+
+These are concerned with the lifetime of the overall git process.
+e.g: `void trace2_initialize_clock()`, `void trace2_initialize()`,
+`int trace2_is_enabled()`, `void trace2_cmd_start(int argc, const char **argv)`.
+
+=== Command Detail Messages
+
+These are concerned with describing the specific Git command
+after the command line, config, and environment are inspected.
+e.g: `void trace2_cmd_name(const char *name)`,
+`void trace2_cmd_mode(const char *mode)`.
+
+=== Child Process Messages
+
+These are concerned with the various spawned child processes,
+including shell scripts, git commands, editors, pagers, and hooks.
+
+e.g: `void trace2_child_start(struct child_process *cmd)`.
+
+=== Git Thread Messages
+
+These messages are concerned with Git thread usage.
+
+e.g: `void trace2_thread_start(const char *thread_name)`.
+
+=== Region and Data Messages
+
+These are concerned with recording performance data
+over regions or spans of code. e.g:
+`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`.
+
+Refer to trace2.h for details about all trace2 functions.
+
+== Trace2 Target Formats
+
+=== NORMAL Format
+
+Events are written as lines of the form:
+
+------------
+[<time> SP <filename>:<line> SP+] <event-name> [[SP] <event-message>] LF
+------------
+
+`<event-name>`::
+
+	is the event name.
+
+`<event-message>`::
+	is a free-form printf message intended for human consumption.
++
+Note that this may contain embedded LF or CRLF characters that are
+not escaped, so the event may spill across multiple lines.
+
+If `GIT_TRACE2_BRIEF` or `trace2.normalBrief` is true, the `time`, `filename`,
+and `line` fields are omitted.
+
+This target is intended to be more of a summary (like GIT_TRACE) and
+less detailed than the other targets.  It ignores thread, region, and
+data messages, for example.
+
+=== PERF Format
+
+Events are written as lines of the form:
+
+------------
+[<time> SP <filename>:<line> SP+
+    BAR SP] d<depth> SP
+    BAR SP <thread-name> SP+
+    BAR SP <event-name> SP+
+    BAR SP [r<repo-id>] SP+
+    BAR SP [<t_abs>] SP+
+    BAR SP [<t_rel>] SP+
+    BAR SP [<category>] SP+
+    BAR SP DOTS* <perf-event-message>
+    LF
+------------
+
+`<depth>`::
+	is the git process depth.  This is the number of parent
+	git processes.  A top-level git command has depth value "d0".
+	A child of it has depth value "d1".  A second level child
+	has depth value "d2" and so on.
+
+`<thread-name>`::
+	is a unique name for the thread.  The primary thread
+	is called "main".  Other thread names are of the form "th%d:%s"
+	and include a unique number and the name of the thread-proc.
+
+`<event-name>`::
+	is the event name.
+
+`<repo-id>`::
+	when present, is a number indicating the repository
+	in use.  A `def_repo` event is emitted when a repository is
+	opened.  This defines the repo-id and associated worktree.
+	Subsequent repo-specific events will reference this repo-id.
++
+Currently, this is always "r1" for the main repository.
+This field is in anticipation of in-proc submodules in the future.
+
+`<t_abs>`::
+	when present, is the absolute time in seconds since the
+	program started.
+
+`<t_rel>`::
+	when present, is time in seconds relative to the start of
+	the current region.  For a thread-exit event, it is the elapsed
+	time of the thread.
+
+`<category>`::
+	is present on region and data events and is used to
+	indicate a broad category, such as "index" or "status".
+
+`<perf-event-message>`::
+	is a free-form printf message intended for human consumption.
+
+------------
+15:33:33.532712 wt-status.c:2310                  | d0 | main                     | region_enter | r1  |  0.126064 |           | status     | label:print
+15:33:33.532712 wt-status.c:2331                  | d0 | main                     | region_leave | r1  |  0.127568 |  0.001504 | status     | label:print
+------------
+
+If `GIT_TRACE2_PERF_BRIEF` or `trace2.perfBrief` is true, the `time`, `file`,
+and `line` fields are omitted.
+
+------------
+d0 | main                     | region_leave | r1  |  0.011717 |  0.009122 | index      | label:preload
+------------
+
+The PERF target is intended for interactive performance analysis
+during development and is quite noisy.
+
+=== EVENT Format
+
+Each event is a JSON-object containing multiple key/value pairs
+written as a single line and followed by a LF.
+
+------------
+'{' <key> ':' <value> [',' <key> ':' <value>]* '}' LF
+------------
+
+Some key/value pairs are common to all events and some are
+event-specific.
+
+==== Common Key/Value Pairs
+
+The following key/value pairs are common to all events:
+
+------------
+{
+	"event":"version",
+	"sid":"20190408T191827.272759Z-H9b68c35f-P00003510",
+	"thread":"main",
+	"time":"2019-04-08T19:18:27.282761Z",
+	"file":"common-main.c",
+	"line":42,
+	...
+}
+------------
+
+`"event":<event>`::
+	is the event name.
+
+`"sid":<sid>`::
+	is the session-id.  This is a unique string to identify the
+	process instance to allow all events emitted by a process to
+	be identified.  A session-id is used instead of a PID because
+	PIDs are recycled by the OS.  For child git processes, the
+	session-id is prepended with the session-id of the parent git
+	process to allow parent-child relationships to be identified
+	during post-processing.
+
+`"thread":<thread>`::
+	is the thread name.
+
+`"time":<time>`::
+	is the UTC time of the event.
+
+`"file":<filename>`::
+	is source file generating the event.
+
+`"line":<line-number>`::
+	is the integer source line number generating the event.
+
+`"repo":<repo-id>`::
+	when present, is the integer repo-id as described previously.
+
+If `GIT_TRACE2_EVENT_BRIEF` or `trace2.eventBrief` is true, the `file`
+and `line` fields are omitted from all events and the `time` field is
+only present on the "start" and "atexit" events.
+
+==== Event-Specific Key/Value Pairs
+
+`"version"`::
+	This event gives the version of the executable and the EVENT format. It
+	should always be the first event in a trace session. The EVENT format
+	version will be incremented if new event types are added, if existing
+	fields are removed, or if there are significant changes in
+	interpretation of existing events or fields. Smaller changes, such as
+	adding a new field to an existing event, will not require an increment
+	to the EVENT format version.
++
+------------
+{
+	"event":"version",
+	...
+	"evt":"2",		       # EVENT format version
+	"exe":"2.20.1.155.g426c96fcdb" # git version
+}
+------------
+
+`"discard"`::
+	This event is written to the git-trace2-discard sentinel file if there
+	are too many files in the target trace directory (see the
+	trace2.maxFiles config option).
++
+------------
+{
+	"event":"discard",
+	...
+}
+------------
+
+`"start"`::
+	This event contains the complete argv received by main().
++
+------------
+{
+	"event":"start",
+	...
+	"t_abs":0.001227, # elapsed time in seconds
+	"argv":["git","version"]
+}
+------------
+
+`"exit"`::
+	This event is emitted when git calls `exit()`.
++
+------------
+{
+	"event":"exit",
+	...
+	"t_abs":0.001227, # elapsed time in seconds
+	"code":0	  # exit code
+}
+------------
+
+`"atexit"`::
+	This event is emitted by the Trace2 `atexit` routine during
+	final shutdown.  It should be the last event emitted by the
+	process.
++
+(The elapsed time reported here is greater than the time reported in
+the "exit" event because it runs after all other atexit tasks have
+completed.)
++
+------------
+{
+	"event":"atexit",
+	...
+	"t_abs":0.001227, # elapsed time in seconds
+	"code":0          # exit code
+}
+------------
+
+`"signal"`::
+	This event is emitted when the program is terminated by a user
+	signal.  Depending on the platform, the signal event may
+	prevent the "atexit" event from being generated.
++
+------------
+{
+	"event":"signal",
+	...
+	"t_abs":0.001227,  # elapsed time in seconds
+	"signo":13         # SIGTERM, SIGINT, etc.
+}
+------------
+
+`"error"`::
+	This event is emitted when one of the `error()`, `die()`,
+	`warning()`, or `usage()` functions are called.
++
+------------
+{
+	"event":"error",
+	...
+	"msg":"invalid option: --cahced", # formatted error message
+	"fmt":"invalid option: %s"	  # error format string
+}
+------------
++
+The error event may be emitted more than once.  The format string
+allows post-processors to group errors by type without worrying
+about specific error arguments.
+
+`"cmd_path"`::
+	This event contains the discovered full path of the git
+	executable (on platforms that are configured to resolve it).
++
+------------
+{
+	"event":"cmd_path",
+	...
+	"path":"C:/work/gfw/git.exe"
+}
+------------
+
+`"cmd_name"`::
+	This event contains the command name for this git process
+	and the hierarchy of commands from parent git processes.
++
+------------
+{
+	"event":"cmd_name",
+	...
+	"name":"pack-objects",
+	"hierarchy":"push/pack-objects"
+}
+------------
++
+Normally, the "name" field contains the canonical name of the
+command.  When a canonical name is not available, one of
+these special values are used:
++
+------------
+"_query_"            # "git --html-path"
+"_run_dashed_"       # when "git foo" tries to run "git-foo"
+"_run_shell_alias_"  # alias expansion to a shell command
+"_run_git_alias_"    # alias expansion to a git command
+"_usage_"            # usage error
+------------
+
+`"cmd_mode"`::
+	This event, when present, describes the command variant This
+	event may be emitted more than once.
++
+------------
+{
+	"event":"cmd_mode",
+	...
+	"name":"branch"
+}
+------------
++
+The "name" field is an arbitrary string to describe the command mode.
+For example, checkout can checkout a branch or an individual file.
+And these variations typically have different performance
+characteristics that are not comparable.
+
+`"alias"`::
+	This event is present when an alias is expanded.
++
+------------
+{
+	"event":"alias",
+	...
+	"alias":"l",		 # registered alias
+	"argv":["log","--graph"] # alias expansion
+}
+------------
+
+`"child_start"`::
+	This event describes a child process that is about to be
+	spawned.
++
+------------
+{
+	"event":"child_start",
+	...
+	"child_id":2,
+	"child_class":"?",
+	"use_shell":false,
+	"argv":["git","rev-list","--objects","--stdin","--not","--all","--quiet"]
+
+	"hook_name":"<hook_name>"  # present when child_class is "hook"
+	"cd":"<path>"		   # present when cd is required
+}
+------------
++
+The "child_id" field can be used to match this child_start with the
+corresponding child_exit event.
++
+The "child_class" field is a rough classification, such as "editor",
+"pager", "transport/*", and "hook".  Unclassified children are classified
+with "?".
+
+`"child_exit"`::
+	This event is generated after the current process has returned
+	from the waitpid() and collected the exit information from the
+	child.
++
+------------
+{
+	"event":"child_exit",
+	...
+	"child_id":2,
+	"pid":14708,	 # child PID
+	"code":0,	 # child exit-code
+	"t_rel":0.110605 # observed run-time of child process
+}
+------------
++
+Note that the session-id of the child process is not available to
+the current/spawning process, so the child's PID is reported here as
+a hint for post-processing.  (But it is only a hint because the child
+process may be a shell script which doesn't have a session-id.)
++
+Note that the `t_rel` field contains the observed run time in seconds
+for the child process (starting before the fork/exec/spawn and
+stopping after the waitpid() and includes OS process creation overhead).
+So this time will be slightly larger than the atexit time reported by
+the child process itself.
+
+`"exec"`::
+	This event is generated before git attempts to `exec()`
+	another command rather than starting a child process.
++
+------------
+{
+	"event":"exec",
+	...
+	"exec_id":0,
+	"exe":"git",
+	"argv":["foo", "bar"]
+}
+------------
++
+The "exec_id" field is a command-unique id and is only useful if the
+`exec()` fails and a corresponding exec_result event is generated.
+
+`"exec_result"`::
+	This event is generated if the `exec()` fails and control
+	returns to the current git command.
++
+------------
+{
+	"event":"exec_result",
+	...
+	"exec_id":0,
+	"code":1      # error code (errno) from exec()
+}
+------------
+
+`"thread_start"`::
+	This event is generated when a thread is started.  It is
+	generated from *within* the new thread's thread-proc (for TLS
+	reasons).
++
+------------
+{
+	"event":"thread_start",
+	...
+	"thread":"th02:preload_thread" # thread name
+}
+------------
+
+`"thread_exit"`::
+	This event is generated when a thread exits.  It is generated
+	from *within* the thread's thread-proc (for TLS reasons).
++
+------------
+{
+	"event":"thread_exit",
+	...
+	"thread":"th02:preload_thread", # thread name
+	"t_rel":0.007328                # thread elapsed time
+}
+------------
+
+`"def_param"`::
+	This event is generated to log a global parameter, such as a config
+	setting, command-line flag, or environment variable.
++
+------------
+{
+	"event":"def_param",
+	...
+	"param":"core.abbrev",
+	"value":"7"
+}
+------------
+
+`"def_repo"`::
+	This event defines a repo-id and associates it with the root
+	of the worktree.
++
+------------
+{
+	"event":"def_repo",
+	...
+	"repo":1,
+	"worktree":"/Users/jeffhost/work/gfw"
+}
+------------
++
+As stated earlier, the repo-id is currently always 1, so there will
+only be one def_repo event.  Later, if in-proc submodules are
+supported, a def_repo event should be emitted for each submodule
+visited.
+
+`"region_enter"`::
+	This event is generated when entering a region.
++
+------------
+{
+	"event":"region_enter",
+	...
+	"repo":1,                # optional
+	"nesting":1,             # current region stack depth
+	"category":"index",      # optional
+	"label":"do_read_index", # optional
+	"msg":".git/index"       # optional
+}
+------------
++
+The `category` field may be used in a future enhancement to
+do category-based filtering.
++
+`GIT_TRACE2_EVENT_NESTING` or `trace2.eventNesting` can be used to
+filter deeply nested regions and data events.  It defaults to "2".
+
+`"region_leave"`::
+	This event is generated when leaving a region.
++
+------------
+{
+	"event":"region_leave",
+	...
+	"repo":1,                # optional
+	"t_rel":0.002876,        # time spent in region in seconds
+	"nesting":1,             # region stack depth
+	"category":"index",      # optional
+	"label":"do_read_index", # optional
+	"msg":".git/index"       # optional
+}
+------------
+
+`"data"`::
+	This event is generated to log a thread- and region-local
+	key/value pair.
++
+------------
+{
+	"event":"data",
+	...
+	"repo":1,              # optional
+	"t_abs":0.024107,      # absolute elapsed time
+	"t_rel":0.001031,      # elapsed time in region/thread
+	"nesting":2,           # region stack depth
+	"category":"index",
+	"key":"read/cache_nr",
+	"value":"3552"
+}
+------------
++
+The "value" field may be an integer or a string.
+
+`"data-json"`::
+	This event is generated to log a pre-formatted JSON string
+	containing structured data.
++
+------------
+{
+	"event":"data_json",
+	...
+	"repo":1,              # optional
+	"t_abs":0.015905,
+	"t_rel":0.015905,
+	"nesting":1,
+	"category":"process",
+	"key":"windows/ancestry",
+	"value":["bash.exe","bash.exe"]
+}
+------------
+
+== Example Trace2 API Usage
+
+Here is a hypothetical usage of the Trace2 API showing the intended
+usage (without worrying about the actual Git details).
+
+Initialization::
+
+	Initialization happens in `main()`.  Behind the scenes, an
+	`atexit` and `signal` handler are registered.
++
+----------------
+int main(int argc, const char **argv)
+{
+	int exit_code;
+
+	trace2_initialize();
+	trace2_cmd_start(argv);
+
+	exit_code = cmd_main(argc, argv);
+
+	trace2_cmd_exit(exit_code);
+
+	return exit_code;
+}
+----------------
+
+Command Details::
+
+	After the basics are established, additional command
+	information can be sent to Trace2 as it is discovered.
++
+----------------
+int cmd_checkout(int argc, const char **argv)
+{
+	trace2_cmd_name("checkout");
+	trace2_cmd_mode("branch");
+	trace2_def_repo(the_repository);
+
+	// emit "def_param" messages for "interesting" config settings.
+	trace2_cmd_list_config();
+
+	if (do_something())
+	    trace2_cmd_error("Path '%s': cannot do something", path);
+
+	return 0;
+}
+----------------
+
+Child Processes::
+
+	Wrap code spawning child processes.
++
+----------------
+void run_child(...)
+{
+	int child_exit_code;
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	...
+	cmd.trace2_child_class = "editor";
+
+	trace2_child_start(&cmd);
+	child_exit_code = spawn_child_and_wait_for_it();
+	trace2_child_exit(&cmd, child_exit_code);
+}
+----------------
++
+For example, the following fetch command spawned ssh, index-pack,
+rev-list, and gc.  This example also shows that fetch took
+5.199 seconds and of that 4.932 was in ssh.
++
+----------------
+$ export GIT_TRACE2_BRIEF=1
+$ export GIT_TRACE2=~/log.normal
+$ git fetch origin
+...
+----------------
++
+----------------
+$ cat ~/log.normal
+version 2.20.1.vfs.1.1.47.g534dbe1ad1
+start git fetch origin
+worktree /Users/jeffhost/work/gfw
+cmd_name fetch (fetch)
+child_start[0] ssh git@github.com ...
+child_start[1] git index-pack ...
+... (Trace2 events from child processes omitted)
+child_exit[1] pid:14707 code:0 elapsed:0.076353
+child_exit[0] pid:14706 code:0 elapsed:4.931869
+child_start[2] git rev-list ...
+... (Trace2 events from child process omitted)
+child_exit[2] pid:14708 code:0 elapsed:0.110605
+child_start[3] git gc --auto
+... (Trace2 events from child process omitted)
+child_exit[3] pid:14709 code:0 elapsed:0.006240
+exit elapsed:5.198503 code:0
+atexit elapsed:5.198541 code:0
+----------------
++
+When a git process is a (direct or indirect) child of another
+git process, it inherits Trace2 context information.  This
+allows the child to print the command hierarchy.  This example
+shows gc as child[3] of fetch.  When the gc process reports
+its name as "gc", it also reports the hierarchy as "fetch/gc".
+(In this example, trace2 messages from the child process is
+indented for clarity.)
++
+----------------
+$ export GIT_TRACE2_BRIEF=1
+$ export GIT_TRACE2=~/log.normal
+$ git fetch origin
+...
+----------------
++
+----------------
+$ cat ~/log.normal
+version 2.20.1.160.g5676107ecd.dirty
+start git fetch official
+worktree /Users/jeffhost/work/gfw
+cmd_name fetch (fetch)
+...
+child_start[3] git gc --auto
+    version 2.20.1.160.g5676107ecd.dirty
+    start /Users/jeffhost/work/gfw/git gc --auto
+    worktree /Users/jeffhost/work/gfw
+    cmd_name gc (fetch/gc)
+    exit elapsed:0.001959 code:0
+    atexit elapsed:0.001997 code:0
+child_exit[3] pid:20303 code:0 elapsed:0.007564
+exit elapsed:3.868938 code:0
+atexit elapsed:3.868970 code:0
+----------------
+
+Regions::
+
+	Regions can be use to time an interesting section of code.
++
+----------------
+void wt_status_collect(struct wt_status *s)
+{
+	trace2_region_enter("status", "worktrees", s->repo);
+	wt_status_collect_changes_worktree(s);
+	trace2_region_leave("status", "worktrees", s->repo);
+
+	trace2_region_enter("status", "index", s->repo);
+	wt_status_collect_changes_index(s);
+	trace2_region_leave("status", "index", s->repo);
+
+	trace2_region_enter("status", "untracked", s->repo);
+	wt_status_collect_untracked(s);
+	trace2_region_leave("status", "untracked", s->repo);
+}
+
+void wt_status_print(struct wt_status *s)
+{
+	trace2_region_enter("status", "print", s->repo);
+	switch (s->status_format) {
+	    ...
+	}
+	trace2_region_leave("status", "print", s->repo);
+}
+----------------
++
+In this example, scanning for untracked files ran from +0.012568 to
++0.027149 (since the process started) and took 0.014581 seconds.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+
+$ cat ~/log.perf
+d0 | main                     | version      |     |           |           |            | 2.20.1.160.g5676107ecd.dirty
+d0 | main                     | start        |     |  0.001173 |           |            | git status
+d0 | main                     | def_repo     | r1  |           |           |            | worktree:/Users/jeffhost/work/gfw
+d0 | main                     | cmd_name     |     |           |           |            | status (status)
+...
+d0 | main                     | region_enter | r1  |  0.010988 |           | status     | label:worktrees
+d0 | main                     | region_leave | r1  |  0.011236 |  0.000248 | status     | label:worktrees
+d0 | main                     | region_enter | r1  |  0.011260 |           | status     | label:index
+d0 | main                     | region_leave | r1  |  0.012542 |  0.001282 | status     | label:index
+d0 | main                     | region_enter | r1  |  0.012568 |           | status     | label:untracked
+d0 | main                     | region_leave | r1  |  0.027149 |  0.014581 | status     | label:untracked
+d0 | main                     | region_enter | r1  |  0.027411 |           | status     | label:print
+d0 | main                     | region_leave | r1  |  0.028741 |  0.001330 | status     | label:print
+d0 | main                     | exit         |     |  0.028778 |           |            | code:0
+d0 | main                     | atexit       |     |  0.028809 |           |            | code:0
+----------------
++
+Regions may be nested.  This causes messages to be indented in the
+PERF target, for example.
+Elapsed times are relative to the start of the corresponding nesting
+level as expected.  For example, if we add region message to:
++
+----------------
+static enum path_treatment read_directory_recursive(struct dir_struct *dir,
+	struct index_state *istate, const char *base, int baselen,
+	struct untracked_cache_dir *untracked, int check_only,
+	int stop_at_first_file, const struct pathspec *pathspec)
+{
+	enum path_treatment state, subdir_state, dir_state = path_none;
+
+	trace2_region_enter_printf("dir", "read_recursive", NULL, "%.*s", baselen, base);
+	...
+	trace2_region_leave_printf("dir", "read_recursive", NULL, "%.*s", baselen, base);
+	return dir_state;
+}
+----------------
++
+We can further investigate the time spent scanning for untracked files.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+d0 | main                     | version      |     |           |           |            | 2.20.1.162.gb4ccea44db.dirty
+d0 | main                     | start        |     |  0.001173 |           |            | git status
+d0 | main                     | def_repo     | r1  |           |           |            | worktree:/Users/jeffhost/work/gfw
+d0 | main                     | cmd_name     |     |           |           |            | status (status)
+...
+d0 | main                     | region_enter | r1  |  0.015047 |           | status     | label:untracked
+d0 | main                     | region_enter |     |  0.015132 |           | dir        | ..label:read_recursive
+d0 | main                     | region_enter |     |  0.016341 |           | dir        | ....label:read_recursive vcs-svn/
+d0 | main                     | region_leave |     |  0.016422 |  0.000081 | dir        | ....label:read_recursive vcs-svn/
+d0 | main                     | region_enter |     |  0.016446 |           | dir        | ....label:read_recursive xdiff/
+d0 | main                     | region_leave |     |  0.016522 |  0.000076 | dir        | ....label:read_recursive xdiff/
+d0 | main                     | region_enter |     |  0.016612 |           | dir        | ....label:read_recursive git-gui/
+d0 | main                     | region_enter |     |  0.016698 |           | dir        | ......label:read_recursive git-gui/po/
+d0 | main                     | region_enter |     |  0.016810 |           | dir        | ........label:read_recursive git-gui/po/glossary/
+d0 | main                     | region_leave |     |  0.016863 |  0.000053 | dir        | ........label:read_recursive git-gui/po/glossary/
+...
+d0 | main                     | region_enter |     |  0.031876 |           | dir        | ....label:read_recursive builtin/
+d0 | main                     | region_leave |     |  0.032270 |  0.000394 | dir        | ....label:read_recursive builtin/
+d0 | main                     | region_leave |     |  0.032414 |  0.017282 | dir        | ..label:read_recursive
+d0 | main                     | region_leave | r1  |  0.032454 |  0.017407 | status     | label:untracked
+...
+d0 | main                     | exit         |     |  0.034279 |           |            | code:0
+d0 | main                     | atexit       |     |  0.034322 |           |            | code:0
+----------------
++
+Trace2 regions are similar to the existing trace_performance_enter()
+and trace_performance_leave() routines, but are thread safe and
+maintain per-thread stacks of timers.
+
+Data Messages::
+
+	Data messages added to a region.
++
+----------------
+int read_index_from(struct index_state *istate, const char *path,
+	const char *gitdir)
+{
+	trace2_region_enter_printf("index", "do_read_index", the_repository, "%s", path);
+
+	...
+
+	trace2_data_intmax("index", the_repository, "read/version", istate->version);
+	trace2_data_intmax("index", the_repository, "read/cache_nr", istate->cache_nr);
+
+	trace2_region_leave_printf("index", "do_read_index", the_repository, "%s", path);
+}
+----------------
++
+This example shows that the index contained 3552 entries.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+d0 | main                     | version      |     |           |           |            | 2.20.1.156.gf9916ae094.dirty
+d0 | main                     | start        |     |  0.001173 |           |            | git status
+d0 | main                     | def_repo     | r1  |           |           |            | worktree:/Users/jeffhost/work/gfw
+d0 | main                     | cmd_name     |     |           |           |            | status (status)
+d0 | main                     | region_enter | r1  |  0.001791 |           | index      | label:do_read_index .git/index
+d0 | main                     | data         | r1  |  0.002494 |  0.000703 | index      | ..read/version:2
+d0 | main                     | data         | r1  |  0.002520 |  0.000729 | index      | ..read/cache_nr:3552
+d0 | main                     | region_leave | r1  |  0.002539 |  0.000748 | index      | label:do_read_index .git/index
+...
+----------------
+
+Thread Events::
+
+	Thread messages added to a thread-proc.
++
+For example, the multithreaded preload-index code can be
+instrumented with a region around the thread pool and then
+per-thread start and exit events within the threadproc.
++
+----------------
+static void *preload_thread(void *_data)
+{
+	// start the per-thread clock and emit a message.
+	trace2_thread_start("preload_thread");
+
+	// report which chunk of the array this thread was assigned.
+	trace2_data_intmax("index", the_repository, "offset", p->offset);
+	trace2_data_intmax("index", the_repository, "count", nr);
+
+	do {
+	    ...
+	} while (--nr > 0);
+	...
+
+	// report elapsed time taken by this thread.
+	trace2_thread_exit();
+	return NULL;
+}
+
+void preload_index(struct index_state *index,
+	const struct pathspec *pathspec,
+	unsigned int refresh_flags)
+{
+	trace2_region_enter("index", "preload", the_repository);
+
+	for (i = 0; i < threads; i++) {
+	    ... /* create thread */
+	}
+
+	for (i = 0; i < threads; i++) {
+	    ... /* join thread */
+	}
+
+	trace2_region_leave("index", "preload", the_repository);
+}
+----------------
++
+In this example preload_index() was executed by the `main` thread
+and started the `preload` region.  Seven threads, named
+`th01:preload_thread` through `th07:preload_thread`, were started.
+Events from each thread are atomically appended to the shared target
+stream as they occur so they may appear in random order with respect
+other threads. Finally, the main thread waits for the threads to
+finish and leaves the region.
++
+Data events are tagged with the active thread name.  They are used
+to report the per-thread parameters.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+...
+d0 | main                     | region_enter | r1  |  0.002595 |           | index      | label:preload
+d0 | th01:preload_thread      | thread_start |     |  0.002699 |           |            |
+d0 | th02:preload_thread      | thread_start |     |  0.002721 |           |            |
+d0 | th01:preload_thread      | data         | r1  |  0.002736 |  0.000037 | index      | offset:0
+d0 | th02:preload_thread      | data         | r1  |  0.002751 |  0.000030 | index      | offset:2032
+d0 | th03:preload_thread      | thread_start |     |  0.002711 |           |            |
+d0 | th06:preload_thread      | thread_start |     |  0.002739 |           |            |
+d0 | th01:preload_thread      | data         | r1  |  0.002766 |  0.000067 | index      | count:508
+d0 | th06:preload_thread      | data         | r1  |  0.002856 |  0.000117 | index      | offset:2540
+d0 | th03:preload_thread      | data         | r1  |  0.002824 |  0.000113 | index      | offset:1016
+d0 | th04:preload_thread      | thread_start |     |  0.002710 |           |            |
+d0 | th02:preload_thread      | data         | r1  |  0.002779 |  0.000058 | index      | count:508
+d0 | th06:preload_thread      | data         | r1  |  0.002966 |  0.000227 | index      | count:508
+d0 | th07:preload_thread      | thread_start |     |  0.002741 |           |            |
+d0 | th07:preload_thread      | data         | r1  |  0.003017 |  0.000276 | index      | offset:3048
+d0 | th05:preload_thread      | thread_start |     |  0.002712 |           |            |
+d0 | th05:preload_thread      | data         | r1  |  0.003067 |  0.000355 | index      | offset:1524
+d0 | th05:preload_thread      | data         | r1  |  0.003090 |  0.000378 | index      | count:508
+d0 | th07:preload_thread      | data         | r1  |  0.003037 |  0.000296 | index      | count:504
+d0 | th03:preload_thread      | data         | r1  |  0.002971 |  0.000260 | index      | count:508
+d0 | th04:preload_thread      | data         | r1  |  0.002983 |  0.000273 | index      | offset:508
+d0 | th04:preload_thread      | data         | r1  |  0.007311 |  0.004601 | index      | count:508
+d0 | th05:preload_thread      | thread_exit  |     |  0.008781 |  0.006069 |            |
+d0 | th01:preload_thread      | thread_exit  |     |  0.009561 |  0.006862 |            |
+d0 | th03:preload_thread      | thread_exit  |     |  0.009742 |  0.007031 |            |
+d0 | th06:preload_thread      | thread_exit  |     |  0.009820 |  0.007081 |            |
+d0 | th02:preload_thread      | thread_exit  |     |  0.010274 |  0.007553 |            |
+d0 | th07:preload_thread      | thread_exit  |     |  0.010477 |  0.007736 |            |
+d0 | th04:preload_thread      | thread_exit  |     |  0.011657 |  0.008947 |            |
+d0 | main                     | region_leave | r1  |  0.011717 |  0.009122 | index      | label:preload
+...
+d0 | main                     | exit         |     |  0.029996 |           |            | code:0
+d0 | main                     | atexit       |     |  0.030027 |           |            | code:0
+----------------
++
+In this example, the preload region took 0.009122 seconds.  The 7 threads
+took between 0.006069 and 0.008947 seconds to work on their portion of
+the index.  Thread "th01" worked on 508 items at offset 0.  Thread "th02"
+worked on 508 items at offset 2032.  Thread "th04" worked on 508 items
+at offset 508.
++
+This example also shows that thread names are assigned in a racy manner
+as each thread starts and allocates TLS storage.
+
+== Future Work
+
+=== Relationship to the Existing Trace Api (api-trace.txt)
+
+There are a few issues to resolve before we can completely
+switch to Trace2.
+
+* Updating existing tests that assume GIT_TRACE format messages.
+
+* How to best handle custom GIT_TRACE_<key> messages?
+
+** The GIT_TRACE_<key> mechanism allows each <key> to write to a
+different file (in addition to just stderr).
+
+** Do we want to maintain that ability or simply write to the existing
+Trace2 targets (and convert <key> to a "category").
diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
new file mode 100644
index 0000000000..f8c18a0f7a
--- /dev/null
+++ b/Documentation/technical/bitmap-format.txt
@@ -0,0 +1,164 @@
+GIT bitmap v1 format
+====================
+
+	- A header appears at the beginning:
+
+		4-byte signature: {'B', 'I', 'T', 'M'}
+
+		2-byte version number (network byte order)
+			The current implementation only supports version 1
+			of the bitmap index (the same one as JGit).
+
+		2-byte flags (network byte order)
+
+			The following flags are supported:
+
+			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
+			This flag must always be present. It implies that the bitmap
+			index has been generated for a packfile with full closure
+			(i.e. where every single object in the packfile can find
+			 its parent links inside the same packfile). This is a
+			requirement for the bitmap index format, also present in JGit,
+			that greatly reduces the complexity of the implementation.
+
+			- BITMAP_OPT_HASH_CACHE (0x4)
+			If present, the end of the bitmap file contains
+			`N` 32-bit name-hash values, one per object in the
+			pack. The format and meaning of the name-hash is
+			described below.
+
+		4-byte entry count (network byte order)
+
+			The total count of entries (bitmapped commits) in this bitmap index.
+
+		20-byte checksum
+
+			The SHA1 checksum of the pack this bitmap index belongs to.
+
+	- 4 EWAH bitmaps that act as type indexes
+
+		Type indexes are serialized after the hash cache in the shape
+		of four EWAH bitmaps stored consecutively (see Appendix A for
+		the serialization format of an EWAH bitmap).
+
+		There is a bitmap for each Git object type, stored in the following
+		order:
+
+			- Commits
+			- Trees
+			- Blobs
+			- Tags
+
+		In each bitmap, the `n`th bit is set to true if the `n`th object
+		in the packfile is of that type.
+
+		The obvious consequence is that the OR of all 4 bitmaps will result
+		in a full set (all bits set), and the AND of all 4 bitmaps will
+		result in an empty bitmap (no bits set).
+
+	- N entries with compressed bitmaps, one for each indexed commit
+
+		Where `N` is the total amount of entries in this bitmap index.
+		Each entry contains the following:
+
+		- 4-byte object position (network byte order)
+			The position **in the index for the packfile** where the
+			bitmap for this commit is found.
+
+		- 1-byte XOR-offset
+			The xor offset used to compress this bitmap. For an entry
+			in position `x`, a XOR offset of `y` means that the actual
+			bitmap representing this commit is composed by XORing the
+			bitmap for this entry with the bitmap in entry `x-y` (i.e.
+			the bitmap `y` entries before this one).
+
+			Note that this compression can be recursive. In order to
+			XOR this entry with a previous one, the previous entry needs
+			to be decompressed first, and so on.
+
+			The hard-limit for this offset is 160 (an entry can only be
+			xor'ed against one of the 160 entries preceding it). This
+			number is always positive, and hence entries are always xor'ed
+			with **previous** bitmaps, not bitmaps that will come afterwards
+			in the index.
+
+		- 1-byte flags for this bitmap
+			At the moment the only available flag is `0x1`, which hints
+			that this bitmap can be re-used when rebuilding bitmap indexes
+			for the repository.
+
+		- The compressed bitmap itself, see Appendix A.
+
+== Appendix A: Serialization format for an EWAH bitmap
+
+Ewah bitmaps are serialized in the same protocol as the JAVAEWAH
+library, making them backwards compatible with the JGit
+implementation:
+
+	- 4-byte number of bits of the resulting UNCOMPRESSED bitmap
+
+	- 4-byte number of words of the COMPRESSED bitmap, when stored
+
+	- N x 8-byte words, as specified by the previous field
+
+		This is the actual content of the compressed bitmap.
+
+	- 4-byte position of the current RLW for the compressed
+		bitmap
+
+All words are stored in network byte order for their corresponding
+sizes.
+
+The compressed bitmap is stored in a form of run-length encoding, as
+follows.  It consists of a concatenation of an arbitrary number of
+chunks.  Each chunk consists of one or more 64-bit words
+
+     H  L_1  L_2  L_3 .... L_M
+
+H is called RLW (run length word).  It consists of (from lower to higher
+order bits):
+
+     - 1 bit: the repeated bit B
+
+     - 32 bits: repetition count K (unsigned)
+
+     - 31 bits: literal word count M (unsigned)
+
+The bitstream represented by the above chunk is then:
+
+     - K repetitions of B
+
+     - The bits stored in `L_1` through `L_M`.  Within a word, bits at
+       lower order come earlier in the stream than those at higher
+       order.
+
+The next word after `L_M` (if any) must again be a RLW, for the next
+chunk.  For efficient appending to the bitstream, the EWAH stores a
+pointer to the last RLW in the stream.
+
+
+== Appendix B: Optional Bitmap Sections
+
+These sections may or may not be present in the `.bitmap` file; their
+presence is indicated by the header flags section described above.
+
+Name-hash cache
+---------------
+
+If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
+a cache of 32-bit values, one per object in the pack. The value at
+position `i` is the hash of the pathname at which the `i`th object
+(counting in index order) in the pack can be found.  This can be fed
+into the delta heuristics to compare objects with similar pathnames.
+
+The hash algorithm used is:
+
+    hash = 0;
+    while ((c = *name++))
+	    if (!isspace(c))
+		    hash = (hash >> 2) + (c << 24);
+
+Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
+If implementations want to choose a different hashing scheme, they are
+free to do so, but MUST allocate a new header flag (because comparing
+hashes made under two different schemes would be pointless).
diff --git a/Documentation/technical/bundle-format.txt b/Documentation/technical/bundle-format.txt
new file mode 100644
index 0000000000..bac558d049
--- /dev/null
+++ b/Documentation/technical/bundle-format.txt
@@ -0,0 +1,76 @@
+= Git bundle v2 format
+
+The Git bundle format is a format that represents both refs and Git objects.
+
+== Format
+
+We will use ABNF notation to define the Git bundle format. See
+protocol-common.txt for the details.
+
+A v2 bundle looks like this:
+
+----
+bundle    = signature *prerequisite *reference LF pack
+signature = "# v2 git bundle" LF
+
+prerequisite = "-" obj-id SP comment LF
+comment      = *CHAR
+reference    = obj-id SP refname LF
+
+pack         = ... ; packfile
+----
+
+A v3 bundle looks like this:
+
+----
+bundle    = signature *capability *prerequisite *reference LF pack
+signature = "# v3 git bundle" LF
+
+capability   = "@" key ["=" value] LF
+prerequisite = "-" obj-id SP comment LF
+comment      = *CHAR
+reference    = obj-id SP refname LF
+key          = 1*(ALPHA / DIGIT / "-")
+value        = *(%01-09 / %0b-FF)
+
+pack         = ... ; packfile
+----
+
+== Semantics
+
+A Git bundle consists of several parts.
+
+* "Capabilities", which are only in the v3 format, indicate functionality that
+	the bundle requires to be read properly.
+
+* "Prerequisites" lists the objects that are NOT included in the bundle and the
+  reader of the bundle MUST already have, in order to use the data in the
+  bundle. The objects stored in the bundle may refer to prerequisite objects and
+  anything reachable from them (e.g. a tree object in the bundle can reference
+  a blob that is reachable from a prerequisite) and/or expressed as a delta
+  against prerequisite objects.
+
+* "References" record the tips of the history graph, iow, what the reader of the
+  bundle CAN "git fetch" from it.
+
+* "Pack" is the pack data stream "git fetch" would send, if you fetch from a
+  repository that has the references recorded in the "References" above into a
+  repository that has references pointing at the objects listed in
+  "Prerequisites" above.
+
+In the bundle format, there can be a comment following a prerequisite obj-id.
+This is a comment and it has no specific meaning. The writer of the bundle MAY
+put any string here. The reader of the bundle MUST ignore the comment.
+
+=== Note on the shallow clone and a Git bundle
+
+Note that the prerequisites does not represent a shallow-clone boundary. The
+semantics of the prerequisites and the shallow-clone boundaries are different,
+and the Git bundle v2 format cannot represent a shallow clone repository.
+
+== Capabilities
+
+Because there is no opportunity for negotiation, unknown capabilities cause 'git
+bundle' to abort.  The only known capability is `object-format`, which specifies
+the hash algorithm in use, and can take the same values as the
+`extensions.objectFormat` configuration value.
diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt
new file mode 100644
index 0000000000..593614fced
--- /dev/null
+++ b/Documentation/technical/chunk-format.txt
@@ -0,0 +1,116 @@
+Chunk-based file formats
+========================
+
+Some file formats in Git use a common concept of "chunks" to describe
+sections of the file. This allows structured access to a large file by
+scanning a small "table of contents" for the remaining data. This common
+format is used by the `commit-graph` and `multi-pack-index` files. See
+link:technical/pack-format.html[the `multi-pack-index` format] and
+link:technical/commit-graph-format.html[the `commit-graph` format] for
+how they use the chunks to describe structured data.
+
+A chunk-based file format begins with some header information custom to
+that format. That header should include enough information to identify
+the file type, format version, and number of chunks in the file. From this
+information, that file can determine the start of the chunk-based region.
+
+The chunk-based region starts with a table of contents describing where
+each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
+where C is the number of chunks. Consider the following table:
+
+  | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
+  |--------------------|------------------------|
+  | ID[0]              | OFFSET[0]              |
+  | ...                | ...                    |
+  | ID[C]              | OFFSET[C]              |
+  | 0x0000             | OFFSET[C+1]            |
+
+Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
+Each integer is stored in network-byte order.
+
+The chunk identifier `ID[i]` is a label for the data stored within this
+fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
+size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
+and `OFFSET[i]`. This requires that the chunk data appears contiguously
+in the same order as the table of contents.
+
+The final entry in the table of contents must be four zero bytes. This
+confirms that the table of contents is ending and provides the offset for
+the end of the chunk-based data.
+
+Note: The chunk-based format expects that the file contains _at least_ a
+trailing hash after `OFFSET[C+1]`.
+
+Functions for working with chunk-based file formats are declared in
+`chunk-format.h`. Using these methods provide extra checks that assist
+developers when creating new file formats.
+
+Writing chunk-based file formats
+--------------------------------
+
+To write a chunk-based file format, create a `struct chunkfile` by
+calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
+caller is responsible for opening the `hashfile` and writing header
+information so the file format is identifiable before the chunk-based
+format begins.
+
+Then, call `add_chunk()` for each chunk that is intended for write. This
+populates the `chunkfile` with information about the order and size of
+each chunk to write. Provide a `chunk_write_fn` function pointer to
+perform the write of the chunk data upon request.
+
+Call `write_chunkfile()` to write the table of contents to the `hashfile`
+followed by each of the chunks. This will verify that each chunk wrote
+the expected amount of data so the table of contents is correct.
+
+Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
+caller is responsible for finalizing the `hashfile` by writing the trailing
+hash and closing the file.
+
+Reading chunk-based file formats
+--------------------------------
+
+To read a chunk-based file format, the file must be opened as a
+memory-mapped region. The chunk-format API expects that the entire file
+is mapped as a contiguous memory region.
+
+Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
+
+After reading the header information from the beginning of the file,
+including the chunk count, call `read_table_of_contents()` to populate
+the `struct chunkfile` with the list of chunks, their offsets, and their
+sizes.
+
+Extract the data information for each chunk using `pair_chunk()` or
+`read_chunk()`:
+
+* `pair_chunk()` assigns a given pointer with the location inside the
+  memory-mapped file corresponding to that chunk's offset. If the chunk
+  does not exist, then the pointer is not modified.
+
+* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
+  with the appropriate initial pointer and size information. The function
+  is not called if the chunk does not exist. Use this method to read chunks
+  if you need to perform immediate parsing or if you need to execute logic
+  based on the size of the chunk.
+
+After calling these methods, call `free_chunkfile()` to clear the
+`struct chunkfile` data. This will not close the memory-mapped region.
+Callers are expected to own that data for the timeframe the pointers into
+the region are needed.
+
+Examples
+--------
+
+These file formats use the chunk-format API, and can be used as examples
+for future formats:
+
+* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
+  in `commit-graph.c` for how the chunk-format API is used to write and
+  parse the commit-graph file format documented in
+  link:technical/commit-graph-format.html[the commit-graph file format].
+
+* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
+  in `midx.c` for how the chunk-format API is used to write and
+  parse the multi-pack-index file format documented in
+  link:technical/pack-format.html[the multi-pack-index file format].
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 0000000000..87971c27dd
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,158 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit.
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+- The Bloom filter of the commit carrying the paths that were changed between
+  the commit and its first parent, if requested.
+
+These positional references are stored as unsigned 32-bit integers
+corresponding to the array position within the list of commit OIDs. Due
+to some special constants we use to track parents, we can store at most
+(1 << 30) + (1 << 29) + (1 << 28) - 1 (around 1.8 billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All multi-byte numbers are in network byte order.
+
+HEADER:
+
+  4-byte signature:
+      The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+      Currently, the only valid version is 1.
+
+  1-byte Hash Version
+      We infer the hash length (H) from this value:
+	1 => SHA-1
+	2 => SHA-256
+      If the hash type does not match the repository's hash algorithm, the
+      commit-graph file should be ignored with a warning presented to the
+      user.
+
+  1-byte number (C) of "chunks"
+
+  1-byte number (B) of base commit-graphs
+      We infer the length (H*B) of the Base Graphs chunk
+      from this value.
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+      First 4 bytes describe the chunk id. Value 0 is a terminating label.
+      Other 8 bytes provide the byte-offset in current file for chunk to
+      start. (Chunks are ordered contiguously in the file, so you can infer
+      the length using the next chunk position if necessary.) Each chunk
+      ID appears at most once.
+
+  The CHUNK LOOKUP matches the table of contents from
+  link:technical/chunk-format.html[the chunk-based file format].
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+      The ith entry, F[i], stores the number of OIDs with first
+      byte at most i. Thus F[255] stores the total
+      number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+      The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)
+    * The first H bytes are for the OID of the root tree.
+    * The next 8 bytes are for the positions of the first two parents
+      of the ith commit. Stores value 0x70000000 if no parent in that
+      position. If there are more than two parents, the second value
+      has its most-significant bit on and the other bits store an array
+      position into the Extra Edge List chunk.
+    * The next 8 bytes store the topological level (generation number v1)
+      of the commit and
+      the commit time in seconds since EPOCH. The generation number
+      uses the higher 30 bits of the first 4 bytes, while the commit
+      time uses the 32 bits of the second 4 bytes, along with the lowest
+      2 bits of the lowest byte, storing the 33rd and 34th bit of the
+      commit time.
+
+  Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional]
+    * This list of 4-byte values store corrected commit date offsets for the
+      commits, arranged in the same order as commit data chunk.
+    * If the corrected commit date offset cannot be stored within 31 bits,
+      the value has its most-significant bit on and the other bits store
+      the position of corrected commit date into the Generation Data Overflow
+      chunk.
+    * Generation Data chunk is present only when commit-graph file is written
+      by compatible versions of Git and in case of split commit-graph chains,
+      the topmost layer also has Generation Data chunk.
+
+  Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
+    * This list of 8-byte values stores the corrected commit date offsets
+      for commits with corrected commit date offsets that cannot be
+      stored within 31 bits.
+    * Generation Data Overflow chunk is present only when Generation Data
+      chunk is present and atleast one corrected commit date offset cannot
+      be stored within 31 bits.
+
+  Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+      This list of 4-byte values store the second through nth parents for
+      all octopus merges. The second parent value in the commit data stores
+      an array position within this list along with the most-significant bit
+      on. Starting at that array position, iterate through this list of commit
+      positions for the parents until reaching a value with the most-significant
+      bit on. The other bits correspond to the position of the last parent.
+
+  Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional]
+    * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters
+      from commit 0 to commit i (inclusive) in lexicographic order. The Bloom
+      filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header
+      length), where BIDX[-1] is 0.
+    * The BIDX chunk is ignored if the BDAT chunk is not present.
+
+  Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
+    * It starts with header consisting of three unsigned 32-bit integers:
+      - Version of the hash algorithm being used. We currently only support
+	value 1 which corresponds to the 32-bit version of the murmur3 hash
+	implemented exactly as described in
+	https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double
+	hashing technique using seed values 0x293ae76f and 0x7e646e2 as
+	described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters
+	in Probabilistic Verification"
+      - The number of times a path is hashed and hence the number of bit positions
+	      that cumulatively determine whether a file is present in the commit.
+      - The minimum number of bits 'b' per entry in the Bloom filter. If the filter
+	      contains 'n' entries, then the filter size is the minimum number of 64-bit
+	      words that contain n*b bits.
+    * The rest of the chunk is the concatenation of all the computed Bloom
+      filters for the commits in lexicographic order.
+    * Note: Commits with no changes or more than 512 changes have Bloom filters
+      of length one, with either all bits set to zero or one respectively.
+    * The BDAT chunk is present if and only if BIDX is present.
+
+  Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
+      This list of H-byte hashes describe a set of B commit-graph files that
+      form a commit-graph chain. The graph position for the ith commit in this
+      file's OID Lookup chunk is equal to i plus the number of commits in all
+      base graphs.  If B is non-zero, this chunk must exist.
+
+TRAILER:
+
+	H-byte HASH-checksum of all of the above.
diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
new file mode 100644
index 0000000000..f05e7bda1a
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,401 @@
+Git Commit Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit-graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit-graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in
+lexicographic order, we can identify an integer position for each commit
+and refer to the parents of a commit using those integer positions. We
+use binary search to find initial commits and then use the integer
+positions for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+There are two definitions of generation number:
+1. Corrected committer dates (generation number v2)
+2. Topological levels (generation nummber v1)
+
+Define "corrected committer date" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has corrected committer date
+    equal to its committer date.
+
+ * A commit with at least one parent has corrected committer date equal to
+    the maximum of its commiter date and one more than the largest corrected
+    committer date among its parents.
+
+ * As a special case, a root commit with timestamp zero has corrected commit
+    date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO
+    (that is, an uncomputed corrected commit date).
+
+Define the "topological level" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has topological level of one.
+
+ * A commit with at least one parent has topological level one more than
+   the largest topological level among its parents.
+
+Equivalently, the topological level of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+    If A and B are commits with generation numbers N and M, respectively,
+    and N <= M, then A cannot reach B. That is, we know without searching
+    that B is not an ancestor of A because it is further from a root commit
+    than A.
+
+    Conversely, when checking if A is an ancestor of B, then we only need
+    to walk commits until all commits on the walk boundary have generation
+    number at most N. If we walk commits using a priority queue seeded by
+    generation numbers, then we always expand the boundary commit with highest
+    generation number and can easily detect the stopping condition.
+
+The property applies to both versions of generation number, that is both
+corrected committer dates and topological levels.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+    If A and B are commits with commit time X and Y, respectively, and
+    X < Y, then A _probably_ cannot reach B.
+
+In absence of corrected commit dates (for example, old versions of Git or
+mixed generation graph chains),
+this heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+We use the macro GENERATION_NUMBER_INFINITY to mark commits not
+in the commit-graph file. If a commit-graph file was written by a version
+of Git that did not compute generation numbers, then those commits will
+have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
+
+Since the commit-graph file is closed under reachability, we can guarantee
+the following weaker condition on all commits:
+
+    If A and B are commits with generation numbers N and M, respectively,
+    and N < M, then A cannot reach B.
+
+Note how the strict inequality differs from the inequality when we have
+fully-computed generation numbers. Using strict inequality may result in
+walking a few extra commits, but the simplicity in dealing with commits
+with generation number *_INFINITY or *_ZERO is valuable.
+
+We use the macro GENERATION_NUMBER_V1_MAX = 0x3FFFFFFF for commits whose
+topological levels (generation number v1) are computed to be at least
+this value. We limit at this value since it is the largest value that
+can be stored in the commit-graph file using the 30 bits available
+to topological levels. This presents another case where a commit can
+have generation number equal to that of a parent.
+
+Design Details
+--------------
+
+- The commit-graph file is stored in a file named 'commit-graph' in the
+  .git/objects/info directory. This could be stored in the info directory
+  of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+  so a future change of hash algorithm does not require a change in format.
+
+- Commit grafts and replace objects can change the shape of the commit
+  history. The latter can also be enabled/disabled on the fly using
+  `--no-replace-objects`. This leads to difficultly storing both possible
+  interpretations of a commit id, especially when computing generation
+  numbers. The commit-graph will not be read or written when
+  replace-objects or grafts are present.
+
+- Shallow clones create grafts of commits by dropping their parents. This
+  leads the commit-graph to think those commits have generation number 1.
+  If and when those commits are made unshallow, those generation numbers
+  become invalid. Since shallow clones are intended to restrict the commit
+  history to a very small set of commits, the commit-graph feature is less
+  helpful for these clones, anyway. The commit-graph will not be read or
+  written when shallow commits are present.
+
+Commit Graphs Chains
+--------------------
+
+Typically, repos grow with near-constant velocity (commits per day). Over time,
+the number of commits added by a fetch operation is much smaller than the
+number of commits in the full history. By creating a "chain" of commit-graphs,
+we enable fast writes of new commit data without rewriting the entire commit
+history -- at least, most of the time.
+
+## File Layout
+
+A commit-graph chain uses multiple files, and we use a fixed naming convention
+to organize these files. Each commit-graph file has a name
+`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex-
+valued hash stored in the footer of that file (which is a hash of the file's
+contents before that hash). For a chain of commit-graph files, a plain-text
+file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the
+hashes for the files in order from "lowest" to "highest".
+
+For example, if the `commit-graph-chain` file contains the lines
+
+```
+	{hash0}
+	{hash1}
+	{hash2}
+```
+
+then the commit-graph chain looks like the following diagram:
+
+ +-----------------------+
+ |  graph-{hash2}.graph  |
+ +-----------------------+
+	  |
+ +-----------------------+
+ |                       |
+ |  graph-{hash1}.graph  |
+ |                       |
+ +-----------------------+
+	  |
+ +-----------------------+
+ |                       |
+ |                       |
+ |                       |
+ |  graph-{hash0}.graph  |
+ |                       |
+ |                       |
+ |                       |
+ +-----------------------+
+
+Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of
+commits in `graph-{hash1}.graph`, and X2 be the number of commits in
+`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`,
+then we interpret this as being the commit in position (X0 + X1 + i), and that
+will be used as its "graph position". The commits in `graph-{hash2}.graph` use these
+positions to refer to their parents, which may be in `graph-{hash1}.graph` or
+`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking
+its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
+X2).
+
+Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data
+specifying the hashes of all files in the lower layers. In the above example,
+`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
+`{hash0}` and `{hash1}`.
+
+## Merging commit-graph files
+
+If we only added a new commit-graph file on every write, we would run into a
+linear search problem through many commit-graph files.  Instead, we use a merge
+strategy to decide when the stack should collapse some number of levels.
+
+The diagram below shows such a collapse. As a set of new commits are added, it
+is determined by the merge strategy that the files should collapse to
+`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and
+the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
+file.
+
+			    +---------------------+
+			    |                     |
+			    |    (new commits)    |
+			    |                     |
+			    +---------------------+
+			    |                     |
+ +-----------------------+  +---------------------+
+ |  graph-{hash2}        |->|                     |
+ +-----------------------+  +---------------------+
+	  |                 |                     |
+ +-----------------------+  +---------------------+
+ |                       |  |                     |
+ |  graph-{hash1}        |->|                     |
+ |                       |  |                     |
+ +-----------------------+  +---------------------+
+	  |                  tmp_graphXXX
+ +-----------------------+
+ |                       |
+ |                       |
+ |                       |
+ |  graph-{hash0}        |
+ |                       |
+ |                       |
+ |                       |
+ +-----------------------+
+
+During this process, the commits to write are combined, sorted and we write the
+contents to a temporary file, all while holding a `commit-graph-chain.lock`
+lock-file.  When the file is flushed, we rename it to `graph-{hash3}`
+according to the computed `{hash3}`. Finally, we write the new chain data to
+`commit-graph-chain.lock`:
+
+```
+	{hash3}
+	{hash0}
+```
+
+We then close the lock-file.
+
+## Merge Strategy
+
+When writing a set of commits that do not exist in the commit-graph stack of
+height N, we default to creating a new file at level N + 1. We then decide to
+merge with the Nth level if one of two conditions hold:
+
+  1. `--size-multiple=<X>` is specified or X = 2, and the number of commits in
+     level N is less than X times the number of commits in level N + 1.
+
+  2. `--max-commits=<C>` is specified with non-zero C and the number of commits
+     in level N + 1 is more than C commits.
+
+This decision cascades down the levels: when we merge a level we create a new
+set of commits that then compares to the next level.
+
+The first condition bounds the number of levels to be logarithmic in the total
+number of commits.  The second condition bounds the total number of commits in
+a `graph-{hashN}` file and not in the `commit-graph` file, preventing
+significant performance issues when the stack merges and another process only
+partially reads the previous stack.
+
+The merge strategy values (2 for the size multiple, 64,000 for the maximum
+number of commits) could be extracted into config settings for full
+flexibility.
+
+## Handling Mixed Generation Number Chains
+
+With the introduction of generation number v2 and generation data chunk, the
+following scenario is possible:
+
+1. "New" Git writes a commit-graph with the corrected commit dates.
+2. "Old" Git writes a split commit-graph on top without corrected commit dates.
+
+A naive approach of using the newest available generation number from
+each layer would lead to violated expectations: the lower layer would
+use corrected commit dates which are much larger than the topological
+levels of the higher layer. For this reason, Git inspects the topmost
+layer to see if the layer is missing corrected commit dates. In such a case
+Git only uses topological level for generation numbers.
+
+When writing a new layer in split commit-graph, we write corrected commit
+dates if the topmost layer has corrected commit dates written. This
+guarantees that if a layer has corrected commit dates, all lower layers
+must have corrected commit dates as well.
+
+When merging layers, we do not consider whether the merged layers had corrected
+commit dates. Instead, the new layer will have corrected commit dates if the
+layer below the new layer has corrected commit dates.
+
+While writing or merging layers, if the new layer is the only layer, it will
+have corrected commit dates when written by compatible versions of Git. Thus,
+rewriting split commit-graph as a single file (`--split=replace`) creates a
+single layer with corrected commit dates.
+
+## Deleting graph-{hash} files
+
+After a new tip file is written, some `graph-{hash}` files may no longer
+be part of a chain. It is important to remove these files from disk, eventually.
+The main reason to delay removal is that another process could read the
+`commit-graph-chain` file before it is rewritten, but then look for the
+`graph-{hash}` files after they are deleted.
+
+To allow holding old split commit-graphs for a while after they are unreferenced,
+we update the modified times of the files when they become unreferenced. Then,
+we scan the `$OBJDIR/info/commit-graphs/` directory for `graph-{hash}`
+files whose modified times are older than a given expiry window. This window
+defaults to zero, but can be changed using command-line arguments or a config
+setting.
+
+## Chains across multiple object directories
+
+In a repo with alternates, we look for the `commit-graph-chain` file starting
+in the local object directory and then in each alternate. The first file that
+exists defines our chain. As we look for the `graph-{hash}` files for
+each `{hash}` in the chain file, we follow the same pattern for the host
+directories.
+
+This allows commit-graphs to be split across multiple forks in a fork network.
+The typical case is a large "base" repo with many smaller forks.
+
+As the base repo advances, it will likely update and merge its commit-graph
+chain more frequently than the forks. If a fork updates their commit-graph after
+the base repo, then it should "reparent" the commit-graph chain onto the new
+chain in the base repo. When reading each `graph-{hash}` file, we track
+the object directory containing it. During a write of a new commit-graph file,
+we check for any changes in the source object directory and read the
+`commit-graph-chain` file for that source and create a new file based on those
+files. During this "reparent" operation, we necessarily need to collapse all
+levels in the fork, as all of the files are invalid against the new base file.
+
+It is crucial to be careful when cleaning up "unreferenced" `graph-{hash}.graph`
+files in this scenario. It falls to the user to define the proper settings for
+their custom environment:
+
+ 1. When merging levels in the base repo, the unreferenced files may still be
+    referenced by chains from fork repos.
+
+ 2. The expiry time should be set to a length of time such that every fork has
+    time to recompute their commit-graph chain to "reparent" onto the new base
+    file(s).
+
+ 3. If the commit-graph chain is updated in the base, the fork will not have
+    access to the new chain until its chain is updated to reference those files.
+    (This may change in the future [5].)
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+    Chromium work item for: Serialized Commit Graph
+
+[1] https://lore.kernel.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+    An abandoned patch that introduced generation numbers.
+
+[2] https://lore.kernel.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+    Discussion about generation numbers on commits and how they interact
+    with fsck.
+
+[3] https://lore.kernel.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
+    More discussion about generation numbers and not storing them inside
+    commit objects. A valuable quote:
+
+    "I think we should be moving more in the direction of keeping
+     repo-local caches for optimizations. Reachability bitmaps have been
+     a big performance win. I think we should be doing the same with our
+     properties of commits. Not just generation numbers, but making it
+     cheap to access the graph structure without zlib-inflating whole
+     commit objects (i.e., packv4 or something like the "metapacks" I
+     proposed a few years ago)."
+
+[4] https://lore.kernel.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+    A patch to remove the ahead-behind calculation from 'status'.
+
+[5] https://lore.kernel.org/git/f27db281-abad-5043-6d71-cbb083b1c877@gmail.com/
+    A discussion of a "two-dimensional graph position" that can allow reading
+    multiple commit-graph chains at the same time.
diff --git a/Documentation/technical/directory-rename-detection.txt b/Documentation/technical/directory-rename-detection.txt
new file mode 100644
index 0000000000..49b83ef3cc
--- /dev/null
+++ b/Documentation/technical/directory-rename-detection.txt
@@ -0,0 +1,116 @@
+Directory rename detection
+==========================
+
+Rename detection logic in diffcore-rename that checks for renames of
+individual files is aggregated and analyzed in merge-recursive for cases
+where combinations of renames indicate that a full directory has been
+renamed.
+
+Scope of abilities
+------------------
+
+It is perhaps easiest to start with an example:
+
+  * When all of x/a, x/b and x/c have moved to z/a, z/b and z/c, it is
+    likely that x/d added in the meantime would also want to move to z/d by
+    taking the hint that the entire directory 'x' moved to 'z'.
+
+More interesting possibilities exist, though, such as:
+
+  * one side of history renames x -> z, and the other renames some file to
+    x/e, causing the need for the merge to do a transitive rename so that
+    the rename ends up at z/e.
+
+  * one side of history renames x -> z, but also renames all files within x.
+    For example, x/a -> z/alpha, x/b -> z/bravo, etc.
+
+  * both 'x' and 'y' being merged into a single directory 'z', with a
+    directory rename being detected for both x->z and y->z.
+
+  * not all files in a directory being renamed to the same location;
+    i.e. perhaps most the files in 'x' are now found under 'z', but a few
+    are found under 'w'.
+
+  * a directory being renamed, which also contained a subdirectory that was
+    renamed to some entirely different location.  (And perhaps the inner
+    directory itself contained inner directories that were renamed to yet
+    other locations).
+
+  * combinations of the above; see t/t6423-merge-rename-directories.sh for
+    various interesting cases.
+
+Limitations -- applicability of directory renames
+-------------------------------------------------
+
+In order to prevent edge and corner cases resulting in either conflicts
+that cannot be represented in the index or which might be too complex for
+users to try to understand and resolve, a couple basic rules limit when
+directory rename detection applies:
+
+  1) If a given directory still exists on both sides of a merge, we do
+     not consider it to have been renamed.
+
+  2) If a subset of to-be-renamed files have a file or directory in the
+     way (or would be in the way of each other), "turn off" the directory
+     rename for those specific sub-paths and report the conflict to the
+     user.
+
+  3) If the other side of history did a directory rename to a path that
+     your side of history renamed away, then ignore that particular
+     rename from the other side of history for any implicit directory
+     renames (but warn the user).
+
+Limitations -- detailed rules and testcases
+-------------------------------------------
+
+t/t6423-merge-rename-directories.sh contains extensive tests and commentary
+which generate and explore the rules listed above.  It also lists a few
+additional rules:
+
+  a) If renames split a directory into two or more others, the directory
+     with the most renames, "wins".
+
+  b) Only apply implicit directory renames to directories if the other side
+     of history is the one doing the renaming.
+
+  c) Do not perform directory rename detection for directories which had no
+     new paths added to them.
+
+Limitations -- support in different commands
+--------------------------------------------
+
+Directory rename detection is supported by 'merge' and 'cherry-pick'.
+Other git commands which users might be surprised to see limited or no
+directory rename detection support in:
+
+  * diff
+
+    Folks have requested in the past that `git diff` detect directory
+    renames and somehow simplify its output.  It is not clear whether this
+    would be desirable or how the output should be simplified, so this was
+    simply not implemented.  Further, to implement this, directory rename
+    detection logic would need to move from merge-recursive to
+    diffcore-rename.
+
+  * am
+
+    git-am tries to avoid a full three way merge, instead calling
+    git-apply.  That prevents us from detecting renames at all, which may
+    defeat the directory rename detection.  There is a fallback, though; if
+    the initial git-apply fails and the user has specified the -3 option,
+    git-am will fall back to a three way merge.  However, git-am lacks the
+    necessary information to do a "real" three way merge.  Instead, it has
+    to use build_fake_ancestor() to get a merge base that is missing files
+    whose rename may have been important to detect for directory rename
+    detection to function.
+
+  * rebase
+
+    Since am-based rebases work by first generating a bunch of patches
+    (which no longer record what the original commits were and thus don't
+    have the necessary info from which we can find a real merge-base), and
+    then calling git-am, this implies that am-based rebases will not always
+    successfully detect directory renames either (see the 'am' section
+    above).  merged-based rebases (rebase -m) and cherry-pick-based rebases
+    (rebase -i) are not affected by this shortcoming, and fully support
+    directory rename detection.
diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
new file mode 100644
index 0000000000..7c1630bf83
--- /dev/null
+++ b/Documentation/technical/hash-function-transition.txt
@@ -0,0 +1,830 @@
+Git hash function transition
+============================
+
+Objective
+---------
+Migrate Git from SHA-1 to a stronger hash function.
+
+Background
+----------
+At its core, the Git version control system is a content addressable
+filesystem. It uses the SHA-1 hash function to name content. For
+example, files, directories, and revisions are referred to by hash
+values unlike in other traditional version control systems where files
+or versions are referred to via sequential numbers. The use of a hash
+function to address its content delivers a few advantages:
+
+* Integrity checking is easy. Bit flips, for example, are easily
+  detected, as the hash of corrupted content does not match its name.
+* Lookup of objects is fast.
+
+Using a cryptographically secure hash function brings additional
+advantages:
+
+* Object names can be signed and third parties can trust the hash to
+  address the signed object and all objects it references.
+* Communication using Git protocol and out of band communication
+  methods have a short reliable string that can be used to reliably
+  address stored content.
+
+Over time some flaws in SHA-1 have been discovered by security
+researchers. On 23 February 2017 the SHAttered attack
+(https://shattered.io) demonstrated a practical SHA-1 hash collision.
+
+Git v2.13.0 and later subsequently moved to a hardened SHA-1
+implementation by default, which isn't vulnerable to the SHAttered
+attack, but SHA-1 is still weak.
+
+Thus it's considered prudent to move past any variant of SHA-1
+to a new hash. There's no guarantee that future attacks on SHA-1 won't
+be published in the future, and those attacks may not have viable
+mitigations.
+
+If SHA-1 and its variants were to be truly broken, Git's hash function
+could not be considered cryptographically secure any more. This would
+impact the communication of hash values because we could not trust
+that a given hash value represented the known good version of content
+that the speaker intended.
+
+SHA-1 still possesses the other properties such as fast object lookup
+and safe error checking, but other hash functions are equally suitable
+that are believed to be cryptographically secure.
+
+Choice of Hash
+--------------
+The hash to replace the hardened SHA-1 should be stronger than SHA-1
+was: we would like it to be trustworthy and useful in practice for at
+least 10 years.
+
+Some other relevant properties:
+
+1. A 256-bit hash (long enough to match common security practice; not
+   excessively long to hurt performance and disk usage).
+
+2. High quality implementations should be widely available (e.g., in
+   OpenSSL and Apple CommonCrypto).
+
+3. The hash function's properties should match Git's needs (e.g. Git
+   requires collision and 2nd preimage resistance and does not require
+   length extension resistance).
+
+4. As a tiebreaker, the hash should be fast to compute (fortunately
+   many contenders are faster than SHA-1).
+
+There were several contenders for a successor hash to SHA-1, including
+SHA-256, SHA-512/256, SHA-256x16, K12, and BLAKE2bp-256.
+
+In late 2018 the project picked SHA-256 as its successor hash.
+
+See 0ed8d8da374 (doc hash-function-transition: pick SHA-256 as
+NewHash, 2018-08-04) and numerous mailing list threads at the time,
+particularly the one starting at
+https://lore.kernel.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/
+for more information.
+
+Goals
+-----
+1. The transition to SHA-256 can be done one local repository at a time.
+   a. Requiring no action by any other party.
+   b. A SHA-256 repository can communicate with SHA-1 Git servers
+      (push/fetch).
+   c. Users can use SHA-1 and SHA-256 identifiers for objects
+      interchangeably (see "Object names on the command line", below).
+   d. New signed objects make use of a stronger hash function than
+      SHA-1 for their security guarantees.
+2. Allow a complete transition away from SHA-1.
+   a. Local metadata for SHA-1 compatibility can be removed from a
+      repository if compatibility with SHA-1 is no longer needed.
+3. Maintainability throughout the process.
+   a. The object format is kept simple and consistent.
+   b. Creation of a generalized repository conversion tool.
+
+Non-Goals
+---------
+1. Add SHA-256 support to Git protocol. This is valuable and the
+   logical next step but it is out of scope for this initial design.
+2. Transparently improving the security of existing SHA-1 signed
+   objects.
+3. Intermixing objects using multiple hash functions in a single
+   repository.
+4. Taking the opportunity to fix other bugs in Git's formats and
+   protocols.
+5. Shallow clones and fetches into a SHA-256 repository. (This will
+   change when we add SHA-256 support to Git protocol.)
+6. Skip fetching some submodules of a project into a SHA-256
+   repository. (This also depends on SHA-256 support in Git
+   protocol.)
+
+Overview
+--------
+We introduce a new repository format extension. Repositories with this
+extension enabled use SHA-256 instead of SHA-1 to name their objects.
+This affects both object names and object content -- both the names
+of objects and all references to other objects within an object are
+switched to the new hash function.
+
+SHA-256 repositories cannot be read by older versions of Git.
+
+Alongside the packfile, a SHA-256 repository stores a bidirectional
+mapping between SHA-256 and SHA-1 object names. The mapping is generated
+locally and can be verified using "git fsck". Object lookups use this
+mapping to allow naming objects using either their SHA-1 and SHA-256 names
+interchangeably.
+
+"git cat-file" and "git hash-object" gain options to display an object
+in its SHA-1 form and write an object given its SHA-1 form. This
+requires all objects referenced by that object to be present in the
+object database so that they can be named using the appropriate name
+(using the bidirectional hash mapping).
+
+Fetches from a SHA-1 based server convert the fetched objects into
+SHA-256 form and record the mapping in the bidirectional mapping table
+(see below for details). Pushes to a SHA-1 based server convert the
+objects being pushed into SHA-1 form so the server does not have to be
+aware of the hash function the client is using.
+
+Detailed Design
+---------------
+Repository format extension
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A SHA-256 repository uses repository format version `1` (see
+Documentation/technical/repository-version.txt) with extensions
+`objectFormat` and `compatObjectFormat`:
+
+	[core]
+		repositoryFormatVersion = 1
+	[extensions]
+		objectFormat = sha256
+		compatObjectFormat = sha1
+
+The combination of setting `core.repositoryFormatVersion=1` and
+populating `extensions.*` ensures that all versions of Git later than
+`v0.99.9l` will die instead of trying to operate on the SHA-256
+repository, instead producing an error message.
+
+	# Between v0.99.9l and v2.7.0
+	$ git status
+	fatal: Expected git repo version <= 0, found 1
+	# After v2.7.0
+	$ git status
+	fatal: unknown repository extensions found:
+		objectformat
+		compatobjectformat
+
+See the "Transition plan" section below for more details on these
+repository extensions.
+
+Object names
+~~~~~~~~~~~~
+Objects can be named by their 40 hexadecimal digit SHA-1 name or 64
+hexadecimal digit SHA-256 name, plus names derived from those (see
+gitrevisions(7)).
+
+The SHA-1 name of an object is the SHA-1 of the concatenation of its
+type, length, a nul byte, and the object's SHA-1 content. This is the
+traditional <sha1> used in Git to name objects.
+
+The SHA-256 name of an object is the SHA-256 of the concatenation of its
+type, length, a nul byte, and the object's SHA-256 content.
+
+Object format
+~~~~~~~~~~~~~
+The content as a byte sequence of a tag, commit, or tree object named
+by SHA-1 and SHA-256 differ because an object named by SHA-256 name refers to
+other objects by their SHA-256 names and an object named by SHA-1 name
+refers to other objects by their SHA-1 names.
+
+The SHA-256 content of an object is the same as its SHA-1 content, except
+that objects referenced by the object are named using their SHA-256 names
+instead of SHA-1 names. Because a blob object does not refer to any
+other object, its SHA-1 content and SHA-256 content are the same.
+
+The format allows round-trip conversion between SHA-256 content and
+SHA-1 content.
+
+Object storage
+~~~~~~~~~~~~~~
+Loose objects use zlib compression and packed objects use the packed
+format described in Documentation/technical/pack-format.txt, just like
+today. The content that is compressed and stored uses SHA-256 content
+instead of SHA-1 content.
+
+Pack index
+~~~~~~~~~~
+Pack index (.idx) files use a new v3 format that supports multiple
+hash functions. They have the following format (all integers are in
+network byte order):
+
+- A header appears at the beginning and consists of the following:
+  * The 4-byte pack index signature: '\377t0c'
+  * 4-byte version number: 3
+  * 4-byte length of the header section, including the signature and
+    version number
+  * 4-byte number of objects contained in the pack
+  * 4-byte number of object formats in this pack index: 2
+  * For each object format:
+    ** 4-byte format identifier (e.g., 'sha1' for SHA-1)
+    ** 4-byte length in bytes of shortened object names. This is the
+      shortest possible length needed to make names in the shortened
+      object name table unambiguous.
+    ** 4-byte integer, recording where tables relating to this format
+      are stored in this index file, as an offset from the beginning.
+  * 4-byte offset to the trailer from the beginning of this file.
+  * Zero or more additional key/value pairs (4-byte key, 4-byte
+    value). Only one key is supported: 'PSRC'. See the "Loose objects
+    and unreachable objects" section for supported values and how this
+    is used.  All other keys are reserved. Readers must ignore
+    unrecognized keys.
+- Zero or more NUL bytes. This can optionally be used to improve the
+  alignment of the full object name table below.
+- Tables for the first object format:
+  * A sorted table of shortened object names.  These are prefixes of
+    the names of all objects in this pack file, packed together
+    without offset values to reduce the cache footprint of the binary
+    search for a specific object name.
+
+  * A table of full object names in pack order. This allows resolving
+    a reference to "the nth object in the pack file" (from a
+    reachability bitmap or from the next table of another object
+    format) to its object name.
+
+  * A table of 4-byte values mapping object name order to pack order.
+    For an object in the table of sorted shortened object names, the
+    value at the corresponding index in this table is the index in the
+    previous table for that same object.
+    This can be used to look up the object in reachability bitmaps or
+    to look up its name in another object format.
+
+  * A table of 4-byte CRC32 values of the packed object data, in the
+    order that the objects appear in the pack file. This is to allow
+    compressed data to be copied directly from pack to pack during
+    repacking without undetected data corruption.
+
+  * A table of 4-byte offset values. For an object in the table of
+    sorted shortened object names, the value at the corresponding
+    index in this table indicates where that object can be found in
+    the pack file. These are usually 31-bit pack file offsets, but
+    large offsets are encoded as an index into the next table with the
+    most significant bit set.
+
+  * A table of 8-byte offset entries (empty for pack files less than
+    2 GiB). Pack files are organized with heavily used objects toward
+    the front, so most object references should not need to refer to
+    this table.
+- Zero or more NUL bytes.
+- Tables for the second object format, with the same layout as above,
+  up to and not including the table of CRC32 values.
+- Zero or more NUL bytes.
+- The trailer consists of the following:
+  * A copy of the 20-byte SHA-256 checksum at the end of the
+    corresponding packfile.
+
+  * 20-byte SHA-256 checksum of all of the above.
+
+Loose object index
+~~~~~~~~~~~~~~~~~~
+A new file $GIT_OBJECT_DIR/loose-object-idx contains information about
+all loose objects. Its format is
+
+  # loose-object-idx
+  (sha256-name SP sha1-name LF)*
+
+where the object names are in hexadecimal format. The file is not
+sorted.
+
+The loose object index is protected against concurrent writes by a
+lock file $GIT_OBJECT_DIR/loose-object-idx.lock. To add a new loose
+object:
+
+1. Write the loose object to a temporary file, like today.
+2. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the lock.
+3. Rename the loose object into place.
+4. Open loose-object-idx with O_APPEND and write the new object
+5. Unlink loose-object-idx.lock to release the lock.
+
+To remove entries (e.g. in "git pack-refs" or "git-prune"):
+
+1. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the
+   lock.
+2. Write the new content to loose-object-idx.lock.
+3. Unlink any loose objects being removed.
+4. Rename to replace loose-object-idx, releasing the lock.
+
+Translation table
+~~~~~~~~~~~~~~~~~
+The index files support a bidirectional mapping between SHA-1 names
+and SHA-256 names. The lookup proceeds similarly to ordinary object
+lookups. For example, to convert a SHA-1 name to a SHA-256 name:
+
+ 1. Look for the object in idx files. If a match is present in the
+    idx's sorted list of truncated SHA-1 names, then:
+    a. Read the corresponding entry in the SHA-1 name order to pack
+       name order mapping.
+    b. Read the corresponding entry in the full SHA-1 name table to
+       verify we found the right object. If it is, then
+    c. Read the corresponding entry in the full SHA-256 name table.
+       That is the object's SHA-256 name.
+ 2. Check for a loose object. Read lines from loose-object-idx until
+    we find a match.
+
+Step (1) takes the same amount of time as an ordinary object lookup:
+O(number of packs * log(objects per pack)). Step (2) takes O(number of
+loose objects) time. To maintain good performance it will be necessary
+to keep the number of loose objects low. See the "Loose objects and
+unreachable objects" section below for more details.
+
+Since all operations that make new objects (e.g., "git commit") add
+the new objects to the corresponding index, this mapping is possible
+for all objects in the object store.
+
+Reading an object's SHA-1 content
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The SHA-1 content of an object can be read by converting all SHA-256 names
+of its SHA-256 content references to SHA-1 names using the translation table.
+
+Fetch
+~~~~~
+Fetching from a SHA-1 based server requires translating between SHA-1
+and SHA-256 based representations on the fly.
+
+SHA-1s named in the ref advertisement that are present on the client
+can be translated to SHA-256 and looked up as local objects using the
+translation table.
+
+Negotiation proceeds as today. Any "have"s generated locally are
+converted to SHA-1 before being sent to the server, and SHA-1s
+mentioned by the server are converted to SHA-256 when looking them up
+locally.
+
+After negotiation, the server sends a packfile containing the
+requested objects. We convert the packfile to SHA-256 format using
+the following steps:
+
+1. index-pack: inflate each object in the packfile and compute its
+   SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
+   objects the client has locally. These objects can be looked up
+   using the translation table and their SHA-1 content read as
+   described above to resolve the deltas.
+2. topological sort: starting at the "want"s from the negotiation
+   phase, walk through objects in the pack and emit a list of them,
+   excluding blobs, in reverse topologically sorted order, with each
+   object coming later in the list than all objects it references.
+   (This list only contains objects reachable from the "wants". If the
+   pack from the server contained additional extraneous objects, then
+   they will be discarded.)
+3. convert to SHA-256: open a new SHA-256 packfile. Read the topologically
+   sorted list just generated. For each object, inflate its
+   SHA-1 content, convert to SHA-256 content, and write it to the SHA-256
+   pack. Record the new SHA-1<-->SHA-256 mapping entry for use in the idx.
+4. sort: reorder entries in the new pack to match the order of objects
+   in the pack the server generated and include blobs. Write a SHA-256 idx
+   file
+5. clean up: remove the SHA-1 based pack file, index, and
+   topologically sorted list obtained from the server in steps 1
+   and 2.
+
+Step 3 requires every object referenced by the new object to be in the
+translation table. This is why the topological sort step is necessary.
+
+As an optimization, step 1 could write a file describing what non-blob
+objects each object it has inflated from the packfile references. This
+makes the topological sort in step 2 possible without inflating the
+objects in the packfile for a second time. The objects need to be
+inflated again in step 3, for a total of two inflations.
+
+Step 4 is probably necessary for good read-time performance. "git
+pack-objects" on the server optimizes the pack file for good data
+locality (see Documentation/technical/pack-heuristics.txt).
+
+Details of this process are likely to change. It will take some
+experimenting to get this to perform well.
+
+Push
+~~~~
+Push is simpler than fetch because the objects referenced by the
+pushed objects are already in the translation table. The SHA-1 content
+of each object being pushed can be read as described in the "Reading
+an object's SHA-1 content" section to generate the pack written by git
+send-pack.
+
+Signed Commits
+~~~~~~~~~~~~~~
+We add a new field "gpgsig-sha256" to the commit object format to allow
+signing commits without relying on SHA-1. It is similar to the
+existing "gpgsig" field. Its signed payload is the SHA-256 content of the
+commit object with any "gpgsig" and "gpgsig-sha256" fields removed.
+
+This means commits can be signed
+
+1. using SHA-1 only, as in existing signed commit objects
+2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
+   fields.
+3. using only SHA-256, by only using the gpgsig-sha256 field.
+
+Old versions of "git verify-commit" can verify the gpgsig signature in
+cases (1) and (2) without modifications and view case (3) as an
+ordinary unsigned commit.
+
+Signed Tags
+~~~~~~~~~~~
+We add a new field "gpgsig-sha256" to the tag object format to allow
+signing tags without relying on SHA-1. Its signed payload is the
+SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
+SIGNATURE-----" delimited in-body signature removed.
+
+This means tags can be signed
+
+1. using SHA-1 only, as in existing signed tag objects
+2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
+   signature.
+3. using only SHA-256, by only using the gpgsig-sha256 field.
+
+Mergetag embedding
+~~~~~~~~~~~~~~~~~~
+The mergetag field in the SHA-1 content of a commit contains the
+SHA-1 content of a tag that was merged by that commit.
+
+The mergetag field in the SHA-256 content of the same commit contains the
+SHA-256 content of the same tag.
+
+Submodules
+~~~~~~~~~~
+To convert recorded submodule pointers, you need to have the converted
+submodule repository in place. The translation table of the submodule
+can be used to look up the new hash.
+
+Loose objects and unreachable objects
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Fast lookups in the loose-object-idx require that the number of loose
+objects not grow too high.
+
+"git gc --auto" currently waits for there to be 6700 loose objects
+present before consolidating them into a packfile. We will need to
+measure to find a more appropriate threshold for it to use.
+
+"git gc --auto" currently waits for there to be 50 packs present
+before combining packfiles. Packing loose objects more aggressively
+may cause the number of pack files to grow too quickly. This can be
+mitigated by using a strategy similar to Martin Fick's exponential
+rolling garbage collection script:
+https://gerrit-review.googlesource.com/c/gerrit/+/35215
+
+"git gc" currently expels any unreachable objects it encounters in
+pack files to loose objects in an attempt to prevent a race when
+pruning them (in case another process is simultaneously writing a new
+object that refers to the about-to-be-deleted object). This leads to
+an explosion in the number of loose objects present and disk space
+usage due to the objects in delta form being replaced with independent
+loose objects.  Worse, the race is still present for loose objects.
+
+Instead, "git gc" will need to move unreachable objects to a new
+packfile marked as UNREACHABLE_GARBAGE (using the PSRC field; see
+below). To avoid the race when writing new objects referring to an
+about-to-be-deleted object, code paths that write new objects will
+need to copy any objects from UNREACHABLE_GARBAGE packs that they
+refer to new, non-UNREACHABLE_GARBAGE packs (or loose objects).
+UNREACHABLE_GARBAGE are then safe to delete if their creation time (as
+indicated by the file's mtime) is long enough ago.
+
+To avoid a proliferation of UNREACHABLE_GARBAGE packs, they can be
+combined under certain circumstances. If "gc.garbageTtl" is set to
+greater than one day, then packs created within a single calendar day,
+UTC, can be coalesced together. The resulting packfile would have an
+mtime before midnight on that day, so this makes the effective maximum
+ttl the garbageTtl + 1 day. If "gc.garbageTtl" is less than one day,
+then we divide the calendar day into intervals one-third of that ttl
+in duration. Packs created within the same interval can be coalesced
+together. The resulting packfile would have an mtime before the end of
+the interval, so this makes the effective maximum ttl equal to the
+garbageTtl * 4/3.
+
+This rule comes from Thirumala Reddy Mutchukota's JGit change
+https://git.eclipse.org/r/90465.
+
+The UNREACHABLE_GARBAGE setting goes in the PSRC field of the pack
+index. More generally, that field indicates where a pack came from:
+
+ - 1 (PACK_SOURCE_RECEIVE) for a pack received over the network
+ - 2 (PACK_SOURCE_AUTO) for a pack created by a lightweight
+   "gc --auto" operation
+ - 3 (PACK_SOURCE_GC) for a pack created by a full gc
+ - 4 (PACK_SOURCE_UNREACHABLE_GARBAGE) for potential garbage
+   discovered by gc
+ - 5 (PACK_SOURCE_INSERT) for locally created objects that were
+   written directly to a pack file, e.g. from "git add ."
+
+This information can be useful for debugging and for "gc --auto" to
+make appropriate choices about which packs to coalesce.
+
+Caveats
+-------
+Invalid objects
+~~~~~~~~~~~~~~~
+The conversion from SHA-1 content to SHA-256 content retains any
+brokenness in the original object (e.g., tree entry modes encoded with
+leading 0, tree objects whose paths are not sorted correctly, and
+commit objects without an author or committer). This is a deliberate
+feature of the design to allow the conversion to round-trip.
+
+More profoundly broken objects (e.g., a commit with a truncated "tree"
+header line) cannot be converted but were not usable by current Git
+anyway.
+
+Shallow clone and submodules
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Because it requires all referenced objects to be available in the
+locally generated translation table, this design does not support
+shallow clone or unfetched submodules. Protocol improvements might
+allow lifting this restriction.
+
+Alternates
+~~~~~~~~~~
+For the same reason, a SHA-256 repository cannot borrow objects from a
+SHA-1 repository using objects/info/alternates or
+$GIT_ALTERNATE_OBJECT_REPOSITORIES.
+
+git notes
+~~~~~~~~~
+The "git notes" tool annotates objects using their SHA-1 name as key.
+This design does not describe a way to migrate notes trees to use
+SHA-256 names. That migration is expected to happen separately (for
+example using a file at the root of the notes tree to describe which
+hash it uses).
+
+Server-side cost
+~~~~~~~~~~~~~~~~
+Until Git protocol gains SHA-256 support, using SHA-256 based storage
+on public-facing Git servers is strongly discouraged. Once Git
+protocol gains SHA-256 support, SHA-256 based servers are likely not
+to support SHA-1 compatibility, to avoid what may be a very expensive
+hash re-encode during clone and to encourage peers to modernize.
+
+The design described here allows fetches by SHA-1 clients of a
+personal SHA-256 repository because it's not much more difficult than
+allowing pushes from that repository. This support needs to be guarded
+by a configuration option --- servers like git.kernel.org that serve a
+large number of clients would not be expected to bear that cost.
+
+Meaning of signatures
+~~~~~~~~~~~~~~~~~~~~~
+The signed payload for signed commits and tags does not explicitly
+name the hash used to identify objects. If some day Git adopts a new
+hash function with the same length as the current SHA-1 (40
+hexadecimal digit) or SHA-256 (64 hexadecimal digit) objects then the
+intent behind the PGP signed payload in an object signature is
+unclear:
+
+	object e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7
+	type commit
+	tag v2.12.0
+	tagger Junio C Hamano <gitster@pobox.com> 1487962205 -0800
+
+	Git 2.12
+
+Does this mean Git v2.12.0 is the commit with SHA-1 name
+e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with
+new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7?
+
+Fortunately SHA-256 and SHA-1 have different lengths. If Git starts
+using another hash with the same length to name objects, then it will
+need to change the format of signed payloads using that hash to
+address this issue.
+
+Object names on the command line
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+To support the transition (see Transition plan below), this design
+supports four different modes of operation:
+
+ 1. ("dark launch") Treat object names input by the user as SHA-1 and
+    convert any object names written to output to SHA-1, but store
+    objects using SHA-256.  This allows users to test the code with no
+    visible behavior change except for performance.  This allows
+    allows running even tests that assume the SHA-1 hash function, to
+    sanity-check the behavior of the new mode.
+
+ 2. ("early transition") Allow both SHA-1 and SHA-256 object names in
+    input. Any object names written to output use SHA-1. This allows
+    users to continue to make use of SHA-1 to communicate with peers
+    (e.g. by email) that have not migrated yet and prepares for mode 3.
+
+ 3. ("late transition") Allow both SHA-1 and SHA-256 object names in
+    input. Any object names written to output use SHA-256. In this
+    mode, users are using a more secure object naming method by
+    default.  The disruption is minimal as long as most of their peers
+    are in mode 2 or mode 3.
+
+ 4. ("post-transition") Treat object names input by the user as
+    SHA-256 and write output using SHA-256. This is safer than mode 3
+    because there is less risk that input is incorrectly interpreted
+    using the wrong hash function.
+
+The mode is specified in configuration.
+
+The user can also explicitly specify which format to use for a
+particular revision specifier and for output, overriding the mode. For
+example:
+
+    git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
+
+Transition plan
+---------------
+Some initial steps can be implemented independently of one another:
+
+- adding a hash function API (vtable)
+- teaching fsck to tolerate the gpgsig-sha256 field
+- excluding gpgsig-* from the fields copied by "git commit --amend"
+- annotating tests that depend on SHA-1 values with a SHA1 test
+  prerequisite
+- using "struct object_id", GIT_MAX_RAWSZ, and GIT_MAX_HEXSZ
+  consistently instead of "unsigned char *" and the hardcoded
+  constants 20 and 40.
+- introducing index v3
+- adding support for the PSRC field and safer object pruning
+
+The first user-visible change is the introduction of the objectFormat
+extension (without compatObjectFormat). This requires:
+
+- teaching fsck about this mode of operation
+- using the hash function API (vtable) when computing object names
+- signing objects and verifying signatures
+- rejecting attempts to fetch from or push to an incompatible
+  repository
+
+Next comes introduction of compatObjectFormat:
+
+- implementing the loose-object-idx
+- translating object names between object formats
+- translating object content between object formats
+- generating and verifying signatures in the compat format
+- adding appropriate index entries when adding a new object to the
+  object store
+- --output-format option
+- ^{sha1} and ^{sha256} revision notation
+- configuration to specify default input and output format (see
+  "Object names on the command line" above)
+
+The next step is supporting fetches and pushes to SHA-1 repositories:
+
+- allow pushes to a repository using the compat format
+- generate a topologically sorted list of the SHA-1 names of fetched
+  objects
+- convert the fetched packfile to SHA-256 format and generate an idx
+  file
+- re-sort to match the order of objects in the fetched packfile
+
+The infrastructure supporting fetch also allows converting an existing
+repository. In converted repositories and new clones, end users can
+gain support for the new hash function without any visible change in
+behavior (see "dark launch" in the "Object names on the command line"
+section). In particular this allows users to verify SHA-256 signatures
+on objects in the repository, and it should ensure the transition code
+is stable in production in preparation for using it more widely.
+
+Over time projects would encourage their users to adopt the "early
+transition" and then "late transition" modes to take advantage of the
+new, more futureproof SHA-256 object names.
+
+When objectFormat and compatObjectFormat are both set, commands
+generating signatures would generate both SHA-1 and SHA-256 signatures
+by default to support both new and old users.
+
+In projects using SHA-256 heavily, users could be encouraged to adopt
+the "post-transition" mode to avoid accidentally making implicit use
+of SHA-1 object names.
+
+Once a critical mass of users have upgraded to a version of Git that
+can verify SHA-256 signatures and have converted their existing
+repositories to support verifying them, we can add support for a
+setting to generate only SHA-256 signatures. This is expected to be at
+least a year later.
+
+That is also a good moment to advertise the ability to convert
+repositories to use SHA-256 only, stripping out all SHA-1 related
+metadata. This improves performance by eliminating translation
+overhead and security by avoiding the possibility of accidentally
+relying on the safety of SHA-1.
+
+Updating Git's protocols to allow a server to specify which hash
+functions it supports is also an important part of this transition. It
+is not discussed in detail in this document but this transition plan
+assumes it happens. :)
+
+Alternatives considered
+-----------------------
+Upgrading everyone working on a particular project on a flag day
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Projects like the Linux kernel are large and complex enough that
+flipping the switch for all projects based on the repository at once
+is infeasible.
+
+Not only would all developers and server operators supporting
+developers have to switch on the same flag day, but supporting tooling
+(continuous integration, code review, bug trackers, etc) would have to
+be adapted as well. This also makes it difficult to get early feedback
+from some project participants testing before it is time for mass
+adoption.
+
+Using hash functions in parallel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+(e.g. https://lore.kernel.org/git/22708.8913.864049.452252@chiark.greenend.org.uk/ )
+Objects newly created would be addressed by the new hash, but inside
+such an object (e.g. commit) it is still possible to address objects
+using the old hash function.
+
+* You cannot trust its history (needed for bisectability) in the
+  future without further work
+* Maintenance burden as the number of supported hash functions grows
+  (they will never go away, so they accumulate). In this proposal, by
+  comparison, converted objects lose all references to SHA-1.
+
+Signed objects with multiple hashes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Instead of introducing the gpgsig-sha256 field in commit and tag objects
+for SHA-256 content based signatures, an earlier version of this design
+added "hash sha256 <SHA-256 name>" fields to strengthen the existing
+SHA-1 content based signatures.
+
+In other words, a single signature was used to attest to the object
+content using both hash functions. This had some advantages:
+
+* Using one signature instead of two speeds up the signing process.
+* Having one signed payload with both hashes allows the signer to
+  attest to the SHA-1 name and SHA-256 name referring to the same object.
+* All users consume the same signature. Broken signatures are likely
+  to be detected quickly using current versions of git.
+
+However, it also came with disadvantages:
+
+* Verifying a signed object requires access to the SHA-1 names of all
+  objects it references, even after the transition is complete and
+  translation table is no longer needed for anything else. To support
+  this, the design added fields such as "hash sha1 tree <SHA-1 name>"
+  and "hash sha1 parent <SHA-1 name>" to the SHA-256 content of a signed
+  commit, complicating the conversion process.
+* Allowing signed objects without a SHA-1 (for after the transition is
+  complete) complicated the design further, requiring a "nohash sha1"
+  field to suppress including "hash sha1" fields in the SHA-256 content
+  and signed payload.
+
+Lazily populated translation table
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Some of the work of building the translation table could be deferred to
+push time, but that would significantly complicate and slow down pushes.
+Calculating the SHA-1 name at object creation time at the same time it is
+being streamed to disk and having its SHA-256 name calculated should be
+an acceptable cost.
+
+Document History
+----------------
+
+2017-03-03
+bmwill@google.com, jonathantanmy@google.com, jrnieder@gmail.com,
+sbeller@google.com
+
+* Initial version sent to https://lore.kernel.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com
+
+2017-03-03 jrnieder@gmail.com
+Incorporated suggestions from jonathantanmy and sbeller:
+
+* Describe purpose of signed objects with each hash type
+* Redefine signed object verification using object content under the
+  first hash function
+
+2017-03-06 jrnieder@gmail.com
+
+* Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2]
+* Make SHA3-based signatures a separate field, avoiding the need for
+  "hash" and "nohash" fields (thanks to peff[3]).
+* Add a sorting phase to fetch (thanks to Junio for noticing the need
+  for this).
+* Omit blobs from the topological sort during fetch (thanks to peff).
+* Discuss alternates, git notes, and git servers in the caveats
+  section (thanks to Junio Hamano, brian m. carlson[4], and Shawn
+  Pearce).
+* Clarify language throughout (thanks to various commenters,
+  especially Junio).
+
+2017-09-27 jrnieder@gmail.com, sbeller@google.com
+
+* Use placeholder NewHash instead of SHA3-256
+* Describe criteria for picking a hash function.
+* Include a transition plan (thanks especially to Brandon Williams
+  for fleshing these ideas out)
+* Define the translation table (thanks, Shawn Pearce[5], Jonathan
+  Tan, and Masaya Suzuki)
+* Avoid loose object overhead by packing more aggressively in
+  "git gc --auto"
+
+Later history:
+
+* See the history of this file in git.git for the history of subsequent
+  edits. This document history is no longer being maintained as it
+  would now be superfluous to the commit log
+
+References:
+
+ [1] https://lore.kernel.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/
+ [2] https://lore.kernel.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/
+ [3] https://lore.kernel.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/
+ [4] https://lore.kernel.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net
+ [5] https://lore.kernel.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
new file mode 100644
index 0000000000..96d89ea9b2
--- /dev/null
+++ b/Documentation/technical/http-protocol.txt
@@ -0,0 +1,519 @@
+HTTP transfer protocols
+=======================
+
+Git supports two HTTP based transfer protocols.  A "dumb" protocol
+which requires only a standard HTTP server on the server end of the
+connection, and a "smart" protocol which requires a Git aware CGI
+(or server module).  This document describes both protocols.
+
+As a design feature smart clients can automatically upgrade "dumb"
+protocol URLs to smart URLs.  This permits all users to have the
+same published URL, and the peers automatically select the most
+efficient transport available to them.
+
+
+URL Format
+----------
+
+URLs for Git repositories accessed by HTTP use the standard HTTP
+URL syntax documented by RFC 1738, so they are of the form:
+
+  http://<host>:<port>/<path>?<searchpart>
+
+Within this documentation the placeholder `$GIT_URL` will stand for
+the http:// repository URL entered by the end-user.
+
+Servers SHOULD handle all requests to locations matching `$GIT_URL`, as
+both the "smart" and "dumb" HTTP protocols used by Git operate
+by appending additional path components onto the end of the user
+supplied `$GIT_URL` string.
+
+An example of a dumb client requesting for a loose object:
+
+  $GIT_URL:     http://example.com:8080/git/repo.git
+  URL request:  http://example.com:8080/git/repo.git/objects/d0/49f6c27a2244e12041955e262a404c7faba355
+
+An example of a smart request to a catch-all gateway:
+
+  $GIT_URL:     http://example.com/daemon.cgi?svc=git&q=
+  URL request:  http://example.com/daemon.cgi?svc=git&q=/info/refs&service=git-receive-pack
+
+An example of a request to a submodule:
+
+  $GIT_URL:     http://example.com/git/repo.git/path/submodule.git
+  URL request:  http://example.com/git/repo.git/path/submodule.git/info/refs
+
+Clients MUST strip a trailing `/`, if present, from the user supplied
+`$GIT_URL` string to prevent empty path tokens (`//`) from appearing
+in any URL sent to a server.  Compatible clients MUST expand
+`$GIT_URL/info/refs` as `foo/info/refs` and not `foo//info/refs`.
+
+
+Authentication
+--------------
+
+Standard HTTP authentication is used if authentication is required
+to access a repository, and MAY be configured and enforced by the
+HTTP server software.
+
+Because Git repositories are accessed by standard path components
+server administrators MAY use directory based permissions within
+their HTTP server to control repository access.
+
+Clients SHOULD support Basic authentication as described by RFC 2617.
+Servers SHOULD support Basic authentication by relying upon the
+HTTP server placed in front of the Git server software.
+
+Servers SHOULD NOT require HTTP cookies for the purposes of
+authentication or access control.
+
+Clients and servers MAY support other common forms of HTTP based
+authentication, such as Digest authentication.
+
+
+SSL
+---
+
+Clients and servers SHOULD support SSL, particularly to protect
+passwords when relying on Basic HTTP authentication.
+
+
+Session State
+-------------
+
+The Git over HTTP protocol (much like HTTP itself) is stateless
+from the perspective of the HTTP server side.  All state MUST be
+retained and managed by the client process.  This permits simple
+round-robin load-balancing on the server side, without needing to
+worry about state management.
+
+Clients MUST NOT require state management on the server side in
+order to function correctly.
+
+Servers MUST NOT require HTTP cookies in order to function correctly.
+Clients MAY store and forward HTTP cookies during request processing
+as described by RFC 2616 (HTTP/1.1).  Servers SHOULD ignore any
+cookies sent by a client.
+
+
+General Request Processing
+--------------------------
+
+Except where noted, all standard HTTP behavior SHOULD be assumed
+by both client and server.  This includes (but is not necessarily
+limited to):
+
+If there is no repository at `$GIT_URL`, or the resource pointed to by a
+location matching `$GIT_URL` does not exist, the server MUST NOT respond
+with `200 OK` response.  A server SHOULD respond with
+`404 Not Found`, `410 Gone`, or any other suitable HTTP status code
+which does not imply the resource exists as requested.
+
+If there is a repository at `$GIT_URL`, but access is not currently
+permitted, the server MUST respond with the `403 Forbidden` HTTP
+status code.
+
+Servers SHOULD support both HTTP 1.0 and HTTP 1.1.
+Servers SHOULD support chunked encoding for both request and response
+bodies.
+
+Clients SHOULD support both HTTP 1.0 and HTTP 1.1.
+Clients SHOULD support chunked encoding for both request and response
+bodies.
+
+Servers MAY return ETag and/or Last-Modified headers.
+
+Clients MAY revalidate cached entities by including If-Modified-Since
+and/or If-None-Match request headers.
+
+Servers MAY return `304 Not Modified` if the relevant headers appear
+in the request and the entity has not changed.  Clients MUST treat
+`304 Not Modified` identical to `200 OK` by reusing the cached entity.
+
+Clients MAY reuse a cached entity without revalidation if the
+Cache-Control and/or Expires header permits caching.  Clients and
+servers MUST follow RFC 2616 for cache controls.
+
+
+Discovering References
+----------------------
+
+All HTTP clients MUST begin either a fetch or a push exchange by
+discovering the references available on the remote repository.
+
+Dumb Clients
+~~~~~~~~~~~~
+
+HTTP clients that only support the "dumb" protocol MUST discover
+references by making a request for the special info/refs file of
+the repository.
+
+Dumb HTTP clients MUST make a `GET` request to `$GIT_URL/info/refs`,
+without any search/query parameters.
+
+   C: GET $GIT_URL/info/refs HTTP/1.0
+
+   S: 200 OK
+   S:
+   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
+   S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
+   S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
+   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+
+The Content-Type of the returned info/refs entity SHOULD be
+`text/plain; charset=utf-8`, but MAY be any content type.
+Clients MUST NOT attempt to validate the returned Content-Type.
+Dumb servers MUST NOT return a return type starting with
+`application/x-git-`.
+
+Cache-Control headers MAY be returned to disable caching of the
+returned entity.
+
+When examining the response clients SHOULD only examine the HTTP
+status code.  Valid responses are `200 OK`, or `304 Not Modified`.
+
+The returned content is a UNIX formatted text file describing
+each ref and its known value.  The file SHOULD be sorted by name
+according to the C locale ordering.  The file SHOULD NOT include
+the default ref named `HEAD`.
+
+  info_refs   =  *( ref_record )
+  ref_record  =  any_ref / peeled_ref
+
+  any_ref     =  obj-id HTAB refname LF
+  peeled_ref  =  obj-id HTAB refname LF
+		 obj-id HTAB refname "^{}" LF
+
+Smart Clients
+~~~~~~~~~~~~~
+
+HTTP clients that support the "smart" protocol (or both the
+"smart" and "dumb" protocols) MUST discover references by making
+a parameterized request for the info/refs file of the repository.
+
+The request MUST contain exactly one query parameter,
+`service=$servicename`, where `$servicename` MUST be the service
+name the client wishes to contact to complete the operation.
+The request MUST NOT contain additional query parameters.
+
+   C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
+
+dumb server reply:
+
+   S: 200 OK
+   S:
+   S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31	refs/heads/maint
+   S: d049f6c27a2244e12041955e262a404c7faba355	refs/heads/master
+   S: 2cb58b79488a98d2721cea644875a8dd0026b115	refs/tags/v1.0
+   S: a3c2e2402b99163d1d59756e5f207ae21cccba4c	refs/tags/v1.0^{}
+
+smart server reply:
+
+   S: 200 OK
+   S: Content-Type: application/x-git-upload-pack-advertisement
+   S: Cache-Control: no-cache
+   S:
+   S: 001e# service=git-upload-pack\n
+   S: 0000
+   S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n
+   S: 003fd049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n
+   S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
+   S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
+   S: 0000
+
+The client may send Extra Parameters (see
+Documentation/technical/pack-protocol.txt) as a colon-separated string
+in the Git-Protocol HTTP header.
+
+Dumb Server Response
+^^^^^^^^^^^^^^^^^^^^
+Dumb servers MUST respond with the dumb server reply format.
+
+See the prior section under dumb clients for a more detailed
+description of the dumb server response.
+
+Smart Server Response
+^^^^^^^^^^^^^^^^^^^^^
+If the server does not recognize the requested service name, or the
+requested service name has been disabled by the server administrator,
+the server MUST respond with the `403 Forbidden` HTTP status code.
+
+Otherwise, smart servers MUST respond with the smart server reply
+format for the requested service name.
+
+Cache-Control headers SHOULD be used to disable caching of the
+returned entity.
+
+The Content-Type MUST be `application/x-$servicename-advertisement`.
+Clients SHOULD fall back to the dumb protocol if another content
+type is returned.  When falling back to the dumb protocol clients
+SHOULD NOT make an additional request to `$GIT_URL/info/refs`, but
+instead SHOULD use the response already in hand.  Clients MUST NOT
+continue if they do not support the dumb protocol.
+
+Clients MUST validate the status code is either `200 OK` or
+`304 Not Modified`.
+
+Clients MUST validate the first five bytes of the response entity
+matches the regex `^[0-9a-f]{4}#`.  If this test fails, clients
+MUST NOT continue.
+
+Clients MUST parse the entire response as a sequence of pkt-line
+records.
+
+Clients MUST verify the first pkt-line is `# service=$servicename`.
+Servers MUST set $servicename to be the request parameter value.
+Servers SHOULD include an LF at the end of this line.
+Clients MUST ignore an LF at the end of the line.
+
+Servers MUST terminate the response with the magic `0000` end
+pkt-line marker.
+
+The returned response is a pkt-line stream describing each ref and
+its known value.  The stream SHOULD be sorted by name according to
+the C locale ordering.  The stream SHOULD include the default ref
+named `HEAD` as the first ref.  The stream MUST include capability
+declarations behind a NUL on the first ref.
+
+The returned response contains "version 1" if "version=1" was sent as an
+Extra Parameter.
+
+  smart_reply     =  PKT-LINE("# service=$servicename" LF)
+		     "0000"
+		     *1("version 1")
+		     ref_list
+		     "0000"
+  ref_list        =  empty_list / non_empty_list
+
+  empty_list      =  PKT-LINE(zero-id SP "capabilities^{}" NUL cap-list LF)
+
+  non_empty_list  =  PKT-LINE(obj-id SP name NUL cap_list LF)
+		     *ref_record
+
+  cap-list        =  capability *(SP capability)
+  capability      =  1*(LC_ALPHA / DIGIT / "-" / "_")
+  LC_ALPHA        =  %x61-7A
+
+  ref_record      =  any_ref / peeled_ref
+  any_ref         =  PKT-LINE(obj-id SP name LF)
+  peeled_ref      =  PKT-LINE(obj-id SP name LF)
+		     PKT-LINE(obj-id SP name "^{}" LF
+
+
+Smart Service git-upload-pack
+------------------------------
+This service reads from the repository pointed to by `$GIT_URL`.
+
+Clients MUST first perform ref discovery with
+`$GIT_URL/info/refs?service=git-upload-pack`.
+
+   C: POST $GIT_URL/git-upload-pack HTTP/1.0
+   C: Content-Type: application/x-git-upload-pack-request
+   C:
+   C: 0032want 0a53e9ddeaddad63ad106860237bbf53411d11a7\n
+   C: 0032have 441b40d833fdfa93eb2908e52742248faf0ee993\n
+   C: 0000
+
+   S: 200 OK
+   S: Content-Type: application/x-git-upload-pack-result
+   S: Cache-Control: no-cache
+   S:
+   S: ....ACK %s, continue
+   S: ....NAK
+
+Clients MUST NOT reuse or revalidate a cached response.
+Servers MUST include sufficient Cache-Control headers
+to prevent caching of the response.
+
+Servers SHOULD support all capabilities defined here.
+
+Clients MUST send at least one "want" command in the request body.
+Clients MUST NOT reference an id in a "want" command which did not
+appear in the response obtained through ref discovery unless the
+server advertises capability `allow-tip-sha1-in-want` or
+`allow-reachable-sha1-in-want`.
+
+  compute_request   =  want_list
+		       have_list
+		       request_end
+  request_end       =  "0000" / "done"
+
+  want_list         =  PKT-LINE(want SP cap_list LF)
+		       *(want_pkt)
+  want_pkt          =  PKT-LINE(want LF)
+  want              =  "want" SP id
+  cap_list          =  capability *(SP capability)
+
+  have_list         =  *PKT-LINE("have" SP id LF)
+
+TODO: Document this further.
+
+The Negotiation Algorithm
+~~~~~~~~~~~~~~~~~~~~~~~~~
+The computation to select the minimal pack proceeds as follows
+(C = client, S = server):
+
+'init step:'
+
+C: Use ref discovery to obtain the advertised refs.
+
+C: Place any object seen into set `advertised`.
+
+C: Build an empty set, `common`, to hold the objects that are later
+   determined to be on both ends.
+
+C: Build a set, `want`, of the objects from `advertised` the client
+   wants to fetch, based on what it saw during ref discovery.
+
+C: Start a queue, `c_pending`, ordered by commit time (popping newest
+   first).  Add all client refs.  When a commit is popped from
+   the queue its parents SHOULD be automatically inserted back.
+   Commits MUST only enter the queue once.
+
+'one compute step:'
+
+C: Send one `$GIT_URL/git-upload-pack` request:
+
+   C: 0032want <want #1>...............................
+   C: 0032want <want #2>...............................
+   ....
+   C: 0032have <common #1>.............................
+   C: 0032have <common #2>.............................
+   ....
+   C: 0032have <have #1>...............................
+   C: 0032have <have #2>...............................
+   ....
+   C: 0000
+
+The stream is organized into "commands", with each command
+appearing by itself in a pkt-line.  Within a command line,
+the text leading up to the first space is the command name,
+and the remainder of the line to the first LF is the value.
+Command lines are terminated with an LF as the last byte of
+the pkt-line value.
+
+Commands MUST appear in the following order, if they appear
+at all in the request stream:
+
+* "want"
+* "have"
+
+The stream is terminated by a pkt-line flush (`0000`).
+
+A single "want" or "have" command MUST have one hex formatted
+object name as its value.  Multiple object names MUST be sent by sending
+multiple commands. Object names MUST be given using the object format
+negotiated through the `object-format` capability (default SHA-1).
+
+The `have` list is created by popping the first 32 commits
+from `c_pending`.  Less can be supplied if `c_pending` empties.
+
+If the client has sent 256 "have" commits and has not yet
+received one of those back from `s_common`, or the client has
+emptied `c_pending` it SHOULD include a "done" command to let
+the server know it won't proceed:
+
+   C: 0009done
+
+S: Parse the git-upload-pack request:
+
+Verify all objects in `want` are directly reachable from refs.
+
+The server MAY walk backwards through history or through
+the reflog to permit slightly stale requests.
+
+If no "want" objects are received, send an error:
+TODO: Define error if no "want" lines are requested.
+
+If any "want" object is not reachable, send an error:
+TODO: Define error if an invalid "want" is requested.
+
+Create an empty list, `s_common`.
+
+If "have" was sent:
+
+Loop through the objects in the order supplied by the client.
+
+For each object, if the server has the object reachable from
+a ref, add it to `s_common`.  If a commit is added to `s_common`,
+do not add any ancestors, even if they also appear in `have`.
+
+S: Send the git-upload-pack response:
+
+If the server has found a closed set of objects to pack or the
+request ends with "done", it replies with the pack.
+TODO: Document the pack based response
+
+   S: PACK...
+
+The returned stream is the side-band-64k protocol supported
+by the git-upload-pack service, and the pack is embedded into
+stream 1.  Progress messages from the server side MAY appear
+in stream 2.
+
+Here a "closed set of objects" is defined to have at least
+one path from every "want" to at least one "common" object.
+
+If the server needs more information, it replies with a
+status continue response:
+TODO: Document the non-pack response
+
+C: Parse the upload-pack response:
+   TODO: Document parsing response
+
+'Do another compute step.'
+
+
+Smart Service git-receive-pack
+------------------------------
+This service reads from the repository pointed to by `$GIT_URL`.
+
+Clients MUST first perform ref discovery with
+`$GIT_URL/info/refs?service=git-receive-pack`.
+
+   C: POST $GIT_URL/git-receive-pack HTTP/1.0
+   C: Content-Type: application/x-git-receive-pack-request
+   C:
+   C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status
+   C: 0000
+   C: PACK....
+
+   S: 200 OK
+   S: Content-Type: application/x-git-receive-pack-result
+   S: Cache-Control: no-cache
+   S:
+   S: ....
+
+Clients MUST NOT reuse or revalidate a cached response.
+Servers MUST include sufficient Cache-Control headers
+to prevent caching of the response.
+
+Servers SHOULD support all capabilities defined here.
+
+Clients MUST send at least one command in the request body.
+Within the command portion of the request body clients SHOULD send
+the id obtained through ref discovery as old_id.
+
+  update_request  =  command_list
+		     "PACK" <binary data>
+
+  command_list    =  PKT-LINE(command NUL cap_list LF)
+		     *(command_pkt)
+  command_pkt     =  PKT-LINE(command LF)
+  cap_list        =  *(SP capability) SP
+
+  command         =  create / delete / update
+  create          =  zero-id SP new_id SP name
+  delete          =  old_id SP zero-id SP name
+  update          =  old_id SP new_id SP name
+
+TODO: Document this further.
+
+
+References
+----------
+
+http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
+http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]
+link:technical/pack-protocol.html
+link:technical/protocol-capabilities.html
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
new file mode 100644
index 0000000000..d363a71c37
--- /dev/null
+++ b/Documentation/technical/index-format.txt
@@ -0,0 +1,387 @@
+Git index format
+================
+
+== The Git index file has the following format
+
+  All binary numbers are in network byte order.
+  In a repository using the traditional SHA-1, checksums and object IDs
+  (object names) mentioned below are all computed using SHA-1.  Similarly,
+  in SHA-256 repositories, these values are computed using SHA-256.
+  Version 2 is described here unless stated otherwise.
+
+   - A 12-byte header consisting of
+
+     4-byte signature:
+       The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
+
+     4-byte version number:
+       The current supported versions are 2, 3 and 4.
+
+     32-bit number of index entries.
+
+   - A number of sorted index entries (see below).
+
+   - Extensions
+
+     Extensions are identified by signature. Optional extensions can
+     be ignored if Git does not understand them.
+
+     Git currently supports cache tree and resolve undo extensions.
+
+     4-byte extension signature. If the first byte is 'A'..'Z' the
+     extension is optional and can be ignored.
+
+     32-bit size of the extension
+
+     Extension data
+
+   - Hash checksum over the content of the index file before this checksum.
+
+== Index entry
+
+  Index entries are sorted in ascending order on the name field,
+  interpreted as a string of unsigned bytes (i.e. memcmp() order, no
+  localization, no special casing of directory separator '/'). Entries
+  with the same name are sorted by their stage field.
+
+  32-bit ctime seconds, the last time a file's metadata changed
+    this is stat(2) data
+
+  32-bit ctime nanosecond fractions
+    this is stat(2) data
+
+  32-bit mtime seconds, the last time a file's data changed
+    this is stat(2) data
+
+  32-bit mtime nanosecond fractions
+    this is stat(2) data
+
+  32-bit dev
+    this is stat(2) data
+
+  32-bit ino
+    this is stat(2) data
+
+  32-bit mode, split into (high to low bits)
+
+    4-bit object type
+      valid values in binary are 1000 (regular file), 1010 (symbolic link)
+      and 1110 (gitlink)
+
+    3-bit unused
+
+    9-bit unix permission. Only 0755 and 0644 are valid for regular files.
+    Symbolic links and gitlinks have value 0 in this field.
+
+  32-bit uid
+    this is stat(2) data
+
+  32-bit gid
+    this is stat(2) data
+
+  32-bit file size
+    This is the on-disk size from stat(2), truncated to 32-bit.
+
+  Object name for the represented object
+
+  A 16-bit 'flags' field split into (high to low bits)
+
+    1-bit assume-valid flag
+
+    1-bit extended flag (must be zero in version 2)
+
+    2-bit stage (during merge)
+
+    12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
+    is stored in this field.
+
+  (Version 3 or later) A 16-bit field, only applicable if the
+  "extended flag" above is 1, split into (high to low bits).
+
+    1-bit reserved for future
+
+    1-bit skip-worktree flag (used by sparse checkout)
+
+    1-bit intent-to-add flag (used by "git add -N")
+
+    13-bit unused, must be zero
+
+  Entry path name (variable length) relative to top level directory
+    (without leading slash). '/' is used as path separator. The special
+    path components ".", ".." and ".git" (without quotes) are disallowed.
+    Trailing slash is also disallowed.
+
+    The exact encoding is undefined, but the '.' and '/' characters
+    are encoded in 7-bit ASCII and the encoding cannot contain a NUL
+    byte (iow, this is a UNIX pathname).
+
+  (Version 4) In version 4, the entry path name is prefix-compressed
+    relative to the path name for the previous entry (the very first
+    entry is encoded as if the path name for the previous entry is an
+    empty string).  At the beginning of an entry, an integer N in the
+    variable width encoding (the same encoding as the offset is encoded
+    for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
+    by a NUL-terminated string S.  Removing N bytes from the end of the
+    path name for the previous entry, and replacing it with the string S
+    yields the path name for this entry.
+
+  1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
+  while keeping the name NUL-terminated.
+
+  (Version 4) In version 4, the padding after the pathname does not
+  exist.
+
+  Interpretation of index entries in split index mode is completely
+  different. See below for details.
+
+== Extensions
+
+=== Cache tree
+
+  Since the index does not record entries for directories, the cache
+  entries cannot describe tree objects that already exist in the object
+  database for regions of the index that are unchanged from an existing
+  commit. The cache tree extension stores a recursive tree structure that
+  describes the trees that already exist and completely match sections of
+  the cache entries. This speeds up tree object generation from the index
+  for a new commit by only computing the trees that are "new" to that
+  commit. It also assists when comparing the index to another tree, such
+  as `HEAD^{tree}`, since sections of the index can be skipped when a tree
+  comparison demonstrates equality.
+
+  The recursive tree structure uses nodes that store a number of cache
+  entries, a list of subnodes, and an object ID (OID). The OID references
+  the existing tree for that node, if it is known to exist. The subnodes
+  correspond to subdirectories that themselves have cache tree nodes. The
+  number of cache entries corresponds to the number of cache entries in
+  the index that describe paths within that tree's directory.
+
+  The extension tracks the full directory structure in the cache tree
+  extension, but this is generally smaller than the full cache entry list.
+
+  When a path is updated in index, Git invalidates all nodes of the
+  recursive cache tree corresponding to the parent directories of that
+  path. We store these tree nodes as being "invalid" by using "-1" as the
+  number of cache entries. Invalid nodes still store a span of index
+  entries, allowing Git to focus its efforts when reconstructing a full
+  cache tree.
+
+  The signature for this extension is { 'T', 'R', 'E', 'E' }.
+
+  A series of entries fill the entire extension; each of which
+  consists of:
+
+  - NUL-terminated path component (relative to its parent directory);
+
+  - ASCII decimal number of entries in the index that is covered by the
+    tree this entry represents (entry_count);
+
+  - A space (ASCII 32);
+
+  - ASCII decimal number that represents the number of subtrees this
+    tree has;
+
+  - A newline (ASCII 10); and
+
+  - Object name for the object that would result from writing this span
+    of index as a tree.
+
+  An entry can be in an invalidated state and is represented by having
+  a negative number in the entry_count field. In this case, there is no
+  object name and the next entry starts immediately after the newline.
+  When writing an invalid entry, -1 should always be used as entry_count.
+
+  The entries are written out in the top-down, depth-first order.  The
+  first entry represents the root level of the repository, followed by the
+  first subtree--let's call this A--of the root level (with its name
+  relative to the root level), followed by the first subtree of A (with
+  its name relative to A), and so on. The specified number of subtrees
+  indicates when the current level of the recursive stack is complete.
+
+=== Resolve undo
+
+  A conflict is represented in the index as a set of higher stage entries.
+  When a conflict is resolved (e.g. with "git add path"), these higher
+  stage entries will be removed and a stage-0 entry with proper resolution
+  is added.
+
+  When these higher stage entries are removed, they are saved in the
+  resolve undo extension, so that conflicts can be recreated (e.g. with
+  "git checkout -m"), in case users want to redo a conflict resolution
+  from scratch.
+
+  The signature for this extension is { 'R', 'E', 'U', 'C' }.
+
+  A series of entries fill the entire extension; each of which
+  consists of:
+
+  - NUL-terminated pathname the entry describes (relative to the root of
+    the repository, i.e. full pathname);
+
+  - Three NUL-terminated ASCII octal numbers, entry mode of entries in
+    stage 1 to 3 (a missing stage is represented by "0" in this field);
+    and
+
+  - At most three object names of the entry in stages from 1 to 3
+    (nothing is written for a missing stage).
+
+=== Split index
+
+  In split index mode, the majority of index entries could be stored
+  in a separate file. This extension records the changes to be made on
+  top of that to produce the final index.
+
+  The signature for this extension is { 'l', 'i', 'n', 'k' }.
+
+  The extension consists of:
+
+  - Hash of the shared index file. The shared index file path
+    is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
+    index does not require a shared index file.
+
+  - An ewah-encoded delete bitmap, each bit represents an entry in the
+    shared index. If a bit is set, its corresponding entry in the
+    shared index will be removed from the final index.  Note, because
+    a delete operation changes index entry positions, but we do need
+    original positions in replace phase, it's best to just mark
+    entries for removal, then do a mass deletion after replacement.
+
+  - An ewah-encoded replace bitmap, each bit represents an entry in
+    the shared index. If a bit is set, its corresponding entry in the
+    shared index will be replaced with an entry in this index
+    file. All replaced entries are stored in sorted order in this
+    index. The first "1" bit in the replace bitmap corresponds to the
+    first index entry, the second "1" bit to the second entry and so
+    on. Replaced entries may have empty path names to save space.
+
+  The remaining index entries after replaced ones will be added to the
+  final index. These added entries are also sorted by entry name then
+  stage.
+
+== Untracked cache
+
+  Untracked cache saves the untracked file list and necessary data to
+  verify the cache. The signature for this extension is { 'U', 'N',
+  'T', 'R' }.
+
+  The extension starts with
+
+  - A sequence of NUL-terminated strings, preceded by the size of the
+    sequence in variable width encoding. Each string describes the
+    environment where the cache can be used.
+
+  - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
+    ctime field until "file size".
+
+  - Stat data of core.excludesFile
+
+  - 32-bit dir_flags (see struct dir_struct)
+
+  - Hash of $GIT_DIR/info/exclude. A null hash means the file
+    does not exist.
+
+  - Hash of core.excludesFile. A null hash means the file does
+    not exist.
+
+  - NUL-terminated string of per-dir exclude file name. This usually
+    is ".gitignore".
+
+  - The number of following directory blocks, variable width
+    encoding. If this number is zero, the extension ends here with a
+    following NUL.
+
+  - A number of directory blocks in depth-first-search order, each
+    consists of
+
+    - The number of untracked entries, variable width encoding.
+
+    - The number of sub-directory blocks, variable width encoding.
+
+    - The directory name terminated by NUL.
+
+    - A number of untracked file/dir names terminated by NUL.
+
+The remaining data of each directory block is grouped by type:
+
+  - An ewah bitmap, the n-th bit marks whether the n-th directory has
+    valid untracked cache entries.
+
+  - An ewah bitmap, the n-th bit records "check-only" bit of
+    read_directory_recursive() for the n-th directory.
+
+  - An ewah bitmap, the n-th bit indicates whether hash and stat data
+    is valid for the n-th directory and exists in the next data.
+
+  - An array of stat data. The n-th data corresponds with the n-th
+    "one" bit in the previous ewah bitmap.
+
+  - An array of hashes. The n-th hash corresponds with the n-th "one" bit
+    in the previous ewah bitmap.
+
+  - One NUL.
+
+== File System Monitor cache
+
+  The file system monitor cache tracks files for which the core.fsmonitor
+  hook has told us about changes.  The signature for this extension is
+  { 'F', 'S', 'M', 'N' }.
+
+  The extension starts with
+
+  - 32-bit version number: the current supported versions are 1 and 2.
+
+  - (Version 1)
+    64-bit time: the extension data reflects all changes through the given
+	time which is stored as the nanoseconds elapsed since midnight,
+	January 1, 1970.
+
+  - (Version 2)
+    A null terminated string: an opaque token defined by the file system
+    monitor application.  The extension data reflects all changes relative
+    to that token.
+
+  - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+    is not CE_FSMONITOR_VALID.
+
+== End of Index Entry
+
+  The End of Index Entry (EOIE) is used to locate the end of the variable
+  length index entries and the beginning of the extensions. Code can take
+  advantage of this to quickly locate the index extensions without having
+  to parse through all of the index entries.
+
+  Because it must be able to be loaded before the variable length cache
+  entries and other index extensions, this extension must be written last.
+  The signature for this extension is { 'E', 'O', 'I', 'E' }.
+
+  The extension consists of:
+
+  - 32-bit offset to the end of the index entries
+
+  - Hash over the extension types and their sizes (but not
+	their contents).  E.g. if we have "TREE" extension that is N-bytes
+	long, "REUC" extension that is M-bytes long, followed by "EOIE",
+	then the hash would be:
+
+	Hash("TREE" + <binary representation of N> +
+		"REUC" + <binary representation of M>)
+
+== Index Entry Offset Table
+
+  The Index Entry Offset Table (IEOT) is used to help address the CPU
+  cost of loading the index by enabling multi-threading the process of
+  converting cache entries from the on-disk format to the in-memory format.
+  The signature for this extension is { 'I', 'E', 'O', 'T' }.
+
+  The extension consists of:
+
+  - 32-bit version (currently 1)
+
+  - A number of index offset entries each consisting of:
+
+    - 32-bit offset from the beginning of the file to the first cache entry
+	in this block of entries.
+
+    - 32-bit count of cache entries in this block
diff --git a/Documentation/technical/long-running-process-protocol.txt b/Documentation/technical/long-running-process-protocol.txt
new file mode 100644
index 0000000000..aa0aa9af1c
--- /dev/null
+++ b/Documentation/technical/long-running-process-protocol.txt
@@ -0,0 +1,50 @@
+Long-running process protocol
+=============================
+
+This protocol is used when Git needs to communicate with an external
+process throughout the entire life of a single Git command. All
+communication is in pkt-line format (see technical/protocol-common.txt)
+over standard input and standard output.
+
+Handshake
+---------
+
+Git starts by sending a welcome message (for example,
+"git-filter-client"), a list of supported protocol version numbers, and
+a flush packet. Git expects to read the welcome message with "server"
+instead of "client" (for example, "git-filter-server"), exactly one
+protocol version number from the previously sent list, and a flush
+packet. All further communication will be based on the selected version.
+The remaining protocol description below documents "version=2". Please
+note that "version=42" in the example below does not exist and is only
+there to illustrate how the protocol would look like with more than one
+version.
+
+After the version negotiation Git sends a list of all capabilities that
+it supports and a flush packet. Git expects to read a list of desired
+capabilities, which must be a subset of the supported capabilities list,
+and a flush packet as response:
+------------------------
+packet:          git> git-filter-client
+packet:          git> version=2
+packet:          git> version=42
+packet:          git> 0000
+packet:          git< git-filter-server
+packet:          git< version=2
+packet:          git< 0000
+packet:          git> capability=clean
+packet:          git> capability=smudge
+packet:          git> capability=not-yet-invented
+packet:          git> 0000
+packet:          git< capability=clean
+packet:          git< capability=smudge
+packet:          git< 0000
+------------------------
+
+Shutdown
+--------
+
+Git will close
+the command pipe on exit. The filter is expected to detect EOF
+and exit gracefully on its own. Git will wait until the filter
+process has stopped.
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
new file mode 100644
index 0000000000..e8e377a59f
--- /dev/null
+++ b/Documentation/technical/multi-pack-index.txt
@@ -0,0 +1,105 @@
+Multi-Pack-Index (MIDX) Design Notes
+====================================
+
+The Git object directory contains a 'pack' directory containing
+packfiles (with suffix ".pack") and pack-indexes (with suffix
+".idx"). The pack-indexes provide a way to lookup objects and
+navigate to their offset within the pack, but these must come
+in pairs with the packfiles. This pairing depends on the file
+names, as the pack-index differs only in suffix with its pack-
+file. While the pack-indexes provide fast lookup per packfile,
+this performance degrades as the number of packfiles increases,
+because abbreviations need to inspect every packfile and we are
+more likely to have a miss on our most-recently-used packfile.
+For some large repositories, repacking into a single packfile
+is not feasible due to storage space or excessive repack times.
+
+The multi-pack-index (MIDX for short) stores a list of objects
+and their offsets into multiple packfiles. It contains:
+
+- A list of packfile names.
+- A sorted list of object IDs.
+- A list of metadata for the ith object ID including:
+  - A value j referring to the jth packfile.
+  - An offset within the jth packfile for the object.
+- If large offsets are required, we use another list of large
+  offsets similar to version 2 pack-indexes.
+
+Thus, we can provide O(log N) lookup time for any number
+of packfiles.
+
+Design Details
+--------------
+
+- The MIDX is stored in a file named 'multi-pack-index' in the
+  .git/objects/pack directory. This could be stored in the pack
+  directory of an alternate. It refers only to packfiles in that
+  same directory.
+
+- The core.multiPackIndex config setting must be on to consume MIDX files.
+
+- The file format includes parameters for the object ID hash
+  function, so a future change of hash algorithm does not require
+  a change in format.
+
+- The MIDX keeps only one record per object ID. If an object appears
+  in multiple packfiles, then the MIDX selects the copy in the most-
+  recently modified packfile.
+
+- If there exist packfiles in the pack directory not registered in
+  the MIDX, then those packfiles are loaded into the `packed_git`
+  list and `packed_git_mru` cache.
+
+- The pack-indexes (.idx files) remain in the pack directory so we
+  can delete the MIDX file, set core.midx to false, or downgrade
+  without any loss of information.
+
+- The MIDX file format uses a chunk-based approach (similar to the
+  commit-graph file) that allows optional data to be added.
+
+Future Work
+-----------
+
+- The multi-pack-index allows many packfiles, especially in a context
+  where repacking is expensive (such as a very large repo), or
+  unexpected maintenance time is unacceptable (such as a high-demand
+  build machine). However, the multi-pack-index needs to be rewritten
+  in full every time. We can extend the format to be incremental, so
+  writes are fast. By storing a small "tip" multi-pack-index that
+  points to large "base" MIDX files, we can keep writes fast while
+  still reducing the number of binary searches required for object
+  lookups.
+
+- The reachability bitmap is currently paired directly with a single
+  packfile, using the pack-order as the object order to hopefully
+  compress the bitmaps well using run-length encoding. This could be
+  extended to pair a reachability bitmap with a multi-pack-index. If
+  the multi-pack-index is extended to store a "stable object order"
+  (a function Order(hash) = integer that is constant for a given hash,
+  even as the multi-pack-index is updated) then a reachability bitmap
+  could point to a multi-pack-index and be updated independently.
+
+- Packfiles can be marked as "special" using empty files that share
+  the initial name but replace ".pack" with ".keep" or ".promisor".
+  We can add an optional chunk of data to the multi-pack-index that
+  records flags of information about the packfiles. This allows new
+  states, such as 'repacked' or 'redeltified', that can help with
+  pack maintenance in a multi-pack environment. It may also be
+  helpful to organize packfiles by object type (commit, tree, blob,
+  etc.) and use this metadata to help that maintenance.
+
+- The partial clone feature records special "promisor" packs that
+  may point to objects that are not stored locally, but available
+  on request to a server. The multi-pack-index does not currently
+  track these promisor packs.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=6
+    Chromium work item for: Multi-Pack Index (MIDX)
+
+[1] https://lore.kernel.org/git/20180107181459.222909-1-dstolee@microsoft.com/
+    An earlier RFC for the multi-pack-index feature
+
+[2] https://lore.kernel.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/
+    Git Merge 2018 Contributor's summit notes (includes discussion of MIDX)
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
new file mode 100644
index 0000000000..1faa949bf6
--- /dev/null
+++ b/Documentation/technical/pack-format.txt
@@ -0,0 +1,381 @@
+Git pack format
+===============
+
+== Checksums and object IDs
+
+In a repository using the traditional SHA-1, pack checksums, index checksums,
+and object IDs (object names) mentioned below are all computed using SHA-1.
+Similarly, in SHA-256 repositories, these values are computed using SHA-256.
+
+== pack-*.pack files have the following format:
+
+   - A header appears at the beginning and consists of the following:
+
+     4-byte signature:
+         The signature is: {'P', 'A', 'C', 'K'}
+
+     4-byte version number (network byte order):
+	 Git currently accepts version number 2 or 3 but
+         generates version 2 only.
+
+     4-byte number of objects contained in the pack (network byte order)
+
+     Observation: we cannot have more than 4G versions ;-) and
+     more than 4G objects in a pack.
+
+   - The header is followed by number of object entries, each of
+     which looks like this:
+
+     (undeltified representation)
+     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
+     compressed data
+
+     (deltified representation)
+     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
+     base object name if OBJ_REF_DELTA or a negative relative
+	 offset from the delta object's position in the pack if this
+	 is an OBJ_OFS_DELTA object
+     compressed delta data
+
+     Observation: length of each object is encoded in a variable
+     length format and is not constrained to 32-bit or anything.
+
+  - The trailer records a pack checksum of all of the above.
+
+=== Object types
+
+Valid object types are:
+
+- OBJ_COMMIT (1)
+- OBJ_TREE (2)
+- OBJ_BLOB (3)
+- OBJ_TAG (4)
+- OBJ_OFS_DELTA (6)
+- OBJ_REF_DELTA (7)
+
+Type 5 is reserved for future expansion. Type 0 is invalid.
+
+=== Size encoding
+
+This document uses the following "size encoding" of non-negative
+integers: From each byte, the seven least significant bits are
+used to form the resulting integer. As long as the most significant
+bit is 1, this process continues; the byte with MSB 0 provides the
+last seven bits.  The seven-bit chunks are concatenated. Later
+values are more significant.
+
+This size encoding should not be confused with the "offset encoding",
+which is also used in this document.
+
+=== Deltified representation
+
+Conceptually there are only four object types: commit, tree, tag and
+blob. However to save space, an object could be stored as a "delta" of
+another "base" object. These representations are assigned new types
+ofs-delta and ref-delta, which is only valid in a pack file.
+
+Both ofs-delta and ref-delta store the "delta" to be applied to
+another object (called 'base object') to reconstruct the object. The
+difference between them is, ref-delta directly encodes base object
+name. If the base object is in the same pack, ofs-delta encodes
+the offset of the base object in the pack instead.
+
+The base object could also be deltified if it's in the same pack.
+Ref-delta can also refer to an object outside the pack (i.e. the
+so-called "thin pack"). When stored on disk however, the pack should
+be self contained to avoid cyclic dependency.
+
+The delta data starts with the size of the base object and the
+size of the object to be reconstructed. These sizes are
+encoded using the size encoding from above.  The remainder of
+the delta data is a sequence of instructions to reconstruct the object
+from the base object. If the base object is deltified, it must be
+converted to canonical form first. Each instruction appends more and
+more data to the target object until it's complete. There are two
+supported instructions so far: one for copy a byte range from the
+source object and one for inserting new data embedded in the
+instruction itself.
+
+Each instruction has variable length. Instruction type is determined
+by the seventh bit of the first octet. The following diagrams follow
+the convention in RFC 1951 (Deflate compressed data format).
+
+==== Instruction to copy from base object
+
+  +----------+---------+---------+---------+---------+-------+-------+-------+
+  | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
+  +----------+---------+---------+---------+---------+-------+-------+-------+
+
+This is the instruction format to copy a byte range from the source
+object. It encodes the offset to copy from and the number of bytes to
+copy. Offset and size are in little-endian order.
+
+All offset and size bytes are optional. This is to reduce the
+instruction size when encoding small offsets or sizes. The first seven
+bits in the first octet determines which of the next seven octets is
+present. If bit zero is set, offset1 is present. If bit one is set
+offset2 is present and so on.
+
+Note that a more compact instruction does not change offset and size
+encoding. For example, if only offset2 is omitted like below, offset3
+still contains bits 16-23. It does not become offset2 and contains
+bits 8-15 even if it's right next to offset1.
+
+  +----------+---------+---------+
+  | 10000101 | offset1 | offset3 |
+  +----------+---------+---------+
+
+In its most compact form, this instruction only takes up one byte
+(0x80) with both offset and size omitted, which will have default
+values zero. There is another exception: size zero is automatically
+converted to 0x10000.
+
+==== Instruction to add new data
+
+  +----------+============+
+  | 0xxxxxxx |    data    |
+  +----------+============+
+
+This is the instruction to construct target object without the base
+object. The following data is appended to the target object. The first
+seven bits of the first octet determines the size of data in
+bytes. The size must be non-zero.
+
+==== Reserved instruction
+
+  +----------+============
+  | 00000000 |
+  +----------+============
+
+This is the instruction reserved for future expansion.
+
+== Original (version 1) pack-*.idx files have the following format:
+
+  - The header consists of 256 4-byte network byte order
+    integers.  N-th entry of this table records the number of
+    objects in the corresponding pack, the first byte of whose
+    object name is less than or equal to N.  This is called the
+    'first-level fan-out' table.
+
+  - The header is followed by sorted 24-byte entries, one entry
+    per object in the pack.  Each entry is:
+
+    4-byte network byte order integer, recording where the
+    object is stored in the packfile as the offset from the
+    beginning.
+
+    one object name of the appropriate size.
+
+  - The file is concluded with a trailer:
+
+    A copy of the pack checksum at the end of the corresponding
+    packfile.
+
+    Index checksum of all of the above.
+
+Pack Idx file:
+
+	--  +--------------------------------+
+fanout	    | fanout[0] = 2 (for example)    |-.
+table	    +--------------------------------+ |
+	    | fanout[1]                      | |
+	    +--------------------------------+ |
+	    | fanout[2]                      | |
+	    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
+	    | fanout[255] = total objects    |---.
+	--  +--------------------------------+ | |
+main	    | offset                         | | |
+index	    | object name 00XXXXXXXXXXXXXXXX | | |
+table	    +--------------------------------+ | |
+	    | offset                         | | |
+	    | object name 00XXXXXXXXXXXXXXXX | | |
+	    +--------------------------------+<+ |
+	  .-| offset                         |   |
+	  | | object name 01XXXXXXXXXXXXXXXX |   |
+	  | +--------------------------------+   |
+	  | | offset                         |   |
+	  | | object name 01XXXXXXXXXXXXXXXX |   |
+	  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   |
+	  | | offset                         |   |
+	  | | object name FFXXXXXXXXXXXXXXXX |   |
+	--| +--------------------------------+<--+
+trailer	  | | packfile checksum              |
+	  | +--------------------------------+
+	  | | idxfile checksum               |
+	  | +--------------------------------+
+          .-------.
+                  |
+Pack file entry: <+
+
+     packed object header:
+	1-byte size extension bit (MSB)
+	       type (next 3 bit)
+	       size0 (lower 4-bit)
+        n-byte sizeN (as long as MSB is set, each 7-bit)
+		size0..sizeN form 4+7+7+..+7 bit integer, size0
+		is the least significant part, and sizeN is the
+		most significant part.
+     packed object data:
+        If it is not DELTA, then deflated bytes (the size above
+		is the size before compression).
+	If it is REF_DELTA, then
+	  base object name (the size above is the
+		size of the delta data that follows).
+          delta data, deflated.
+	If it is OFS_DELTA, then
+	  n-byte offset (see below) interpreted as a negative
+		offset from the type-byte of the header of the
+		ofs-delta entry (the size above is the size of
+		the delta data that follows).
+	  delta data, deflated.
+
+     offset encoding:
+	  n bytes with MSB set in all but the last one.
+	  The offset is then the number constructed by
+	  concatenating the lower 7 bit of each byte, and
+	  for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
+	  to the result.
+
+
+
+== Version 2 pack-*.idx files support packs larger than 4 GiB, and
+   have some other reorganizations.  They have the format:
+
+  - A 4-byte magic number '\377tOc' which is an unreasonable
+    fanout[0] value.
+
+  - A 4-byte version number (= 2)
+
+  - A 256-entry fan-out table just like v1.
+
+  - A table of sorted object names.  These are packed together
+    without offset values to reduce the cache footprint of the
+    binary search for a specific object name.
+
+  - A table of 4-byte CRC32 values of the packed object data.
+    This is new in v2 so compressed data can be copied directly
+    from pack to pack during repacking without undetected
+    data corruption.
+
+  - A table of 4-byte offset values (in network byte order).
+    These are usually 31-bit pack file offsets, but large
+    offsets are encoded as an index into the next table with
+    the msbit set.
+
+  - A table of 8-byte offset entries (empty for pack files less
+    than 2 GiB).  Pack files are organized with heavily used
+    objects toward the front, so most object references should
+    not need to refer to this table.
+
+  - The same trailer as a v1 pack file:
+
+    A copy of the pack checksum at the end of
+    corresponding packfile.
+
+    Index checksum of all of the above.
+
+== pack-*.rev files have the format:
+
+  - A 4-byte magic number '0x52494458' ('RIDX').
+
+  - A 4-byte version identifier (= 1).
+
+  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256).
+
+  - A table of index positions (one per packed object, num_objects in
+    total, each a 4-byte unsigned integer in network order), sorted by
+    their corresponding offsets in the packfile.
+
+  - A trailer, containing a:
+
+    checksum of the corresponding packfile, and
+
+    a checksum of all of the above.
+
+All 4-byte numbers are in network order.
+
+== multi-pack-index (MIDX) files have the following format:
+
+The multi-pack-index files refer to multiple pack-files and loose objects.
+
+In order to allow extensions that add extra data to the MIDX, we organize
+the body into "chunks" and provide a lookup table at the beginning of the
+body. The header includes certain length values, such as the number of packs,
+the number of base MIDX files, hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+	4-byte signature:
+	    The signature is: {'M', 'I', 'D', 'X'}
+
+	1-byte version number:
+	    Git only writes or recognizes version 1.
+
+	1-byte Object Id Version
+	    We infer the length of object IDs (OIDs) from this value:
+		1 => SHA-1
+		2 => SHA-256
+	    If the hash type does not match the repository's hash algorithm,
+	    the multi-pack-index file should be ignored with a warning
+	    presented to the user.
+
+	1-byte number of "chunks"
+
+	1-byte number of base multi-pack-index files:
+	    This value is currently always zero.
+
+	4-byte number of pack files
+
+CHUNK LOOKUP:
+
+	(C + 1) * 12 bytes providing the chunk offsets:
+	    First 4 bytes describe chunk id. Value 0 is a terminating label.
+	    Other 8 bytes provide offset in current file for chunk to start.
+	    (Chunks are provided in file-order, so you can infer the length
+	    using the next chunk position if necessary.)
+
+	The CHUNK LOOKUP matches the table of contents from
+	link:technical/chunk-format.html[the chunk-based file format].
+
+	The remaining data in the body is described one chunk at a time, and
+	these chunks may be given in any order. Chunks are required unless
+	otherwise specified.
+
+CHUNK DATA:
+
+	Packfile Names (ID: {'P', 'N', 'A', 'M'})
+	    Stores the packfile names as concatenated, null-terminated strings.
+	    Packfiles must be listed in lexicographic order for fast lookups by
+	    name. This is the only chunk not guaranteed to be a multiple of four
+	    bytes in length, so should be the last chunk for alignment reasons.
+
+	OID Fanout (ID: {'O', 'I', 'D', 'F'})
+	    The ith entry, F[i], stores the number of OIDs with first
+	    byte at most i. Thus F[255] stores the total
+	    number of objects.
+
+	OID Lookup (ID: {'O', 'I', 'D', 'L'})
+	    The OIDs for all objects in the MIDX are stored in lexicographic
+	    order in this chunk.
+
+	Object Offsets (ID: {'O', 'O', 'F', 'F'})
+	    Stores two 4-byte values for every object.
+	    1: The pack-int-id for the pack storing this object.
+	    2: The offset within the pack.
+		If all offsets are less than 2^32, then the large offset chunk
+		will not exist and offsets are stored as in IDX v1.
+		If there is at least one offset value larger than 2^32-1, then
+		the large offset chunk must exist, and offsets larger than
+		2^31-1 must be stored in it instead. If the large offset chunk
+		exists and the 31st bit is on, then removing that bit reveals
+		the row in the large offsets containing the 8-byte offset of
+		this object.
+
+	[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
+	    8-byte offsets into large packfiles.
+
+TRAILER:
+
+	Index checksum of the above contents.
diff --git a/Documentation/technical/pack-heuristics.txt b/Documentation/technical/pack-heuristics.txt
new file mode 100644
index 0000000000..95a07db6e8
--- /dev/null
+++ b/Documentation/technical/pack-heuristics.txt
@@ -0,0 +1,460 @@
+Concerning Git's Packing Heuristics
+===================================
+
+        Oh, here's a really stupid question:
+
+                  Where do I go
+               to learn the details
+	    of Git's packing heuristics?
+
+Be careful what you ask!
+
+Followers of the Git, please open the Git IRC Log and turn to
+February 10, 2006.
+
+It's a rare occasion, and we are joined by the King Git Himself,
+Linus Torvalds (linus).  Nathaniel Smith, (njs`), has the floor
+and seeks enlightenment.  Others are present, but silent.
+
+Let's listen in!
+
+    <njs`> Oh, here's a really stupid question -- where do I go to
+	learn the details of Git's packing heuristics?  google avails
+        me not, reading the source didn't help a lot, and wading
+        through the whole mailing list seems less efficient than any
+        of that.
+
+It is a bold start!  A plea for help combined with a simultaneous
+tri-part attack on some of the tried and true mainstays in the quest
+for enlightenment.  Brash accusations of google being useless. Hubris!
+Maligning the source.  Heresy!  Disdain for the mailing list archives.
+Woe.
+
+    <pasky> yes, the packing-related delta stuff is somewhat
+        mysterious even for me ;)
+
+Ah!  Modesty after all.
+
+    <linus> njs, I don't think the docs exist. That's something where
+	 I don't think anybody else than me even really got involved.
+	 Most of the rest of Git others have been busy with (especially
+	 Junio), but packing nobody touched after I did it.
+
+It's cryptic, yet vague.  Linus in style for sure.  Wise men
+interpret this as an apology.  A few argue it is merely a
+statement of fact.
+
+    <njs`> I guess the next step is "read the source again", but I
+        have to build up a certain level of gumption first :-)
+
+Indeed!  On both points.
+
+    <linus> The packing heuristic is actually really really simple.
+
+Bait...
+
+    <linus> But strange.
+
+And switch.  That ought to do it!
+
+    <linus> Remember: Git really doesn't follow files. So what it does is
+        - generate a list of all objects
+        - sort the list according to magic heuristics
+        - walk the list, using a sliding window, seeing if an object
+          can be diffed against another object in the window
+        - write out the list in recency order
+
+The traditional understatement:
+
+    <njs`> I suspect that what I'm missing is the precise definition of
+        the word "magic"
+
+The traditional insight:
+
+    <pasky> yes
+
+And Babel-like confusion flowed.
+
+    <njs`> oh, hmm, and I'm not sure what this sliding window means either
+
+    <pasky> iirc, it appeared to me to be just the sha1 of the object
+        when reading the code casually ...
+
+        ... which simply doesn't sound as a very good heuristics, though ;)
+
+    <njs`> .....and recency order.  okay, I think it's clear I didn't
+       even realize how much I wasn't realizing :-)
+
+Ah, grasshopper!  And thus the enlightenment begins anew.
+
+    <linus> The "magic" is actually in theory totally arbitrary.
+        ANY order will give you a working pack, but no, it's not
+	ordered by SHA-1.
+
+        Before talking about the ordering for the sliding delta
+        window, let's talk about the recency order. That's more
+        important in one way.
+
+    <njs`> Right, but if all you want is a working way to pack things
+        together, you could just use cat and save yourself some
+        trouble...
+
+Waaait for it....
+
+    <linus> The recency ordering (which is basically: put objects
+        _physically_ into the pack in the order that they are
+        "reachable" from the head) is important.
+
+    <njs`> okay
+
+    <linus> It's important because that's the thing that gives packs
+        good locality. It keeps the objects close to the head (whether
+        they are old or new, but they are _reachable_ from the head)
+        at the head of the pack. So packs actually have absolutely
+        _wonderful_ IO patterns.
+
+Read that again, because it is important.
+
+    <linus> But recency ordering is totally useless for deciding how
+        to actually generate the deltas, so the delta ordering is
+        something else.
+
+        The delta ordering is (wait for it):
+        - first sort by the "basename" of the object, as defined by
+          the name the object was _first_ reached through when
+          generating the object list
+        - within the same basename, sort by size of the object
+        - but always sort different types separately (commits first).
+
+        That's not exactly it, but it's very close.
+
+    <njs`> The "_first_ reached" thing is not too important, just you
+        need some way to break ties since the same objects may be
+        reachable many ways, yes?
+
+And as if to clarify:
+
+    <linus> The point is that it's all really just any random
+        heuristic, and the ordering is totally unimportant for
+        correctness, but it helps a lot if the heuristic gives
+        "clumping" for things that are likely to delta well against
+        each other.
+
+It is an important point, so secretly, I did my own research and have
+included my results below.  To be fair, it has changed some over time.
+And through the magic of Revisionistic History, I draw upon this entry
+from The Git IRC Logs on my father's birthday, March 1:
+
+    <gitster> The quote from the above linus should be rewritten a
+        bit (wait for it):
+        - first sort by type.  Different objects never delta with
+	  each other.
+        - then sort by filename/dirname.  hash of the basename
+          occupies the top BITS_PER_INT-DIR_BITS bits, and bottom
+          DIR_BITS are for the hash of leading path elements.
+        - then if we are doing "thin" pack, the objects we are _not_
+          going to pack but we know about are sorted earlier than
+          other objects.
+        - and finally sort by size, larger to smaller.
+
+In one swell-foop, clarification and obscurification!  Nonetheless,
+authoritative.  Cryptic, yet concise.  It even solicits notions of
+quotes from The Source Code.  Clearly, more study is needed.
+
+    <gitster> That's the sort order.  What this means is:
+        - we do not delta different object types.
+	- we prefer to delta the objects with the same full path, but
+          allow files with the same name from different directories.
+	- we always prefer to delta against objects we are not going
+          to send, if there are some.
+	- we prefer to delta against larger objects, so that we have
+          lots of removals.
+
+        The penultimate rule is for "thin" packs.  It is used when
+        the other side is known to have such objects.
+
+There it is again. "Thin" packs.  I'm thinking to myself, "What
+is a 'thin' pack?"  So I ask:
+
+    <jdl> What is a "thin" pack?
+
+    <gitster> Use of --objects-edge to rev-list as the upstream of
+        pack-objects.  The pack transfer protocol negotiates that.
+
+Woo hoo!  Cleared that _right_ up!
+
+    <gitster> There are two directions - push and fetch.
+
+There!  Did you see it?  It is not '"push" and "pull"'!  How often the
+confusion has started here.  So casually mentioned, too!
+
+    <gitster> For push, git-send-pack invokes git-receive-pack on the
+        other end.  The receive-pack says "I have up to these commits".
+        send-pack looks at them, and computes what are missing from
+        the other end.  So "thin" could be the default there.
+
+        In the other direction, fetch, git-fetch-pack and
+        git-clone-pack invokes git-upload-pack on the other end
+	(via ssh or by talking to the daemon).
+
+	There are two cases: fetch-pack with -k and clone-pack is one,
+        fetch-pack without -k is the other.  clone-pack and fetch-pack
+        with -k will keep the downloaded packfile without expanded, so
+        we do not use thin pack transfer.  Otherwise, the generated
+        pack will have delta without base object in the same pack.
+
+        But fetch-pack without -k will explode the received pack into
+        individual objects, so we automatically ask upload-pack to
+        give us a thin pack if upload-pack supports it.
+
+OK then.
+
+Uh.
+
+Let's return to the previous conversation still in progress.
+
+    <njs`> and "basename" means something like "the tail of end of
+        path of file objects and dir objects, as per basename(3), and
+        we just declare all commit and tag objects to have the same
+        basename" or something?
+
+Luckily, that too is a point that gitster clarified for us!
+
+If I might add, the trick is to make files that _might_ be similar be
+located close to each other in the hash buckets based on their file
+names.  It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and
+"Makefile" all landed in the same bucket due to their common basename,
+"Makefile". However, now they land in "close" buckets.
+
+The algorithm allows not just for the _same_ bucket, but for _close_
+buckets to be considered delta candidates.  The rationale is
+essentially that files, like Makefiles, often have very similar
+content no matter what directory they live in.
+
+    <linus> I played around with different delta algorithms, and with
+        making the "delta window" bigger, but having too big of a
+        sliding window makes it very expensive to generate the pack:
+        you need to compare every object with a _ton_ of other objects.
+
+        There are a number of other trivial heuristics too, which
+        basically boil down to "don't bother even trying to delta this
+        pair" if we can tell before-hand that the delta isn't worth it
+        (due to size differences, where we can take a previous delta
+        result into account to decide that "ok, no point in trying
+        that one, it will be worse").
+
+        End result: packing is actually very size efficient. It's
+        somewhat CPU-wasteful, but on the other hand, since you're
+        really only supposed to do it maybe once a month (and you can
+        do it during the night), nobody really seems to care.
+
+Nice Engineering Touch, there.  Find when it doesn't matter, and
+proclaim it a non-issue.  Good style too!
+
+    <njs`> So, just to repeat to see if I'm following, we start by
+        getting a list of the objects we want to pack, we sort it by
+        this heuristic (basically lexicographically on the tuple
+        (type, basename, size)).
+
+        Then we walk through this list, and calculate a delta of
+        each object against the last n (tunable parameter) objects,
+        and pick the smallest of these deltas.
+
+Vastly simplified, but the essence is there!
+
+    <linus> Correct.
+
+    <njs`> And then once we have picked a delta or fulltext to
+        represent each object, we re-sort by recency, and write them
+        out in that order.
+
+    <linus> Yup. Some other small details:
+
+And of course there is the "Other Shoe" Factor too.
+
+    <linus> - We limit the delta depth to another magic value (right
+        now both the window and delta depth magic values are just "10")
+
+    <njs`> Hrm, my intuition is that you'd end up with really _bad_ IO
+        patterns, because the things you want are near by, but to
+        actually reconstruct them you may have to jump all over in
+        random ways.
+
+    <linus> - When we write out a delta, and we haven't yet written
+        out the object it is a delta against, we write out the base
+        object first.  And no, when we reconstruct them, we actually
+        get nice IO patterns, because:
+        - larger objects tend to be "more recent" (Linus' law: files grow)
+        - we actively try to generate deltas from a larger object to a
+          smaller one
+        - this means that the top-of-tree very seldom has deltas
+          (i.e. deltas in _practice_ are "backwards deltas")
+
+Again, we should reread that whole paragraph.  Not just because
+Linus has slipped Linus's Law in there on us, but because it is
+important.  Let's make sure we clarify some of the points here:
+
+    <njs`> So the point is just that in practice, delta order and
+        recency order match each other quite well.
+
+    <linus> Yes. There's another nice side to this (and yes, it was
+	designed that way ;):
+        - the reason we generate deltas against the larger object is
+	  actually a big space saver too!
+
+    <njs`> Hmm, but your last comment (if "we haven't yet written out
+        the object it is a delta against, we write out the base object
+        first"), seems like it would make these facts mostly
+        irrelevant because even if in practice you would not have to
+        wander around much, in fact you just brute-force say that in
+        the cases where you might have to wander, don't do that :-)
+
+    <linus> Yes and no. Notice the rule: we only write out the base
+        object first if the delta against it was more recent.  That
+        means that you can actually have deltas that refer to a base
+        object that is _not_ close to the delta object, but that only
+        happens when the delta is needed to generate an _old_ object.
+
+    <linus> See?
+
+Yeah, no.  I missed that on the first two or three readings myself.
+
+    <linus> This keeps the front of the pack dense. The front of the
+        pack never contains data that isn't relevant to a "recent"
+        object.  The size optimization comes from our use of xdelta
+        (but is true for many other delta algorithms): removing data
+        is cheaper (in size) than adding data.
+
+        When you remove data, you only need to say "copy bytes n--m".
+	In contrast, in a delta that _adds_ data, you have to say "add
+        these bytes: 'actual data goes here'"
+
+    *** njs` has quit: Read error: 104 (Connection reset by peer)
+
+    <linus> Uhhuh. I hope I didn't blow njs` mind.
+
+    *** njs` has joined channel #git
+
+    <pasky> :)
+
+The silent observers are amused.  Of course.
+
+And as if njs` was expected to be omniscient:
+
+    <linus> njs - did you miss anything?
+
+OK, I'll spell it out.  That's Geek Humor.  If njs` was not actually
+connected for a little bit there, how would he know if missed anything
+while he was disconnected?  He's a benevolent dictator with a sense of
+humor!  Well noted!
+
+    <njs`> Stupid router.  Or gremlins, or whatever.
+
+It's a cheap shot at Cisco.  Take 'em when you can.
+
+    <njs`> Yes and no. Notice the rule: we only write out the base
+        object first if the delta against it was more recent.
+
+        I'm getting lost in all these orders, let me re-read :-)
+	So the write-out order is from most recent to least recent?
+        (Conceivably it could be the opposite way too, I'm not sure if
+        we've said) though my connection back at home is logging, so I
+        can just read what you said there :-)
+
+And for those of you paying attention, the Omniscient Trick has just
+been detailed!
+
+    <linus> Yes, we always write out most recent first
+
+    <njs`> And, yeah, I got the part about deeper-in-history stuff
+        having worse IO characteristics, one sort of doesn't care.
+
+    <linus> With the caveat that if the "most recent" needs an older
+        object to delta against (hey, shrinking sometimes does
+        happen), we write out the old object with the delta.
+
+    <njs`> (if only it happened more...)
+
+    <linus> Anyway, the pack-file could easily be denser still, but
+	because it's used both for streaming (the Git protocol) and
+        for on-disk, it has a few pessimizations.
+
+Actually, it is a made-up word. But it is a made-up word being
+used as setup for a later optimization, which is a real word:
+
+    <linus> In particular, while the pack-file is then compressed,
+        it's compressed just one object at a time, so the actual
+        compression factor is less than it could be in theory. But it
+        means that it's all nice random-access with a simple index to
+        do "object name->location in packfile" translation.
+
+    <njs`> I'm assuming the real win for delta-ing large->small is
+        more homogeneous statistics for gzip to run over?
+
+        (You have to put the bytes in one place or another, but
+        putting them in a larger blob wins on compression)
+
+        Actually, what is the compression strategy -- each delta
+        individually gzipped, the whole file gzipped, somewhere in
+        between, no compression at all, ....?
+
+        Right.
+
+Reality IRC sets in.  For example:
+
+    <pasky> I'll read the rest in the morning, I really have to go
+        sleep or there's no hope whatsoever for me at the today's
+        exam... g'nite all.
+
+Heh.
+
+    <linus> pasky: g'nite
+
+    <njs`> pasky: 'luck
+
+    <linus> Right: large->small matters exactly because of compression
+        behaviour. If it was non-compressed, it probably wouldn't make
+        any difference.
+
+    <njs`> yeah
+
+    <linus> Anyway: I'm not even trying to claim that the pack-files
+        are perfect, but they do tend to have a nice balance of
+        density vs ease-of use.
+
+Gasp!  OK, saved.  That's a fair Engineering trade off.  Close call!
+In fact, Linus reflects on some Basic Engineering Fundamentals,
+design options, etc.
+
+    <linus> More importantly, they allow Git to still _conceptually_
+        never deal with deltas at all, and be a "whole object" store.
+
+        Which has some problems (we discussed bad huge-file
+	behaviour on the Git lists the other day), but it does mean
+	that the basic Git concepts are really really simple and
+        straightforward.
+
+        It's all been quite stable.
+
+        Which I think is very much a result of having very simple
+        basic ideas, so that there's never any confusion about what's
+        going on.
+
+        Bugs happen, but they are "simple" bugs. And bugs that
+        actually get some object store detail wrong are almost always
+        so obvious that they never go anywhere.
+
+    <njs`> Yeah.
+
+Nuff said.
+
+    <linus> Anyway.  I'm off for bed. It's not 6AM here, but I've got
+	 three kids, and have to get up early in the morning to send
+	 them off. I need my beauty sleep.
+
+    <njs`> :-)
+
+    <njs`> appreciate the infodump, I really was failing to find the
+	details on Git packs :-)
+
+And now you know the rest of the story.
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
new file mode 100644
index 0000000000..e13a2c064d
--- /dev/null
+++ b/Documentation/technical/pack-protocol.txt
@@ -0,0 +1,709 @@
+Packfile transfer protocols
+===========================
+
+Git supports transferring data in packfiles over the ssh://, git://, http:// and
+file:// transports.  There exist two sets of protocols, one for pushing
+data from a client to a server and another for fetching data from a
+server to a client.  The three transports (ssh, git, file) use the same
+protocol to transfer data. http is documented in http-protocol.txt.
+
+The processes invoked in the canonical Git implementation are 'upload-pack'
+on the server side and 'fetch-pack' on the client side for fetching data;
+then 'receive-pack' on the server and 'send-pack' on the client for pushing
+data.  The protocol functions to have a server tell a client what is
+currently on the server, then for the two to negotiate the smallest amount
+of data to send in order to fully update one or the other.
+
+pkt-line Format
+---------------
+
+The descriptions below build on the pkt-line format described in
+protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless
+otherwise noted the usual pkt-line LF rules apply: the sender SHOULD
+include a LF, but the receiver MUST NOT complain if it is not present.
+
+An error packet is a special pkt-line that contains an error string.
+
+----
+  error-line     =  PKT-LINE("ERR" SP explanation-text)
+----
+
+Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY
+be sent. Once this packet is sent by a client or a server, the data transfer
+process defined in this protocol is terminated.
+
+Transports
+----------
+There are three transports over which the packfile protocol is
+initiated.  The Git transport is a simple, unauthenticated server that
+takes the command (almost always 'upload-pack', though Git
+servers can be configured to be globally writable, in which 'receive-
+pack' initiation is also allowed) with which the client wishes to
+communicate and executes it and connects it to the requesting
+process.
+
+In the SSH transport, the client just runs the 'upload-pack'
+or 'receive-pack' process on the server over the SSH protocol and then
+communicates with that invoked process over the SSH connection.
+
+The file:// transport runs the 'upload-pack' or 'receive-pack'
+process locally and communicates with it over a pipe.
+
+Extra Parameters
+----------------
+
+The protocol provides a mechanism in which clients can send additional
+information in its first message to the server. These are called "Extra
+Parameters", and are supported by the Git, SSH, and HTTP protocols.
+
+Each Extra Parameter takes the form of `<key>=<value>` or `<key>`.
+
+Servers that receive any such Extra Parameters MUST ignore all
+unrecognized keys. Currently, the only Extra Parameter recognized is
+"version" with a value of '1' or '2'.  See protocol-v2.txt for more
+information on protocol version 2.
+
+Git Transport
+-------------
+
+The Git transport starts off by sending the command and repository
+on the wire using the pkt-line format, followed by a NUL byte and a
+hostname parameter, terminated by a NUL byte.
+
+   0033git-upload-pack /project.git\0host=myserver.com\0
+
+The transport may send Extra Parameters by adding an additional NUL
+byte, and then adding one or more NUL-terminated strings:
+
+   003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0
+
+--
+   git-proto-request = request-command SP pathname NUL
+		       [ host-parameter NUL ] [ NUL extra-parameters ]
+   request-command   = "git-upload-pack" / "git-receive-pack" /
+		       "git-upload-archive"   ; case sensitive
+   pathname          = *( %x01-ff ) ; exclude NUL
+   host-parameter    = "host=" hostname [ ":" port ]
+   extra-parameters  = 1*extra-parameter
+   extra-parameter   = 1*( %x01-ff ) NUL
+--
+
+host-parameter is used for the
+git-daemon name based virtual hosting.  See --interpolated-path
+option to git daemon, with the %H/%CH format characters.
+
+Basically what the Git client is doing to connect to an 'upload-pack'
+process on the server side over the Git protocol is this:
+
+   $ echo -e -n \
+     "003agit-upload-pack /schacon/gitbook.git\0host=example.com\0" |
+     nc -v example.com 9418
+
+
+SSH Transport
+-------------
+
+Initiating the upload-pack or receive-pack processes over SSH is
+executing the binary on the server via SSH remote execution.
+It is basically equivalent to running this:
+
+   $ ssh git.example.com "git-upload-pack '/project.git'"
+
+For a server to support Git pushing and pulling for a given user over
+SSH, that user needs to be able to execute one or both of those
+commands via the SSH shell that they are provided on login.  On some
+systems, that shell access is limited to only being able to run those
+two commands, or even just one of them.
+
+In an ssh:// format URI, it's absolute in the URI, so the '/' after
+the host name (or port number) is sent as an argument, which is then
+read by the remote git-upload-pack exactly as is, so it's effectively
+an absolute path in the remote filesystem.
+
+       git clone ssh://user@example.com/project.git
+		    |
+		    v
+    ssh user@example.com "git-upload-pack '/project.git'"
+
+In a "user@host:path" format URI, its relative to the user's home
+directory, because the Git client will run:
+
+     git clone user@example.com:project.git
+		    |
+		    v
+  ssh user@example.com "git-upload-pack 'project.git'"
+
+The exception is if a '~' is used, in which case
+we execute it without the leading '/'.
+
+      ssh://user@example.com/~alice/project.git,
+		     |
+		     v
+   ssh user@example.com "git-upload-pack '~alice/project.git'"
+
+Depending on the value of the `protocol.version` configuration variable,
+Git may attempt to send Extra Parameters as a colon-separated string in
+the GIT_PROTOCOL environment variable. This is done only if
+the `ssh.variant` configuration variable indicates that the ssh command
+supports passing environment variables as an argument.
+
+A few things to remember here:
+
+- The "command name" is spelled with dash (e.g. git-upload-pack), but
+  this can be overridden by the client;
+
+- The repository path is always quoted with single quotes.
+
+Fetching Data From a Server
+---------------------------
+
+When one Git repository wants to get data that a second repository
+has, the first can 'fetch' from the second.  This operation determines
+what data the server has that the client does not then streams that
+data down to the client in packfile format.
+
+
+Reference Discovery
+-------------------
+
+When the client initially connects the server will immediately respond
+with a version number (if "version=1" is sent as an Extra Parameter),
+and a listing of each reference it has (all branches and tags) along
+with the object name that each reference currently points to.
+
+   $ echo -e -n "0045git-upload-pack /schacon/gitbook.git\0host=example.com\0\0version=1\0" |
+      nc -v example.com 9418
+   000eversion 1
+   00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack
+		side-band side-band-64k ofs-delta shallow no-progress include-tag
+   00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration
+   003f7217a7c7e582c46cec22a130adf4b9d7d950fba0 refs/heads/master
+   003cb88d2441cac0977faf98efc80305012112238d9d refs/tags/v0.9
+   003c525128480b96c89e6418b1e40909bf6c5b2d580f refs/tags/v1.0
+   003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{}
+   0000
+
+The returned response is a pkt-line stream describing each ref and
+its current value.  The stream MUST be sorted by name according to
+the C locale ordering.
+
+If HEAD is a valid ref, HEAD MUST appear as the first advertised
+ref.  If HEAD is not a valid ref, HEAD MUST NOT appear in the
+advertisement list at all, but other refs may still appear.
+
+The stream MUST include capability declarations behind a NUL on the
+first ref. The peeled value of a ref (that is "ref^{}") MUST be
+immediately after the ref itself, if presented. A conforming server
+MUST peel the ref if it's an annotated tag.
+
+----
+  advertised-refs  =  *1("version 1")
+		      (no-refs / list-of-refs)
+		      *shallow
+		      flush-pkt
+
+  no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
+		      NUL capability-list)
+
+  list-of-refs     =  first-ref *other-ref
+  first-ref        =  PKT-LINE(obj-id SP refname
+		      NUL capability-list)
+
+  other-ref        =  PKT-LINE(other-tip / other-peeled)
+  other-tip        =  obj-id SP refname
+  other-peeled     =  obj-id SP refname "^{}"
+
+  shallow          =  PKT-LINE("shallow" SP obj-id)
+
+  capability-list  =  capability *(SP capability)
+  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
+  LC_ALPHA         =  %x61-7A
+----
+
+Server and client MUST use lowercase for obj-id, both MUST treat obj-id
+as case-insensitive.
+
+See protocol-capabilities.txt for a list of allowed server capabilities
+and descriptions.
+
+Packfile Negotiation
+--------------------
+After reference and capabilities discovery, the client can decide to
+terminate the connection by sending a flush-pkt, telling the server it can
+now gracefully terminate, and disconnect, when it does not need any pack
+data. This can happen with the ls-remote command, and also can happen when
+the client already is up to date.
+
+Otherwise, it enters the negotiation phase, where the client and
+server determine what the minimal packfile necessary for transport is,
+by telling the server what objects it wants, its shallow objects
+(if any), and the maximum commit depth it wants (if any).  The client
+will also send a list of the capabilities it wants to be in effect,
+out of what the server said it could do with the first 'want' line.
+
+----
+  upload-request    =  want-list
+		       *shallow-line
+		       *1depth-request
+		       [filter-request]
+		       flush-pkt
+
+  want-list         =  first-want
+		       *additional-want
+
+  shallow-line      =  PKT-LINE("shallow" SP obj-id)
+
+  depth-request     =  PKT-LINE("deepen" SP depth) /
+		       PKT-LINE("deepen-since" SP timestamp) /
+		       PKT-LINE("deepen-not" SP ref)
+
+  first-want        =  PKT-LINE("want" SP obj-id SP capability-list)
+  additional-want   =  PKT-LINE("want" SP obj-id)
+
+  depth             =  1*DIGIT
+
+  filter-request    =  PKT-LINE("filter" SP filter-spec)
+----
+
+Clients MUST send all the obj-ids it wants from the reference
+discovery phase as 'want' lines. Clients MUST send at least one
+'want' command in the request body. Clients MUST NOT mention an
+obj-id in a 'want' command which did not appear in the response
+obtained through ref discovery.
+
+The client MUST write all obj-ids which it only has shallow copies
+of (meaning that it does not have the parents of a commit) as
+'shallow' lines so that the server is aware of the limitations of
+the client's history.
+
+The client now sends the maximum commit history depth it wants for
+this transaction, which is the number of commits it wants from the
+tip of the history, if any, as a 'deepen' line.  A depth of 0 is the
+same as not making a depth request. The client does not want to receive
+any commits beyond this depth, nor does it want objects needed only to
+complete those commits. Commits whose parents are not received as a
+result are defined as shallow and marked as such in the server. This
+information is sent back to the client in the next step.
+
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial clone and partial fetch
+operations. An object that does not meet a filter-spec value is
+omitted unless explicitly requested in a 'want' line. See `rev-list`
+for possible filter-spec values.
+
+Once all the 'want's and 'shallow's (and optional 'deepen') are
+transferred, clients MUST send a flush-pkt, to tell the server side
+that it is done sending the list.
+
+Otherwise, if the client sent a positive depth request, the server
+will determine which commits will and will not be shallow and
+send this information to the client. If the client did not request
+a positive depth, this step is skipped.
+
+----
+  shallow-update   =  *shallow-line
+		      *unshallow-line
+		      flush-pkt
+
+  shallow-line     =  PKT-LINE("shallow" SP obj-id)
+
+  unshallow-line   =  PKT-LINE("unshallow" SP obj-id)
+----
+
+If the client has requested a positive depth, the server will compute
+the set of commits which are no deeper than the desired depth. The set
+of commits start at the client's wants.
+
+The server writes 'shallow' lines for each
+commit whose parents will not be sent as a result. The server writes
+an 'unshallow' line for each commit which the client has indicated is
+shallow, but is no longer shallow at the currently requested depth
+(that is, its parents will now be sent). The server MUST NOT mark
+as unshallow anything which the client has not indicated was shallow.
+
+Now the client will send a list of the obj-ids it has using 'have'
+lines, so the server can make a packfile that only contains the objects
+that the client needs. In multi_ack mode, the canonical implementation
+will send up to 32 of these at a time, then will send a flush-pkt. The
+canonical implementation will skip ahead and send the next 32 immediately,
+so that there is always a block of 32 "in-flight on the wire" at a time.
+
+----
+  upload-haves      =  have-list
+		       compute-end
+
+  have-list         =  *have-line
+  have-line         =  PKT-LINE("have" SP obj-id)
+  compute-end       =  flush-pkt / PKT-LINE("done")
+----
+
+If the server reads 'have' lines, it then will respond by ACKing any
+of the obj-ids the client said it had that the server also has. The
+server will ACK obj-ids differently depending on which ack mode is
+chosen by the client.
+
+In multi_ack mode:
+
+  * the server will respond with 'ACK obj-id continue' for any common
+    commits.
+
+  * once the server has found an acceptable common base commit and is
+    ready to make a packfile, it will blindly ACK all 'have' obj-ids
+    back to the client.
+
+  * the server will then send a 'NAK' and then wait for another response
+    from the client - either a 'done' or another list of 'have' lines.
+
+In multi_ack_detailed mode:
+
+  * the server will differentiate the ACKs where it is signaling
+    that it is ready to send data with 'ACK obj-id ready' lines, and
+    signals the identified common commits with 'ACK obj-id common' lines.
+
+Without either multi_ack or multi_ack_detailed:
+
+ * upload-pack sends "ACK obj-id" on the first common object it finds.
+   After that it says nothing until the client gives it a "done".
+
+ * upload-pack sends "NAK" on a flush-pkt if no common object
+   has been found yet.  If one has been found, and thus an ACK
+   was already sent, it's silent on the flush-pkt.
+
+After the client has gotten enough ACK responses that it can determine
+that the server has enough information to send an efficient packfile
+(in the canonical implementation, this is determined when it has received
+enough ACKs that it can color everything left in the --date-order queue
+as common with the server, or the --date-order queue is empty), or the
+client determines that it wants to give up (in the canonical implementation,
+this is determined when the client sends 256 'have' lines without getting
+any of them ACKed by the server - meaning there is nothing in common and
+the server should just send all of its objects), then the client will send
+a 'done' command.  The 'done' command signals to the server that the client
+is ready to receive its packfile data.
+
+However, the 256 limit *only* turns on in the canonical client
+implementation if we have received at least one "ACK %s continue"
+during a prior round.  This helps to ensure that at least one common
+ancestor is found before we give up entirely.
+
+Once the 'done' line is read from the client, the server will either
+send a final 'ACK obj-id' or it will send a 'NAK'. 'obj-id' is the object
+name of the last commit determined to be common. The server only sends
+ACK after 'done' if there is at least one common base and multi_ack or
+multi_ack_detailed is enabled. The server always sends NAK after 'done'
+if there is no common base found.
+
+Instead of 'ACK' or 'NAK', the server may send an error message (for
+example, if it does not recognize an object in a 'want' line received
+from the client).
+
+Then the server will start sending its packfile data.
+
+----
+  server-response = *ack_multi ack / nak
+  ack_multi       = PKT-LINE("ACK" SP obj-id ack_status)
+  ack_status      = "continue" / "common" / "ready"
+  ack             = PKT-LINE("ACK" SP obj-id)
+  nak             = PKT-LINE("NAK")
+----
+
+A simple clone may look like this (with no 'have' lines):
+
+----
+   C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \
+     side-band-64k ofs-delta\n
+   C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
+   C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
+   C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
+   C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n
+   C: 0000
+   C: 0009done\n
+
+   S: 0008NAK\n
+   S: [PACKFILE]
+----
+
+An incremental update (fetch) response might look like this:
+
+----
+   C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \
+     side-band-64k ofs-delta\n
+   C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
+   C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
+   C: 0000
+   C: 0032have 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
+   C: [30 more have lines]
+   C: 0032have 74730d410fcb6603ace96f1dc55ea6196122532d\n
+   C: 0000
+
+   S: 003aACK 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 continue\n
+   S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d continue\n
+   S: 0008NAK\n
+
+   C: 0009done\n
+
+   S: 0031ACK 74730d410fcb6603ace96f1dc55ea6196122532d\n
+   S: [PACKFILE]
+----
+
+
+Packfile Data
+-------------
+
+Now that the client and server have finished negotiation about what
+the minimal amount of data that needs to be sent to the client is, the server
+will construct and send the required data in packfile format.
+
+See pack-format.txt for what the packfile itself actually looks like.
+
+If 'side-band' or 'side-band-64k' capabilities have been specified by
+the client, the server will send the packfile data multiplexed.
+
+Each packet starting with the packet-line length of the amount of data
+that follows, followed by a single byte specifying the sideband the
+following data is coming in on.
+
+In 'side-band' mode, it will send up to 999 data bytes plus 1 control
+code, for a total of up to 1000 bytes in a pkt-line.  In 'side-band-64k'
+mode it will send up to 65519 data bytes plus 1 control code, for a
+total of up to 65520 bytes in a pkt-line.
+
+The sideband byte will be a '1', '2' or a '3'. Sideband '1' will contain
+packfile data, sideband '2' will be used for progress information that the
+client will generally print to stderr and sideband '3' is used for error
+information.
+
+If no 'side-band' capability was specified, the server will stream the
+entire packfile without multiplexing.
+
+
+Pushing Data To a Server
+------------------------
+
+Pushing data to a server will invoke the 'receive-pack' process on the
+server, which will allow the client to tell it which references it should
+update and then send all the data the server will need for those new
+references to be complete.  Once all the data is received and validated,
+the server will then update its references to what the client specified.
+
+Authentication
+--------------
+
+The protocol itself contains no authentication mechanisms.  That is to be
+handled by the transport, such as SSH, before the 'receive-pack' process is
+invoked.  If 'receive-pack' is configured over the Git transport, those
+repositories will be writable by anyone who can access that port (9418) as
+that transport is unauthenticated.
+
+Reference Discovery
+-------------------
+
+The reference discovery phase is done nearly the same way as it is in the
+fetching protocol. Each reference obj-id and name on the server is sent
+in packet-line format to the client, followed by a flush-pkt.  The only
+real difference is that the capability listing is different - the only
+possible values are 'report-status', 'report-status-v2', 'delete-refs',
+'ofs-delta', 'atomic' and 'push-options'.
+
+Reference Update Request and Packfile Transfer
+----------------------------------------------
+
+Once the client knows what references the server is at, it can send a
+list of reference update requests.  For each reference on the server
+that it wants to update, it sends a line listing the obj-id currently on
+the server, the obj-id the client would like to update it to and the name
+of the reference.
+
+This list is followed by a flush-pkt.
+
+----
+  update-requests   =  *shallow ( command-list | push-cert )
+
+  shallow           =  PKT-LINE("shallow" SP obj-id)
+
+  command-list      =  PKT-LINE(command NUL capability-list)
+		       *PKT-LINE(command)
+		       flush-pkt
+
+  command           =  create / delete / update
+  create            =  zero-id SP new-id  SP name
+  delete            =  old-id  SP zero-id SP name
+  update            =  old-id  SP new-id  SP name
+
+  old-id            =  obj-id
+  new-id            =  obj-id
+
+  push-cert         = PKT-LINE("push-cert" NUL capability-list LF)
+		      PKT-LINE("certificate version 0.1" LF)
+		      PKT-LINE("pusher" SP ident LF)
+		      PKT-LINE("pushee" SP url LF)
+		      PKT-LINE("nonce" SP nonce LF)
+		      *PKT-LINE("push-option" SP push-option LF)
+		      PKT-LINE(LF)
+		      *PKT-LINE(command LF)
+		      *PKT-LINE(gpg-signature-lines LF)
+		      PKT-LINE("push-cert-end" LF)
+
+  push-option       =  1*( VCHAR | SP )
+----
+
+If the server has advertised the 'push-options' capability and the client has
+specified 'push-options' as part of the capability list above, the client then
+sends its push options followed by a flush-pkt.
+
+----
+  push-options      =  *PKT-LINE(push-option) flush-pkt
+----
+
+For backwards compatibility with older Git servers, if the client sends a push
+cert and push options, it MUST send its push options both embedded within the
+push cert and after the push cert. (Note that the push options within the cert
+are prefixed, but the push options after the cert are not.) Both these lists
+MUST be the same, modulo the prefix.
+
+After that the packfile that
+should contain all the objects that the server will need to complete the new
+references will be sent.
+
+----
+  packfile          =  "PACK" 28*(OCTET)
+----
+
+If the receiving end does not support delete-refs, the sending end MUST
+NOT ask for delete command.
+
+If the receiving end does not support push-cert, the sending end
+MUST NOT send a push-cert command.  When a push-cert command is
+sent, command-list MUST NOT be sent; the commands recorded in the
+push certificate is used instead.
+
+The packfile MUST NOT be sent if the only command used is 'delete'.
+
+A packfile MUST be sent if either create or update command is used,
+even if the server already has all the necessary objects.  In this
+case the client MUST send an empty packfile.   The only time this
+is likely to happen is if the client is creating
+a new branch or a tag that points to an existing obj-id.
+
+The server will receive the packfile, unpack it, then validate each
+reference that is being updated that it hasn't changed while the request
+was being processed (the obj-id is still the same as the old-id), and
+it will run any update hooks to make sure that the update is acceptable.
+If all of that is fine, the server will then update the references.
+
+Push Certificate
+----------------
+
+A push certificate begins with a set of header lines.  After the
+header and an empty line, the protocol commands follow, one per
+line. Note that the trailing LF in push-cert PKT-LINEs is _not_
+optional; it must be present.
+
+Currently, the following header fields are defined:
+
+`pusher` ident::
+	Identify the GPG key in "Human Readable Name <email@address>"
+	format.
+
+`pushee` url::
+	The repository URL (anonymized, if the URL contains
+	authentication material) the user who ran `git push`
+	intended to push into.
+
+`nonce` nonce::
+	The 'nonce' string the receiving repository asked the
+	pushing user to include in the certificate, to prevent
+	replay attacks.
+
+The GPG signature lines are a detached signature for the contents
+recorded in the push certificate before the signature block begins.
+The detached signature is used to certify that the commands were
+given by the pusher, who must be the signer.
+
+Report Status
+-------------
+
+After receiving the pack data from the sender, the receiver sends a
+report if 'report-status' or 'report-status-v2' capability is in effect.
+It is a short listing of what happened in that update.  It will first
+list the status of the packfile unpacking as either 'unpack ok' or
+'unpack [error]'.  Then it will list the status for each of the references
+that it tried to update.  Each line is either 'ok [refname]' if the
+update was successful, or 'ng [refname] [error]' if the update was not.
+
+----
+  report-status     = unpack-status
+		      1*(command-status)
+		      flush-pkt
+
+  unpack-status     = PKT-LINE("unpack" SP unpack-result)
+  unpack-result     = "ok" / error-msg
+
+  command-status    = command-ok / command-fail
+  command-ok        = PKT-LINE("ok" SP refname)
+  command-fail      = PKT-LINE("ng" SP refname SP error-msg)
+
+  error-msg         = 1*(OCTET) ; where not "ok"
+----
+
+The 'report-status-v2' capability extends the protocol by adding new option
+lines in order to support reporting of reference rewritten by the
+'proc-receive' hook.  The 'proc-receive' hook may handle a command for a
+pseudo-reference which may create or update one or more references, and each
+reference may have different name, different new-oid, and different old-oid.
+
+----
+  report-status-v2  = unpack-status
+		      1*(command-status-v2)
+		      flush-pkt
+
+  unpack-status     = PKT-LINE("unpack" SP unpack-result)
+  unpack-result     = "ok" / error-msg
+
+  command-status-v2 = command-ok-v2 / command-fail
+  command-ok-v2     = command-ok
+		      *option-line
+
+  command-ok        = PKT-LINE("ok" SP refname)
+  command-fail      = PKT-LINE("ng" SP refname SP error-msg)
+
+  error-msg         = 1*(OCTET) ; where not "ok"
+
+  option-line       = *1(option-refname)
+		      *1(option-old-oid)
+		      *1(option-new-oid)
+		      *1(option-forced-update)
+
+  option-refname    = PKT-LINE("option" SP "refname" SP refname)
+  option-old-oid    = PKT-LINE("option" SP "old-oid" SP obj-id)
+  option-new-oid    = PKT-LINE("option" SP "new-oid" SP obj-id)
+  option-force      = PKT-LINE("option" SP "forced-update")
+
+----
+
+Updates can be unsuccessful for a number of reasons.  The reference can have
+changed since the reference discovery phase was originally sent, meaning
+someone pushed in the meantime.  The reference being pushed could be a
+non-fast-forward reference and the update hooks or configuration could be
+set to not allow that, etc.  Also, some references can be updated while others
+can be rejected.
+
+An example client/server communication might look like this:
+
+----
+   S: 006274730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n
+   S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n
+   S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n
+   S: 003d74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n
+   S: 0000
+
+   C: 00677d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n
+   C: 006874730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n
+   C: 0000
+   C: [PACKDATA]
+
+   S: 000eunpack ok\n
+   S: 0018ok refs/heads/debug\n
+   S: 002ang refs/heads/master non-fast-forward\n
+----
diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
new file mode 100644
index 0000000000..f7eabc6c76
--- /dev/null
+++ b/Documentation/technical/packfile-uri.txt
@@ -0,0 +1,81 @@
+Packfile URIs
+=============
+
+This feature allows servers to serve part of their packfile response as URIs.
+This allows server designs that improve scalability in bandwidth and CPU usage
+(for example, by serving some data through a CDN), and (in the future) provides
+some measure of resumability to clients.
+
+This feature is available only in protocol version 2.
+
+Protocol
+--------
+
+The server advertises the `packfile-uris` capability.
+
+If the client then communicates which protocols (HTTPS, etc.) it supports with
+a `packfile-uris` argument, the server MAY send a `packfile-uris` section
+directly before the `packfile` section (right after `wanted-refs` if it is
+sent) containing URIs of any of the given protocols. The URIs point to
+packfiles that use only features that the client has declared that it supports
+(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
+this section.
+
+Clients should then download and index all the given URIs (in addition to
+downloading and indexing the packfile given in the `packfile` section of the
+response) before performing the connectivity check.
+
+Server design
+-------------
+
+The server can be trivially made compatible with the proposed protocol by
+having it advertise `packfile-uris`, tolerating the client sending
+`packfile-uris`, and never sending any `packfile-uris` section. But we should
+include some sort of non-trivial implementation in the Minimum Viable Product,
+at least so that we can test the client.
+
+This is the implementation: a feature, marked experimental, that allows the
+server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
+<uri>` entries. Whenever the list of objects to be sent is assembled, all such
+blobs are excluded, replaced with URIs. As noted in "Future work" below, the
+server can evolve in the future to support excluding other objects (or other
+implementations of servers could be made that support excluding other objects)
+without needing a protocol change, so clients should not expect that packfiles
+downloaded in this way only contain single blobs.
+
+Client design
+-------------
+
+The client has a config variable `fetch.uriprotocols` that determines which
+protocols the end user is willing to use. By default, this is empty.
+
+When the client downloads the given URIs, it should store them with "keep"
+files, just like it does with the packfile in the `packfile` section. These
+additional "keep" files can only be removed after the refs have been updated -
+just like the "keep" file for the packfile in the `packfile` section.
+
+The division of work (initial fetch + additional URIs) introduces convenient
+points for resumption of an interrupted clone - such resumption can be done
+after the Minimum Viable Product (see "Future work").
+
+Future work
+-----------
+
+The protocol design allows some evolution of the server and client without any
+need for protocol changes, so only a small-scoped design is included here to
+form the MVP. For example, the following can be done:
+
+ * On the server, more sophisticated means of excluding objects (e.g. by
+   specifying a commit to represent that commit and all objects that it
+   references).
+ * On the client, resumption of clone. If a clone is interrupted, information
+   could be recorded in the repository's config and a "clone-resume" command
+   can resume the clone in progress. (Resumption of subsequent fetches is more
+   difficult because that must deal with the user wanting to use the repository
+   even after the fetch was interrupted.)
+
+There are some possible features that will require a change in protocol:
+
+ * Additional HTTP headers (e.g. authentication)
+ * Byte range support
+ * Different file formats referenced by URIs (e.g. raw object)
diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
new file mode 100644
index 0000000000..0780d30cac
--- /dev/null
+++ b/Documentation/technical/partial-clone.txt
@@ -0,0 +1,368 @@
+Partial Clone Design Notes
+==========================
+
+The "Partial Clone" feature is a performance optimization for Git that
+allows Git to function without having a complete copy of the repository.
+The goal of this work is to allow Git better handle extremely large
+repositories.
+
+During clone and fetch operations, Git downloads the complete contents
+and history of the repository.  This includes all commits, trees, and
+blobs for the complete life of the repository.  For extremely large
+repositories, clones can take hours (or days) and consume 100+GiB of disk
+space.
+
+Often in these repositories there are many blobs and trees that the user
+does not need such as:
+
+  1. files outside of the user's work area in the tree.  For example, in
+     a repository with 500K directories and 3.5M files in every commit,
+     we can avoid downloading many objects if the user only needs a
+     narrow "cone" of the source tree.
+
+  2. large binary assets.  For example, in a repository where large build
+     artifacts are checked into the tree, we can avoid downloading all
+     previous versions of these non-mergeable binary assets and only
+     download versions that are actually referenced.
+
+Partial clone allows us to avoid downloading such unneeded objects *in
+advance* during clone and fetch operations and thereby reduce download
+times and disk usage.  Missing objects can later be "demand fetched"
+if/when needed.
+
+A remote that can later provide the missing objects is called a
+promisor remote, as it promises to send the objects when
+requested. Initially Git supported only one promisor remote, the origin
+remote from which the user cloned and that was configured in the
+"extensions.partialClone" config option. Later support for more than
+one promisor remote has been implemented.
+
+Use of partial clone requires that the user be online and the origin
+remote or other promisor remotes be available for on-demand fetching
+of missing objects.  This may or may not be problematic for the user.
+For example, if the user can stay within the pre-selected subset of
+the source tree, they may not encounter any missing objects.
+Alternatively, the user could try to pre-fetch various objects if they
+know that they are going offline.
+
+
+Non-Goals
+---------
+
+Partial clone is a mechanism to limit the number of blobs and trees downloaded
+*within* a given range of commits -- and is therefore independent of and not
+intended to conflict with existing DAG-level mechanisms to limit the set of
+requested commits (i.e. shallow clone, single branch, or fetch '<refspec>').
+
+
+Design Overview
+---------------
+
+Partial clone logically consists of the following parts:
+
+- A mechanism for the client to describe unneeded or unwanted objects to
+  the server.
+
+- A mechanism for the server to omit such unwanted objects from packfiles
+  sent to the client.
+
+- A mechanism for the client to gracefully handle missing objects (that
+  were previously omitted by the server).
+
+- A mechanism for the client to backfill missing objects as needed.
+
+
+Design Details
+--------------
+
+- A new pack-protocol capability "filter" is added to the fetch-pack and
+  upload-pack negotiation.
++
+This uses the existing capability discovery mechanism.
+See "filter" in Documentation/technical/pack-protocol.txt.
+
+- Clients pass a "filter-spec" to clone and fetch which is passed to the
+  server to request filtering during packfile construction.
++
+There are various filters available to accommodate different situations.
+See "--filter=<filter-spec>" in Documentation/rev-list-options.txt.
+
+- On the server pack-objects applies the requested filter-spec as it
+  creates "filtered" packfiles for the client.
++
+These filtered packfiles are *incomplete* in the traditional sense because
+they may contain objects that reference objects not contained in the
+packfile and that the client doesn't already have.  For example, the
+filtered packfile may contain trees or tags that reference missing blobs
+or commits that reference missing trees.
+
+- On the client these incomplete packfiles are marked as "promisor packfiles"
+  and treated differently by various commands.
+
+- On the client a repository extension is added to the local config to
+  prevent older versions of git from failing mid-operation because of
+  missing objects that they cannot handle.
+  See "extensions.partialClone" in Documentation/technical/repository-version.txt"
+
+
+Handling Missing Objects
+------------------------
+
+- An object may be missing due to a partial clone or fetch, or missing
+  due to repository corruption.  To differentiate these cases, the
+  local repository specially indicates such filtered packfiles
+  obtained from promisor remotes as "promisor packfiles".
++
+These promisor packfiles consist of a "<name>.promisor" file with
+arbitrary contents (like the "<name>.keep" files), in addition to
+their "<name>.pack" and "<name>.idx" files.
+
+- The local repository considers a "promisor object" to be an object that
+  it knows (to the best of its ability) that promisor remotes have promised
+  that they have, either because the local repository has that object in one of
+  its promisor packfiles, or because another promisor object refers to it.
++
+When Git encounters a missing object, Git can see if it is a promisor object
+and handle it appropriately.  If not, Git can report a corruption.
++
+This means that there is no need for the client to explicitly maintain an
+expensive-to-modify list of missing objects.[a]
+
+- Since almost all Git code currently expects any referenced object to be
+  present locally and because we do not want to force every command to do
+  a dry-run first, a fallback mechanism is added to allow Git to attempt
+  to dynamically fetch missing objects from promisor remotes.
++
+When the normal object lookup fails to find an object, Git invokes
+promisor_remote_get_direct() to try to get the object from a promisor
+remote and then retry the object lookup.  This allows objects to be
+"faulted in" without complicated prediction algorithms.
++
+For efficiency reasons, no check as to whether the missing object is
+actually a promisor object is performed.
++
+Dynamic object fetching tends to be slow as objects are fetched one at
+a time.
+
+- `checkout` (and any other command using `unpack-trees`) has been taught
+  to bulk pre-fetch all required missing blobs in a single batch.
+
+- `rev-list` has been taught to print missing objects.
++
+This can be used by other commands to bulk prefetch objects.
+For example, a "git log -p A..B" may internally want to first do
+something like "git rev-list --objects --quiet --missing=print A..B"
+and prefetch those objects in bulk.
+
+- `fsck` has been updated to be fully aware of promisor objects.
+
+- `repack` in GC has been updated to not touch promisor packfiles at all,
+  and to only repack other objects.
+
+- The global variable "fetch_if_missing" is used to control whether an
+  object lookup will attempt to dynamically fetch a missing object or
+  report an error.
++
+We are not happy with this global variable and would like to remove it,
+but that requires significant refactoring of the object code to pass an
+additional flag.
+
+
+Fetching Missing Objects
+------------------------
+
+- Fetching of objects is done by invoking a "git fetch" subprocess.
+
+- The local repository sends a request with the hashes of all requested
+  objects, and does not perform any packfile negotiation.
+  It then receives a packfile.
+
+- Because we are reusing the existing fetch mechanism, fetching
+  currently fetches all objects referred to by the requested objects, even
+  though they are not necessary.
+
+
+Using many promisor remotes
+---------------------------
+
+Many promisor remotes can be configured and used.
+
+This allows for example a user to have multiple geographically-close
+cache servers for fetching missing blobs while continuing to do
+filtered `git-fetch` commands from the central server.
+
+When fetching objects, promisor remotes are tried one after the other
+until all the objects have been fetched.
+
+Remotes that are considered "promisor" remotes are those specified by
+the following configuration variables:
+
+- `extensions.partialClone = <name>`
+
+- `remote.<name>.promisor = true`
+
+- `remote.<name>.partialCloneFilter = ...`
+
+Only one promisor remote can be configured using the
+`extensions.partialClone` config variable. This promisor remote will
+be the last one tried when fetching objects.
+
+We decided to make it the last one we try, because it is likely that
+someone using many promisor remotes is doing so because the other
+promisor remotes are better for some reason (maybe they are closer or
+faster for some kind of objects) than the origin, and the origin is
+likely to be the remote specified by extensions.partialClone.
+
+This justification is not very strong, but one choice had to be made,
+and anyway the long term plan should be to make the order somehow
+fully configurable.
+
+For now though the other promisor remotes will be tried in the order
+they appear in the config file.
+
+Current Limitations
+-------------------
+
+- It is not possible to specify the order in which the promisor
+  remotes are tried in other ways than the order in which they appear
+  in the config file.
++
+It is also not possible to specify an order to be used when fetching
+from one remote and a different order when fetching from another
+remote.
+
+- It is not possible to push only specific objects to a promisor
+  remote.
++
+It is not possible to push at the same time to multiple promisor
+remote in a specific order.
+
+- Dynamic object fetching will only ask promisor remotes for missing
+  objects.  We assume that promisor remotes have a complete view of the
+  repository and can satisfy all such requests.
+
+- Repack essentially treats promisor and non-promisor packfiles as 2
+  distinct partitions and does not mix them.  Repack currently only works
+  on non-promisor packfiles and loose objects.
+
+- Dynamic object fetching invokes fetch-pack once *for each item*
+  because most algorithms stumble upon a missing object and need to have
+  it resolved before continuing their work.  This may incur significant
+  overhead -- and multiple authentication requests -- if many objects are
+  needed.
+
+- Dynamic object fetching currently uses the existing pack protocol V0
+  which means that each object is requested via fetch-pack.  The server
+  will send a full set of info/refs when the connection is established.
+  If there are large number of refs, this may incur significant overhead.
+
+
+Future Work
+-----------
+
+- Improve the way to specify the order in which promisor remotes are
+  tried.
++
+For example this could allow to specify explicitly something like:
+"When fetching from this remote, I want to use these promisor remotes
+in this order, though, when pushing or fetching to that remote, I want
+to use those promisor remotes in that order."
+
+- Allow pushing to promisor remotes.
++
+The user might want to work in a triangular work flow with multiple
+promisor remotes that each have an incomplete view of the repository.
+
+- Allow repack to work on promisor packfiles (while keeping them distinct
+  from non-promisor packfiles).
+
+- Allow non-pathname-based filters to make use of packfile bitmaps (when
+  present).  This was just an omission during the initial implementation.
+
+- Investigate use of a long-running process to dynamically fetch a series
+  of objects, such as proposed in [5,6] to reduce process startup and
+  overhead costs.
++
+It would be nice if pack protocol V2 could allow that long-running
+process to make a series of requests over a single long-running
+connection.
+
+- Investigate pack protocol V2 to avoid the info/refs broadcast on
+  each connection with the server to dynamically fetch missing objects.
+
+- Investigate the need to handle loose promisor objects.
++
+Objects in promisor packfiles are allowed to reference missing objects
+that can be dynamically fetched from the server.  An assumption was
+made that loose objects are only created locally and therefore should
+not reference a missing object.  We may need to revisit that assumption
+if, for example, we dynamically fetch a missing tree and store it as a
+loose object rather than a single object packfile.
++
+This does not necessarily mean we need to mark loose objects as promisor;
+it may be sufficient to relax the object lookup or is-promisor functions.
+
+
+Non-Tasks
+---------
+
+- Every time the subject of "demand loading blobs" comes up it seems
+  that someone suggests that the server be allowed to "guess" and send
+  additional objects that may be related to the requested objects.
++
+No work has gone into actually doing that; we're just documenting that
+it is a common suggestion.  We're not sure how it would work and have
+no plans to work on it.
++
+It is valid for the server to send more objects than requested (even
+for a dynamic object fetch), but we are not building on that.
+
+
+Footnotes
+---------
+
+[a] expensive-to-modify list of missing objects:  Earlier in the design of
+    partial clone we discussed the need for a single list of missing objects.
+    This would essentially be a sorted linear list of OIDs that the were
+    omitted by the server during a clone or subsequent fetches.
+
+This file would need to be loaded into memory on every object lookup.
+It would need to be read, updated, and re-written (like the .git/index)
+on every explicit "git fetch" command *and* on any dynamic object fetch.
+
+The cost to read, update, and write this file could add significant
+overhead to every command if there are many missing objects.  For example,
+if there are 100M missing blobs, this file would be at least 2GiB on disk.
+
+With the "promisor" concept, we *infer* a missing object based upon the
+type of packfile that references it.
+
+
+Related Links
+-------------
+[0] https://crbug.com/git/2
+    Bug#2: Partial Clone
+
+[1] https://lore.kernel.org/git/20170113155253.1644-1-benpeart@microsoft.com/ +
+    Subject: [RFC] Add support for downloading blobs on demand +
+    Date: Fri, 13 Jan 2017 10:52:53 -0500
+
+[2] https://lore.kernel.org/git/cover.1506714999.git.jonathantanmy@google.com/ +
+    Subject: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches) +
+    Date: Fri, 29 Sep 2017 13:11:36 -0700
+
+[3] https://lore.kernel.org/git/20170426221346.25337-1-jonathantanmy@google.com/ +
+    Subject: Proposal for missing blob support in Git repos +
+    Date: Wed, 26 Apr 2017 15:13:46 -0700
+
+[4] https://lore.kernel.org/git/1488999039-37631-1-git-send-email-git@jeffhostetler.com/ +
+    Subject: [PATCH 00/10] RFC Partial Clone and Fetch +
+    Date: Wed,  8 Mar 2017 18:50:29 +0000
+
+[5] https://lore.kernel.org/git/20170505152802.6724-1-benpeart@microsoft.com/ +
+    Subject: [PATCH v7 00/10] refactor the filter process code into a reusable module +
+    Date: Fri,  5 May 2017 11:27:52 -0400
+
+[6] https://lore.kernel.org/git/20170714132651.170708-1-benpeart@microsoft.com/ +
+    Subject: [RFC/PATCH v2 0/1] Add support for downloading blobs on demand +
+    Date: Fri, 14 Jul 2017 09:26:50 -0400
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
new file mode 100644
index 0000000000..9dfade930d
--- /dev/null
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -0,0 +1,380 @@
+Git Protocol Capabilities
+=========================
+
+NOTE: this document describes capabilities for versions 0 and 1 of the pack
+protocol. For version 2, please refer to the link:protocol-v2.html[protocol-v2]
+doc.
+
+Servers SHOULD support all capabilities defined in this document.
+
+On the very first line of the initial server response of either
+receive-pack and upload-pack the first reference is followed by
+a NUL byte and then a list of space delimited server capabilities.
+These allow the server to declare what it can and cannot support
+to the client.
+
+Client will then send a space separated list of capabilities it wants
+to be in effect. The client MUST NOT ask for capabilities the server
+did not say it supports.
+
+Server MUST diagnose and abort if capabilities it does not understand
+was sent.  Server MUST NOT ignore capabilities that client requested
+and server advertised.  As a consequence of these rules, server MUST
+NOT advertise capabilities it does not understand.
+
+The 'atomic', 'report-status', 'report-status-v2', 'delete-refs', 'quiet',
+and 'push-cert' capabilities are sent and recognized by the receive-pack
+(push to server) process.
+
+The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized
+by both upload-pack and receive-pack protocols.  The 'agent' and 'session-id'
+capabilities may optionally be sent in both protocols.
+
+All other capabilities are only recognized by the upload-pack (fetch
+from server) process.
+
+multi_ack
+---------
+
+The 'multi_ack' capability allows the server to return "ACK obj-id
+continue" as soon as it finds a commit that it can use as a common
+base, between the client's wants and the client's have set.
+
+By sending this early, the server can potentially head off the client
+from walking any further down that particular branch of the client's
+repository history.  The client may still need to walk down other
+branches, sending have lines for those, until the server has a
+complete cut across the DAG, or the client has said "done".
+
+Without multi_ack, a client sends have lines in --date-order until
+the server has found a common base.  That means the client will send
+have lines that are already known by the server to be common, because
+they overlap in time with another branch that the server hasn't found
+a common base on yet.
+
+For example suppose the client has commits in caps that the server
+doesn't and the server has commits in lower case that the client
+doesn't, as in the following diagram:
+
+       +---- u ---------------------- x
+      /              +----- y
+     /              /
+    a -- b -- c -- d -- E -- F
+       \
+	+--- Q -- R -- S
+
+If the client wants x,y and starts out by saying have F,S, the server
+doesn't know what F,S is.  Eventually the client says "have d" and
+the server sends "ACK d continue" to let the client know to stop
+walking down that line (so don't send c-b-a), but it's not done yet,
+it needs a base for x. The client keeps going with S-R-Q, until a
+gets reached, at which point the server has a clear base and it all
+ends.
+
+Without multi_ack the client would have sent that c-b-a chain anyway,
+interleaved with S-R-Q.
+
+multi_ack_detailed
+------------------
+This is an extension of multi_ack that permits client to better
+understand the server's in-memory state. See pack-protocol.txt,
+section "Packfile Negotiation" for more information.
+
+no-done
+-------
+This capability should only be used with the smart HTTP protocol. If
+multi_ack_detailed and no-done are both present, then the sender is
+free to immediately send a pack following its first "ACK obj-id ready"
+message.
+
+Without no-done in the smart HTTP protocol, the server session would
+end and the client has to make another trip to send "done" before
+the server can send the pack. no-done removes the last round and
+thus slightly reduces latency.
+
+thin-pack
+---------
+
+A thin pack is one with deltas which reference base objects not
+contained within the pack (but are known to exist at the receiving
+end). This can reduce the network traffic significantly, but it
+requires the receiving end to know how to "thicken" these packs by
+adding the missing bases to the pack.
+
+The upload-pack server advertises 'thin-pack' when it can generate
+and send a thin pack. A client requests the 'thin-pack' capability
+when it understands how to "thicken" it, notifying the server that
+it can receive such a pack. A client MUST NOT request the
+'thin-pack' capability if it cannot turn a thin pack into a
+self-contained pack.
+
+Receive-pack, on the other hand, is assumed by default to be able to
+handle thin packs, but can ask the client not to use the feature by
+advertising the 'no-thin' capability. A client MUST NOT send a thin
+pack if the server advertises the 'no-thin' capability.
+
+The reasons for this asymmetry are historical. The receive-pack
+program did not exist until after the invention of thin packs, so
+historically the reference implementation of receive-pack always
+understood thin packs. Adding 'no-thin' later allowed receive-pack
+to disable the feature in a backwards-compatible manner.
+
+
+side-band, side-band-64k
+------------------------
+
+This capability means that server can send, and client understand multiplexed
+progress reports and error info interleaved with the packfile itself.
+
+These two options are mutually exclusive. A modern client always
+favors 'side-band-64k'.
+
+Either mode indicates that the packfile data will be streamed broken
+up into packets of up to either 1000 bytes in the case of 'side_band',
+or 65520 bytes in the case of 'side_band_64k'. Each packet is made up
+of a leading 4-byte pkt-line length of how much data is in the packet,
+followed by a 1-byte stream code, followed by the actual data.
+
+The stream code can be one of:
+
+ 1 - pack data
+ 2 - progress messages
+ 3 - fatal error message just before stream aborts
+
+The "side-band-64k" capability came about as a way for newer clients
+that can handle much larger packets to request packets that are
+actually crammed nearly full, while maintaining backward compatibility
+for the older clients.
+
+Further, with side-band and its up to 1000-byte messages, it's actually
+999 bytes of payload and 1 byte for the stream code. With side-band-64k,
+same deal, you have up to 65519 bytes of data and 1 byte for the stream
+code.
+
+The client MUST send only maximum of one of "side-band" and "side-
+band-64k".  Server MUST diagnose it as an error if client requests
+both.
+
+ofs-delta
+---------
+
+Server can send, and client understand PACKv2 with delta referring to
+its base by position in pack rather than by an obj-id.  That is, they can
+send/read OBJ_OFS_DELTA (aka type 6) in a packfile.
+
+agent
+-----
+
+The server may optionally send a capability of the form `agent=X` to
+notify the client that the server is running version `X`. The client may
+optionally return its own agent string by responding with an `agent=Y`
+capability (but it MUST NOT do so if the server did not mention the
+agent capability). The `X` and `Y` strings may contain any printable
+ASCII characters except space (i.e., the byte range 32 < x < 127), and
+are typically of the form "package/version" (e.g., "git/1.8.3.1"). The
+agent strings are purely informative for statistics and debugging
+purposes, and MUST NOT be used to programmatically assume the presence
+or absence of particular features.
+
+object-format
+-------------
+
+This capability, which takes a hash algorithm as an argument, indicates
+that the server supports the given hash algorithms.  It may be sent
+multiple times; if so, the first one given is the one used in the ref
+advertisement.
+
+When provided by the client, this indicates that it intends to use the
+given hash algorithm to communicate.  The algorithm provided must be one
+that the server supports.
+
+If this capability is not provided, it is assumed that the only
+supported algorithm is SHA-1.
+
+symref
+------
+
+This parameterized capability is used to inform the receiver which symbolic ref
+points to which ref; for example, "symref=HEAD:refs/heads/master" tells the
+receiver that HEAD points to master. This capability can be repeated to
+represent multiple symrefs.
+
+Servers SHOULD include this capability for the HEAD symref if it is one of the
+refs being sent.
+
+Clients MAY use the parameters from this capability to select the proper initial
+branch when cloning a repository.
+
+shallow
+-------
+
+This capability adds "deepen", "shallow" and "unshallow" commands to
+the  fetch-pack/upload-pack protocol so clients can request shallow
+clones.
+
+deepen-since
+------------
+
+This capability adds "deepen-since" command to fetch-pack/upload-pack
+protocol so the client can request shallow clones that are cut at a
+specific time, instead of depth. Internally it's equivalent of doing
+"rev-list --max-age=<timestamp>" on the server side. "deepen-since"
+cannot be used with "deepen".
+
+deepen-not
+----------
+
+This capability adds "deepen-not" command to fetch-pack/upload-pack
+protocol so the client can request shallow clones that are cut at a
+specific revision, instead of depth. Internally it's equivalent of
+doing "rev-list --not <rev>" on the server side. "deepen-not"
+cannot be used with "deepen", but can be used with "deepen-since".
+
+deepen-relative
+---------------
+
+If this capability is requested by the client, the semantics of
+"deepen" command is changed. The "depth" argument is the depth from
+the current shallow boundary, instead of the depth from remote refs.
+
+no-progress
+-----------
+
+The client was started with "git clone -q" or something, and doesn't
+want that side band 2.  Basically the client just says "I do not
+wish to receive stream 2 on sideband, so do not send it to me, and if
+you did, I will drop it on the floor anyway".  However, the sideband
+channel 3 is still used for error responses.
+
+include-tag
+-----------
+
+The 'include-tag' capability is about sending annotated tags if we are
+sending objects they point to.  If we pack an object to the client, and
+a tag object points exactly at that object, we pack the tag object too.
+In general this allows a client to get all new annotated tags when it
+fetches a branch, in a single network connection.
+
+Clients MAY always send include-tag, hardcoding it into a request when
+the server advertises this capability. The decision for a client to
+request include-tag only has to do with the client's desires for tag
+data, whether or not a server had advertised objects in the
+refs/tags/* namespace.
+
+Servers MUST pack the tags if their referrant is packed and the client
+has requested include-tags.
+
+Clients MUST be prepared for the case where a server has ignored
+include-tag and has not actually sent tags in the pack.  In such
+cases the client SHOULD issue a subsequent fetch to acquire the tags
+that include-tag would have otherwise given the client.
+
+The server SHOULD send include-tag, if it supports it, regardless
+of whether or not there are tags available.
+
+report-status
+-------------
+
+The receive-pack process can receive a 'report-status' capability,
+which tells it that the client wants a report of what happened after
+a packfile upload and reference update.  If the pushing client requests
+this capability, after unpacking and updating references the server
+will respond with whether the packfile unpacked successfully and if
+each reference was updated successfully.  If any of those were not
+successful, it will send back an error message.  See pack-protocol.txt
+for example messages.
+
+report-status-v2
+----------------
+
+Capability 'report-status-v2' extends capability 'report-status' by
+adding new "option" directives in order to support reference rewritten by
+the "proc-receive" hook.  The "proc-receive" hook may handle a command
+for a pseudo-reference which may create or update a reference with
+different name, new-oid, and old-oid.  While the capability
+'report-status' cannot report for such case.  See pack-protocol.txt
+for details.
+
+delete-refs
+-----------
+
+If the server sends back the 'delete-refs' capability, it means that
+it is capable of accepting a zero-id value as the target
+value of a reference update.  It is not sent back by the client, it
+simply informs the client that it can be sent zero-id values
+to delete references.
+
+quiet
+-----
+
+If the receive-pack server advertises the 'quiet' capability, it is
+capable of silencing human-readable progress output which otherwise may
+be shown when processing the received pack. A send-pack client should
+respond with the 'quiet' capability to suppress server-side progress
+reporting if the local progress reporting is also being suppressed
+(e.g., via `push -q`, or if stderr does not go to a tty).
+
+atomic
+------
+
+If the server sends the 'atomic' capability it is capable of accepting
+atomic pushes. If the pushing client requests this capability, the server
+will update the refs in one atomic transaction. Either all refs are
+updated or none.
+
+push-options
+------------
+
+If the server sends the 'push-options' capability it is able to accept
+push options after the update commands have been sent, but before the
+packfile is streamed. If the pushing client requests this capability,
+the server will pass the options to the pre- and post- receive hooks
+that process this push request.
+
+allow-tip-sha1-in-want
+----------------------
+
+If the upload-pack server advertises this capability, fetch-pack may
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. For historical reasons, the name of this
+capability contains "sha1". Object names are always given using the
+object format negotiated through the 'object-format' capability.
+
+allow-reachable-sha1-in-want
+----------------------------
+
+If the upload-pack server advertises this capability, fetch-pack may
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. For historical reasons, the name of this
+capability contains "sha1". Object names are always given using the
+object format negotiated through the 'object-format' capability.
+
+push-cert=<nonce>
+-----------------
+
+The receive-pack server that advertises this capability is willing
+to accept a signed push certificate, and asks the <nonce> to be
+included in the push certificate.  A send-pack client MUST NOT
+send a push-cert packet unless the receive-pack server advertises
+this capability.
+
+filter
+------
+
+If the upload-pack server advertises the 'filter' capability,
+fetch-pack may send "filter" commands to request a partial clone
+or partial fetch and request that the server omit various objects
+from the packfile.
+
+session-id=<session id>
+-----------------------
+
+The server may advertise a session ID that can be used to identify this process
+across multiple requests. The client may advertise its own session ID back to
+the server as well.
+
+Session IDs should be unique to a given process. They must fit within a
+packet-line, and must not contain non-printable or whitespace characters. The
+current implementation uses trace2 session IDs (see
+link:api-trace2.html[api-trace2] for details), but this may change and users of
+the session ID should not rely on this fact.
diff --git a/Documentation/technical/protocol-common.txt b/Documentation/technical/protocol-common.txt
new file mode 100644
index 0000000000..ecedb34bba
--- /dev/null
+++ b/Documentation/technical/protocol-common.txt
@@ -0,0 +1,99 @@
+Documentation Common to Pack and Http Protocols
+===============================================
+
+ABNF Notation
+-------------
+
+ABNF notation as described by RFC 5234 is used within the protocol documents,
+except the following replacement core rules are used:
+----
+  HEXDIG    =  DIGIT / "a" / "b" / "c" / "d" / "e" / "f"
+----
+
+We also define the following common rules:
+----
+  NUL       =  %x00
+  zero-id   =  40*"0"
+  obj-id    =  40*(HEXDIGIT)
+
+  refname  =  "HEAD"
+  refname /=  "refs/" <see discussion below>
+----
+
+A refname is a hierarchical octet string beginning with "refs/" and
+not violating the 'git-check-ref-format' command's validation rules.
+More specifically, they:
+
+. They can include slash `/` for hierarchical (directory)
+  grouping, but no slash-separated component can begin with a
+  dot `.`.
+
+. They must contain at least one `/`. This enforces the presence of a
+  category like `heads/`, `tags/` etc. but the actual names are not
+  restricted.
+
+. They cannot have two consecutive dots `..` anywhere.
+
+. They cannot have ASCII control characters (i.e. bytes whose
+  values are lower than \040, or \177 `DEL`), space, tilde `~`,
+  caret `^`, colon `:`, question-mark `?`, asterisk `*`,
+  or open bracket `[` anywhere.
+
+. They cannot end with a slash `/` or a dot `.`.
+
+. They cannot end with the sequence `.lock`.
+
+. They cannot contain a sequence `@{`.
+
+. They cannot contain a `\\`.
+
+
+pkt-line Format
+---------------
+
+Much (but not all) of the payload is described around pkt-lines.
+
+A pkt-line is a variable length binary string.  The first four bytes
+of the line, the pkt-len, indicates the total length of the line,
+in hexadecimal.  The pkt-len includes the 4 bytes used to contain
+the length's hexadecimal representation.
+
+A pkt-line MAY contain binary data, so implementors MUST ensure
+pkt-line parsing/formatting routines are 8-bit clean.
+
+A non-binary line SHOULD BE terminated by an LF, which if present
+MUST be included in the total length. Receivers MUST treat pkt-lines
+with non-binary data the same whether or not they contain the trailing
+LF (stripping the LF if present, and not complaining when it is
+missing).
+
+The maximum length of a pkt-line's data component is 65516 bytes.
+Implementations MUST NOT send pkt-line whose length exceeds 65520
+(65516 bytes of payload + 4 bytes of length data).
+
+Implementations SHOULD NOT send an empty pkt-line ("0004").
+
+A pkt-line with a length field of 0 ("0000"), called a flush-pkt,
+is a special case and MUST be handled differently than an empty
+pkt-line ("0004").
+
+----
+  pkt-line     =  data-pkt / flush-pkt
+
+  data-pkt     =  pkt-len pkt-payload
+  pkt-len      =  4*(HEXDIG)
+  pkt-payload  =  (pkt-len - 4)*(OCTET)
+
+  flush-pkt    = "0000"
+----
+
+Examples (as C-style strings):
+
+----
+  pkt-line          actual value
+  ---------------------------------
+  "0006a\n"         "a\n"
+  "0005a"           "a"
+  "000bfoobar\n"    "foobar\n"
+  "0004"            ""
+----
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
new file mode 100644
index 0000000000..a7c806a73e
--- /dev/null
+++ b/Documentation/technical/protocol-v2.txt
@@ -0,0 +1,516 @@
+Git Wire Protocol, Version 2
+============================
+
+This document presents a specification for a version 2 of Git's wire
+protocol.  Protocol v2 will improve upon v1 in the following ways:
+
+  * Instead of multiple service names, multiple commands will be
+    supported by a single service
+  * Easily extendable as capabilities are moved into their own section
+    of the protocol, no longer being hidden behind a NUL byte and
+    limited by the size of a pkt-line
+  * Separate out other information hidden behind NUL bytes (e.g. agent
+    string as a capability and symrefs can be requested using 'ls-refs')
+  * Reference advertisement will be omitted unless explicitly requested
+  * ls-refs command to explicitly request some refs
+  * Designed with http and stateless-rpc in mind.  With clear flush
+    semantics the http remote helper can simply act as a proxy
+
+In protocol v2 communication is command oriented.  When first contacting a
+server a list of capabilities will advertised.  Some of these capabilities
+will be commands which a client can request be executed.  Once a command
+has completed, a client can reuse the connection and request that other
+commands be executed.
+
+Packet-Line Framing
+-------------------
+
+All communication is done using packet-line framing, just as in v1.  See
+`Documentation/technical/pack-protocol.txt` and
+`Documentation/technical/protocol-common.txt` for more information.
+
+In protocol v2 these special packets will have the following semantics:
+
+  * '0000' Flush Packet (flush-pkt) - indicates the end of a message
+  * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
+  * '0002' Response End Packet (response-end-pkt) - indicates the end of a
+    response for stateless connections
+
+Initial Client Request
+----------------------
+
+In general a client can request to speak protocol v2 by sending
+`version=2` through the respective side-channel for the transport being
+used which inevitably sets `GIT_PROTOCOL`.  More information can be
+found in `pack-protocol.txt` and `http-protocol.txt`.  In all cases the
+response from the server is the capability advertisement.
+
+Git Transport
+~~~~~~~~~~~~~
+
+When using the git:// transport, you can request to use protocol v2 by
+sending "version=2" as an extra parameter:
+
+   003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0
+
+SSH and File Transport
+~~~~~~~~~~~~~~~~~~~~~~
+
+When using either the ssh:// or file:// transport, the GIT_PROTOCOL
+environment variable must be set explicitly to include "version=2".
+
+HTTP Transport
+~~~~~~~~~~~~~~
+
+When using the http:// or https:// transport a client makes a "smart"
+info/refs request as described in `http-protocol.txt` and requests that
+v2 be used by supplying "version=2" in the `Git-Protocol` header.
+
+   C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
+   C: Git-Protocol: version=2
+
+A v2 server would reply:
+
+   S: 200 OK
+   S: <Some headers>
+   S: ...
+   S:
+   S: 000eversion 2\n
+   S: <capability-advertisement>
+
+Subsequent requests are then made directly to the service
+`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack).
+
+Capability Advertisement
+------------------------
+
+A server which decides to communicate (based on a request from a client)
+using protocol version 2, notifies the client by sending a version string
+in its initial response followed by an advertisement of its capabilities.
+Each capability is a key with an optional value.  Clients must ignore all
+unknown keys.  Semantics of unknown values are left to the definition of
+each key.  Some capabilities will describe commands which can be requested
+to be executed by the client.
+
+    capability-advertisement = protocol-version
+			       capability-list
+			       flush-pkt
+
+    protocol-version = PKT-LINE("version 2" LF)
+    capability-list = *capability
+    capability = PKT-LINE(key[=value] LF)
+
+    key = 1*(ALPHA | DIGIT | "-_")
+    value = 1*(ALPHA | DIGIT | " -_.,?\/{}[]()<>!@#$%^&*+=:;")
+
+Command Request
+---------------
+
+After receiving the capability advertisement, a client can then issue a
+request to select the command it wants with any particular capabilities
+or arguments.  There is then an optional section where the client can
+provide any command specific parameters or queries.  Only a single
+command can be requested at a time.
+
+    request = empty-request | command-request
+    empty-request = flush-pkt
+    command-request = command
+		      capability-list
+		      [command-args]
+		      flush-pkt
+    command = PKT-LINE("command=" key LF)
+    command-args = delim-pkt
+		   *command-specific-arg
+
+    command-specific-args are packet line framed arguments defined by
+    each individual command.
+
+The server will then check to ensure that the client's request is
+comprised of a valid command as well as valid capabilities which were
+advertised.  If the request is valid the server will then execute the
+command.  A server MUST wait till it has received the client's entire
+request before issuing a response.  The format of the response is
+determined by the command being executed, but in all cases a flush-pkt
+indicates the end of the response.
+
+When a command has finished, and the client has received the entire
+response from the server, a client can either request that another
+command be executed or can terminate the connection.  A client may
+optionally send an empty request consisting of just a flush-pkt to
+indicate that no more requests will be made.
+
+Capabilities
+------------
+
+There are two different types of capabilities: normal capabilities,
+which can be used to convey information or alter the behavior of a
+request, and commands, which are the core actions that a client wants to
+perform (fetch, push, etc).
+
+Protocol version 2 is stateless by default.  This means that all commands
+must only last a single round and be stateless from the perspective of the
+server side, unless the client has requested a capability indicating that
+state should be maintained by the server.  Clients MUST NOT require state
+management on the server side in order to function correctly.  This
+permits simple round-robin load-balancing on the server side, without
+needing to worry about state management.
+
+agent
+~~~~~
+
+The server can advertise the `agent` capability with a value `X` (in the
+form `agent=X`) to notify the client that the server is running version
+`X`.  The client may optionally send its own agent string by including
+the `agent` capability with a value `Y` (in the form `agent=Y`) in its
+request to the server (but it MUST NOT do so if the server did not
+advertise the agent capability). The `X` and `Y` strings may contain any
+printable ASCII characters except space (i.e., the byte range 32 < x <
+127), and are typically of the form "package/version" (e.g.,
+"git/1.8.3.1"). The agent strings are purely informative for statistics
+and debugging purposes, and MUST NOT be used to programmatically assume
+the presence or absence of particular features.
+
+ls-refs
+~~~~~~~
+
+`ls-refs` is the command used to request a reference advertisement in v2.
+Unlike the current reference advertisement, ls-refs takes in arguments
+which can be used to limit the refs sent from the server.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "<command>=<feature 1> <feature 2>"
+
+ls-refs takes in the following arguments:
+
+    symrefs
+	In addition to the object pointed by it, show the underlying ref
+	pointed by it when showing a symbolic ref.
+    peel
+	Show peeled tags.
+    ref-prefix <prefix>
+	When specified, only references having a prefix matching one of
+	the provided prefixes are displayed.
+
+If the 'unborn' feature is advertised the following argument can be
+included in the client's request.
+
+    unborn
+	The server will send information about HEAD even if it is a symref
+	pointing to an unborn branch in the form "unborn HEAD
+	symref-target:<target>".
+
+The output of ls-refs is as follows:
+
+    output = *ref
+	     flush-pkt
+    obj-id-or-unborn = (obj-id | "unborn")
+    ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
+    ref-attribute = (symref | peeled)
+    symref = "symref-target:" symref-target
+    peeled = "peeled:" obj-id
+
+fetch
+~~~~~
+
+`fetch` is the command used to fetch a packfile in v2.  It can be looked
+at as a modified version of the v1 fetch where the ref-advertisement is
+stripped out (since the `ls-refs` command fills that role) and the
+message format is tweaked to eliminate redundancies and permit easy
+addition of future extensions.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "<command>=<feature 1> <feature 2>"
+
+A `fetch` request can take the following arguments:
+
+    want <oid>
+	Indicates to the server an object which the client wants to
+	retrieve.  Wants can be anything and are not limited to
+	advertised objects.
+
+    have <oid>
+	Indicates to the server an object which the client has locally.
+	This allows the server to make a packfile which only contains
+	the objects that the client needs. Multiple 'have' lines can be
+	supplied.
+
+    done
+	Indicates to the server that negotiation should terminate (or
+	not even begin if performing a clone) and that the server should
+	use the information supplied in the request to construct the
+	packfile.
+
+    thin-pack
+	Request that a thin pack be sent, which is a pack with deltas
+	which reference base objects not contained within the pack (but
+	are known to exist at the receiving end). This can reduce the
+	network traffic significantly, but it requires the receiving end
+	to know how to "thicken" these packs by adding the missing bases
+	to the pack.
+
+    no-progress
+	Request that progress information that would normally be sent on
+	side-band channel 2, during the packfile transfer, should not be
+	sent.  However, the side-band channel 3 is still used for error
+	responses.
+
+    include-tag
+	Request that annotated tags should be sent if the objects they
+	point to are being sent.
+
+    ofs-delta
+	Indicate that the client understands PACKv2 with delta referring
+	to its base by position in pack rather than by an oid.  That is,
+	they can read OBJ_OFS_DELTA (aka type 6) in a packfile.
+
+If the 'shallow' feature is advertised the following arguments can be
+included in the clients request as well as the potential addition of the
+'shallow-info' section in the server's response as explained below.
+
+    shallow <oid>
+	A client must notify the server of all commits for which it only
+	has shallow copies (meaning that it doesn't have the parents of
+	a commit) by supplying a 'shallow <oid>' line for each such
+	object so that the server is aware of the limitations of the
+	client's history.  This is so that the server is aware that the
+	client may not have all objects reachable from such commits.
+
+    deepen <depth>
+	Requests that the fetch/clone should be shallow having a commit
+	depth of <depth> relative to the remote side.
+
+    deepen-relative
+	Requests that the semantics of the "deepen" command be changed
+	to indicate that the depth requested is relative to the client's
+	current shallow boundary, instead of relative to the requested
+	commits.
+
+    deepen-since <timestamp>
+	Requests that the shallow clone/fetch should be cut at a
+	specific time, instead of depth.  Internally it's equivalent to
+	doing "git rev-list --max-age=<timestamp>". Cannot be used with
+	"deepen".
+
+    deepen-not <rev>
+	Requests that the shallow clone/fetch should be cut at a
+	specific revision specified by '<rev>', instead of a depth.
+	Internally it's equivalent of doing "git rev-list --not <rev>".
+	Cannot be used with "deepen", but can be used with
+	"deepen-since".
+
+If the 'filter' feature is advertised, the following argument can be
+included in the client's request:
+
+    filter <filter-spec>
+	Request that various objects from the packfile be omitted
+	using one of several filtering techniques. These are intended
+	for use with partial clone and partial fetch operations. See
+	`rev-list` for possible "filter-spec" values. When communicating
+	with other processes, senders SHOULD translate scaled integers
+	(e.g. "1k") into a fully-expanded form (e.g. "1024") to aid
+	interoperability with older receivers that may not understand
+	newly-invented scaling suffixes. However, receivers SHOULD
+	accept the following suffixes: 'k', 'm', and 'g' for 1024,
+	1048576, and 1073741824, respectively.
+
+If the 'ref-in-want' feature is advertised, the following argument can
+be included in the client's request as well as the potential addition of
+the 'wanted-refs' section in the server's response as explained below.
+
+    want-ref <ref>
+	Indicates to the server that the client wants to retrieve a
+	particular ref, where <ref> is the full name of a ref on the
+	server.
+
+If the 'sideband-all' feature is advertised, the following argument can be
+included in the client's request:
+
+    sideband-all
+	Instruct the server to send the whole response multiplexed, not just
+	the packfile section. All non-flush and non-delim PKT-LINE in the
+	response (not only in the packfile section) will then start with a byte
+	indicating its sideband (1, 2, or 3), and the server may send "0005\2"
+	(a PKT-LINE of sideband 2 with no payload) as a keepalive packet.
+
+If the 'packfile-uris' feature is advertised, the following argument
+can be included in the client's request as well as the potential
+addition of the 'packfile-uris' section in the server's response as
+explained below.
+
+    packfile-uris <comma-separated list of protocols>
+	Indicates to the server that the client is willing to receive
+	URIs of any of the given protocols in place of objects in the
+	sent packfile. Before performing the connectivity check, the
+	client should download from all given URIs. Currently, the
+	protocols supported are "http" and "https".
+
+The response of `fetch` is broken into a number of sections separated by
+delimiter packets (0001), with each section beginning with its section
+header. Most sections are sent only when the packfile is sent.
+
+    output = acknowledgements flush-pkt |
+	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
+	     [wanted-refs delim-pkt] [packfile-uris delim-pkt]
+	     packfile flush-pkt
+
+    acknowledgments = PKT-LINE("acknowledgments" LF)
+		      (nak | *ack)
+		      (ready)
+    ready = PKT-LINE("ready" LF)
+    nak = PKT-LINE("NAK" LF)
+    ack = PKT-LINE("ACK" SP obj-id LF)
+
+    shallow-info = PKT-LINE("shallow-info" LF)
+		   *PKT-LINE((shallow | unshallow) LF)
+    shallow = "shallow" SP obj-id
+    unshallow = "unshallow" SP obj-id
+
+    wanted-refs = PKT-LINE("wanted-refs" LF)
+		  *PKT-LINE(wanted-ref LF)
+    wanted-ref = obj-id SP refname
+
+    packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
+    packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF)
+
+    packfile = PKT-LINE("packfile" LF)
+	       *PKT-LINE(%x01-03 *%x00-ff)
+
+    acknowledgments section
+	* If the client determines that it is finished with negotiations by
+	  sending a "done" line (thus requiring the server to send a packfile),
+	  the acknowledgments sections MUST be omitted from the server's
+	  response.
+
+	* Always begins with the section header "acknowledgments"
+
+	* The server will respond with "NAK" if none of the object ids sent
+	  as have lines were common.
+
+	* The server will respond with "ACK obj-id" for all of the
+	  object ids sent as have lines which are common.
+
+	* A response cannot have both "ACK" lines as well as a "NAK"
+	  line.
+
+	* The server will respond with a "ready" line indicating that
+	  the server has found an acceptable common base and is ready to
+	  make and send a packfile (which will be found in the packfile
+	  section of the same response)
+
+	* If the server has found a suitable cut point and has decided
+	  to send a "ready" line, then the server can decide to (as an
+	  optimization) omit any "ACK" lines it would have sent during
+	  its response.  This is because the server will have already
+	  determined the objects it plans to send to the client and no
+	  further negotiation is needed.
+
+    shallow-info section
+	* If the client has requested a shallow fetch/clone, a shallow
+	  client requests a fetch or the server is shallow then the
+	  server's response may include a shallow-info section.  The
+	  shallow-info section will be included if (due to one of the
+	  above conditions) the server needs to inform the client of any
+	  shallow boundaries or adjustments to the clients already
+	  existing shallow boundaries.
+
+	* Always begins with the section header "shallow-info"
+
+	* If a positive depth is requested, the server will compute the
+	  set of commits which are no deeper than the desired depth.
+
+	* The server sends a "shallow obj-id" line for each commit whose
+	  parents will not be sent in the following packfile.
+
+	* The server sends an "unshallow obj-id" line for each commit
+	  which the client has indicated is shallow, but is no longer
+	  shallow as a result of the fetch (due to its parents being
+	  sent in the following packfile).
+
+	* The server MUST NOT send any "unshallow" lines for anything
+	  which the client has not indicated was shallow as a part of
+	  its request.
+
+    wanted-refs section
+	* This section is only included if the client has requested a
+	  ref using a 'want-ref' line and if a packfile section is also
+	  included in the response.
+
+	* Always begins with the section header "wanted-refs".
+
+	* The server will send a ref listing ("<oid> <refname>") for
+	  each reference requested using 'want-ref' lines.
+
+	* The server MUST NOT send any refs which were not requested
+	  using 'want-ref' lines.
+
+    packfile-uris section
+	* This section is only included if the client sent
+	  'packfile-uris' and the server has at least one such URI to
+	  send.
+
+	* Always begins with the section header "packfile-uris".
+
+	* For each URI the server sends, it sends a hash of the pack's
+	  contents (as output by git index-pack) followed by the URI.
+
+	* The hashes are 40 hex characters long. When Git upgrades to a new
+	  hash algorithm, this might need to be updated. (It should match
+	  whatever index-pack outputs after "pack\t" or "keep\t".
+
+    packfile section
+	* This section is only included if the client has sent 'want'
+	  lines in its request and either requested that no more
+	  negotiation be done by sending 'done' or if the server has
+	  decided it has found a sufficient cut point to produce a
+	  packfile.
+
+	* Always begins with the section header "packfile"
+
+	* The transmission of the packfile begins immediately after the
+	  section header
+
+	* The data transfer of the packfile is always multiplexed, using
+	  the same semantics of the 'side-band-64k' capability from
+	  protocol version 1.  This means that each packet, during the
+	  packfile data stream, is made up of a leading 4-byte pkt-line
+	  length (typical of the pkt-line format), followed by a 1-byte
+	  stream code, followed by the actual data.
+
+	  The stream code can be one of:
+		1 - pack data
+		2 - progress messages
+		3 - fatal error message just before stream aborts
+
+server-option
+~~~~~~~~~~~~~
+
+If advertised, indicates that any number of server specific options can be
+included in a request.  This is done by sending each option as a
+"server-option=<option>" capability line in the capability-list section of
+a request.
+
+The provided options must not contain a NUL or LF character.
+
+ object-format
+~~~~~~~~~~~~~~~
+
+The server can advertise the `object-format` capability with a value `X` (in the
+form `object-format=X`) to notify the client that the server is able to deal
+with objects using hash algorithm X.  If not specified, the server is assumed to
+only handle SHA-1.  If the client would like to use a hash algorithm other than
+SHA-1, it should specify its object-format string.
+
+session-id=<session id>
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The server may advertise a session ID that can be used to identify this process
+across multiple requests. The client may advertise its own session ID back to
+the server as well.
+
+Session IDs should be unique to a given process. They must fit within a
+packet-line, and must not contain non-printable or whitespace characters. The
+current implementation uses trace2 session IDs (see
+link:api-trace2.html[api-trace2] for details), but this may change and users of
+the session ID should not rely on this fact.
diff --git a/Documentation/technical/racy-git.txt b/Documentation/technical/racy-git.txt
new file mode 100644
index 0000000000..ceda4bbfda
--- /dev/null
+++ b/Documentation/technical/racy-git.txt
@@ -0,0 +1,201 @@
+Use of index and Racy Git problem
+=================================
+
+Background
+----------
+
+The index is one of the most important data structures in Git.
+It represents a virtual working tree state by recording list of
+paths and their object names and serves as a staging area to
+write out the next tree object to be committed.  The state is
+"virtual" in the sense that it does not necessarily have to, and
+often does not, match the files in the working tree.
+
+There are cases Git needs to examine the differences between the
+virtual working tree state in the index and the files in the
+working tree.  The most obvious case is when the user asks `git
+diff` (or its low level implementation, `git diff-files`) or
+`git-ls-files --modified`.  In addition, Git internally checks
+if the files in the working tree are different from what are
+recorded in the index to avoid stomping on local changes in them
+during patch application, switching branches, and merging.
+
+In order to speed up this comparison between the files in the
+working tree and the index entries, the index entries record the
+information obtained from the filesystem via `lstat(2)` system
+call when they were last updated.  When checking if they differ,
+Git first runs `lstat(2)` on the files and compares the result
+with this information (this is what was originally done by the
+`ce_match_stat()` function, but the current code does it in
+`ce_match_stat_basic()` function).  If some of these "cached
+stat information" fields do not match, Git can tell that the
+files are modified without even looking at their contents.
+
+Note: not all members in `struct stat` obtained via `lstat(2)`
+are used for this comparison.  For example, `st_atime` obviously
+is not useful.  Currently, Git compares the file type (regular
+files vs symbolic links) and executable bits (only for regular
+files) from `st_mode` member, `st_mtime` and `st_ctime`
+timestamps, `st_uid`, `st_gid`, `st_ino`, and `st_size` members.
+With a `USE_STDEV` compile-time option, `st_dev` is also
+compared, but this is not enabled by default because this member
+is not stable on network filesystems.  With `USE_NSEC`
+compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec`
+members are also compared. On Linux, this is not enabled by default
+because in-core timestamps can have finer granularity than
+on-disk timestamps, resulting in meaningless changes when an
+inode is evicted from the inode cache.  See commit 8ce13b0
+of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
+([PATCH] Sync in core time granularity with filesystems,
+2005-01-04). This patch is included in kernel 2.6.11 and newer, but
+only fixes the issue for file systems with exactly 1 ns or 1 s
+resolution. Other file systems are still broken in current Linux
+kernels (e.g. CEPH, CIFS, NTFS, UDF), see
+https://lore.kernel.org/lkml/5577240D.7020309@gmail.com/
+
+Racy Git
+--------
+
+There is one slight problem with the optimization based on the
+cached stat information.  Consider this sequence:
+
+  : modify 'foo'
+  $ git update-index 'foo'
+  : modify 'foo' again, in-place, without changing its size
+
+The first `update-index` computes the object name of the
+contents of file `foo` and updates the index entry for `foo`
+along with the `struct stat` information.  If the modification
+that follows it happens very fast so that the file's `st_mtime`
+timestamp does not change, after this sequence, the cached stat
+information the index entry records still exactly match what you
+would see in the filesystem, even though the file `foo` is now
+different.
+This way, Git can incorrectly think files in the working tree
+are unmodified even though they actually are.  This is called
+the "racy Git" problem (discovered by Pasky), and the entries
+that appear clean when they may not be because of this problem
+are called "racily clean".
+
+To avoid this problem, Git does two things:
+
+. When the cached stat information says the file has not been
+  modified, and the `st_mtime` is the same as (or newer than)
+  the timestamp of the index file itself (which is the time `git
+  update-index foo` finished running in the above example), it
+  also compares the contents with the object registered in the
+  index entry to make sure they match.
+
+. When the index file is updated that contains racily clean
+  entries, cached `st_size` information is truncated to zero
+  before writing a new version of the index file.
+
+Because the index file itself is written after collecting all
+the stat information from updated paths, `st_mtime` timestamp of
+it is usually the same as or newer than any of the paths the
+index contains.  And no matter how quick the modification that
+follows `git update-index foo` finishes, the resulting
+`st_mtime` timestamp on `foo` cannot get a value earlier
+than the index file.  Therefore, index entries that can be
+racily clean are limited to the ones that have the same
+timestamp as the index file itself.
+
+The callers that want to check if an index entry matches the
+corresponding file in the working tree continue to call
+`ce_match_stat()`, but with this change, `ce_match_stat()` uses
+`ce_modified_check_fs()` to see if racily clean ones are
+actually clean after comparing the cached stat information using
+`ce_match_stat_basic()`.
+
+The problem the latter solves is this sequence:
+
+  $ git update-index 'foo'
+  : modify 'foo' in-place without changing its size
+  : wait for enough time
+  $ git update-index 'bar'
+
+Without the latter, the timestamp of the index file gets a newer
+value, and falsely clean entry `foo` would not be caught by the
+timestamp comparison check done with the former logic anymore.
+The latter makes sure that the cached stat information for `foo`
+would never match with the file in the working tree, so later
+checks by `ce_match_stat_basic()` would report that the index entry
+does not match the file and Git does not have to fall back on more
+expensive `ce_modified_check_fs()`.
+
+
+Runtime penalty
+---------------
+
+The runtime penalty of falling back to `ce_modified_check_fs()`
+from `ce_match_stat()` can be very expensive when there are many
+racily clean entries.  An obvious way to artificially create
+this situation is to give the same timestamp to all the files in
+the working tree in a large project, run `git update-index` on
+them, and give the same timestamp to the index file:
+
+  $ date >.datestamp
+  $ git ls-files | xargs touch -r .datestamp
+  $ git ls-files | git update-index --stdin
+  $ touch -r .datestamp .git/index
+
+This will make all index entries racily clean.  The linux project, for
+example, there are over 20,000 files in the working tree.  On my
+Athlon 64 X2 3800+, after the above:
+
+  $ /usr/bin/time git diff-files
+  1.68user 0.54system 0:02.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
+  0inputs+0outputs (0major+67111minor)pagefaults 0swaps
+  $ git update-index MAINTAINERS
+  $ /usr/bin/time git diff-files
+  0.02user 0.12system 0:00.14elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
+  0inputs+0outputs (0major+935minor)pagefaults 0swaps
+
+Running `git update-index` in the middle checked the racily
+clean entries, and left the cached `st_mtime` for all the paths
+intact because they were actually clean (so this step took about
+the same amount of time as the first `git diff-files`).  After
+that, they are not racily clean anymore but are truly clean, so
+the second invocation of `git diff-files` fully took advantage
+of the cached stat information.
+
+
+Avoiding runtime penalty
+------------------------
+
+In order to avoid the above runtime penalty, post 1.4.2 Git used
+to have a code that made sure the index file
+got timestamp newer than the youngest files in the index when
+there are many young files with the same timestamp as the
+resulting index file would otherwise would have by waiting
+before finishing writing the index file out.
+
+I suspected that in practice the situation where many paths in the
+index are all racily clean was quite rare.  The only code paths
+that can record recent timestamp for large number of paths are:
+
+. Initial `git add .` of a large project.
+
+. `git checkout` of a large project from an empty index into an
+  unpopulated working tree.
+
+Note: switching branches with `git checkout` keeps the cached
+stat information of existing working tree files that are the
+same between the current branch and the new branch, which are
+all older than the resulting index file, and they will not
+become racily clean.  Only the files that are actually checked
+out can become racily clean.
+
+In a large project where raciness avoidance cost really matters,
+however, the initial computation of all object names in the
+index takes more than one second, and the index file is written
+out after all that happens.  Therefore the timestamp of the
+index file will be more than one seconds later than the
+youngest file in the working tree.  This means that in these
+cases there actually will not be any racily clean entry in
+the resulting index.
+
+Based on this discussion, the current code does not use the
+"workaround" to avoid the runtime penalty that does not exist in
+practice anymore.  This was done with commit 0fc82cff on Aug 15,
+2006.
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
new file mode 100644
index 0000000000..3ef169af27
--- /dev/null
+++ b/Documentation/technical/reftable.txt
@@ -0,0 +1,1093 @@
+reftable
+--------
+
+Overview
+~~~~~~~~
+
+Problem statement
+^^^^^^^^^^^^^^^^^
+
+Some repositories contain a lot of references (e.g. android at 866k,
+rails at 31k). The existing packed-refs format takes up a lot of space
+(e.g. 62M), and does not scale with additional references. Lookup of a
+single reference requires linearly scanning the file.
+
+Atomic pushes modifying multiple references require copying the entire
+packed-refs file, which can be a considerable amount of data moved
+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
+
+Repositories with many loose references occupy a large number of disk
+blocks from the local file system, as each reference is its own file
+storing 41 bytes (and another file for the corresponding reflog). This
+negatively affects the number of inodes available when a large number of
+repositories are stored on the same filesystem. Readers can be penalized
+due to the larger number of syscalls required to traverse and read the
+`$GIT_DIR/refs` directory.
+
+
+Objectives
+^^^^^^^^^^
+
+* Near constant time lookup for any single reference, even when the
+repository is cold and not in process or kernel cache.
+* Near constant time verification if an object name is referred to by at least
+one reference (for allow-tip-sha1-in-want).
+* Efficient enumeration of an entire namespace, such as `refs/tags/`.
+* Support atomic push with `O(size_of_update)` operations.
+* Combine reflog storage with ref storage for small transactions.
+* Separate reflog storage for base refs and historical logs.
+
+Description
+^^^^^^^^^^^
+
+A reftable file is a portable binary file format customized for
+reference storage. References are sorted, enabling linear scans, binary
+search lookup, and range scans.
+
+Storage in the file is organized into variable sized blocks. Prefix
+compression is used within a single block to reduce disk space. Block
+size and alignment is tunable by the writer.
+
+Performance
+^^^^^^^^^^^
+
+Space used, packed-refs vs. reftable:
+
+[cols=",>,>,>,>,>",options="header",]
+|===============================================================
+|repository |packed-refs |reftable |% original |avg ref |avg obj
+|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
+|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
+|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
+|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
+|===============================================================
+
+Scan (read 866k refs), by reference name lookup (single ref from 866k
+refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
+
+[cols=",>,>,>,>",options="header",]
+|=========================================================
+|format |cache |scan |by name |by SHA-1
+|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
+|packed-refs |hot | |6,844.6 usec |20,110.1 usec
+|reftable |cold |112 ms |33.9 usec |323.2 usec
+|reftable |hot | |20.2 usec |320.8 usec
+|=========================================================
+
+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
+
+[cols=",>,>",options="header",]
+|================================
+|format |size |avg entry
+|$GIT_DIR/logs |173 M |1209 bytes
+|reftable |5 M |37 bytes
+|================================
+
+Details
+~~~~~~~
+
+Peeling
+^^^^^^^
+
+References stored in a reftable are peeled, a record for an annotated
+(or signed) tag records both the tag object, and the object it refers
+to. This is analogous to storage in the packed-refs format.
+
+Reference name encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Reference names are an uninterpreted sequence of bytes that must pass
+linkgit:git-check-ref-format[1] as a valid reference name.
+
+Key unicity
+^^^^^^^^^^^
+
+Each entry must have a unique key; repeated keys are disallowed.
+
+Network byte order
+^^^^^^^^^^^^^^^^^^
+
+All multi-byte, fixed width fields are in network byte order.
+
+Varint encoding
+^^^^^^^^^^^^^^^
+
+Varint encoding is identical to the ofs-delta encoding method used
+within pack files.
+
+Decoder works such as:
+
+....
+val = buf[ptr] & 0x7f
+while (buf[ptr] & 0x80) {
+  ptr++
+  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
+}
+....
+
+Ordering
+^^^^^^^^
+
+Blocks are lexicographically ordered by their first reference.
+
+Directory/file conflicts
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format accepts both `refs/heads/foo` and
+`refs/heads/foo/bar` as distinct references.
+
+This property is useful for retaining log records in reftable, but may
+confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
+references. Users of reftable may choose to continue to reject `foo` and
+`foo/bar` type conflicts to prevent problems for peers.
+
+File format
+~~~~~~~~~~~
+
+Structure
+^^^^^^^^^
+
+A reftable file has the following high-level structure:
+
+....
+first_block {
+  header
+  first_ref_block
+}
+ref_block*
+ref_index*
+obj_block*
+obj_index*
+log_block*
+log_index*
+footer
+....
+
+A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
+`obj_index` sections, containing only the file header and log block:
+
+....
+first_block {
+  header
+}
+log_block*
+log_index*
+footer
+....
+
+in a log-only file the first log block immediately follows the file
+header, without padding to block alignment.
+
+Block size
+^^^^^^^^^^
+
+The file's block size is arbitrarily determined by the writer, and does
+not have to be a power of 2. The block size must be larger than the
+longest reference name or log entry used in the repository, as
+references cannot span blocks.
+
+Powers of two that are friendly to the virtual memory system or
+filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
+yield better compression, with a possible increased cost incurred by
+readers during access.
+
+The largest block size is `16777215` bytes (15.99 MiB).
+
+Block alignment
+^^^^^^^^^^^^^^^
+
+Writers may choose to align blocks at multiples of the block size by
+including `padding` filled with NUL bytes at the end of a block to round
+out to the chosen alignment. When alignment is used, writers must
+specify the alignment with the file header's `block_size` field.
+
+Block alignment is not required by the file format. Unaligned files must
+set `block_size = 0` in the file header, and omit `padding`. Unaligned
+files with more than one ref block must include the link:#Ref-index[ref
+index] to support fast lookup. Readers must be able to read both aligned
+and non-aligned files.
+
+Very small files (e.g. a single ref block) may omit `padding` and the ref
+index to reduce total file size.
+
+Header (version 1)
+^^^^^^^^^^^^^^^^^^
+
+A 24-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+....
+
+Aligned files must specify `block_size` to configure readers with the
+expected block alignment. Unaligned files must set `block_size = 0`.
+
+The `min_update_index` and `max_update_index` describe bounds for the
+`update_index` field of all log records in this file. When reftables are
+used in a stack for link:#Update-transactions[transactions], these
+fields can order the files such that the prior file's
+`max_update_index + 1` is the next file's `min_update_index`.
+
+Header (version 2)
+^^^^^^^^^^^^^^^^^^
+
+A 28-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 2 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+uint32( hash_id )
+....
+
+The header is identical to `version_number=1`, with the 4-byte hash ID
+("sha1" for SHA1 and "s256" for SHA-256) append to the header.
+
+For maximum backward compatibility, it is recommended to use version 1 when
+writing SHA1 reftables.
+
+First ref block
+^^^^^^^^^^^^^^^
+
+The first ref block shares the same block as the file header, and is 24
+bytes smaller than all other blocks in the file. The first block
+immediately begins after the file header, at position 24.
+
+If the first block is a log block (a log-only file), its block header
+begins immediately at position 24.
+
+Ref block format
+^^^^^^^^^^^^^^^^
+
+A ref block is written as:
+
+....
+'r'
+uint24( block_len )
+ref_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
+encodes the number of bytes in the block up to, but not including the
+optional `padding`. This is always less than or equal to the file's
+block size. In the first ref block, `block_len` includes 24 bytes for
+the file header.
+
+The 2-byte `restart_count` stores the number of entries in the
+`restart_offset` list, which must not be empty. Readers can use
+`restart_count` to binary search between restarts before starting a
+linear scan.
+
+Exactly `restart_count` 3-byte `restart_offset` values precedes the
+`restart_count`. Offsets are relative to the start of the block and
+refer to the first byte of any `ref_record` whose name has not been
+prefix compressed. Entries in the `restart_offset` list must be sorted,
+ascending. Readers can start linear scans from any of these records.
+
+A variable number of `ref_record` fill the middle of the block,
+describing reference names and values. The format is described below.
+
+As the first ref block shares the first file block with the file header,
+all `restart_offset` in the first block are relative to the start of the
+file (position 0), and include the file header. This forces the first
+`restart_offset` to be `28`.
+
+ref record
+++++++++++
+
+A `ref_record` describes a single reference, storing both the name and
+its value(s). Records are formatted as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | value_type )
+suffix
+varint( update_index_delta )
+value?
+....
+
+The `prefix_length` field specifies how many leading bytes of the prior
+reference record's name should be copied to obtain this reference's
+name. This must be 0 for the first reference in any block, and also must
+be 0 for any `ref_record` whose offset is listed in the `restart_offset`
+table at the end of the block.
+
+Recovering a reference name from any `ref_record` is a simple concat:
+
+....
+this_name = prior_name[0..prefix_length] + suffix
+....
+
+The `suffix_length` value provides the number of bytes available in
+`suffix` to copy from `suffix` to complete the reference name.
+
+The `update_index` that last modified the reference can be obtained by
+adding `update_index_delta` to the `min_update_index` from the file
+header: `min_update_index + update_index_delta`.
+
+The `value` follows. Its format is determined by `value_type`, one of
+the following:
+
+* `0x0`: deletion; no value data (see transactions, below)
+* `0x1`: one object name; value of the ref
+* `0x2`: two object names; value of the ref, peeled target
+* `0x3`: symbolic reference: `varint( target_len ) target`
+
+Symbolic references use `0x3`, followed by the complete name of the
+reference target. No compression is applied to the target name.
+
+Types `0x4..0x7` are reserved for future use.
+
+Ref index
+^^^^^^^^^
+
+The ref index stores the name of the last reference from every ref block
+in the file, enabling reduced disk seeks for lookups. Any reference can
+be found by searching the index, identifying the containing block, and
+searching within that block.
+
+The index may be organized into a multi-level index, where the 1st level
+index block points to additional ref index blocks (2nd level), which may
+in turn point to either additional index blocks (e.g. 3rd level) or ref
+blocks (leaf level). Disk reads required to access a ref go up with
+higher index levels. Multi-level indexes may be required to ensure no
+single index block exceeds the file format's max block size of
+`16777215` bytes (15.99 MiB). To achieve constant O(1) disk seeks for
+lookups the index must be a single level, which is permitted to exceed
+the file's configured block size, but not the format's max block size of
+15.99 MiB.
+
+If present, the ref index block(s) appears after the last ref block.
+
+If there are at least 4 ref blocks, a ref index block should be written
+to improve lookup times. Cold reads using the index require 2 disk reads
+(read index, read block), and binary searching < 4 blocks also requires
+<= 2 reads. Omitting the index block from smaller files saves space.
+
+If the file is unaligned and contains more than one ref block, the ref
+index must be written.
+
+Index block format:
+
+....
+'i'
+uint24( block_len )
+index_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
+which encodes the number of bytes in the block, up to but not including
+the optional `padding`.
+
+The `restart_offset` and `restart_count` fields are identical in format,
+meaning and usage as in ref blocks.
+
+To reduce the number of reads required for random access in very large
+files the index block may be larger than other blocks. However, readers
+must hold the entire index in memory to benefit from this, so it's a
+time-space tradeoff in both file size and reader memory.
+
+Increasing the file's block size decreases the index size. Alternatively
+a multi-level index may be used, keeping index blocks within the file's
+block size, but increasing the number of blocks that need to be
+accessed.
+
+index record
+++++++++++++
+
+An index record describes the last entry in another block. Index records
+are written as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | 0 )
+suffix
+varint( block_position )
+....
+
+Index records use prefix compression exactly like `ref_record`.
+
+Index records store `block_position` after the suffix, specifying the
+absolute position in bytes (from the start of the file) of the block
+that ends with this reference. Readers can seek to `block_position` to
+begin reading the block header.
+
+Readers must examine the block header at `block_position` to determine
+if the next block is another level index block, or the leaf-level ref
+block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the ref index must first read the footer (below) to
+obtain `ref_index_position`. If not present, the position will be 0. The
+`ref_index_position` is for the 1st level root of the ref index.
+
+Obj block format
+^^^^^^^^^^^^^^^^
+
+Object blocks are optional. Writers may choose to omit object blocks,
+especially if readers will not use the object name to ref mapping.
+
+Object blocks use unique, abbreviated 2-32 object name keys, mapping to
+ref blocks containing references pointing to that object directly, or as
+the peeled value of an annotated tag. Like ref blocks, object blocks use
+the file's standard block size. The abbreviation length is available in
+the footer as `obj_id_len`.
+
+To save space in small files, object blocks may be omitted if the ref
+index is not present, as brute force search will only need to read a few
+ref blocks. When missing, readers should brute force a linear search of
+all references to lookup by object name.
+
+An object block is written as:
+
+....
+'o'
+uint24( block_len )
+obj_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Fields are identical to ref block. Binary search using the restart table
+works the same as in reference blocks.
+
+Because object names are abbreviated by writers to the shortest unique
+abbreviation within the reftable, obj key lengths have a variable length. Their
+length must be at least 2 bytes. Readers must compare only for common prefix
+match within an obj block or obj index.
+
+obj record
+++++++++++
+
+An `obj_record` describes a single object abbreviation, and the blocks
+containing references using that unique abbreviation:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | cnt_3 )
+suffix
+varint( cnt_large )?
+varint( position_delta )*
+....
+
+Like in reference blocks, abbreviations are prefix compressed within an
+obj block. On large reftables with many unique objects, higher block
+sizes (64k), and higher restart interval (128), a `prefix_length` of 2
+or 3 and `suffix_length` of 3 may be common in obj records (unique
+abbreviation of 5-6 raw bytes, 10-12 hex digits).
+
+Each record contains `position_count` number of positions for matching
+ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
+`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
+
+The use of `cnt_3` bets most objects are pointed to by only a single
+reference, some may be pointed to by a couple of references, and very
+few (if any) are pointed to by more than 7 references.
+
+A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
+`position_delta`, but at least one reference starts with this
+abbreviation. A reader that needs exact reference names must scan all
+references to find which specific references have the desired object.
+Writers should use this format when the `position_delta` list would have
+overflowed the file's block size due to a high number of references
+pointing to the same object.
+
+The first `position_delta` is the position from the start of the file.
+Additional `position_delta` entries are sorted ascending and relative to
+the prior entry, e.g. a reader would perform:
+
+....
+pos = position_delta[0]
+prior = pos
+for (j = 1; j < position_count; j++) {
+  pos = prior + position_delta[j]
+  prior = pos
+}
+....
+
+With a position in hand, a reader must linearly scan the ref block,
+starting from the first `ref_record`, testing each reference's object names
+(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
+object name within a single ref block is not supported by the reftable format.
+Smaller block sizes reduce the number of candidates this step must
+consider.
+
+Obj index
+^^^^^^^^^
+
+The obj index stores the abbreviation from the last entry for every obj
+block in the file, enabling reduced disk seeks for all lookups. It is
+formatted exactly the same as the ref index, but refers to obj blocks.
+
+The obj index should be present if obj blocks are present, as obj blocks
+should only be written in larger files.
+
+Readers loading the obj index must first read the footer (below) to
+obtain `obj_index_position`. If not present, the position will be 0.
+
+Log block format
+^^^^^^^^^^^^^^^^
+
+Unlike ref and obj blocks, log blocks are always unaligned.
+
+Log blocks are variable in size, and do not match the `block_size`
+specified in the file header or footer. Writers should choose an
+appropriate buffer size to prepare a log block for deflation, such as
+`2 * block_size`.
+
+A log block is written as:
+
+....
+'g'
+uint24( block_len )
+zlib_deflate {
+  log_record+
+  uint24( restart_offset )+
+  uint16( restart_count )
+}
+....
+
+Log blocks look similar to ref blocks, except `block_type = 'g'`.
+
+The 4-byte block header is followed by the deflated block contents using
+zlib deflate. The `block_len` in the header is the inflated size
+(including 4-byte block header), and should be used by readers to
+preallocate the inflation output buffer. A log block's `block_len` may
+exceed the file's block size.
+
+Offsets within the log block (e.g. `restart_offset`) still include the
+4-byte header. Readers may prefer prefixing the inflation output buffer
+with the 4-byte header.
+
+Within the deflate container, a variable number of `log_record` describe
+reference changes. The log record format is described below. See ref
+block format (above) for a description of `restart_offset` and
+`restart_count`.
+
+Because log blocks have no alignment or padding between blocks, readers
+must keep track of the bytes consumed by the inflater to know where the
+next log block begins.
+
+log record
+++++++++++
+
+Log record keys are structured as:
+
+....
+ref_name '\0' reverse_int64( update_index )
+....
+
+where `update_index` is the unique transaction identifier. The
+`update_index` field must be unique within the scope of a `ref_name`.
+See the update transactions section below for further details.
+
+The `reverse_int64` function inverses the value so lexicographical
+ordering the network byte order encoding sorts the more recent records
+with higher `update_index` values first:
+
+....
+reverse_int64(int64 t) {
+  return 0xffffffffffffffff - t;
+}
+....
+
+Log records have a similar starting structure to ref and index records,
+utilizing the same prefix compression scheme applied to the log record
+key described above.
+
+....
+    varint( prefix_length )
+    varint( (suffix_length << 3) | log_type )
+    suffix
+    log_data {
+      old_id
+      new_id
+      varint( name_length    )  name
+      varint( email_length   )  email
+      varint( time_seconds )
+      sint16( tz_offset )
+      varint( message_length )  message
+    }?
+....
+
+Log record entries use `log_type` to indicate what follows:
+
+* `0x0`: deletion; no log data.
+* `0x1`: standard git reflog data using `log_data` above.
+
+The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
+entry from the reflog of `refs/stash` in a transaction file (below),
+without needing to rewrite larger files. Readers reading a stack of
+reflogs must treat this as a deletion.
+
+For `log_type = 0x1`, the `log_data` section follows
+linkgit:git-update-ref[1] logging and includes:
+
+* two object names (old id, new id)
+* varint string of committer's name
+* varint string of committer's email
+* varint time in seconds since epoch (Jan 1, 1970)
+* 2-byte timezone offset in minutes (signed)
+* varint string of message
+
+`tz_offset` is the absolute number of minutes from GMT the committer was
+at the time of the update. For example `GMT-0800` is encoded in reftable
+as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
+
+The committer email does not contain `<` or `>`, it's the value normally
+found between the `<>` in a git commit object header.
+
+The `message_length` may be 0, in which case there was no message
+supplied for the update.
+
+Contrary to traditional reflog (which is a file), renames are encoded as
+a combination of ref deletion and ref creation.  A deletion is a log
+record with a zero new_id, and a creation is a log record with a zero old_id.
+
+Reading the log
++++++++++++++++
+
+Readers accessing the log must first read the footer (below) to
+determine the `log_position`. The first block of the log begins at
+`log_position` bytes since the start of the file. The `log_position` is
+not block aligned.
+
+Importing logs
+++++++++++++++
+
+When importing from `$GIT_DIR/logs` writers should globally order all
+log records roughly by timestamp while preserving file order, and assign
+unique, increasing `update_index` values for each log line. Newer log
+records get higher `update_index` values.
+
+Although an import may write only a single reftable file, the reftable
+file must span many unique `update_index`, as each log line requires its
+own `update_index` to preserve semantics.
+
+Log index
+^^^^^^^^^
+
+The log index stores the log key
+(`refname \0 reverse_int64(update_index)`) for the last log record of
+every log block in the file, supporting bounded-time lookup.
+
+A log index block must be written if 2 or more log blocks are written to
+the file. If present, the log index appears after the last log block.
+There is no padding used to align the log index to block alignment.
+
+Log index format is identical to ref index, except the keys are 9 bytes
+longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
+Records use `block_position` to refer to the start of a log block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the log index must first read the footer (below) to
+obtain `log_index_position`. If not present, the position will be 0.
+
+Footer
+^^^^^^
+
+After the last block of the file, a file footer is written. It begins
+like the file header, but is extended with additional data.
+
+....
+    HEADER
+
+    uint64( ref_index_position )
+    uint64( (obj_position << 5) | obj_id_len )
+    uint64( obj_index_position )
+
+    uint64( log_position )
+    uint64( log_index_position )
+
+    uint32( CRC-32 of above )
+....
+
+If a section is missing (e.g. ref index) the corresponding position
+field (e.g. `ref_index_position`) will be 0.
+
+* `obj_position`: byte position for the first obj block.
+* `obj_id_len`: number of bytes used to abbreviate object names in
+obj blocks.
+* `log_position`: byte position for the first log block.
+* `ref_index_position`: byte position for the start of the ref index.
+* `obj_index_position`: byte position for the start of the obj index.
+* `log_index_position`: byte position for the start of the log index.
+
+The size of the footer is 68 bytes for version 1, and 72 bytes for
+version 2.
+
+Reading the footer
+++++++++++++++++++
+
+Readers must first read the file start to determine the version
+number. Then they seek to `file_length - FOOTER_LENGTH` to access the
+footer. A trusted external source (such as `stat(2)`) is necessary to
+obtain `file_length`. When reading the footer, readers must verify:
+
+* 4-byte magic is correct
+* 1-byte version number is recognized
+* 4-byte CRC-32 matches the other 64 bytes (including magic, and
+version)
+
+Once verified, the other fields of the footer can be accessed.
+
+Empty tables
+++++++++++++
+
+A reftable may be empty. In this case, the file starts with a header
+and is immediately followed by a footer.
+
+Binary search
+^^^^^^^^^^^^^
+
+Binary search within a block is supported by the `restart_offset` fields
+at the end of the block. Readers can binary search through the restart
+table to locate between which two restart points the sought reference or
+key should appear.
+
+Each record identified by a `restart_offset` stores the complete key in
+the `suffix` field of the record, making the compare operation during
+binary search straightforward.
+
+Once a restart point lexicographically before the sought reference has
+been identified, readers can linearly scan through the following record
+entries to locate the sought record, terminating if the current record
+sorts after (and therefore the sought key is not present).
+
+Restart point selection
++++++++++++++++++++++++
+
+Writers determine the restart points at file creation. The process is
+arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+more suitable for smaller block sizes (4k or 8k), every 64 for larger
+block sizes (64k).
+
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
+
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalties for readers
+walking through more records after the binary search step.
+
+A maximum of `65535` restart points per block is supported.
+
+Considerations
+~~~~~~~~~~~~~~
+
+Lightweight refs dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format assumes the vast majority of references are single
+object names valued with common prefixes, such as Gerrit Code Review's
+`refs/changes/` namespace, GitHub's `refs/pulls/` namespace, or many
+lightweight tags in the `refs/tags/` namespace.
+
+Annotated tags storing the peeled object cost an additional object name per
+reference.
+
+Low overhead
+^^^^^^^^^^^^
+
+A reftable with very few references (e.g. git.git with 5 heads) is 269
+bytes for reftable, vs. 332 bytes for packed-refs. This supports
+reftable scaling down for transaction logs (below).
+
+Block size
+^^^^^^^^^^
+
+For a Gerrit Code Review type repository with many change refs, larger
+block sizes (64 KiB) and less frequent restart points (every 64) yield
+better compression due to more references within the block compressing
+against the prior reference.
+
+Larger block sizes reduce the index size, as the reftable will require
+fewer blocks to store the same number of references.
+
+Minimal disk seeks
+^^^^^^^^^^^^^^^^^^
+
+Assuming the index block has been loaded into memory, binary searching
+for any single reference requires exactly 1 disk seek to load the
+containing block.
+
+Scans and lookups dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Scanning all references and lookup by name (or namespace such as
+`refs/heads/`) are the most common activities performed on repositories.
+Object names are stored directly with references to optimize this use case.
+
+Logs are infrequently read
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are infrequently accessed, but can be large. Deflating log blocks
+saves disk space, with some increased penalty at read time.
+
+Logs are stored in an isolated section from refs, reducing the burden on
+reference readers that want to ignore logs. Further, historical logs can
+be isolated into log-only files.
+
+Logs are read backwards
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are frequently accessed backwards (most recent N records for master
+to answer `master@{4}`), so log records are grouped by reference, and
+sorted descending by update index.
+
+Repository format
+~~~~~~~~~~~~~~~~~
+
+Version 1
+^^^^^^^^^
+
+A repository must set its `$GIT_DIR/config` to configure reftable:
+
+....
+[core]
+    repositoryformatversion = 1
+[extensions]
+    refStorage = reftable
+....
+
+Layout
+^^^^^^
+
+A collection of reftable files are stored in the `$GIT_DIR/reftable/` directory.
+Their names should have a random element, such that each filename is globally
+unique; this helps avoid spurious failures on Windows, where open files cannot
+be removed or overwritten. It suggested to use
+`${min_update_index}-${max_update_index}-${random}.ref` as a naming convention.
+
+Log-only files use the `.log` extension, while ref-only and mixed ref
+and log files use `.ref`. extension.
+
+The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
+current files, one per line, in order, from oldest (base) to newest
+(most recent):
+
+....
+$ cat .git/reftable/tables.list
+00000001-00000001-RANDOM1.log
+00000002-00000002-RANDOM2.ref
+00000003-00000003-RANDOM3.ref
+....
+
+Readers must read `$GIT_DIR/reftable/tables.list` to determine which
+files are relevant right now, and search through the stack in reverse
+order (last reftable is examined first).
+
+Reftable files not listed in `tables.list` may be new (and about to be
+added to the stack by the active writer), or ancient and ready to be
+pruned.
+
+Backward compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+Older clients should continue to recognize the directory as a git
+repository so they don't look for an enclosing repository in parent
+directories. To this end, a reftable-enabled repository must contain the
+following dummy files
+
+* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
+* `.git/refs/`, a directory
+* `.git/refs/heads`, a regular file
+
+Readers
+^^^^^^^
+
+Readers can obtain a consistent snapshot of the reference space by
+following:
+
+1.  Open and read the `tables.list` file.
+2.  Open each of the reftable files that it mentions.
+3.  If any of the files is missing, goto 1.
+4.  Read from the now-open files as long as necessary.
+
+Update transactions
+^^^^^^^^^^^^^^^^^^^
+
+Although reftables are immutable, mutations are supported by writing a
+new reftable and atomically appending it to the stack:
+
+1.  Acquire `tables.list.lock`.
+2.  Read `tables.list` to determine current reftables.
+3.  Select `update_index` to be most recent file's
+`max_update_index + 1`.
+4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}-${random}.ref`.
+6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
+7.  Rename `tables.list.lock` to `tables.list`.
+
+During step 4 the new file's `min_update_index` and `max_update_index`
+are both set to the `update_index` selected by step 3. All log records
+for the transaction use the same `update_index` in their keys. This
+enables later correlation of which references were updated by the same
+transaction.
+
+Because a single `tables.list.lock` file is used to manage locking, the
+repository is single-threaded for writers. Writers may have to busy-spin
+(with backoff) around creating `tables.list.lock`, for up to an
+acceptable wait period, aborting if the repository is too busy to
+mutate. Application servers wrapped around repositories (e.g. Gerrit
+Code Review) can layer their own lock/wait queue to improve fairness to
+writers.
+
+Reference deletions
+^^^^^^^^^^^^^^^^^^^
+
+Deletion of any reference can be explicitly stored by setting the `type`
+to `0x0` and omitting the `value` field of the `ref_record`. This serves
+as a tombstone, overriding any assertions about the existence of the
+reference from earlier files in the stack.
+
+Compaction
+^^^^^^^^^^
+
+A partial stack of reftables can be compacted by merging references
+using a straightforward merge join across reftables, selecting the most
+recent value for output, and omitting deleted references that do not
+appear in remaining, lower reftables.
+
+A compacted reftable should set its `min_update_index` to the smallest
+of the input files' `min_update_index`, and its `max_update_index`
+likewise to the largest input `max_update_index`.
+
+For sake of illustration, assume the stack currently consists of
+reftable files (from oldest to newest): A, B, C, and D. The compactor is
+going to compact B and C, leaving A and D alone.
+
+1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
+2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
+prevents other processes from trying to compact these files.
+3.  Release `tables.list.lock`.
+4.  Compact `B` and `C` into a temp file
+`${min_update_index}-${max_update_index}_XXXXXX`.
+5.  Reacquire lock `tables.list.lock`.
+6.  Verify that `B` and `C` are still in the stack, in that order. This
+should always be the case, assuming that other processes are adhering to
+the locking protocol.
+7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
+`${min_update_index}-${max_update_index}-${random}.ref`.
+8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
+with the file from (4).
+9.  Rename `tables.list.lock` to `tables.list`.
+10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
+readers to backtrack.
+
+This strategy permits compactions to proceed independently of updates.
+
+Each reftable (compacted or not) is uniquely identified by its name, so
+open reftables can be cached by their name.
+
+Windows
+^^^^^^^
+
+On windows, and other systems that do not allow deleting or renaming to open
+files, compaction may succeed, but other readers may prevent obsolete tables
+from being deleted.
+
+On these platforms, the following strategy can be followed: on closing a
+reftable stack, reload `tables.list`, and delete any tables no longer mentioned
+in `tables.list`.
+
+Irregular program exit may still leave about unused files. In this case, a
+cleanup operation can read `tables.list`, note its modification timestamp, and
+delete any unreferenced `*.ref` files that are older.
+
+
+Alternatives considered
+~~~~~~~~~~~~~~~~~~~~~~~
+
+bzip packed-refs
+^^^^^^^^^^^^^^^^
+
+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
+compresses to 23 MiB, 37%). However the bzip format does not support
+random access to a single reference. Readers must inflate and discard
+while performing a linear scan.
+
+Breaking packed-refs into chunks (individually compressing each chunk)
+would reduce the amount of data a reader must inflate, but still leaves
+the problem of indexing chunks to support readers efficiently locating
+the correct chunk.
+
+Given the compression achieved by reftable's encoding, it does not seem
+necessary to add the complexity of bzip/gzip/zlib.
+
+Michael Haggerty's alternate format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Michael Haggerty proposed
+link:https://lore.kernel.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe%2BwGvEJ7A%40mail.gmail.com/[an
+alternate] format to reftable on the Git mailing list. This format uses
+smaller chunks, without the restart table, and avoids block alignment
+with padding. Reflog entries immediately follow each ref, and are thus
+interleaved between refs.
+
+Performance testing indicates reftable is faster for lookups (51%
+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
+larger file (+ ~3.2%, 28.3M vs 29.2M):
+
+[cols=">,>,>,>",options="header",]
+|=====================================
+|format |size |seek cold |seek hot
+|mh-alt |28.3 M |23.4 usec |11.2 usec
+|reftable |29.2 M |19.9 usec |5.4 usec
+|=====================================
+
+JGit Ketch RefTree
+^^^^^^^^^^^^^^^^^^
+
+https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
+proposed
+link:https://lore.kernel.org/git/CAJo%3DhJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg%40mail.gmail.com/[RefTree],
+an encoding of references inside Git tree objects stored as part of the
+repository's object database.
+
+The RefTree format adds additional load on the object database storage
+layer (more loose objects, more objects in packs), and relies heavily on
+the packer's delta compression to save space. Namespaces which are flat
+(e.g. thousands of tags in refs/tags) initially create very large loose
+objects, and so RefTree does not address the problem of copying many
+references to modify a handful.
+
+Flat namespaces are not efficiently searchable in RefTree, as tree
+objects in canonical formatting cannot be binary searched. This fails
+the need to handle a large number of references in a single namespace,
+such as GitHub's `refs/pulls`, or a project with many tags.
+
+LMDB
+^^^^
+
+David Turner proposed
+https://lore.kernel.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
+LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
+license.
+
+A downside of LMDB is its reliance on a single C implementation. This
+makes embedding inside JGit (a popular reimplementation of Git)
+difficult, and hoisting onto virtual storage (for JGit DFS) virtually
+impossible.
+
+A common format that can be supported by all major Git implementations
+(git-core, JGit, libgit2) is strongly preferred.
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
new file mode 100644
index 0000000000..7844ef30ff
--- /dev/null
+++ b/Documentation/technical/repository-version.txt
@@ -0,0 +1,102 @@
+== Git Repository Format Versions
+
+Every git repository is marked with a numeric version in the
+`core.repositoryformatversion` key of its `config` file. This version
+specifies the rules for operating on the on-disk repository data. An
+implementation of git which does not understand a particular version
+advertised by an on-disk repository MUST NOT operate on that repository;
+doing so risks not only producing wrong results, but actually losing
+data.
+
+Because of this rule, version bumps should be kept to an absolute
+minimum. Instead, we generally prefer these strategies:
+
+  - bumping format version numbers of individual data files (e.g.,
+    index, packfiles, etc). This restricts the incompatibilities only to
+    those files.
+
+  - introducing new data that gracefully degrades when used by older
+    clients (e.g., pack bitmap files are ignored by older clients, which
+    simply do not take advantage of the optimization they provide).
+
+A whole-repository format version bump should only be part of a change
+that cannot be independently versioned. For instance, if one were to
+change the reachability rules for objects, or the rules for locking
+refs, that would require a bump of the repository format version.
+
+Note that this applies only to accessing the repository's disk contents
+directly. An older client which understands only format `0` may still
+connect via `git://` to a repository using format `1`, as long as the
+server process understands format `1`.
+
+The preferred strategy for rolling out a version bump (whether whole
+repository or for a single file) is to teach git to read the new format,
+and allow writing the new format with a config switch or command line
+option (for experimentation or for those who do not care about backwards
+compatibility with older gits). Then after a long period to allow the
+reading capability to become common, we may switch to writing the new
+format by default.
+
+The currently defined format versions are:
+
+=== Version `0`
+
+This is the format defined by the initial version of git, including but
+not limited to the format of the repository directory, the repository
+configuration file, and the object and ref storage. Specifying the
+complete behavior of git is beyond the scope of this document.
+
+=== Version `1`
+
+This format is identical to version `0`, with the following exceptions:
+
+  1. When reading the `core.repositoryformatversion` variable, a git
+     implementation which supports version 1 MUST also read any
+     configuration keys found in the `extensions` section of the
+     configuration file.
+
+  2. If a version-1 repository specifies any `extensions.*` keys that
+     the running git has not implemented, the operation MUST NOT
+     proceed. Similarly, if the value of any known key is not understood
+     by the implementation, the operation MUST NOT proceed.
+
+Note that if no extensions are specified in the config file, then
+`core.repositoryformatversion` SHOULD be set to `0` (setting it to `1`
+provides no benefit, and makes the repository incompatible with older
+implementations of git).
+
+This document will serve as the master list for extensions. Any
+implementation wishing to define a new extension should make a note of
+it here, in order to claim the name.
+
+The defined extensions are:
+
+==== `noop`
+
+This extension does not change git's behavior at all. It is useful only
+for testing format-1 compatibility.
+
+==== `preciousObjects`
+
+When the config key `extensions.preciousObjects` is set to `true`,
+objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
+`git repack -d`).
+
+==== `partialclone`
+
+When the config key `extensions.partialclone` is set, it indicates
+that the repo was created with a partial clone (or later performed
+a partial fetch) and that the remote may have omitted sending
+certain unwanted objects.  Such a remote is called a "promisor remote"
+and it promises that all such omitted objects can be fetched from it
+in the future.
+
+The value of this key is the name of the promisor remote.
+
+==== `worktreeConfig`
+
+If set, by default "git config" reads from both "config" and
+"config.worktree" file from GIT_DIR in that order. In
+multiple working directory mode, "config" file is shared while
+"config.worktree" is per-working directory (i.e., it's in
+GIT_COMMON_DIR/worktrees/<id>/config.worktree)
diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..af5f9fc24f
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,186 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Different conflict styles and branch names are normalized by stripping
+the labels from the conflict markers, and removing the common ancestor
+version from the `diff3` conflict style. Branches that are merged
+in different order are normalized by sorting the conflict hunks.  More
+on each of those steps in the following sections.
+
+Once these two normalization operations are applied, a conflict ID is
+calculated based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Removing the common ancestor version
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Say we have three branches AB, AC and AC2.  The common ancestor of
+these branches has a file with a line containing the string "A" (for
+brevity this is called "line A" in the rest of the document).  In
+branch AB this line is changed to "B", in AC, this line is changed to
+"C", and branch AC2 is forked off of AC, after the line was changed to
+"C".
+
+Forking a branch ABAC off of branch AB and then merging AC into it, we
+get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    =======
+    C
+    >>>>>>> AC
+
+Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
+and then merging branch AC2 into it), using the diff3 conflict style,
+we get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    ||||||| merged common ancestors
+    A
+    =======
+    C
+    >>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+    After examining what branches AB and AC did, I believe that making
+    line A into line D is the best thing to do that is compatible with
+    what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same.  To do this, the labels on the conflict
+markers are stripped, and the common ancestor version is removed.  The above
+examples would both result in the following normalized conflict:
+
+    <<<<<<<
+    B
+    =======
+    C
+    >>>>>>>
+
+Sorting hunks
+~~~~~~~~~~~~~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part.  And then four branches
+are forked that do these things:
+
+    - AB: changes A to B
+    - AC: changes A to C
+    - XY: changes X to Y
+    - XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order.  The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+    After examining what branches AB and AC did, I believe that
+    making line A into line D is the best thing to do that is
+    compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean.  We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z".  We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or C, a late part became X or Y", and whenever
+any of these four patterns appear, and we can get to the same conflict
+and resolution that we saw earlier.
+
+Without the sorting, we'd have to somehow find a previous resolution
+from combinatorial explosion.
+
+Conflict ID calculation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the conflict normalization is done, the conflict ID is calculated
+as the sha1 hash of the conflict hunks appended to each other,
+separated by <NUL> characters.  The conflict markers are stripped out
+before the sha1 is calculated.  So in the example above, where we
+merge branch AC which changes line A to line C, into branch AB, which
+changes line A to line C, the conflict ID would be
+SHA1('B<NUL>C<NUL>').
+
+If there are multiple conflicts in one file, the sha1 is calculated
+the same way with all hunks appended to each other, in the order in
+which they appear in the file, separated by a <NUL> character.
+
+Nested conflicts
+~~~~~~~~~~~~~~~~
+
+Nested conflicts are handled very similarly to "simple" conflicts.
+Similar to simple conflicts, the conflict is first normalized by
+stripping the labels from conflict markers, stripping the common ancestor
+version, and the sorting the conflict hunks, both for the outer and the
+inner conflict.  This is done recursively, so any number of nested
+conflicts can be handled.
+
+Note that this only works for conflict markers that "cleanly nest".  If
+there are any unmatched conflict markers, rerere will fail to handle
+the conflict and record a conflict resolution.
+
+The only difference is in how the conflict ID is calculated.  For the
+inner conflict, the conflict markers themselves are not stripped out
+before calculating the sha1.
+
+Say we have the following conflict for example:
+
+    <<<<<<< HEAD
+    1
+    =======
+    <<<<<<< HEAD
+    3
+    =======
+    2
+    >>>>>>> branch-2
+    >>>>>>> branch-3~
+
+After stripping out the labels of the conflict markers, and sorting
+the hunks, the conflict would look as follows:
+
+    <<<<<<<
+    1
+    =======
+    <<<<<<<
+    2
+    =======
+    3
+    >>>>>>>
+    >>>>>>>
+
+and finally the conflict ID would be calculated as:
+`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
diff --git a/Documentation/technical/send-pack-pipeline.txt b/Documentation/technical/send-pack-pipeline.txt
new file mode 100644
index 0000000000..9b5a0bc186
--- /dev/null
+++ b/Documentation/technical/send-pack-pipeline.txt
@@ -0,0 +1,63 @@
+Git-send-pack internals
+=======================
+
+Overall operation
+-----------------
+
+. Connects to the remote side and invokes git-receive-pack.
+
+. Learns what refs the remote has and what commit they point at.
+  Matches them to the refspecs we are pushing.
+
+. Checks if there are non-fast-forwards.  Unlike fetch-pack,
+  the repository send-pack runs in is supposed to be a superset
+  of the recipient in fast-forward cases, so there is no need
+  for want/have exchanges, and fast-forward check can be done
+  locally.  Tell the result to the other end.
+
+. Calls pack_objects() which generates a packfile and sends it
+  over to the other end.
+
+. If the remote side is new enough (v1.1.0 or later), wait for
+  the unpack and hook status from the other end.
+
+. Exit with appropriate error codes.
+
+
+Pack_objects pipeline
+---------------------
+
+This function gets one file descriptor (`fd`) which is either a
+socket (over the network) or a pipe (local).  What's written to
+this fd goes to git-receive-pack to be unpacked.
+
+    send-pack ---> fd ---> receive-pack
+
+The function pack_objects creates a pipe and then forks.  The
+forked child execs pack-objects with --revs to receive revision
+parameters from its standard input. This process will write the
+packfile to the other end.
+
+    send-pack
+       |
+       pack_objects() ---> fd ---> receive-pack
+          | ^ (pipe)
+	  v |
+         (child)
+
+The child dup2's to arrange its standard output to go back to
+the other end, and read its standard input to come from the
+pipe.  After that it exec's pack-objects.  On the other hand,
+the parent process, before starting to feed the child pipeline,
+closes the reading side of the pipe and fd to receive-pack.
+
+    send-pack
+       |
+       pack_objects(parent)
+          |
+	  v [0]
+         pack-objects [0] ---> receive-pack
+
+
+[jc: the pipeline was much more complex and needed documentation before
+ I understood an earlier bug, but now it is trivial and straightforward.]
diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt
new file mode 100644
index 0000000000..f3738baa0f
--- /dev/null
+++ b/Documentation/technical/shallow.txt
@@ -0,0 +1,60 @@
+Shallow commits
+===============
+
+.Definition
+*********************************************************
+Shallow commits do have parents, but not in the shallow
+repo, and therefore grafts are introduced pretending that
+these commits have no parents.
+*********************************************************
+
+$GIT_DIR/shallow lists commit object names and tells Git to
+pretend as if they are root commits (e.g. "git log" traversal
+stops after showing them; "git fsck" does not complain saying
+the commits listed on their "parent" lines do not exist).
+
+Each line contains exactly one object name. When read, a commit_graft
+will be constructed, which has nr_parent < 0 to make it easier
+to discern from user provided grafts.
+
+Note that the shallow feature could not be changed easily to
+use replace refs: a commit containing a `mergetag` is not allowed
+to be replaced, not even by a root commit. Such a commit can be
+made shallow, though. Also, having a `shallow` file explicitly
+listing all the commits made shallow makes it a *lot* easier to
+do shallow-specific things such as to deepen the history.
+
+Since fsck-objects relies on the library to read the objects,
+it honours shallow commits automatically.
+
+There are some unfinished ends of the whole shallow business:
+
+- maybe we have to force non-thin packs when fetching into a
+  shallow repo (ATM they are forced non-thin).
+
+- A special handling of a shallow upstream is needed. At some
+  stage, upload-pack has to check if it sends a shallow commit,
+  and it should send that information early (or fail, if the
+  client does not support shallow repositories). There is no
+  support at all for this in this patch series.
+
+- Instead of locking $GIT_DIR/shallow at the start, just
+  the timestamp of it is noted, and when it comes to writing it,
+  a check is performed if the mtime is still the same, dying if
+  it is not.
+
+- It is unclear how "push into/from a shallow repo" should behave.
+
+- If you deepen a history, you'd want to get the tags of the
+  newly stored (but older!) commits. This does not work right now.
+
+To make a shallow clone, you can call "git-clone --depth 20 repo".
+The result contains only commit chains with a length of at most 20.
+It also writes an appropriate $GIT_DIR/shallow.
+
+You can deepen a shallow repository with "git-fetch --depth 20
+repo branch", which will fetch branch from repo, but stop at depth
+20, updating $GIT_DIR/shallow.
+
+The special depth 2147483647 (or 0x7fffffff, the largest positive
+number a signed 32-bit integer can contain) means infinite depth.
diff --git a/Documentation/technical/signature-format.txt b/Documentation/technical/signature-format.txt
new file mode 100644
index 0000000000..2c9406a56a
--- /dev/null
+++ b/Documentation/technical/signature-format.txt
@@ -0,0 +1,186 @@
+Git signature format
+====================
+
+== Overview
+
+Git uses cryptographic signatures in various places, currently objects (tags,
+commits, mergetags) and transactions (pushes). In every case, the command which
+is about to create an object or transaction determines a payload from that,
+calls gpg to obtain a detached signature for the payload (`gpg -bsa`) and
+embeds the signature into the object or transaction.
+
+Signatures always begin with `-----BEGIN PGP SIGNATURE-----`
+and end with `-----END PGP SIGNATURE-----`, unless gpg is told to
+produce RFC1991 signatures which use `MESSAGE` instead of `SIGNATURE`.
+
+The signed payload and the way the signature is embedded depends
+on the type of the object resp. transaction.
+
+== Tag signatures
+
+- created by: `git tag -s`
+- payload: annotated tag object
+- embedding: append the signature to the unsigned tag object
+- example: tag `signedtag` with subject `signed tag`
+
+----
+object 04b871796dc0420f8e7561a895b52484b701d51a
+type commit
+tag signedtag
+tagger C O Mitter <committer@example.com> 1465981006 +0000
+
+signed tag
+
+signed tag message body
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1
+
+iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
+rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
+8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
+q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
+rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
+lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
+=jpXa
+-----END PGP SIGNATURE-----
+----
+
+- verify with: `git verify-tag [-v]` or `git tag -v`
+
+----
+gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
+gpg: Good signature from "Eris Discordia <discord@example.net>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg:          There is no indication that the signature belongs to the owner.
+Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+object 04b871796dc0420f8e7561a895b52484b701d51a
+type commit
+tag signedtag
+tagger C O Mitter <committer@example.com> 1465981006 +0000
+
+signed tag
+
+signed tag message body
+----
+
+== Commit signatures
+
+- created by: `git commit -S`
+- payload: commit object
+- embedding: header entry `gpgsig`
+  (content is preceded by a space)
+- example: commit with subject `signed commit`
+
+----
+tree eebfed94e75e7760540d1485c740902590a00332
+parent 04b871796dc0420f8e7561a895b52484b701d51a
+author A U Thor <author@example.com> 1465981137 +0000
+committer C O Mitter <committer@example.com> 1465981137 +0000
+gpgsig -----BEGIN PGP SIGNATURE-----
+ Version: GnuPG v1
+
+ iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
+ HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
+ DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
+ zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
+ HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
+ EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
+ =jKHM
+ -----END PGP SIGNATURE-----
+
+signed commit
+
+signed commit message body
+----
+
+- verify with: `git verify-commit [-v]` (or `git show --show-signature`)
+
+----
+gpg: Signature made Wed Jun 15 10:58:57 2016 CEST using RSA key ID B7227189
+gpg: Good signature from "Eris Discordia <discord@example.net>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg:          There is no indication that the signature belongs to the owner.
+Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+tree eebfed94e75e7760540d1485c740902590a00332
+parent 04b871796dc0420f8e7561a895b52484b701d51a
+author A U Thor <author@example.com> 1465981137 +0000
+committer C O Mitter <committer@example.com> 1465981137 +0000
+
+signed commit
+
+signed commit message body
+----
+
+== Mergetag signatures
+
+- created by: `git merge` on signed tag
+- payload/embedding: the whole signed tag object is embedded into
+  the (merge) commit object as header entry `mergetag`
+- example: merge of the signed tag `signedtag` as above
+
+----
+tree c7b1cff039a93f3600a1d18b82d26688668c7dea
+parent c33429be94b5f2d3ee9b0adad223f877f174b05d
+parent 04b871796dc0420f8e7561a895b52484b701d51a
+author A U Thor <author@example.com> 1465982009 +0000
+committer C O Mitter <committer@example.com> 1465982009 +0000
+mergetag object 04b871796dc0420f8e7561a895b52484b701d51a
+ type commit
+ tag signedtag
+ tagger C O Mitter <committer@example.com> 1465981006 +0000
+
+ signed tag
+
+ signed tag message body
+ -----BEGIN PGP SIGNATURE-----
+ Version: GnuPG v1
+
+ iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
+ rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
+ 8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
+ q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
+ rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
+ lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
+ =jpXa
+ -----END PGP SIGNATURE-----
+
+Merge tag 'signedtag' into downstream
+
+signed tag
+
+signed tag message body
+
+# gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
+# gpg: Good signature from "Eris Discordia <discord@example.net>"
+# gpg: WARNING: This key is not certified with a trusted signature!
+# gpg:          There is no indication that the signature belongs to the owner.
+# Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+----
+
+- verify with: verification is embedded in merge commit message by default,
+  alternatively with `git show --show-signature`:
+
+----
+commit 9863f0c76ff78712b6800e199a46aa56afbcbd49
+merged tag 'signedtag'
+gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
+gpg: Good signature from "Eris Discordia <discord@example.net>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg:          There is no indication that the signature belongs to the owner.
+Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+Merge: c33429b 04b8717
+Author: A U Thor <author@example.com>
+Date:   Wed Jun 15 09:13:29 2016 +0000
+
+    Merge tag 'signedtag' into downstream
+
+    signed tag
+
+    signed tag message body
+
+    # gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
+    # gpg: Good signature from "Eris Discordia <discord@example.net>"
+    # gpg: WARNING: This key is not certified with a trusted signature!
+    # gpg:          There is no indication that the signature belongs to the owner.
+    # Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA  29A4 6109 2E85 B722 7189
+----
diff --git a/Documentation/technical/trivial-merge.txt b/Documentation/technical/trivial-merge.txt
new file mode 100644
index 0000000000..1f1c33d0da
--- /dev/null
+++ b/Documentation/technical/trivial-merge.txt
@@ -0,0 +1,121 @@
+Trivial merge rules
+===================
+
+This document describes the outcomes of the trivial merge logic in read-tree.
+
+One-way merge
+-------------
+
+This replaces the index with a different tree, keeping the stat info
+for entries that don't change, and allowing -u to make the minimum
+required changes to the working tree to have it match.
+
+Entries marked '+' have stat information. Spaces marked '*' don't
+affect the result.
+
+   index   tree    result
+   -----------------------
+   *       (empty) (empty)
+   (empty) tree    tree
+   index+  tree    tree
+   index+  index   index+
+
+Two-way merge
+-------------
+
+It is permitted for the index to lack an entry; this does not prevent
+any case from applying.
+
+If the index exists, it is an error for it not to match either the old
+or the result.
+
+If multiple cases apply, the one used is listed first.
+
+A result which changes the index is an error if the index is not empty
+and not up to date.
+
+Entries marked '+' have stat information. Spaces marked '*' don't
+affect the result.
+
+ case  index   old     new     result
+ -------------------------------------
+ 0/2   (empty) *       (empty) (empty)
+ 1/3   (empty) *       new     new
+ 4/5   index+  (empty) (empty) index+
+ 6/7   index+  (empty) index   index+
+ 10    index+  index   (empty) (empty)
+ 14/15 index+  old     old     index+
+ 18/19 index+  old     index   index+
+ 20    index+  index   new     new
+
+Three-way merge
+---------------
+
+It is permitted for the index to lack an entry; this does not prevent
+any case from applying.
+
+If the index exists, it is an error for it not to match either the
+head or (if the merge is trivial) the result.
+
+If multiple cases apply, the one used is listed first.
+
+A result of "no merge" means that index is left in stage 0, ancest in
+stage 1, head in stage 2, and remote in stage 3 (if any of these are
+empty, no entry is left for that stage). Otherwise, the given entry is
+left in stage 0, and there are no other entries.
+
+A result of "no merge" is an error if the index is not empty and not
+up to date.
+
+*empty* means that the tree must not have a directory-file conflict
+ with the entry.
+
+For multiple ancestors, a '+' means that this case applies even if
+only one ancestor or remote fits; a '^' means all of the ancestors
+must be the same.
+
+ case  ancest    head    remote    result
+ ----------------------------------------
+ 1     (empty)+  (empty) (empty)   (empty)
+ 2ALT  (empty)+  *empty* remote    remote
+ 2     (empty)^  (empty) remote    no merge
+ 3ALT  (empty)+  head    *empty*   head
+ 3     (empty)^  head    (empty)   no merge
+ 4     (empty)^  head    remote    no merge
+ 5ALT  *         head    head      head
+ 6     ancest+   (empty) (empty)   no merge
+ 8     ancest^   (empty) ancest    no merge
+ 7     ancest+   (empty) remote    no merge
+ 10    ancest^   ancest  (empty)   no merge
+ 9     ancest+   head    (empty)   no merge
+ 16    anc1/anc2 anc1    anc2      no merge
+ 13    ancest+   head    ancest    head
+ 14    ancest+   ancest  remote    remote
+ 11    ancest+   head    remote    no merge
+
+Only #2ALT and #3ALT use *empty*, because these are the only cases
+where there can be conflicts that didn't exist before. Note that we
+allow directory-file conflicts between things in different stages
+after the trivial merge.
+
+A possible alternative for #6 is (empty), which would make it like
+#1. This is not used, due to the likelihood that it arises due to
+moving the file to multiple different locations or moving and deleting
+it in different branches.
+
+Case #1 is included for completeness, and also in case we decide to
+put on '+' markings; any path that is never mentioned at all isn't
+handled.
+
+Note that #16 is when both #13 and #14 apply; in this case, we refuse
+the trivial merge, because we can't tell from this data which is
+right. This is a case of a reverted patch (in some direction, maybe
+multiple times), and the right answer depends on looking at crossings
+of history or common ancestors of the ancestors.
+
+Note that, between #6, #7, #9, and #11, all cases not otherwise
+covered are handled in this table.
+
+For #8 and #10, there is alternative behavior, not currently
+implemented, where the result is (empty). As currently implemented,
+the automatic merge will generally give this effect.