summaryrefslogtreecommitdiff
path: root/Documentation/technical
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical')
-rw-r--r--Documentation/technical/.gitignore1
-rw-r--r--Documentation/technical/api-allocation-growing.txt39
-rw-r--r--Documentation/technical/api-argv-array.txt65
-rw-r--r--Documentation/technical/api-builtin.txt73
-rw-r--r--Documentation/technical/api-config.txt317
-rw-r--r--Documentation/technical/api-credentials.txt271
-rw-r--r--Documentation/technical/api-decorate.txt6
-rw-r--r--Documentation/technical/api-diff.txt174
-rw-r--r--Documentation/technical/api-directory-listing.txt105
-rw-r--r--Documentation/technical/api-error-handling.txt75
-rw-r--r--Documentation/technical/api-gitattributes.txt128
-rw-r--r--Documentation/technical/api-grep.txt8
-rw-r--r--Documentation/technical/api-hashmap.txt285
-rw-r--r--Documentation/technical/api-history-graph.txt174
-rw-r--r--Documentation/technical/api-in-core-index.txt21
-rw-r--r--Documentation/technical/api-index-skel.txt13
-rwxr-xr-xDocumentation/technical/api-index.sh28
-rw-r--r--Documentation/technical/api-merge.txt104
-rw-r--r--Documentation/technical/api-object-access.txt15
-rw-r--r--Documentation/technical/api-parse-options.txt310
-rw-r--r--Documentation/technical/api-quote.txt10
-rw-r--r--Documentation/technical/api-ref-iteration.txt81
-rw-r--r--Documentation/technical/api-remote.txt127
-rw-r--r--Documentation/technical/api-revision-walking.txt72
-rw-r--r--Documentation/technical/api-run-command.txt264
-rw-r--r--Documentation/technical/api-setup.txt49
-rw-r--r--Documentation/technical/api-sha1-array.txt80
-rw-r--r--Documentation/technical/api-sigchain.txt41
-rw-r--r--Documentation/technical/api-string-list.txt209
-rw-r--r--Documentation/technical/api-submodule-config.txt66
-rw-r--r--Documentation/technical/api-trace.txt140
-rw-r--r--Documentation/technical/api-tree-walking.txt147
-rw-r--r--Documentation/technical/api-xdiff-interface.txt7
-rw-r--r--Documentation/technical/bitmap-format.txt164
-rw-r--r--Documentation/technical/http-protocol.txt507
-rw-r--r--Documentation/technical/index-format.txt297
-rw-r--r--Documentation/technical/pack-format.txt162
-rw-r--r--Documentation/technical/pack-heuristics.txt460
-rw-r--r--Documentation/technical/pack-protocol.txt607
-rw-r--r--Documentation/technical/protocol-capabilities.txt311
-rw-r--r--Documentation/technical/protocol-common.txt99
-rw-r--r--Documentation/technical/racy-git.txt201
-rw-r--r--Documentation/technical/repository-version.txt88
-rw-r--r--Documentation/technical/send-pack-pipeline.txt63
-rw-r--r--Documentation/technical/shallow.txt58
-rw-r--r--Documentation/technical/signature-format.txt186
-rw-r--r--Documentation/technical/trivial-merge.txt121
47 files changed, 6829 insertions, 0 deletions
diff --git a/Documentation/technical/.gitignore b/Documentation/technical/.gitignore
new file mode 100644
index 0000000000..8aa891daee
--- /dev/null
+++ b/Documentation/technical/.gitignore
@@ -0,0 +1 @@
+api-index.txt
diff --git a/Documentation/technical/api-allocation-growing.txt b/Documentation/technical/api-allocation-growing.txt
new file mode 100644
index 0000000000..5a59b54844
--- /dev/null
+++ b/Documentation/technical/api-allocation-growing.txt
@@ -0,0 +1,39 @@
+allocation growing API
+======================
+
+Dynamically growing an array using realloc() is error prone and boring.
+
+Define your array with:
+
+* a pointer (`item`) that points at the array, initialized to `NULL`
+ (although please name the variable based on its contents, not on its
+ type);
+
+* an integer variable (`alloc`) that keeps track of how big the current
+ allocation is, initialized to `0`;
+
+* another integer variable (`nr`) to keep track of how many elements the
+ array currently has, initialized to `0`.
+
+Then before adding `n`th element to the item, call `ALLOC_GROW(item, n,
+alloc)`. This ensures that the array can hold at least `n` elements by
+calling `realloc(3)` and adjusting `alloc` variable.
+
+------------
+sometype *item;
+size_t nr;
+size_t alloc
+
+for (i = 0; i < nr; i++)
+ if (we like item[i] already)
+ return;
+
+/* we did not like any existing one, so add one */
+ALLOC_GROW(item, nr + 1, alloc);
+item[nr++] = value you like;
+------------
+
+You are responsible for updating the `nr` variable.
+
+If you need to specify the number of elements to allocate explicitly
+then use the macro `REALLOC_ARRAY(item, alloc)` instead of `ALLOC_GROW`.
diff --git a/Documentation/technical/api-argv-array.txt b/Documentation/technical/api-argv-array.txt
new file mode 100644
index 0000000000..cfc063018c
--- /dev/null
+++ b/Documentation/technical/api-argv-array.txt
@@ -0,0 +1,65 @@
+argv-array API
+==============
+
+The argv-array API allows one to dynamically build and store
+NULL-terminated lists. An argv-array maintains the invariant that the
+`argv` member always points to a non-NULL array, and that the array is
+always NULL-terminated at the element pointed to by `argv[argc]`. This
+makes the result suitable for passing to functions expecting to receive
+argv from main(), or the link:api-run-command.html[run-command API].
+
+The link:api-string-list.html[string-list API] is similar, but cannot be
+used for these purposes; instead of storing a straight string pointer,
+it contains an item structure with a `util` field that is not compatible
+with the traditional argv interface.
+
+Each `argv_array` manages its own memory. Any strings pushed into the
+array are duplicated, and all memory is freed by argv_array_clear().
+
+Data Structures
+---------------
+
+`struct argv_array`::
+
+ A single array. This should be initialized by assignment from
+ `ARGV_ARRAY_INIT`, or by calling `argv_array_init`. The `argv`
+ member contains the actual array; the `argc` member contains the
+ number of elements in the array, not including the terminating
+ NULL.
+
+Functions
+---------
+
+`argv_array_init`::
+ Initialize an array. This is no different than assigning from
+ `ARGV_ARRAY_INIT`.
+
+`argv_array_push`::
+ Push a copy of a string onto the end of the array.
+
+`argv_array_pushl`::
+ Push a list of strings onto the end of the array. The arguments
+ should be a list of `const char *` strings, terminated by a NULL
+ argument.
+
+`argv_array_pushf`::
+ Format a string and push it onto the end of the array. This is a
+ convenience wrapper combining `strbuf_addf` and `argv_array_push`.
+
+`argv_array_pushv`::
+ Push a null-terminated array of strings onto the end of the array.
+
+`argv_array_pop`::
+ Remove the final element from the array. If there are no
+ elements in the array, do nothing.
+
+`argv_array_clear`::
+ Free all memory associated with the array and return it to the
+ initial, empty state.
+
+`argv_array_detach`::
+ Disconnect the `argv` member from the `argv_array` struct and
+ return it. The caller is responsible for freeing the memory used
+ by the array, and by the strings it references. After detaching,
+ the `argv_array` is in a reinitialized state and can be pushed
+ into again.
diff --git a/Documentation/technical/api-builtin.txt b/Documentation/technical/api-builtin.txt
new file mode 100644
index 0000000000..22a39b9299
--- /dev/null
+++ b/Documentation/technical/api-builtin.txt
@@ -0,0 +1,73 @@
+builtin API
+===========
+
+Adding a new built-in
+---------------------
+
+There are 4 things to do to add a built-in command implementation to
+Git:
+
+. Define the implementation of the built-in command `foo` with
+ signature:
+
+ int cmd_foo(int argc, const char **argv, const char *prefix);
+
+. Add the external declaration for the function to `builtin.h`.
+
+. Add the command to the `commands[]` table defined in `git.c`.
+ The entry should look like:
+
+ { "foo", cmd_foo, <options> },
++
+where options is the bitwise-or of:
+
+`RUN_SETUP`::
+ If there is not a Git directory to work on, abort. If there
+ is a work tree, chdir to the top of it if the command was
+ invoked in a subdirectory. If there is no work tree, no
+ chdir() is done.
+
+`RUN_SETUP_GENTLY`::
+ If there is a Git directory, chdir as per RUN_SETUP, otherwise,
+ don't chdir anywhere.
+
+`USE_PAGER`::
+
+ If the standard output is connected to a tty, spawn a pager and
+ feed our output to it.
+
+`NEED_WORK_TREE`::
+
+ Make sure there is a work tree, i.e. the command cannot act
+ on bare repositories.
+ This only makes sense when `RUN_SETUP` is also set.
+
+. Add `builtin/foo.o` to `BUILTIN_OBJS` in `Makefile`.
+
+Additionally, if `foo` is a new command, there are 3 more things to do:
+
+. Add tests to `t/` directory.
+
+. Write documentation in `Documentation/git-foo.txt`.
+
+. Add an entry for `git-foo` to `command-list.txt`.
+
+. Add an entry for `/git-foo` to `.gitignore`.
+
+
+How a built-in is called
+------------------------
+
+The implementation `cmd_foo()` takes three parameters, `argc`, `argv,
+and `prefix`. The first two are similar to what `main()` of a
+standalone command would be called with.
+
+When `RUN_SETUP` is specified in the `commands[]` table, and when you
+were started from a subdirectory of the work tree, `cmd_foo()` is called
+after chdir(2) to the top of the work tree, and `prefix` gets the path
+to the subdirectory the command started from. This allows you to
+convert a user-supplied pathname (typically relative to that directory)
+to a pathname relative to the top of the work tree.
+
+The return value from `cmd_foo()` becomes the exit status of the
+command.
diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
new file mode 100644
index 0000000000..20741f345e
--- /dev/null
+++ b/Documentation/technical/api-config.txt
@@ -0,0 +1,317 @@
+config API
+==========
+
+The config API gives callers a way to access Git configuration files
+(and files which have the same syntax). See linkgit:git-config[1] for a
+discussion of the config file syntax.
+
+General Usage
+-------------
+
+Config files are parsed linearly, and each variable found is passed to a
+caller-provided callback function. The callback function is responsible
+for any actions to be taken on the config option, and is free to ignore
+some options. It is not uncommon for the configuration to be parsed
+several times during the run of a Git program, with different callbacks
+picking out different variables useful to themselves.
+
+A config callback function takes three parameters:
+
+- the name of the parsed variable. This is in canonical "flat" form: the
+ section, subsection, and variable segments will be separated by dots,
+ and the section and variable segments will be all lowercase. E.g.,
+ `core.ignorecase`, `diff.SomeType.textconv`.
+
+- the value of the found variable, as a string. If the variable had no
+ value specified, the value will be NULL (typically this means it
+ should be interpreted as boolean true).
+
+- a void pointer passed in by the caller of the config API; this can
+ contain callback-specific data
+
+A config callback should return 0 for success, or -1 if the variable
+could not be parsed properly.
+
+Basic Config Querying
+---------------------
+
+Most programs will simply want to look up variables in all config files
+that Git knows about, using the normal precedence rules. To do this,
+call `git_config` with a callback function and void data pointer.
+
+`git_config` will read all config sources in order of increasing
+priority. Thus a callback should typically overwrite previously-seen
+entries with new ones (e.g., if both the user-wide `~/.gitconfig` and
+repo-specific `.git/config` contain `color.ui`, the config machinery
+will first feed the user-wide one to the callback, and then the
+repo-specific one; by overwriting, the higher-priority repo-specific
+value is left at the end).
+
+The `git_config_with_options` function lets the caller examine config
+while adjusting some of the default behavior of `git_config`. It should
+almost never be used by "regular" Git code that is looking up
+configuration variables. It is intended for advanced callers like
+`git-config`, which are intentionally tweaking the normal config-lookup
+process. It takes two extra parameters:
+
+`filename`::
+If this parameter is non-NULL, it specifies the name of a file to
+parse for configuration, rather than looking in the usual files. Regular
+`git_config` defaults to `NULL`.
+
+`respect_includes`::
+Specify whether include directives should be followed in parsed files.
+Regular `git_config` defaults to `1`.
+
+Reading Specific Files
+----------------------
+
+To read a specific file in git-config format, use
+`git_config_from_file`. This takes the same callback and data parameters
+as `git_config`.
+
+Querying For Specific Variables
+-------------------------------
+
+For programs wanting to query for specific variables in a non-callback
+manner, the config API provides two functions `git_config_get_value`
+and `git_config_get_value_multi`. They both read values from an internal
+cache generated previously from reading the config files.
+
+`int git_config_get_value(const char *key, const char **value)`::
+
+ Finds the highest-priority value for the configuration variable `key`,
+ stores the pointer to it in `value` and returns 0. When the
+ configuration variable `key` is not found, returns 1 without touching
+ `value`. The caller should not free or modify `value`, as it is owned
+ by the cache.
+
+`const struct string_list *git_config_get_value_multi(const char *key)`::
+
+ Finds and returns the value list, sorted in order of increasing priority
+ for the configuration variable `key`. When the configuration variable
+ `key` is not found, returns NULL. The caller should not free or modify
+ the returned pointer, as it is owned by the cache.
+
+`void git_config_clear(void)`::
+
+ Resets and invalidates the config cache.
+
+The config API also provides type specific API functions which do conversion
+as well as retrieval for the queried variable, including:
+
+`int git_config_get_int(const char *key, int *dest)`::
+
+ Finds and parses the value to an integer for the configuration variable
+ `key`. Dies on error; otherwise, stores the value of the parsed integer in
+ `dest` and returns 0. When the configuration variable `key` is not found,
+ returns 1 without touching `dest`.
+
+`int git_config_get_ulong(const char *key, unsigned long *dest)`::
+
+ Similar to `git_config_get_int` but for unsigned longs.
+
+`int git_config_get_bool(const char *key, int *dest)`::
+
+ Finds and parses the value into a boolean value, for the configuration
+ variable `key` respecting keywords like "true" and "false". Integer
+ values are converted into true/false values (when they are non-zero or
+ zero, respectively). Other values cause a die(). If parsing is successful,
+ stores the value of the parsed result in `dest` and returns 0. When the
+ configuration variable `key` is not found, returns 1 without touching
+ `dest`.
+
+`int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest)`::
+
+ Similar to `git_config_get_bool`, except that integers are copied as-is,
+ and `is_bool` flag is unset.
+
+`int git_config_get_maybe_bool(const char *key, int *dest)`::
+
+ Similar to `git_config_get_bool`, except that it returns -1 on error
+ rather than dying.
+
+`int git_config_get_string_const(const char *key, const char **dest)`::
+
+ Allocates and copies the retrieved string into the `dest` parameter for
+ the configuration variable `key`; if NULL string is given, prints an
+ error message and returns -1. When the configuration variable `key` is
+ not found, returns 1 without touching `dest`.
+
+`int git_config_get_string(const char *key, char **dest)`::
+
+ Similar to `git_config_get_string_const`, except that retrieved value
+ copied into the `dest` parameter is a mutable string.
+
+`int git_config_get_pathname(const char *key, const char **dest)`::
+
+ Similar to `git_config_get_string`, but expands `~` or `~user` into
+ the user's home directory when found at the beginning of the path.
+
+`git_die_config(const char *key, const char *err, ...)`::
+
+ First prints the error message specified by the caller in `err` and then
+ dies printing the line number and the file name of the highest priority
+ value for the configuration variable `key`.
+
+`void git_die_config_linenr(const char *key, const char *filename, int linenr)`::
+
+ Helper function which formats the die error message according to the
+ parameters entered. Used by `git_die_config()`. It can be used by callers
+ handling `git_config_get_value_multi()` to print the correct error message
+ for the desired value.
+
+See test-config.c for usage examples.
+
+Value Parsing Helpers
+---------------------
+
+To aid in parsing string values, the config API provides callbacks with
+a number of helper functions, including:
+
+`git_config_int`::
+Parse the string to an integer, including unit factors. Dies on error;
+otherwise, returns the parsed result.
+
+`git_config_ulong`::
+Identical to `git_config_int`, but for unsigned longs.
+
+`git_config_bool`::
+Parse a string into a boolean value, respecting keywords like "true" and
+"false". Integer values are converted into true/false values (when they
+are non-zero or zero, respectively). Other values cause a die(). If
+parsing is successful, the return value is the result.
+
+`git_config_bool_or_int`::
+Same as `git_config_bool`, except that integers are returned as-is, and
+an `is_bool` flag is unset.
+
+`git_config_maybe_bool`::
+Same as `git_config_bool`, except that it returns -1 on error rather
+than dying.
+
+`git_config_string`::
+Allocates and copies the value string into the `dest` parameter; if no
+string is given, prints an error message and returns -1.
+
+`git_config_pathname`::
+Similar to `git_config_string`, but expands `~` or `~user` into the
+user's home directory when found at the beginning of the path.
+
+Include Directives
+------------------
+
+By default, the config parser does not respect include directives.
+However, a caller can use the special `git_config_include` wrapper
+callback to support them. To do so, you simply wrap your "real" callback
+function and data pointer in a `struct config_include_data`, and pass
+the wrapper to the regular config-reading functions. For example:
+
+-------------------------------------------
+int read_file_with_include(const char *file, config_fn_t fn, void *data)
+{
+ struct config_include_data inc = CONFIG_INCLUDE_INIT;
+ inc.fn = fn;
+ inc.data = data;
+ return git_config_from_file(git_config_include, file, &inc);
+}
+-------------------------------------------
+
+`git_config` respects includes automatically. The lower-level
+`git_config_from_file` does not.
+
+Custom Configsets
+-----------------
+
+A `config_set` can be used to construct an in-memory cache for
+config-like files that the caller specifies (i.e., files like `.gitmodules`,
+`~/.gitconfig` etc.). For example,
+
+---------------------------------------
+struct config_set gm_config;
+git_configset_init(&gm_config);
+int b;
+/* we add config files to the config_set */
+git_configset_add_file(&gm_config, ".gitmodules");
+git_configset_add_file(&gm_config, ".gitmodules_alt");
+
+if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore", &b)) {
+ /* hack hack hack */
+}
+
+/* when we are done with the configset */
+git_configset_clear(&gm_config);
+----------------------------------------
+
+Configset API provides functions for the above mentioned work flow, including:
+
+`void git_configset_init(struct config_set *cs)`::
+
+ Initializes the config_set `cs`.
+
+`int git_configset_add_file(struct config_set *cs, const char *filename)`::
+
+ Parses the file and adds the variable-value pairs to the `config_set`,
+ dies if there is an error in parsing the file. Returns 0 on success, or
+ -1 if the file does not exist or is inaccessible. The user has to decide
+ if he wants to free the incomplete configset or continue using it when
+ the function returns -1.
+
+`int git_configset_get_value(struct config_set *cs, const char *key, const char **value)`::
+
+ Finds the highest-priority value for the configuration variable `key`
+ and config set `cs`, stores the pointer to it in `value` and returns 0.
+ When the configuration variable `key` is not found, returns 1 without
+ touching `value`. The caller should not free or modify `value`, as it
+ is owned by the cache.
+
+`const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)`::
+
+ Finds and returns the value list, sorted in order of increasing priority
+ for the configuration variable `key` and config set `cs`. When the
+ configuration variable `key` is not found, returns NULL. The caller
+ should not free or modify the returned pointer, as it is owned by the cache.
+
+`void git_configset_clear(struct config_set *cs)`::
+
+ Clears `config_set` structure, removes all saved variable-value pairs.
+
+In addition to above functions, the `config_set` API provides type specific
+functions in the vein of `git_config_get_int` and family but with an extra
+parameter, pointer to struct `config_set`.
+They all behave similarly to the `git_config_get*()` family described in
+"Querying For Specific Variables" above.
+
+Writing Config Files
+--------------------
+
+Git gives multiple entry points in the Config API to write config values to
+files namely `git_config_set_in_file` and `git_config_set`, which write to
+a specific config file or to `.git/config` respectively. They both take a
+key/value pair as parameter.
+In the end they both call `git_config_set_multivar_in_file` which takes four
+parameters:
+
+- the name of the file, as a string, to which key/value pairs will be written.
+
+- the name of key, as a string. This is in canonical "flat" form: the section,
+ subsection, and variable segments will be separated by dots, and the section
+ and variable segments will be all lowercase.
+ E.g., `core.ignorecase`, `diff.SomeType.textconv`.
+
+- the value of the variable, as a string. If value is equal to NULL, it will
+ remove the matching key from the config file.
+
+- the value regex, as a string. It will disregard key/value pairs where value
+ does not match.
+
+- a multi_replace value, as an int. If value is equal to zero, nothing or only
+ one matching key/value is replaced, else all matching key/values (regardless
+ how many) are removed, before the new pair is written.
+
+It returns 0 on success.
+
+Also, there are functions `git_config_rename_section` and
+`git_config_rename_section_in_file` with parameters `old_name` and `new_name`
+for renaming or removing sections in the config files. If NULL is passed
+through `new_name` parameter, the section will be removed from the config file.
diff --git a/Documentation/technical/api-credentials.txt b/Documentation/technical/api-credentials.txt
new file mode 100644
index 0000000000..75368f26ca
--- /dev/null
+++ b/Documentation/technical/api-credentials.txt
@@ -0,0 +1,271 @@
+credentials API
+===============
+
+The credentials API provides an abstracted way of gathering username and
+password credentials from the user (even though credentials in the wider
+world can take many forms, in this document the word "credential" always
+refers to a username and password pair).
+
+This document describes two interfaces: the C API that the credential
+subsystem provides to the rest of Git, and the protocol that Git uses to
+communicate with system-specific "credential helpers". If you are
+writing Git code that wants to look up or prompt for credentials, see
+the section "C API" below. If you want to write your own helper, see
+the section on "Credential Helpers" below.
+
+Typical setup
+-------------
+
+------------
++-----------------------+
+| Git code (C) |--- to server requiring --->
+| | authentication
+|.......................|
+| C credential API |--- prompt ---> User
++-----------------------+
+ ^ |
+ | pipe |
+ | v
++-----------------------+
+| Git credential helper |
++-----------------------+
+------------
+
+The Git code (typically a remote-helper) will call the C API to obtain
+credential data like a login/password pair (credential_fill). The
+API will itself call a remote helper (e.g. "git credential-cache" or
+"git credential-store") that may retrieve credential data from a
+store. If the credential helper cannot find the information, the C API
+will prompt the user. Then, the caller of the API takes care of
+contacting the server, and does the actual authentication.
+
+C API
+-----
+
+The credential C API is meant to be called by Git code which needs to
+acquire or store a credential. It is centered around an object
+representing a single credential and provides three basic operations:
+fill (acquire credentials by calling helpers and/or prompting the user),
+approve (mark a credential as successfully used so that it can be stored
+for later use), and reject (mark a credential as unsuccessful so that it
+can be erased from any persistent storage).
+
+Data Structures
+~~~~~~~~~~~~~~~
+
+`struct credential`::
+
+ This struct represents a single username/password combination
+ along with any associated context. All string fields should be
+ heap-allocated (or NULL if they are not known or not applicable).
+ The meaning of the individual context fields is the same as
+ their counterparts in the helper protocol; see the section below
+ for a description of each field.
++
+The `helpers` member of the struct is a `string_list` of helpers. Each
+string specifies an external helper which will be run, in order, to
+either acquire or store credentials. See the section on credential
+helpers below. This list is filled-in by the API functions
+according to the corresponding configuration variables before
+consulting helpers, so there usually is no need for a caller to
+modify the helpers field at all.
++
+This struct should always be initialized with `CREDENTIAL_INIT` or
+`credential_init`.
+
+
+Functions
+~~~~~~~~~
+
+`credential_init`::
+
+ Initialize a credential structure, setting all fields to empty.
+
+`credential_clear`::
+
+ Free any resources associated with the credential structure,
+ returning it to a pristine initialized state.
+
+`credential_fill`::
+
+ Instruct the credential subsystem to fill the username and
+ password fields of the passed credential struct by first
+ consulting helpers, then asking the user. After this function
+ returns, the username and password fields of the credential are
+ guaranteed to be non-NULL. If an error occurs, the function will
+ die().
+
+`credential_reject`::
+
+ Inform the credential subsystem that the provided credentials
+ have been rejected. This will cause the credential subsystem to
+ notify any helpers of the rejection (which allows them, for
+ example, to purge the invalid credentials from storage). It
+ will also free() the username and password fields of the
+ credential and set them to NULL (readying the credential for
+ another call to `credential_fill`). Any errors from helpers are
+ ignored.
+
+`credential_approve`::
+
+ Inform the credential subsystem that the provided credentials
+ were successfully used for authentication. This will cause the
+ credential subsystem to notify any helpers of the approval, so
+ that they may store the result to be used again. Any errors
+ from helpers are ignored.
+
+`credential_from_url`::
+
+ Parse a URL into broken-down credential fields.
+
+Example
+~~~~~~~
+
+The example below shows how the functions of the credential API could be
+used to login to a fictitious "foo" service on a remote host:
+
+-----------------------------------------------------------------------
+int foo_login(struct foo_connection *f)
+{
+ int status;
+ /*
+ * Create a credential with some context; we don't yet know the
+ * username or password.
+ */
+
+ struct credential c = CREDENTIAL_INIT;
+ c.protocol = xstrdup("foo");
+ c.host = xstrdup(f->hostname);
+
+ /*
+ * Fill in the username and password fields by contacting
+ * helpers and/or asking the user. The function will die if it
+ * fails.
+ */
+ credential_fill(&c);
+
+ /*
+ * Otherwise, we have a username and password. Try to use it.
+ */
+ status = send_foo_login(f, c.username, c.password);
+ switch (status) {
+ case FOO_OK:
+ /* It worked. Store the credential for later use. */
+ credential_accept(&c);
+ break;
+ case FOO_BAD_LOGIN:
+ /* Erase the credential from storage so we don't try it
+ * again. */
+ credential_reject(&c);
+ break;
+ default:
+ /*
+ * Some other error occurred. We don't know if the
+ * credential is good or bad, so report nothing to the
+ * credential subsystem.
+ */
+ }
+
+ /* Free any associated resources. */
+ credential_clear(&c);
+
+ return status;
+}
+-----------------------------------------------------------------------
+
+
+Credential Helpers
+------------------
+
+Credential helpers are programs executed by Git to fetch or save
+credentials from and to long-term storage (where "long-term" is simply
+longer than a single Git process; e.g., credentials may be stored
+in-memory for a few minutes, or indefinitely on disk).
+
+Each helper is specified by a single string in the configuration
+variable `credential.helper` (and others, see linkgit:git-config[1]).
+The string is transformed by Git into a command to be executed using
+these rules:
+
+ 1. If the helper string begins with "!", it is considered a shell
+ snippet, and everything after the "!" becomes the command.
+
+ 2. Otherwise, if the helper string begins with an absolute path, the
+ verbatim helper string becomes the command.
+
+ 3. Otherwise, the string "git credential-" is prepended to the helper
+ string, and the result becomes the command.
+
+The resulting command then has an "operation" argument appended to it
+(see below for details), and the result is executed by the shell.
+
+Here are some example specifications:
+
+----------------------------------------------------
+# run "git credential-foo"
+foo
+
+# same as above, but pass an argument to the helper
+foo --bar=baz
+
+# the arguments are parsed by the shell, so use shell
+# quoting if necessary
+foo --bar="whitespace arg"
+
+# you can also use an absolute path, which will not use the git wrapper
+/path/to/my/helper --with-arguments
+
+# or you can specify your own shell snippet
+!f() { echo "password=`cat $HOME/.secret`"; }; f
+----------------------------------------------------
+
+Generally speaking, rule (3) above is the simplest for users to specify.
+Authors of credential helpers should make an effort to assist their
+users by naming their program "git-credential-$NAME", and putting it in
+the $PATH or $GIT_EXEC_PATH during installation, which will allow a user
+to enable it with `git config credential.helper $NAME`.
+
+When a helper is executed, it will have one "operation" argument
+appended to its command line, which is one of:
+
+`get`::
+
+ Return a matching credential, if any exists.
+
+`store`::
+
+ Store the credential, if applicable to the helper.
+
+`erase`::
+
+ Remove a matching credential, if any, from the helper's storage.
+
+The details of the credential will be provided on the helper's stdin
+stream. The exact format is the same as the input/output format of the
+`git credential` plumbing command (see the section `INPUT/OUTPUT
+FORMAT` in linkgit:git-credential[1] for a detailed specification).
+
+For a `get` operation, the helper should produce a list of attributes
+on stdout in the same format. A helper is free to produce a subset, or
+even no values at all if it has nothing useful to provide. Any provided
+attributes will overwrite those already known about by Git. If a helper
+outputs a `quit` attribute with a value of `true` or `1`, no further
+helpers will be consulted, nor will the user be prompted (if no
+credential has been provided, the operation will then fail).
+
+For a `store` or `erase` operation, the helper's output is ignored.
+If it fails to perform the requested operation, it may complain to
+stderr to inform the user. If it does not support the requested
+operation (e.g., a read-only store), it should silently ignore the
+request.
+
+If a helper receives any other operation, it should silently ignore the
+request. This leaves room for future operations to be added (older
+helpers will just ignore the new requests).
+
+See also
+--------
+
+linkgit:gitcredentials[7]
+
+linkgit:git-config[1] (See configuration variables `credential.*`)
diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
new file mode 100644
index 0000000000..1d52a6ce14
--- /dev/null
+++ b/Documentation/technical/api-decorate.txt
@@ -0,0 +1,6 @@
+decorate API
+============
+
+Talk about <decorate.h>
+
+(Linus)
diff --git a/Documentation/technical/api-diff.txt b/Documentation/technical/api-diff.txt
new file mode 100644
index 0000000000..8b001de0db
--- /dev/null
+++ b/Documentation/technical/api-diff.txt
@@ -0,0 +1,174 @@
+diff API
+========
+
+The diff API is for programs that compare two sets of files (e.g. two
+trees, one tree and the index) and present the found difference in
+various ways. The calling program is responsible for feeding the API
+pairs of files, one from the "old" set and the corresponding one from
+"new" set, that are different. The library called through this API is
+called diffcore, and is responsible for two things.
+
+* finding total rewrites (`-B`), renames (`-M`) and copies (`-C`), and
+ changes that touch a string (`-S`), as specified by the caller.
+
+* outputting the differences in various formats, as specified by the
+ caller.
+
+Calling sequence
+----------------
+
+* Prepare `struct diff_options` to record the set of diff options, and
+ then call `diff_setup()` to initialize this structure. This sets up
+ the vanilla default.
+
+* Fill in the options structure to specify desired output format, rename
+ detection, etc. `diff_opt_parse()` can be used to parse options given
+ from the command line in a way consistent with existing git-diff
+ family of programs.
+
+* Call `diff_setup_done()`; this inspects the options set up so far for
+ internal consistency and make necessary tweaking to it (e.g. if
+ textual patch output was asked, recursive behaviour is turned on);
+ the callback set_default in diff_options can be used to tweak this more.
+
+* As you find different pairs of files, call `diff_change()` to feed
+ modified files, `diff_addremove()` to feed created or deleted files,
+ or `diff_unmerge()` to feed a file whose state is 'unmerged' to the
+ API. These are thin wrappers to a lower-level `diff_queue()` function
+ that is flexible enough to record any of these kinds of changes.
+
+* Once you finish feeding the pairs of files, call `diffcore_std()`.
+ This will tell the diffcore library to go ahead and do its work.
+
+* Calling `diff_flush()` will produce the output.
+
+
+Data structures
+---------------
+
+* `struct diff_filespec`
+
+This is the internal representation for a single file (blob). It
+records the blob object name (if known -- for a work tree file it
+typically is a NUL SHA-1), filemode and pathname. This is what the
+`diff_addremove()`, `diff_change()` and `diff_unmerge()` synthesize and
+feed `diff_queue()` function with.
+
+* `struct diff_filepair`
+
+This records a pair of `struct diff_filespec`; the filespec for a file
+in the "old" set (i.e. preimage) is called `one`, and the filespec for a
+file in the "new" set (i.e. postimage) is called `two`. A change that
+represents file creation has NULL in `one`, and file deletion has NULL
+in `two`.
+
+A `filepair` starts pointing at `one` and `two` that are from the same
+filename, but `diffcore_std()` can break pairs and match component
+filespecs with other filespecs from a different filepair to form new
+filepair. This is called 'rename detection'.
+
+* `struct diff_queue`
+
+This is a collection of filepairs. Notable members are:
+
+`queue`::
+
+ An array of pointers to `struct diff_filepair`. This
+ dynamically grows as you add filepairs;
+
+`alloc`::
+
+ The allocated size of the `queue` array;
+
+`nr`::
+
+ The number of elements in the `queue` array.
+
+
+* `struct diff_options`
+
+This describes the set of options the calling program wants to affect
+the operation of diffcore library with.
+
+Notable members are:
+
+`output_format`::
+ The output format used when `diff_flush()` is run.
+
+`context`::
+ Number of context lines to generate in patch output.
+
+`break_opt`, `detect_rename`, `rename-score`, `rename_limit`::
+ Affects the way detection logic for complete rewrites, renames
+ and copies.
+
+`abbrev`::
+ Number of hexdigits to abbreviate raw format output to.
+
+`pickaxe`::
+ A constant string (can and typically does contain newlines to
+ look for a block of text, not just a single line) to filter out
+ the filepairs that do not change the number of strings contained
+ in its preimage and postimage of the diff_queue.
+
+`flags`::
+ This is mostly a collection of boolean options that affects the
+ operation, but some do not have anything to do with the diffcore
+ library.
+
+`touched_flags`::
+ Records whether a flag has been changed due to user request
+ (rather than just set/unset by default).
+
+`set_default`::
+ Callback which allows tweaking the options in diff_setup_done().
+
+BINARY, TEXT;;
+ Affects the way how a file that is seemingly binary is treated.
+
+FULL_INDEX;;
+ Tells the patch output format not to use abbreviated object
+ names on the "index" lines.
+
+FIND_COPIES_HARDER;;
+ Tells the diffcore library that the caller is feeding unchanged
+ filepairs to allow copies from unmodified files be detected.
+
+COLOR_DIFF;;
+ Output should be colored.
+
+COLOR_DIFF_WORDS;;
+ Output is a colored word-diff.
+
+NO_INDEX;;
+ Tells diff-files that the input is not tracked files but files
+ in random locations on the filesystem.
+
+ALLOW_EXTERNAL;;
+ Tells output routine that it is Ok to call user specified patch
+ output routine. Plumbing disables this to ensure stable output.
+
+QUIET;;
+ Do not show any output.
+
+REVERSE_DIFF;;
+ Tells the library that the calling program is feeding the
+ filepairs reversed; `one` is two, and `two` is one.
+
+EXIT_WITH_STATUS;;
+ For communication between the calling program and the options
+ parser; tell the calling program to signal the presence of
+ difference using program exit code.
+
+HAS_CHANGES;;
+ Internal; used for optimization to see if there is any change.
+
+SILENT_ON_REMOVE;;
+ Affects if diff-files shows removed files.
+
+RECURSIVE, TREE_IN_RECURSIVE;;
+ Tells if tree traversal done by tree-diff should recursively
+ descend into a tree object pair that are different in preimage
+ and postimage set.
+
+(JC)
diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
new file mode 100644
index 0000000000..7f8e78d916
--- /dev/null
+++ b/Documentation/technical/api-directory-listing.txt
@@ -0,0 +1,105 @@
+directory listing API
+=====================
+
+The directory listing API is used to enumerate paths in the work tree,
+optionally taking `.git/info/exclude` and `.gitignore` files per
+directory into account.
+
+Data structure
+--------------
+
+`struct dir_struct` structure is used to pass directory traversal
+options to the library and to record the paths discovered. A single
+`struct dir_struct` is used regardless of whether or not the traversal
+recursively descends into subdirectories.
+
+The notable options are:
+
+`exclude_per_dir`::
+
+ The name of the file to be read in each directory for excluded
+ files (typically `.gitignore`).
+
+`flags`::
+
+ A bit-field of options (the `*IGNORED*` flags are mutually exclusive):
+
+`DIR_SHOW_IGNORED`:::
+
+ Return just ignored files in `entries[]`, not untracked files.
+
+`DIR_SHOW_IGNORED_TOO`:::
+
+ Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
+ in addition to untracked files in `entries[]`.
+
+`DIR_COLLECT_IGNORED`:::
+
+ Special mode for git-add. Return ignored files in `ignored[]` and
+ untracked files in `entries[]`. Only returns ignored files that match
+ pathspec exactly (no wildcards). Does not recurse into ignored
+ directories.
+
+`DIR_SHOW_OTHER_DIRECTORIES`:::
+
+ Include a directory that is not tracked.
+
+`DIR_HIDE_EMPTY_DIRECTORIES`:::
+
+ Do not include a directory that is not tracked and is empty.
+
+`DIR_NO_GITLINKS`:::
+
+ If set, recurse into a directory that looks like a Git
+ directory. Otherwise it is shown as a directory.
+
+The result of the enumeration is left in these fields:
+
+`entries[]`::
+
+ An array of `struct dir_entry`, each element of which describes
+ a path.
+
+`nr`::
+
+ The number of members in `entries[]` array.
+
+`alloc`::
+
+ Internal use; keeps track of allocation of `entries[]` array.
+
+`ignored[]`::
+
+ An array of `struct dir_entry`, used for ignored paths with the
+ `DIR_SHOW_IGNORED_TOO` and `DIR_COLLECT_IGNORED` flags.
+
+`ignored_nr`::
+
+ The number of members in `ignored[]` array.
+
+Calling sequence
+----------------
+
+Note: index may be looked at for .gitignore files that are CE_SKIP_WORKTREE
+marked. If you to exclude files, make sure you have loaded index first.
+
+* Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0,
+ sizeof(dir))`.
+
+* To add single exclude pattern, call `add_exclude_list()` and then
+ `add_exclude()`.
+
+* To add patterns from a file (e.g. `.git/info/exclude`), call
+ `add_excludes_from_file()` , and/or set `dir.exclude_per_dir`. A
+ short-hand function `setup_standard_excludes()` can be used to set
+ up the standard set of exclude settings.
+
+* Set options described in the Data Structure section above.
+
+* Call `read_directory()`.
+
+* Use `dir.entries[]`.
+
+* Call `clear_directory()` when none of the contained elements are no longer in use.
+
+(JC)
diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
new file mode 100644
index 0000000000..ceeedd485c
--- /dev/null
+++ b/Documentation/technical/api-error-handling.txt
@@ -0,0 +1,75 @@
+Error reporting in git
+======================
+
+`die`, `usage`, `error`, and `warning` report errors of various
+kinds.
+
+- `die` is for fatal application errors. It prints a message to
+ the user and exits with status 128.
+
+- `usage` is for errors in command line usage. After printing its
+ message, it exits with status 129. (See also `usage_with_options`
+ in the link:api-parse-options.html[parse-options API].)
+
+- `error` is for non-fatal library errors. It prints a message
+ to the user and returns -1 for convenience in signaling the error
+ to the caller.
+
+- `warning` is for reporting situations that probably should not
+ occur but which the user (and Git) can continue to work around
+ without running into too many problems. Like `error`, it
+ returns -1 after reporting the situation to the caller.
+
+Customizable error handlers
+---------------------------
+
+The default behavior of `die` and `error` is to write a message to
+stderr and then exit or return as appropriate. This behavior can be
+overridden using `set_die_routine` and `set_error_routine`. For
+example, "git daemon" uses set_die_routine to write the reason `die`
+was called to syslog before exiting.
+
+Library errors
+--------------
+
+Functions return a negative integer on error. Details beyond that
+vary from function to function:
+
+- Some functions return -1 for all errors. Others return a more
+ specific value depending on how the caller might want to react
+ to the error.
+
+- Some functions report the error to stderr with `error`,
+ while others leave that for the caller to do.
+
+- errno is not meaningful on return from most functions (except
+ for thin wrappers for system calls).
+
+Check the function's API documentation to be sure.
+
+Caller-handled errors
+---------------------
+
+An increasing number of functions take a parameter 'struct strbuf *err'.
+On error, such functions append a message about what went wrong to the
+'err' strbuf. The message is meant to be complete enough to be passed
+to `die` or `error` as-is. For example:
+
+ if (ref_transaction_commit(transaction, &err))
+ die("%s", err.buf);
+
+The 'err' parameter will be untouched if no error occurred, so multiple
+function calls can be chained:
+
+ t = ref_transaction_begin(&err);
+ if (!t ||
+ ref_transaction_update(t, "HEAD", ..., &err) ||
+ ret_transaction_commit(t, &err))
+ die("%s", err.buf);
+
+The 'err' parameter must be a pointer to a valid strbuf. To silence
+a message, pass a strbuf that is explicitly ignored:
+
+ if (thing_that_can_fail_in_an_ignorable_way(..., &err))
+ /* This failure is okay. */
+ strbuf_reset(&err);
diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt
new file mode 100644
index 0000000000..2602668677
--- /dev/null
+++ b/Documentation/technical/api-gitattributes.txt
@@ -0,0 +1,128 @@
+gitattributes API
+=================
+
+gitattributes mechanism gives a uniform way to associate various
+attributes to set of paths.
+
+
+Data Structure
+--------------
+
+`struct git_attr`::
+
+ An attribute is an opaque object that is identified by its name.
+ Pass the name to `git_attr()` function to obtain the object of
+ this type. The internal representation of this structure is
+ of no interest to the calling programs. The name of the
+ attribute can be retrieved by calling `git_attr_name()`.
+
+`struct git_attr_check`::
+
+ This structure represents a set of attributes to check in a call
+ to `git_check_attr()` function, and receives the results.
+
+
+Attribute Values
+----------------
+
+An attribute for a path can be in one of four states: Set, Unset,
+Unspecified or set to a string, and `.value` member of `struct
+git_attr_check` records it. There are three macros to check these:
+
+`ATTR_TRUE()`::
+
+ Returns true if the attribute is Set for the path.
+
+`ATTR_FALSE()`::
+
+ Returns true if the attribute is Unset for the path.
+
+`ATTR_UNSET()`::
+
+ Returns true if the attribute is Unspecified for the path.
+
+If none of the above returns true, `.value` member points at a string
+value of the attribute for the path.
+
+
+Querying Specific Attributes
+----------------------------
+
+* Prepare an array of `struct git_attr_check` to define the list of
+ attributes you would want to check. To populate this array, you would
+ need to define necessary attributes by calling `git_attr()` function.
+
+* Call `git_check_attr()` to check the attributes for the path.
+
+* Inspect `git_attr_check` structure to see how each of the attribute in
+ the array is defined for the path.
+
+
+Example
+-------
+
+To see how attributes "crlf" and "indent" are set for different paths.
+
+. Prepare an array of `struct git_attr_check` with two elements (because
+ we are checking two attributes). Initialize their `attr` member with
+ pointers to `struct git_attr` obtained by calling `git_attr()`:
+
+------------
+static struct git_attr_check check[2];
+static void setup_check(void)
+{
+ if (check[0].attr)
+ return; /* already done */
+ check[0].attr = git_attr("crlf");
+ check[1].attr = git_attr("ident");
+}
+------------
+
+. Call `git_check_attr()` with the prepared array of `struct git_attr_check`:
+
+------------
+ const char *path;
+
+ setup_check();
+ git_check_attr(path, ARRAY_SIZE(check), check);
+------------
+
+. Act on `.value` member of the result, left in `check[]`:
+
+------------
+ const char *value = check[0].value;
+
+ if (ATTR_TRUE(value)) {
+ The attribute is Set, by listing only the name of the
+ attribute in the gitattributes file for the path.
+ } else if (ATTR_FALSE(value)) {
+ The attribute is Unset, by listing the name of the
+ attribute prefixed with a dash - for the path.
+ } else if (ATTR_UNSET(value)) {
+ The attribute is neither set nor unset for the path.
+ } else if (!strcmp(value, "input")) {
+ If none of ATTR_TRUE(), ATTR_FALSE(), or ATTR_UNSET() is
+ true, the value is a string set in the gitattributes
+ file for the path by saying "attr=value".
+ } else if (... other check using value as string ...) {
+ ...
+ }
+------------
+
+
+Querying All Attributes
+-----------------------
+
+To get the values of all attributes associated with a file:
+
+* Call `git_all_attrs()`, which returns an array of `git_attr_check`
+ structures.
+
+* Iterate over the `git_attr_check` array to examine the attribute
+ names and values. The name of the attribute described by a
+ `git_attr_check` object can be retrieved via
+ `git_attr_name(check[i].attr)`. (Please note that no items will be
+ returned for unset attributes, so `ATTR_UNSET()` will return false
+ for all returned `git_array_check` objects.)
+
+* Free the `git_array_check` array.
diff --git a/Documentation/technical/api-grep.txt b/Documentation/technical/api-grep.txt
new file mode 100644
index 0000000000..a69cc8964d
--- /dev/null
+++ b/Documentation/technical/api-grep.txt
@@ -0,0 +1,8 @@
+grep API
+========
+
+Talk about <grep.h>, things like:
+
+* grep_buffer()
+
+(JC)
diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
new file mode 100644
index 0000000000..28f5a8b715
--- /dev/null
+++ b/Documentation/technical/api-hashmap.txt
@@ -0,0 +1,285 @@
+hashmap API
+===========
+
+The hashmap API is a generic implementation of hash-based key-value mappings.
+
+Data Structures
+---------------
+
+`struct hashmap`::
+
+ The hash table structure. Members can be used as follows, but should
+ not be modified directly:
++
+The `size` member keeps track of the total number of entries (0 means the
+hashmap is empty).
++
+`tablesize` is the allocated size of the hash table. A non-0 value indicates
+that the hashmap is initialized. It may also be useful for statistical purposes
+(i.e. `size / tablesize` is the current load factor).
++
+`cmpfn` stores the comparison function specified in `hashmap_init()`. In
+advanced scenarios, it may be useful to change this, e.g. to switch between
+case-sensitive and case-insensitive lookup.
+
+`struct hashmap_entry`::
+
+ An opaque structure representing an entry in the hash table, which must
+ be used as first member of user data structures. Ideally it should be
+ followed by an int-sized member to prevent unused memory on 64-bit
+ systems due to alignment.
++
+The `hash` member is the entry's hash code and the `next` member points to the
+next entry in case of collisions (i.e. if multiple entries map to the same
+bucket).
+
+`struct hashmap_iter`::
+
+ An iterator structure, to be used with hashmap_iter_* functions.
+
+Types
+-----
+
+`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
+
+ User-supplied function to test two hashmap entries for equality. Shall
+ return 0 if the entries are equal.
++
+This function is always called with non-NULL `entry` / `entry_or_key`
+parameters that have the same hash code. When looking up an entry, the `key`
+and `keydata` parameters to hashmap_get and hashmap_remove are always passed
+as second and third argument, respectively. Otherwise, `keydata` is NULL.
+
+Functions
+---------
+
+`unsigned int strhash(const char *buf)`::
+`unsigned int strihash(const char *buf)`::
+`unsigned int memhash(const void *buf, size_t len)`::
+`unsigned int memihash(const void *buf, size_t len)`::
+
+ Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
+ http://www.isthe.com/chongo/tech/comp/fnv).
++
+`strhash` and `strihash` take 0-terminated strings, while `memhash` and
+`memihash` operate on arbitrary-length memory.
++
+`strihash` and `memihash` are case insensitive versions.
+
+`unsigned int sha1hash(const unsigned char *sha1)`::
+
+ Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
+ for use in hash tables. Cryptographic hashes are supposed to have
+ uniform distribution, so in contrast to `memhash()`, this just copies
+ the first `sizeof(int)` bytes without shuffling any bits. Note that
+ the results will be different on big-endian and little-endian
+ platforms, so they should not be stored or transferred over the net.
+
+`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
+
+ Initializes a hashmap structure.
++
+`map` is the hashmap to initialize.
++
+The `equals_function` can be specified to compare two entries for equality.
+If NULL, entries are considered equal if their hash codes are equal.
++
+If the total number of entries is known in advance, the `initial_size`
+parameter may be used to preallocate a sufficiently large table and thus
+prevent expensive resizing. If 0, the table is dynamically resized.
+
+`void hashmap_free(struct hashmap *map, int free_entries)`::
+
+ Frees a hashmap structure and allocated memory.
++
+`map` is the hashmap to free.
++
+If `free_entries` is true, each hashmap_entry in the map is freed as well
+(using stdlib's free()).
+
+`void hashmap_entry_init(void *entry, unsigned int hash)`::
+
+ Initializes a hashmap_entry structure.
++
+`entry` points to the entry to initialize.
++
+`hash` is the hash code of the entry.
++
+The hashmap_entry structure does not hold references to external resources,
+and it is safe to just discard it once you are done with it (i.e. if
+your structure was allocated with xmalloc(), you can just free(3) it,
+and if it is on stack, you can just let it go out of scope).
+
+`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
+
+ Returns the hashmap entry for the specified key, or NULL if not found.
++
+`map` is the hashmap structure.
++
+`key` is a hashmap_entry structure (or user data structure that starts with
+hashmap_entry) that has at least been initialized with the proper hash code
+(via `hashmap_entry_init`).
++
+If an entry with matching hash code is found, `key` and `keydata` are passed
+to `hashmap_cmp_fn` to decide whether the entry matches the key.
+
+`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
+
+ Returns the hashmap entry for the specified hash code and key data,
+ or NULL if not found.
++
+`map` is the hashmap structure.
++
+`hash` is the hash code of the entry to look up.
++
+If an entry with matching hash code is found, `keydata` is passed to
+`hashmap_cmp_fn` to decide whether the entry matches the key. The
+`entry_or_key` parameter points to a bogus hashmap_entry structure that
+should not be used in the comparison.
+
+`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
+
+ Returns the next equal hashmap entry, or NULL if not found. This can be
+ used to iterate over duplicate entries (see `hashmap_add`).
++
+`map` is the hashmap structure.
++
+`entry` is the hashmap_entry to start the search from, obtained via a previous
+call to `hashmap_get` or `hashmap_get_next`.
+
+`void hashmap_add(struct hashmap *map, void *entry)`::
+
+ Adds a hashmap entry. This allows to add duplicate entries (i.e.
+ separate values with the same key according to hashmap_cmp_fn).
++
+`map` is the hashmap structure.
++
+`entry` is the entry to add.
+
+`void *hashmap_put(struct hashmap *map, void *entry)`::
+
+ Adds or replaces a hashmap entry. If the hashmap contains duplicate
+ entries equal to the specified entry, only one of them will be replaced.
++
+`map` is the hashmap structure.
++
+`entry` is the entry to add or replace.
++
+Returns the replaced entry, or NULL if not found (i.e. the entry was added).
+
+`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
+
+ Removes a hashmap entry matching the specified key. If the hashmap
+ contains duplicate entries equal to the specified key, only one of
+ them will be removed.
++
+`map` is the hashmap structure.
++
+`key` is a hashmap_entry structure (or user data structure that starts with
+hashmap_entry) that has at least been initialized with the proper hash code
+(via `hashmap_entry_init`).
++
+If an entry with matching hash code is found, `key` and `keydata` are
+passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
++
+Returns the removed entry, or NULL if not found.
+
+`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
+`void *hashmap_iter_next(struct hashmap_iter *iter)`::
+`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
+
+ Used to iterate over all entries of a hashmap.
++
+`hashmap_iter_init` initializes a `hashmap_iter` structure.
++
+`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
+more entries.
++
+`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
+and returns the first entry, if any).
+
+`const char *strintern(const char *string)`::
+`const void *memintern(const void *data, size_t len)`::
+
+ Returns the unique, interned version of the specified string or data,
+ similar to the `String.intern` API in Java and .NET, respectively.
+ Interned strings remain valid for the entire lifetime of the process.
++
+Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
+strings / data must not be modified or freed.
++
+Interned strings are best used for short strings with high probability of
+duplicates.
++
+Uses a hashmap to store the pool of interned strings.
+
+Usage example
+-------------
+
+Here's a simple usage example that maps long keys to double values.
+------------
+struct hashmap map;
+
+struct long2double {
+ struct hashmap_entry ent; /* must be the first member! */
+ long key;
+ double value;
+};
+
+static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
+{
+ return !(e1->key == e2->key);
+}
+
+void long2double_init(void)
+{
+ hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
+}
+
+void long2double_free(void)
+{
+ hashmap_free(&map, 1);
+}
+
+static struct long2double *find_entry(long key)
+{
+ struct long2double k;
+ hashmap_entry_init(&k, memhash(&key, sizeof(long)));
+ k.key = key;
+ return hashmap_get(&map, &k, NULL);
+}
+
+double get_value(long key)
+{
+ struct long2double *e = find_entry(key);
+ return e ? e->value : 0;
+}
+
+void set_value(long key, double value)
+{
+ struct long2double *e = find_entry(key);
+ if (!e) {
+ e = malloc(sizeof(struct long2double));
+ hashmap_entry_init(e, memhash(&key, sizeof(long)));
+ e->key = key;
+ hashmap_add(&map, e);
+ }
+ e->value = value;
+}
+------------
+
+Using variable-sized keys
+-------------------------
+
+The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
+`hashmap_entry` structure as key to find the correct entry. If the key data is
+variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
+to create a full-fledged entry structure on the heap and copy all the key data
+into the structure.
+
+In this case, the `keydata` parameter can be used to pass
+variable-sized key data directly to the comparison function, and the `key`
+parameter can be a stripped-down, fixed size entry structure allocated on the
+stack.
+
+See test-hashmap.c for an example using arbitrary-length strings as keys.
diff --git a/Documentation/technical/api-history-graph.txt b/Documentation/technical/api-history-graph.txt
new file mode 100644
index 0000000000..18142b6d29
--- /dev/null
+++ b/Documentation/technical/api-history-graph.txt
@@ -0,0 +1,174 @@
+history graph API
+=================
+
+The graph API is used to draw a text-based representation of the commit
+history. The API generates the graph in a line-by-line fashion.
+
+Functions
+---------
+
+Core functions:
+
+* `graph_init()` creates a new `struct git_graph`
+
+* `graph_update()` moves the graph to a new commit.
+
+* `graph_next_line()` outputs the next line of the graph into a strbuf. It
+ does not add a terminating newline.
+
+* `graph_padding_line()` outputs a line of vertical padding in the graph. It
+ is similar to `graph_next_line()`, but is guaranteed to never print the line
+ containing the current commit. Where `graph_next_line()` would print the
+ commit line next, `graph_padding_line()` prints a line that simply extends
+ all branch lines downwards one row, leaving their positions unchanged.
+
+* `graph_is_commit_finished()` determines if the graph has output all lines
+ necessary for the current commit. If `graph_update()` is called before all
+ lines for the current commit have been printed, the next call to
+ `graph_next_line()` will output an ellipsis, to indicate that a portion of
+ the graph was omitted.
+
+The following utility functions are wrappers around `graph_next_line()` and
+`graph_is_commit_finished()`. They always print the output to stdout.
+They can all be called with a NULL graph argument, in which case no graph
+output will be printed.
+
+* `graph_show_commit()` calls `graph_next_line()` and
+ `graph_is_commit_finished()` until one of them return non-zero. This prints
+ all graph lines up to, and including, the line containing this commit.
+ Output is printed to stdout. The last line printed does not contain a
+ terminating newline.
+
+* `graph_show_oneline()` calls `graph_next_line()` and prints the result to
+ stdout. The line printed does not contain a terminating newline.
+
+* `graph_show_padding()` calls `graph_padding_line()` and prints the result to
+ stdout. The line printed does not contain a terminating newline.
+
+* `graph_show_remainder()` calls `graph_next_line()` until
+ `graph_is_commit_finished()` returns non-zero. Output is printed to stdout.
+ The last line printed does not contain a terminating newline. Returns 1 if
+ output was printed, and 0 if no output was necessary.
+
+* `graph_show_strbuf()` prints the specified strbuf to stdout, prefixing all
+ lines but the first with a graph line. The caller is responsible for
+ ensuring graph output for the first line has already been printed to stdout.
+ (This can be done with `graph_show_commit()` or `graph_show_oneline()`.) If
+ a NULL graph is supplied, the strbuf is printed as-is.
+
+* `graph_show_commit_msg()` is similar to `graph_show_strbuf()`, but it also
+ prints the remainder of the graph, if more lines are needed after the strbuf
+ ends. It is better than directly calling `graph_show_strbuf()` followed by
+ `graph_show_remainder()` since it properly handles buffers that do not end in
+ a terminating newline. The output printed by `graph_show_commit_msg()` will
+ end in a newline if and only if the strbuf ends in a newline.
+
+Data structure
+--------------
+`struct git_graph` is an opaque data type used to store the current graph
+state.
+
+Calling sequence
+----------------
+
+* Create a `struct git_graph` by calling `graph_init()`. When using the
+ revision walking API, this is done automatically by `setup_revisions()` if
+ the '--graph' option is supplied.
+
+* Use the revision walking API to walk through a group of contiguous commits.
+ The `get_revision()` function automatically calls `graph_update()` each time
+ it is invoked.
+
+* For each commit, call `graph_next_line()` repeatedly, until
+ `graph_is_commit_finished()` returns non-zero. Each call go
+ `graph_next_line()` will output a single line of the graph. The resulting
+ lines will not contain any newlines. `graph_next_line()` returns 1 if the
+ resulting line contains the current commit, or 0 if this is merely a line
+ needed to adjust the graph before or after the current commit. This return
+ value can be used to determine where to print the commit summary information
+ alongside the graph output.
+
+Limitations
+-----------
+
+* `graph_update()` must be called with commits in topological order. It should
+ not be called on a commit if it has already been invoked with an ancestor of
+ that commit, or the graph output will be incorrect.
+
+* `graph_update()` must be called on a contiguous group of commits. If
+ `graph_update()` is called on a particular commit, it should later be called
+ on all parents of that commit. Parents must not be skipped, or the graph
+ output will appear incorrect.
++
+`graph_update()` may be used on a pruned set of commits only if the parent list
+has been rewritten so as to include only ancestors from the pruned set.
+
+* The graph API does not currently support reverse commit ordering. In
+ order to implement reverse ordering, the graphing API needs an
+ (efficient) mechanism to find the children of a commit.
+
+Sample usage
+------------
+
+------------
+struct commit *commit;
+struct git_graph *graph = graph_init(opts);
+
+while ((commit = get_revision(opts)) != NULL) {
+ graph_update(graph, commit);
+ while (!graph_is_commit_finished(graph))
+ {
+ struct strbuf sb;
+ int is_commit_line;
+
+ strbuf_init(&sb, 0);
+ is_commit_line = graph_next_line(graph, &sb);
+ fputs(sb.buf, stdout);
+
+ if (is_commit_line)
+ log_tree_commit(opts, commit);
+ else
+ putchar(opts->diffopt.line_termination);
+ }
+}
+------------
+
+Sample output
+-------------
+
+The following is an example of the output from the graph API. This output does
+not include any commit summary information--callers are responsible for
+outputting that information, if desired.
+
+------------
+*
+*
+*
+|\
+* |
+| | *
+| \ \
+| \ \
+*-. \ \
+|\ \ \ \
+| | * | |
+| | | | | *
+| | | | | *
+| | | | | *
+| | | | | |\
+| | | | | | *
+| * | | | | |
+| | | | | * \
+| | | | | |\ |
+| | | | * | | |
+| | | | * | | |
+* | | | | | | |
+| |/ / / / / /
+|/| / / / / /
+* | | | | | |
+|/ / / / / /
+* | | | | |
+| | | | | *
+| | | | |/
+| | | | *
+------------
diff --git a/Documentation/technical/api-in-core-index.txt b/Documentation/technical/api-in-core-index.txt
new file mode 100644
index 0000000000..adbdbf5d75
--- /dev/null
+++ b/Documentation/technical/api-in-core-index.txt
@@ -0,0 +1,21 @@
+in-core index API
+=================
+
+Talk about <read-cache.c> and <cache-tree.c>, things like:
+
+* cache -> the_index macros
+* read_index()
+* write_index()
+* ie_match_stat() and ie_modified(); how they are different and when to
+ use which.
+* index_name_pos()
+* remove_index_entry_at()
+* remove_file_from_index()
+* add_file_to_index()
+* add_index_entry()
+* refresh_index()
+* discard_index()
+* cache_tree_invalidate_path()
+* cache_tree_update()
+
+(JC, Linus)
diff --git a/Documentation/technical/api-index-skel.txt b/Documentation/technical/api-index-skel.txt
new file mode 100644
index 0000000000..eda8c195c1
--- /dev/null
+++ b/Documentation/technical/api-index-skel.txt
@@ -0,0 +1,13 @@
+Git API Documents
+=================
+
+Git has grown a set of internal API over time. This collection
+documents them.
+
+////////////////////////////////////////////////////////////////
+// table of contents begin
+////////////////////////////////////////////////////////////////
+
+////////////////////////////////////////////////////////////////
+// table of contents end
+////////////////////////////////////////////////////////////////
diff --git a/Documentation/technical/api-index.sh b/Documentation/technical/api-index.sh
new file mode 100755
index 0000000000..9c3f4131b8
--- /dev/null
+++ b/Documentation/technical/api-index.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+(
+ c=////////////////////////////////////////////////////////////////
+ skel=api-index-skel.txt
+ sed -e '/^\/\/ table of contents begin/q' "$skel"
+ echo "$c"
+
+ ls api-*.txt |
+ while read filename
+ do
+ case "$filename" in
+ api-index-skel.txt | api-index.txt) continue ;;
+ esac
+ title=$(sed -e 1q "$filename")
+ html=${filename%.txt}.html
+ echo "* link:$html[$title]"
+ done
+ echo "$c"
+ sed -n -e '/^\/\/ table of contents end/,$p' "$skel"
+) >api-index.txt+
+
+if test -f api-index.txt && cmp api-index.txt api-index.txt+ >/dev/null
+then
+ rm -f api-index.txt+
+else
+ mv api-index.txt+ api-index.txt
+fi
diff --git a/Documentation/technical/api-merge.txt b/Documentation/technical/api-merge.txt
new file mode 100644
index 0000000000..9dc1bed768
--- /dev/null
+++ b/Documentation/technical/api-merge.txt
@@ -0,0 +1,104 @@
+merge API
+=========
+
+The merge API helps a program to reconcile two competing sets of
+improvements to some files (e.g., unregistered changes from the work
+tree versus changes involved in switching to a new branch), reporting
+conflicts if found. The library called through this API is
+responsible for a few things.
+
+ * determining which trees to merge (recursive ancestor consolidation);
+
+ * lining up corresponding files in the trees to be merged (rename
+ detection, subtree shifting), reporting edge cases like add/add
+ and rename/rename conflicts to the user;
+
+ * performing a three-way merge of corresponding files, taking
+ path-specific merge drivers (specified in `.gitattributes`)
+ into account.
+
+Data structures
+---------------
+
+* `mmbuffer_t`, `mmfile_t`
+
+These store data usable for use by the xdiff backend, for writing and
+for reading, respectively. See `xdiff/xdiff.h` for the definitions
+and `diff.c` for examples.
+
+* `struct ll_merge_options`
+
+This describes the set of options the calling program wants to affect
+the operation of a low-level (single file) merge. Some options:
+
+`virtual_ancestor`::
+ Behave as though this were part of a merge between common
+ ancestors in a recursive merge.
+ If a helper program is specified by the
+ `[merge "<driver>"] recursive` configuration, it will
+ be used (see linkgit:gitattributes[5]).
+
+`variant`::
+ Resolve local conflicts automatically in favor
+ of one side or the other (as in 'git merge-file'
+ `--ours`/`--theirs`/`--union`). Can be `0`,
+ `XDL_MERGE_FAVOR_OURS`, `XDL_MERGE_FAVOR_THEIRS`, or
+ `XDL_MERGE_FAVOR_UNION`.
+
+`renormalize`::
+ Resmudge and clean the "base", "theirs" and "ours" files
+ before merging. Use this when the merge is likely to have
+ overlapped with a change in smudge/clean or end-of-line
+ normalization rules.
+
+Low-level (single file) merge
+-----------------------------
+
+`ll_merge`::
+
+ Perform a three-way single-file merge in core. This is
+ a thin wrapper around `xdl_merge` that takes the path and
+ any merge backend specified in `.gitattributes` or
+ `.git/info/attributes` into account. Returns 0 for a
+ clean merge.
+
+Calling sequence:
+
+* Prepare a `struct ll_merge_options` to record options.
+ If you have no special requests, skip this and pass `NULL`
+ as the `opts` parameter to use the default options.
+
+* Allocate an mmbuffer_t variable for the result.
+
+* Allocate and fill variables with the file's original content
+ and two modified versions (using `read_mmfile`, for example).
+
+* Call `ll_merge()`.
+
+* Read the merged content from `result_buf.ptr` and `result_buf.size`.
+
+* Release buffers when finished. A simple
+ `free(ancestor.ptr); free(ours.ptr); free(theirs.ptr);
+ free(result_buf.ptr);` will do.
+
+If the modifications do not merge cleanly, `ll_merge` will return a
+nonzero value and `result_buf` will generally include a description of
+the conflict bracketed by markers such as the traditional `<<<<<<<`
+and `>>>>>>>`.
+
+The `ancestor_label`, `our_label`, and `their_label` parameters are
+used to label the different sides of a conflict if the merge driver
+supports this.
+
+Everything else
+---------------
+
+Talk about <merge-recursive.h> and merge_file():
+
+ - merge_trees() to merge with rename detection
+ - merge_recursive() for ancestor consolidation
+ - try_merge_command() for other strategies
+ - conflict format
+ - merge options
+
+(Daniel, Miklos, Stephan, JC)
diff --git a/Documentation/technical/api-object-access.txt b/Documentation/technical/api-object-access.txt
new file mode 100644
index 0000000000..03bb0e950d
--- /dev/null
+++ b/Documentation/technical/api-object-access.txt
@@ -0,0 +1,15 @@
+object access API
+=================
+
+Talk about <sha1_file.c> and <object.h> family, things like
+
+* read_sha1_file()
+* read_object_with_reference()
+* has_sha1_file()
+* write_sha1_file()
+* pretend_sha1_file()
+* lookup_{object,commit,tag,blob,tree}
+* parse_{object,commit,tag,blob,tree}
+* Use of object flags
+
+(JC, Shawn, Daniel, Dscho, Linus)
diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt
new file mode 100644
index 0000000000..27bd701c0d
--- /dev/null
+++ b/Documentation/technical/api-parse-options.txt
@@ -0,0 +1,310 @@
+parse-options API
+=================
+
+The parse-options API is used to parse and massage options in Git
+and to provide a usage help with consistent look.
+
+Basics
+------
+
+The argument vector `argv[]` may usually contain mandatory or optional
+'non-option arguments', e.g. a filename or a branch, and 'options'.
+Options are optional arguments that start with a dash and
+that allow to change the behavior of a command.
+
+* There are basically three types of options:
+ 'boolean' options,
+ options with (mandatory) 'arguments' and
+ options with 'optional arguments'
+ (i.e. a boolean option that can be adjusted).
+
+* There are basically two forms of options:
+ 'Short options' consist of one dash (`-`) and one alphanumeric
+ character.
+ 'Long options' begin with two dashes (`--`) and some
+ alphanumeric characters.
+
+* Options are case-sensitive.
+ Please define 'lower-case long options' only.
+
+The parse-options API allows:
+
+* 'stuck' and 'separate form' of options with arguments.
+ `-oArg` is stuck, `-o Arg` is separate form.
+ `--option=Arg` is stuck, `--option Arg` is separate form.
+
+* Long options may be 'abbreviated', as long as the abbreviation
+ is unambiguous.
+
+* Short options may be bundled, e.g. `-a -b` can be specified as `-ab`.
+
+* Boolean long options can be 'negated' (or 'unset') by prepending
+ `no-`, e.g. `--no-abbrev` instead of `--abbrev`. Conversely,
+ options that begin with `no-` can be 'negated' by removing it.
+ Other long options can be unset (e.g., set string to NULL, set
+ integer to 0) by prepending `no-`.
+
+* Options and non-option arguments can clearly be separated using the `--`
+ option, e.g. `-a -b --option -- --this-is-a-file` indicates that
+ `--this-is-a-file` must not be processed as an option.
+
+Steps to parse options
+----------------------
+
+. `#include "parse-options.h"`
+
+. define a NULL-terminated
+ `static const char * const builtin_foo_usage[]` array
+ containing alternative usage strings
+
+. define `builtin_foo_options` array as described below
+ in section 'Data Structure'.
+
+. in `cmd_foo(int argc, const char **argv, const char *prefix)`
+ call
+
+ argc = parse_options(argc, argv, prefix, builtin_foo_options, builtin_foo_usage, flags);
++
+`parse_options()` will filter out the processed options of `argv[]` and leave the
+non-option arguments in `argv[]`.
+`argc` is updated appropriately because of the assignment.
++
+You can also pass NULL instead of a usage array as the fifth parameter of
+parse_options(), to avoid displaying a help screen with usage info and
+option list. This should only be done if necessary, e.g. to implement
+a limited parser for only a subset of the options that needs to be run
+before the full parser, which in turn shows the full help message.
++
+Flags are the bitwise-or of:
+
+`PARSE_OPT_KEEP_DASHDASH`::
+ Keep the `--` that usually separates options from
+ non-option arguments.
+
+`PARSE_OPT_STOP_AT_NON_OPTION`::
+ Usually the whole argument vector is massaged and reordered.
+ Using this flag, processing is stopped at the first non-option
+ argument.
+
+`PARSE_OPT_KEEP_ARGV0`::
+ Keep the first argument, which contains the program name. It's
+ removed from argv[] by default.
+
+`PARSE_OPT_KEEP_UNKNOWN`::
+ Keep unknown arguments instead of erroring out. This doesn't
+ work for all combinations of arguments as users might expect
+ it to do. E.g. if the first argument in `--unknown --known`
+ takes a value (which we can't know), the second one is
+ mistakenly interpreted as a known option. Similarly, if
+ `PARSE_OPT_STOP_AT_NON_OPTION` is set, the second argument in
+ `--unknown value` will be mistakenly interpreted as a
+ non-option, not as a value belonging to the unknown option,
+ the parser early. That's why parse_options() errors out if
+ both options are set.
+
+`PARSE_OPT_NO_INTERNAL_HELP`::
+ By default, parse_options() handles `-h`, `--help` and
+ `--help-all` internally, by showing a help screen. This option
+ turns it off and allows one to add custom handlers for these
+ options, or to just leave them unknown.
+
+Data Structure
+--------------
+
+The main data structure is an array of the `option` struct,
+say `static struct option builtin_add_options[]`.
+There are some macros to easily define options:
+
+`OPT__ABBREV(&int_var)`::
+ Add `--abbrev[=<n>]`.
+
+`OPT__COLOR(&int_var, description)`::
+ Add `--color[=<when>]` and `--no-color`.
+
+`OPT__DRY_RUN(&int_var, description)`::
+ Add `-n, --dry-run`.
+
+`OPT__FORCE(&int_var, description)`::
+ Add `-f, --force`.
+
+`OPT__QUIET(&int_var, description)`::
+ Add `-q, --quiet`.
+
+`OPT__VERBOSE(&int_var, description)`::
+ Add `-v, --verbose`.
+
+`OPT_GROUP(description)`::
+ Start an option group. `description` is a short string that
+ describes the group or an empty string.
+ Start the description with an upper-case letter.
+
+`OPT_BOOL(short, long, &int_var, description)`::
+ Introduce a boolean option. `int_var` is set to one with
+ `--option` and set to zero with `--no-option`.
+
+`OPT_COUNTUP(short, long, &int_var, description)`::
+ Introduce a count-up option.
+ Each use of `--option` increments `int_var`, starting from zero
+ (even if initially negative), and `--no-option` resets it to
+ zero. To determine if `--option` or `--no-option` was encountered at
+ all, initialize `int_var` to a negative value, and if it is still
+ negative after parse_options(), then neither `--option` nor
+ `--no-option` was seen.
+
+`OPT_BIT(short, long, &int_var, description, mask)`::
+ Introduce a boolean option.
+ If used, `int_var` is bitwise-ored with `mask`.
+
+`OPT_NEGBIT(short, long, &int_var, description, mask)`::
+ Introduce a boolean option.
+ If used, `int_var` is bitwise-anded with the inverted `mask`.
+
+`OPT_SET_INT(short, long, &int_var, description, integer)`::
+ Introduce an integer option.
+ `int_var` is set to `integer` with `--option`, and
+ reset to zero with `--no-option`.
+
+`OPT_STRING(short, long, &str_var, arg_str, description)`::
+ Introduce an option with string argument.
+ The string argument is put into `str_var`.
+
+`OPT_INTEGER(short, long, &int_var, description)`::
+ Introduce an option with integer argument.
+ The integer is put into `int_var`.
+
+`OPT_MAGNITUDE(short, long, &unsigned_long_var, description)`::
+ Introduce an option with a size argument. The argument must be a
+ non-negative integer and may include a suffix of 'k', 'm' or 'g' to
+ scale the provided value by 1024, 1024^2 or 1024^3 respectively.
+ The scaled value is put into `unsigned_long_var`.
+
+`OPT_DATE(short, long, &int_var, description)`::
+ Introduce an option with date argument, see `approxidate()`.
+ The timestamp is put into `int_var`.
+
+`OPT_EXPIRY_DATE(short, long, &int_var, description)`::
+ Introduce an option with expiry date argument, see `parse_expiry_date()`.
+ The timestamp is put into `int_var`.
+
+`OPT_CALLBACK(short, long, &var, arg_str, description, func_ptr)`::
+ Introduce an option with argument.
+ The argument will be fed into the function given by `func_ptr`
+ and the result will be put into `var`.
+ See 'Option Callbacks' below for a more elaborate description.
+
+`OPT_FILENAME(short, long, &var, description)`::
+ Introduce an option with a filename argument.
+ The filename will be prefixed by passing the filename along with
+ the prefix argument of `parse_options()` to `prefix_filename()`.
+
+`OPT_ARGUMENT(long, description)`::
+ Introduce a long-option argument that will be kept in `argv[]`.
+
+`OPT_NUMBER_CALLBACK(&var, description, func_ptr)`::
+ Recognize numerical options like -123 and feed the integer as
+ if it was an argument to the function given by `func_ptr`.
+ The result will be put into `var`. There can be only one such
+ option definition. It cannot be negated and it takes no
+ arguments. Short options that happen to be digits take
+ precedence over it.
+
+`OPT_COLOR_FLAG(short, long, &int_var, description)`::
+ Introduce an option that takes an optional argument that can
+ have one of three values: "always", "never", or "auto". If the
+ argument is not given, it defaults to "always". The `--no-` form
+ works like `--long=never`; it cannot take an argument. If
+ "always", set `int_var` to 1; if "never", set `int_var` to 0; if
+ "auto", set `int_var` to 1 if stdout is a tty or a pager,
+ 0 otherwise.
+
+`OPT_NOOP_NOARG(short, long)`::
+ Introduce an option that has no effect and takes no arguments.
+ Use it to hide deprecated options that are still to be recognized
+ and ignored silently.
+
+`OPT_PASSTHRU(short, long, &char_var, arg_str, description, flags)`::
+ Introduce an option that will be reconstructed into a char* string,
+ which must be initialized to NULL. This is useful when you need to
+ pass the command-line option to another command. Any previous value
+ will be overwritten, so this should only be used for options where
+ the last one specified on the command line wins.
+
+`OPT_PASSTHRU_ARGV(short, long, &argv_array_var, arg_str, description, flags)`::
+ Introduce an option where all instances of it on the command-line will
+ be reconstructed into an argv_array. This is useful when you need to
+ pass the command-line option, which can be specified multiple times,
+ to another command.
+
+`OPT_CMDMODE(short, long, &int_var, description, enum_val)`::
+ Define an "operation mode" option, only one of which in the same
+ group of "operating mode" options that share the same `int_var`
+ can be given by the user. `enum_val` is set to `int_var` when the
+ option is used, but an error is reported if other "operating mode"
+ option has already set its value to the same `int_var`.
+
+
+The last element of the array must be `OPT_END()`.
+
+If not stated otherwise, interpret the arguments as follows:
+
+* `short` is a character for the short option
+ (e.g. `'e'` for `-e`, use `0` to omit),
+
+* `long` is a string for the long option
+ (e.g. `"example"` for `--example`, use `NULL` to omit),
+
+* `int_var` is an integer variable,
+
+* `str_var` is a string variable (`char *`),
+
+* `arg_str` is the string that is shown as argument
+ (e.g. `"branch"` will result in `<branch>`).
+ If set to `NULL`, three dots (`...`) will be displayed.
+
+* `description` is a short string to describe the effect of the option.
+ It shall begin with a lower-case letter and a full stop (`.`) shall be
+ omitted at the end.
+
+Option Callbacks
+----------------
+
+The function must be defined in this form:
+
+ int func(const struct option *opt, const char *arg, int unset)
+
+The callback mechanism is as follows:
+
+* Inside `func`, the only interesting member of the structure
+ given by `opt` is the void pointer `opt->value`.
+ `*opt->value` will be the value that is saved into `var`, if you
+ use `OPT_CALLBACK()`.
+ For example, do `*(unsigned long *)opt->value = 42;` to get 42
+ into an `unsigned long` variable.
+
+* Return value `0` indicates success and non-zero return
+ value will invoke `usage_with_options()` and, thus, die.
+
+* If the user negates the option, `arg` is `NULL` and `unset` is 1.
+
+Sophisticated option parsing
+----------------------------
+
+If you need, for example, option callbacks with optional arguments
+or without arguments at all, or if you need other special cases,
+that are not handled by the macros above, you need to specify the
+members of the `option` structure manually.
+
+This is not covered in this document, but well documented
+in `parse-options.h` itself.
+
+Examples
+--------
+
+See `test-parse-options.c` and
+`builtin/add.c`,
+`builtin/clone.c`,
+`builtin/commit.c`,
+`builtin/fetch.c`,
+`builtin/fsck.c`,
+`builtin/rm.c`
+for real-world examples.
diff --git a/Documentation/technical/api-quote.txt b/Documentation/technical/api-quote.txt
new file mode 100644
index 0000000000..e8a1bce94e
--- /dev/null
+++ b/Documentation/technical/api-quote.txt
@@ -0,0 +1,10 @@
+quote API
+=========
+
+Talk about <quote.h>, things like
+
+* sq_quote and unquote
+* c_style quote and unquote
+* quoting for foreign languages
+
+(JC)
diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt
new file mode 100644
index 0000000000..37379d8337
--- /dev/null
+++ b/Documentation/technical/api-ref-iteration.txt
@@ -0,0 +1,81 @@
+ref iteration API
+=================
+
+
+Iteration of refs is done by using an iterate function which will call a
+callback function for every ref. The callback function has this
+signature:
+
+ int handle_one_ref(const char *refname, const struct object_id *oid,
+ int flags, void *cb_data);
+
+There are different kinds of iterate functions which all take a
+callback of this type. The callback is then called for each found ref
+until the callback returns nonzero. The returned value is then also
+returned by the iterate function.
+
+Iteration functions
+-------------------
+
+* `head_ref()` just iterates the head ref.
+
+* `for_each_ref()` iterates all refs.
+
+* `for_each_ref_in()` iterates all refs which have a defined prefix and
+ strips that prefix from the passed variable refname.
+
+* `for_each_tag_ref()`, `for_each_branch_ref()`, `for_each_remote_ref()`,
+ `for_each_replace_ref()` iterate refs from the respective area.
+
+* `for_each_glob_ref()` iterates all refs that match the specified glob
+ pattern.
+
+* `for_each_glob_ref_in()` the previous and `for_each_ref_in()` combined.
+
+* `head_ref_submodule()`, `for_each_ref_submodule()`,
+ `for_each_ref_in_submodule()`, `for_each_tag_ref_submodule()`,
+ `for_each_branch_ref_submodule()`, `for_each_remote_ref_submodule()`
+ do the same as the functions described above but for a specified
+ submodule.
+
+* `for_each_rawref()` can be used to learn about broken ref and symref.
+
+* `for_each_reflog()` iterates each reflog file.
+
+Submodules
+----------
+
+If you want to iterate the refs of a submodule you first need to add the
+submodules object database. You can do this by a code-snippet like
+this:
+
+ const char *path = "path/to/submodule"
+ if (add_submodule_odb(path))
+ die("Error submodule '%s' not populated.", path);
+
+`add_submodule_odb()` will return zero on success. If you
+do not do this you will get an error for each ref that it does not point
+to a valid object.
+
+Note: As a side-effect of this you can not safely assume that all
+objects you lookup are available in superproject. All submodule objects
+will be available the same way as the superprojects objects.
+
+Example:
+--------
+
+----
+static int handle_remote_ref(const char *refname,
+ const unsigned char *sha1, int flags, void *cb_data)
+{
+ struct strbuf *output = cb_data;
+ strbuf_addf(output, "%s\n", refname);
+ return 0;
+}
+
+...
+
+ struct strbuf output = STRBUF_INIT;
+ for_each_remote_ref(handle_remote_ref, &output);
+ printf("%s", output.buf);
+----
diff --git a/Documentation/technical/api-remote.txt b/Documentation/technical/api-remote.txt
new file mode 100644
index 0000000000..f10941b2e8
--- /dev/null
+++ b/Documentation/technical/api-remote.txt
@@ -0,0 +1,127 @@
+Remotes configuration API
+=========================
+
+The API in remote.h gives access to the configuration related to
+remotes. It handles all three configuration mechanisms historically
+and currently used by Git, and presents the information in a uniform
+fashion. Note that the code also handles plain URLs without any
+configuration, giving them just the default information.
+
+struct remote
+-------------
+
+`name`::
+
+ The user's nickname for the remote
+
+`url`::
+
+ An array of all of the url_nr URLs configured for the remote
+
+`pushurl`::
+
+ An array of all of the pushurl_nr push URLs configured for the remote
+
+`push`::
+
+ An array of refspecs configured for pushing, with
+ push_refspec being the literal strings, and push_refspec_nr
+ being the quantity.
+
+`fetch`::
+
+ An array of refspecs configured for fetching, with
+ fetch_refspec being the literal strings, and fetch_refspec_nr
+ being the quantity.
+
+`fetch_tags`::
+
+ The setting for whether to fetch tags (as a separate rule from
+ the configured refspecs); -1 means never to fetch tags, 0
+ means to auto-follow tags based on the default heuristic, 1
+ means to always auto-follow tags, and 2 means to fetch all
+ tags.
+
+`receivepack`, `uploadpack`::
+
+ The configured helper programs to run on the remote side, for
+ Git-native protocols.
+
+`http_proxy`::
+
+ The proxy to use for curl (http, https, ftp, etc.) URLs.
+
+`http_proxy_authmethod`::
+
+ The method used for authenticating against `http_proxy`.
+
+struct remotes can be found by name with remote_get(), and iterated
+through with for_each_remote(). remote_get(NULL) will return the
+default remote, given the current branch and configuration.
+
+struct refspec
+--------------
+
+A struct refspec holds the parsed interpretation of a refspec. If it
+will force updates (starts with a '+'), force is true. If it is a
+pattern (sides end with '*') pattern is true. src and dest are the
+two sides (including '*' characters if present); if there is only one
+side, it is src, and dst is NULL; if sides exist but are empty (i.e.,
+the refspec either starts or ends with ':'), the corresponding side is
+"".
+
+An array of strings can be parsed into an array of struct refspecs
+using parse_fetch_refspec() or parse_push_refspec().
+
+remote_find_tracking(), given a remote and a struct refspec with
+either src or dst filled out, will fill out the other such that the
+result is in the "fetch" specification for the remote (note that this
+evaluates patterns and returns a single result).
+
+struct branch
+-------------
+
+Note that this may end up moving to branch.h
+
+struct branch holds the configuration for a branch. It can be looked
+up with branch_get(name) for "refs/heads/{name}", or with
+branch_get(NULL) for HEAD.
+
+It contains:
+
+`name`::
+
+ The short name of the branch.
+
+`refname`::
+
+ The full path for the branch ref.
+
+`remote_name`::
+
+ The name of the remote listed in the configuration.
+
+`merge_name`::
+
+ An array of the "merge" lines in the configuration.
+
+`merge`::
+
+ An array of the struct refspecs used for the merge lines. That
+ is, merge[i]->dst is a local tracking ref which should be
+ merged into this branch by default.
+
+`merge_nr`::
+
+ The number of merge configurations
+
+branch_has_merge_config() returns true if the given branch has merge
+configuration given.
+
+Other stuff
+-----------
+
+There is other stuff in remote.h that is related, in general, to the
+process of interacting with remotes.
+
+(Daniel Barkalow)
diff --git a/Documentation/technical/api-revision-walking.txt b/Documentation/technical/api-revision-walking.txt
new file mode 100644
index 0000000000..55b878ade8
--- /dev/null
+++ b/Documentation/technical/api-revision-walking.txt
@@ -0,0 +1,72 @@
+revision walking API
+====================
+
+The revision walking API offers functions to build a list of revisions
+and then iterate over that list.
+
+Calling sequence
+----------------
+
+The walking API has a given calling sequence: first you need to
+initialize a rev_info structure, then add revisions to control what kind
+of revision list do you want to get, finally you can iterate over the
+revision list.
+
+Functions
+---------
+
+`init_revisions`::
+
+ Initialize a rev_info structure with default values. The second
+ parameter may be NULL or can be prefix path, and then the `.prefix`
+ variable will be set to it. This is typically the first function you
+ want to call when you want to deal with a revision list. After calling
+ this function, you are free to customize options, like set
+ `.ignore_merges` to 0 if you don't want to ignore merges, and so on. See
+ `revision.h` for a complete list of available options.
+
+`add_pending_object`::
+
+ This function can be used if you want to add commit objects as revision
+ information. You can use the `UNINTERESTING` object flag to indicate if
+ you want to include or exclude the given commit (and commits reachable
+ from the given commit) from the revision list.
++
+NOTE: If you have the commits as a string list then you probably want to
+use setup_revisions(), instead of parsing each string and using this
+function.
+
+`setup_revisions`::
+
+ Parse revision information, filling in the `rev_info` structure, and
+ removing the used arguments from the argument list. Returns the number
+ of arguments left that weren't recognized, which are also moved to the
+ head of the argument list. The last parameter is used in case no
+ parameter given by the first two arguments.
+
+`prepare_revision_walk`::
+
+ Prepares the rev_info structure for a walk. You should check if it
+ returns any error (non-zero return code) and if it does not, you can
+ start using get_revision() to do the iteration.
+
+`get_revision`::
+
+ Takes a pointer to a `rev_info` structure and iterates over it,
+ returning a `struct commit *` each time you call it. The end of the
+ revision list is indicated by returning a NULL pointer.
+
+`reset_revision_walk`::
+
+ Reset the flags used by the revision walking api. You can use
+ this to do multiple sequential revision walks.
+
+Data structures
+---------------
+
+Talk about <revision.h>, things like:
+
+* two diff_options, one for path limiting, another for output;
+* remaining functions;
+
+(Linus, JC, Dscho)
diff --git a/Documentation/technical/api-run-command.txt b/Documentation/technical/api-run-command.txt
new file mode 100644
index 0000000000..8bf3e37f53
--- /dev/null
+++ b/Documentation/technical/api-run-command.txt
@@ -0,0 +1,264 @@
+run-command API
+===============
+
+The run-command API offers a versatile tool to run sub-processes with
+redirected input and output as well as with a modified environment
+and an alternate current directory.
+
+A similar API offers the capability to run a function asynchronously,
+which is primarily used to capture the output that the function
+produces in the caller in order to process it.
+
+
+Functions
+---------
+
+`child_process_init`::
+
+ Initialize a struct child_process variable.
+
+`start_command`::
+
+ Start a sub-process. Takes a pointer to a `struct child_process`
+ that specifies the details and returns pipe FDs (if requested).
+ See below for details.
+
+`finish_command`::
+
+ Wait for the completion of a sub-process that was started with
+ start_command().
+
+`run_command`::
+
+ A convenience function that encapsulates a sequence of
+ start_command() followed by finish_command(). Takes a pointer
+ to a `struct child_process` that specifies the details.
+
+`run_command_v_opt`, `run_command_v_opt_cd_env`::
+
+ Convenience functions that encapsulate a sequence of
+ start_command() followed by finish_command(). The argument argv
+ specifies the program and its arguments. The argument opt is zero
+ or more of the flags `RUN_COMMAND_NO_STDIN`, `RUN_GIT_CMD`,
+ `RUN_COMMAND_STDOUT_TO_STDERR`, or `RUN_SILENT_EXEC_FAILURE`
+ that correspond to the members .no_stdin, .git_cmd,
+ .stdout_to_stderr, .silent_exec_failure of `struct child_process`.
+ The argument dir corresponds the member .dir. The argument env
+ corresponds to the member .env.
+
+`child_process_clear`::
+
+ Release the memory associated with the struct child_process.
+ Most users of the run-command API don't need to call this
+ function explicitly because `start_command` invokes it on
+ failure and `finish_command` calls it automatically already.
+
+The functions above do the following:
+
+. If a system call failed, errno is set and -1 is returned. A diagnostic
+ is printed.
+
+. If the program was not found, then -1 is returned and errno is set to
+ ENOENT; a diagnostic is printed only if .silent_exec_failure is 0.
+
+. Otherwise, the program is run. If it terminates regularly, its exit
+ code is returned. No diagnostic is printed, even if the exit code is
+ non-zero.
+
+. If the program terminated due to a signal, then the return value is the
+ signal number + 128, ie. the same value that a POSIX shell's $? would
+ report. A diagnostic is printed.
+
+
+`start_async`::
+
+ Run a function asynchronously. Takes a pointer to a `struct
+ async` that specifies the details and returns a set of pipe FDs
+ for communication with the function. See below for details.
+
+`finish_async`::
+
+ Wait for the completion of an asynchronous function that was
+ started with start_async().
+
+`run_hook`::
+
+ Run a hook.
+ The first argument is a pathname to an index file, or NULL
+ if the hook uses the default index file or no index is needed.
+ The second argument is the name of the hook.
+ The further arguments correspond to the hook arguments.
+ The last argument has to be NULL to terminate the arguments list.
+ If the hook does not exist or is not executable, the return
+ value will be zero.
+ If it is executable, the hook will be executed and the exit
+ status of the hook is returned.
+ On execution, .stdout_to_stderr and .no_stdin will be set.
+ (See below.)
+
+
+Data structures
+---------------
+
+* `struct child_process`
+
+This describes the arguments, redirections, and environment of a
+command to run in a sub-process.
+
+The caller:
+
+1. allocates and clears (using child_process_init() or
+ CHILD_PROCESS_INIT) a struct child_process variable;
+2. initializes the members;
+3. calls start_command();
+4. processes the data;
+5. closes file descriptors (if necessary; see below);
+6. calls finish_command().
+
+The .argv member is set up as an array of string pointers (NULL
+terminated), of which .argv[0] is the program name to run (usually
+without a path). If the command to run is a git command, set argv[0] to
+the command name without the 'git-' prefix and set .git_cmd = 1.
+
+Note that the ownership of the memory pointed to by .argv stays with the
+caller, but it should survive until `finish_command` completes. If the
+.argv member is NULL, `start_command` will point it at the .args
+`argv_array` (so you may use one or the other, but you must use exactly
+one). The memory in .args will be cleaned up automatically during
+`finish_command` (or during `start_command` when it is unsuccessful).
+
+The members .in, .out, .err are used to redirect stdin, stdout,
+stderr as follows:
+
+. Specify 0 to request no special redirection. No new file descriptor
+ is allocated. The child process simply inherits the channel from the
+ parent.
+
+. Specify -1 to have a pipe allocated; start_command() replaces -1
+ by the pipe FD in the following way:
+
+ .in: Returns the writable pipe end into which the caller writes;
+ the readable end of the pipe becomes the child's stdin.
+
+ .out, .err: Returns the readable pipe end from which the caller
+ reads; the writable end of the pipe end becomes child's
+ stdout/stderr.
+
+ The caller of start_command() must close the so returned FDs
+ after it has completed reading from/writing to it!
+
+. Specify a file descriptor > 0 to be used by the child:
+
+ .in: The FD must be readable; it becomes child's stdin.
+ .out: The FD must be writable; it becomes child's stdout.
+ .err: The FD must be writable; it becomes child's stderr.
+
+ The specified FD is closed by start_command(), even if it fails to
+ run the sub-process!
+
+. Special forms of redirection are available by setting these members
+ to 1:
+
+ .no_stdin, .no_stdout, .no_stderr: The respective channel is
+ redirected to /dev/null.
+
+ .stdout_to_stderr: stdout of the child is redirected to its
+ stderr. This happens after stderr is itself redirected.
+ So stdout will follow stderr to wherever it is
+ redirected.
+
+To modify the environment of the sub-process, specify an array of
+string pointers (NULL terminated) in .env:
+
+. If the string is of the form "VAR=value", i.e. it contains '='
+ the variable is added to the child process's environment.
+
+. If the string does not contain '=', it names an environment
+ variable that will be removed from the child process's environment.
+
+If the .env member is NULL, `start_command` will point it at the
+.env_array `argv_array` (so you may use one or the other, but not both).
+The memory in .env_array will be cleaned up automatically during
+`finish_command` (or during `start_command` when it is unsuccessful).
+
+To specify a new initial working directory for the sub-process,
+specify it in the .dir member.
+
+If the program cannot be found, the functions return -1 and set
+errno to ENOENT. Normally, an error message is printed, but if
+.silent_exec_failure is set to 1, no message is printed for this
+special error condition.
+
+
+* `struct async`
+
+This describes a function to run asynchronously, whose purpose is
+to produce output that the caller reads.
+
+The caller:
+
+1. allocates and clears (memset(&asy, 0, sizeof(asy));) a
+ struct async variable;
+2. initializes .proc and .data;
+3. calls start_async();
+4. processes communicates with proc through .in and .out;
+5. closes .in and .out;
+6. calls finish_async().
+
+The members .in, .out are used to provide a set of fd's for
+communication between the caller and the callee as follows:
+
+. Specify 0 to have no file descriptor passed. The callee will
+ receive -1 in the corresponding argument.
+
+. Specify < 0 to have a pipe allocated; start_async() replaces
+ with the pipe FD in the following way:
+
+ .in: Returns the writable pipe end into which the caller
+ writes; the readable end of the pipe becomes the function's
+ in argument.
+
+ .out: Returns the readable pipe end from which the caller
+ reads; the writable end of the pipe becomes the function's
+ out argument.
+
+ The caller of start_async() must close the returned FDs after it
+ has completed reading from/writing from them.
+
+. Specify a file descriptor > 0 to be used by the function:
+
+ .in: The FD must be readable; it becomes the function's in.
+ .out: The FD must be writable; it becomes the function's out.
+
+ The specified FD is closed by start_async(), even if it fails to
+ run the function.
+
+The function pointer in .proc has the following signature:
+
+ int proc(int in, int out, void *data);
+
+. in, out specifies a set of file descriptors to which the function
+ must read/write the data that it needs/produces. The function
+ *must* close these descriptors before it returns. A descriptor
+ may be -1 if the caller did not configure a descriptor for that
+ direction.
+
+. data is the value that the caller has specified in the .data member
+ of struct async.
+
+. The return value of the function is 0 on success and non-zero
+ on failure. If the function indicates failure, finish_async() will
+ report failure as well.
+
+
+There are serious restrictions on what the asynchronous function can do
+because this facility is implemented by a thread in the same address
+space on most platforms (when pthreads is available), but by a pipe to
+a forked process otherwise:
+
+. It cannot change the program's state (global variables, environment,
+ etc.) in a way that the caller notices; in other words, .in and .out
+ are the only communication channels to the caller.
+
+. It must not change the program's state that the caller of the
+ facility also uses.
diff --git a/Documentation/technical/api-setup.txt b/Documentation/technical/api-setup.txt
new file mode 100644
index 0000000000..540e455689
--- /dev/null
+++ b/Documentation/technical/api-setup.txt
@@ -0,0 +1,49 @@
+setup API
+=========
+
+Talk about
+
+* setup_git_directory()
+* setup_git_directory_gently()
+* is_inside_git_dir()
+* is_inside_work_tree()
+* setup_work_tree()
+
+(Dscho)
+
+Pathspec
+--------
+
+See glossary-context.txt for the syntax of pathspec. In memory, a
+pathspec set is represented by "struct pathspec" and is prepared by
+parse_pathspec(). This function takes several arguments:
+
+- magic_mask specifies what features that are NOT supported by the
+ following code. If a user attempts to use such a feature,
+ parse_pathspec() can reject it early.
+
+- flags specifies other things that the caller wants parse_pathspec to
+ perform.
+
+- prefix and args come from cmd_* functions
+
+get_pathspec() is obsolete and should never be used in new code.
+
+parse_pathspec() helps catch unsupported features and reject them
+politely. At a lower level, different pathspec-related functions may
+not support the same set of features. Such pathspec-sensitive
+functions are guarded with GUARD_PATHSPEC(), which will die in an
+unfriendly way when an unsupported feature is requested.
+
+The command designers are supposed to make sure that GUARD_PATHSPEC()
+never dies. They have to make sure all unsupported features are caught
+by parse_pathspec(), not by GUARD_PATHSPEC. grepping GUARD_PATHSPEC()
+should give the designers all pathspec-sensitive codepaths and what
+features they support.
+
+A similar process is applied when a new pathspec magic is added. The
+designer lifts the GUARD_PATHSPEC restriction in the functions that
+support the new magic. At the same time (s)he has to make sure this
+new feature will be caught at parse_pathspec() in commands that cannot
+handle the new magic in some cases. grepping parse_pathspec() should
+help.
diff --git a/Documentation/technical/api-sha1-array.txt b/Documentation/technical/api-sha1-array.txt
new file mode 100644
index 0000000000..dcc52943a5
--- /dev/null
+++ b/Documentation/technical/api-sha1-array.txt
@@ -0,0 +1,80 @@
+sha1-array API
+==============
+
+The sha1-array API provides storage and manipulation of sets of SHA-1
+identifiers. The emphasis is on storage and processing efficiency,
+making them suitable for large lists. Note that the ordering of items is
+not preserved over some operations.
+
+Data Structures
+---------------
+
+`struct sha1_array`::
+
+ A single array of SHA-1 hashes. This should be initialized by
+ assignment from `SHA1_ARRAY_INIT`. The `sha1` member contains
+ the actual data. The `nr` member contains the number of items in
+ the set. The `alloc` and `sorted` members are used internally,
+ and should not be needed by API callers.
+
+Functions
+---------
+
+`sha1_array_append`::
+ Add an item to the set. The sha1 will be placed at the end of
+ the array (but note that some operations below may lose this
+ ordering).
+
+`sha1_array_lookup`::
+ Perform a binary search of the array for a specific sha1.
+ If found, returns the offset (in number of elements) of the
+ sha1. If not found, returns a negative integer. If the array is
+ not sorted, this function has the side effect of sorting it.
+
+`sha1_array_clear`::
+ Free all memory associated with the array and return it to the
+ initial, empty state.
+
+`sha1_array_for_each_unique`::
+ Efficiently iterate over each unique element of the list,
+ executing the callback function for each one. If the array is
+ not sorted, this function has the side effect of sorting it. If
+ the callback returns a non-zero value, the iteration ends
+ immediately and the callback's return is propagated; otherwise,
+ 0 is returned.
+
+Examples
+--------
+
+-----------------------------------------
+int print_callback(const unsigned char sha1[20],
+ void *data)
+{
+ printf("%s\n", sha1_to_hex(sha1));
+ return 0; /* always continue */
+}
+
+void some_func(void)
+{
+ struct sha1_array hashes = SHA1_ARRAY_INIT;
+ unsigned char sha1[20];
+
+ /* Read objects into our set */
+ while (read_object_from_stdin(sha1))
+ sha1_array_append(&hashes, sha1);
+
+ /* Check if some objects are in our set */
+ while (read_object_from_stdin(sha1)) {
+ if (sha1_array_lookup(&hashes, sha1) >= 0)
+ printf("it's in there!\n");
+
+ /*
+ * Print the unique set of objects. We could also have
+ * avoided adding duplicate objects in the first place,
+ * but we would end up re-sorting the array repeatedly.
+ * Instead, this will sort once and then skip duplicates
+ * in linear time.
+ */
+ sha1_array_for_each_unique(&hashes, print_callback, NULL);
+}
+-----------------------------------------
diff --git a/Documentation/technical/api-sigchain.txt b/Documentation/technical/api-sigchain.txt
new file mode 100644
index 0000000000..9e1189ef01
--- /dev/null
+++ b/Documentation/technical/api-sigchain.txt
@@ -0,0 +1,41 @@
+sigchain API
+============
+
+Code often wants to set a signal handler to clean up temporary files or
+other work-in-progress when we die unexpectedly. For multiple pieces of
+code to do this without conflicting, each piece of code must remember
+the old value of the handler and restore it either when:
+
+ 1. The work-in-progress is finished, and the handler is no longer
+ necessary. The handler should revert to the original behavior
+ (either another handler, SIG_DFL, or SIG_IGN).
+
+ 2. The signal is received. We should then do our cleanup, then chain
+ to the next handler (or die if it is SIG_DFL).
+
+Sigchain is a tiny library for keeping a stack of handlers. Your handler
+and installation code should look something like:
+
+------------------------------------------
+ void clean_foo_on_signal(int sig)
+ {
+ clean_foo();
+ sigchain_pop(sig);
+ raise(sig);
+ }
+
+ void other_func()
+ {
+ sigchain_push_common(clean_foo_on_signal);
+ mess_up_foo();
+ clean_foo();
+ }
+------------------------------------------
+
+Handlers are given the typedef of sigchain_fun. This is the same type
+that is given to signal() or sigaction(). It is perfectly reasonable to
+push SIG_DFL or SIG_IGN onto the stack.
+
+You can sigchain_push and sigchain_pop individual signals. For
+convenience, sigchain_push_common will push the handler onto the stack
+for many common signals.
diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt
new file mode 100644
index 0000000000..c08402b12e
--- /dev/null
+++ b/Documentation/technical/api-string-list.txt
@@ -0,0 +1,209 @@
+string-list API
+===============
+
+The string_list API offers a data structure and functions to handle
+sorted and unsorted string lists. A "sorted" list is one whose
+entries are sorted by string value in `strcmp()` order.
+
+The 'string_list' struct used to be called 'path_list', but was renamed
+because it is not specific to paths.
+
+The caller:
+
+. Allocates and clears a `struct string_list` variable.
+
+. Initializes the members. You might want to set the flag `strdup_strings`
+ if the strings should be strdup()ed. For example, this is necessary
+ when you add something like git_path("..."), since that function returns
+ a static buffer that will change with the next call to git_path().
++
+If you need something advanced, you can manually malloc() the `items`
+member (you need this if you add things later) and you should set the
+`nr` and `alloc` members in that case, too.
+
+. Adds new items to the list, using `string_list_append`,
+ `string_list_append_nodup`, `string_list_insert`,
+ `string_list_split`, and/or `string_list_split_in_place`.
+
+. Can check if a string is in the list using `string_list_has_string` or
+ `unsorted_string_list_has_string` and get it from the list using
+ `string_list_lookup` for sorted lists.
+
+. Can sort an unsorted list using `string_list_sort`.
+
+. Can remove duplicate items from a sorted list using
+ `string_list_remove_duplicates`.
+
+. Can remove individual items of an unsorted list using
+ `unsorted_string_list_delete_item`.
+
+. Can remove items not matching a criterion from a sorted or unsorted
+ list using `filter_string_list`, or remove empty strings using
+ `string_list_remove_empty_items`.
+
+. Finally it should free the list using `string_list_clear`.
+
+Example:
+
+----
+struct string_list list = STRING_LIST_INIT_NODUP;
+int i;
+
+string_list_append(&list, "foo");
+string_list_append(&list, "bar");
+for (i = 0; i < list.nr; i++)
+ printf("%s\n", list.items[i].string)
+----
+
+NOTE: It is more efficient to build an unsorted list and sort it
+afterwards, instead of building a sorted list (`O(n log n)` instead of
+`O(n^2)`).
++
+However, if you use the list to check if a certain string was added
+already, you should not do that (using unsorted_string_list_has_string()),
+because the complexity would be quadratic again (but with a worse factor).
+
+Functions
+---------
+
+* General ones (works with sorted and unsorted lists as well)
+
+`string_list_init`::
+
+ Initialize the members of the string_list, set `strdup_strings`
+ member according to the value of the second parameter.
+
+`filter_string_list`::
+
+ Apply a function to each item in a list, retaining only the
+ items for which the function returns true. If free_util is
+ true, call free() on the util members of any items that have
+ to be deleted. Preserve the order of the items that are
+ retained.
+
+`string_list_remove_empty_items`::
+
+ Remove any empty strings from the list. If free_util is true,
+ call free() on the util members of any items that have to be
+ deleted. Preserve the order of the items that are retained.
+
+`print_string_list`::
+
+ Dump a string_list to stdout, useful mainly for debugging purposes. It
+ can take an optional header argument and it writes out the
+ string-pointer pairs of the string_list, each one in its own line.
+
+`string_list_clear`::
+
+ Free a string_list. The `string` pointer of the items will be freed in
+ case the `strdup_strings` member of the string_list is set. The second
+ parameter controls if the `util` pointer of the items should be freed
+ or not.
+
+* Functions for sorted lists only
+
+`string_list_has_string`::
+
+ Determine if the string_list has a given string or not.
+
+`string_list_insert`::
+
+ Insert a new element to the string_list. The returned pointer can be
+ handy if you want to write something to the `util` pointer of the
+ string_list_item containing the just added string. If the given
+ string already exists the insertion will be skipped and the
+ pointer to the existing item returned.
++
+Since this function uses xrealloc() (which die()s if it fails) if the
+list needs to grow, it is safe not to check the pointer. I.e. you may
+write `string_list_insert(...)->util = ...;`.
+
+`string_list_lookup`::
+
+ Look up a given string in the string_list, returning the containing
+ string_list_item. If the string is not found, NULL is returned.
+
+`string_list_remove_duplicates`::
+
+ Remove all but the first of consecutive entries that have the
+ same string value. If free_util is true, call free() on the
+ util members of any items that have to be deleted.
+
+* Functions for unsorted lists only
+
+`string_list_append`::
+
+ Append a new string to the end of the string_list. If
+ `strdup_string` is set, then the string argument is copied;
+ otherwise the new `string_list_entry` refers to the input
+ string.
+
+`string_list_append_nodup`::
+
+ Append a new string to the end of the string_list. The new
+ `string_list_entry` always refers to the input string, even if
+ `strdup_string` is set. This function can be used to hand
+ ownership of a malloc()ed string to a `string_list` that has
+ `strdup_string` set.
+
+`string_list_sort`::
+
+ Sort the list's entries by string value in `strcmp()` order.
+
+`unsorted_string_list_has_string`::
+
+ It's like `string_list_has_string()` but for unsorted lists.
+
+`unsorted_string_list_lookup`::
+
+ It's like `string_list_lookup()` but for unsorted lists.
++
+The above two functions need to look through all items, as opposed to their
+counterpart for sorted lists, which performs a binary search.
+
+`unsorted_string_list_delete_item`::
+
+ Remove an item from a string_list. The `string` pointer of the items
+ will be freed in case the `strdup_strings` member of the string_list
+ is set. The third parameter controls if the `util` pointer of the
+ items should be freed or not.
+
+`string_list_split`::
+`string_list_split_in_place`::
+
+ Split a string into substrings on a delimiter character and
+ append the substrings to a `string_list`. If `maxsplit` is
+ non-negative, then split at most `maxsplit` times. Return the
+ number of substrings appended to the list.
++
+`string_list_split` requires a `string_list` that has `strdup_strings`
+set to true; it leaves the input string untouched and makes copies of
+the substrings in newly-allocated memory.
+`string_list_split_in_place` requires a `string_list` that has
+`strdup_strings` set to false; it splits the input string in place,
+overwriting the delimiter characters with NULs and creating new
+string_list_items that point into the original string (the original
+string must therefore not be modified or freed while the `string_list`
+is in use).
+
+
+Data structures
+---------------
+
+* `struct string_list_item`
+
+Represents an item of the list. The `string` member is a pointer to the
+string, and you may use the `util` member for any purpose, if you want.
+
+* `struct string_list`
+
+Represents the list itself.
+
+. The array of items are available via the `items` member.
+. The `nr` member contains the number of items stored in the list.
+. The `alloc` member is used to avoid reallocating at every insertion.
+ You should not tamper with it.
+. Setting the `strdup_strings` member to 1 will strdup() the strings
+ before adding them, see above.
+. The `compare_strings_fn` member is used to specify a custom compare
+ function, otherwise `strcmp()` is used as the default function.
diff --git a/Documentation/technical/api-submodule-config.txt b/Documentation/technical/api-submodule-config.txt
new file mode 100644
index 0000000000..3dce003fda
--- /dev/null
+++ b/Documentation/technical/api-submodule-config.txt
@@ -0,0 +1,66 @@
+submodule config cache API
+==========================
+
+The submodule config cache API allows to read submodule
+configurations/information from specified revisions. Internally
+information is lazily read into a cache that is used to avoid
+unnecessary parsing of the same .gitmodule files. Lookups can be done by
+submodule path or name.
+
+Usage
+-----
+
+To initialize the cache with configurations from the worktree the caller
+typically first calls `gitmodules_config()` to read values from the
+worktree .gitmodules and then to overlay the local git config values
+`parse_submodule_config_option()` from the config parsing
+infrastructure.
+
+The caller can look up information about submodules by using the
+`submodule_from_path()` or `submodule_from_name()` functions. They return
+a `struct submodule` which contains the values. The API automatically
+initializes and allocates the needed infrastructure on-demand. If the
+caller does only want to lookup values from revisions the initialization
+can be skipped.
+
+If the internal cache might grow too big or when the caller is done with
+the API, all internally cached values can be freed with submodule_free().
+
+Data Structures
+---------------
+
+`struct submodule`::
+
+ This structure is used to return the information about one
+ submodule for a certain revision. It is returned by the lookup
+ functions.
+
+Functions
+---------
+
+`void submodule_free()`::
+
+ Use these to free the internally cached values.
+
+`int parse_submodule_config_option(const char *var, const char *value)`::
+
+ Can be passed to the config parsing infrastructure to parse
+ local (worktree) submodule configurations.
+
+`const struct submodule *submodule_from_path(const unsigned char *treeish_name, const char *path)`::
+
+ Given a tree-ish in the superproject and a path, return the
+ submodule that is bound at the path in the named tree.
+
+`const struct submodule *submodule_from_name(const unsigned char *treeish_name, const char *name)`::
+
+ The same as above but lookup by name.
+
+Whenever a submodule configuration is parsed in `parse_submodule_config_option`
+via e.g. `gitmodules_config()`, it will overwrite the null_sha1 entry.
+So in the normal case, when HEAD:.gitmodules is parsed first and then overlayed
+with the repository configuration, the null_sha1 entry contains the local
+configuration of a submodule (e.g. consolidated values from local git
+configuration and the .gitmodules file in the worktree).
+
+For an example usage see test-submodule-config.c.
diff --git a/Documentation/technical/api-trace.txt b/Documentation/technical/api-trace.txt
new file mode 100644
index 0000000000..fadb5979c4
--- /dev/null
+++ b/Documentation/technical/api-trace.txt
@@ -0,0 +1,140 @@
+trace API
+=========
+
+The trace API can be used to print debug messages to stderr or a file. Trace
+code is inactive unless explicitly enabled by setting `GIT_TRACE*` environment
+variables.
+
+The trace implementation automatically adds `timestamp file:line ... \n` to
+all trace messages. E.g.:
+
+------------
+23:59:59.123456 git.c:312 trace: built-in: git 'foo'
+00:00:00.000001 builtin/foo.c:99 foo: some message
+------------
+
+Data Structures
+---------------
+
+`struct trace_key`::
+
+ Defines a trace key (or category). The default (for API functions that
+ don't take a key) is `GIT_TRACE`.
++
+E.g. to define a trace key controlled by environment variable `GIT_TRACE_FOO`:
++
+------------
+static struct trace_key trace_foo = TRACE_KEY_INIT(FOO);
+
+static void trace_print_foo(const char *message)
+{
+ trace_printf_key(&trace_foo, "%s", message);
+}
+------------
++
+Note: don't use `const` as the trace implementation stores internal state in
+the `trace_key` structure.
+
+Functions
+---------
+
+`int trace_want(struct trace_key *key)`::
+
+ Checks whether the trace key is enabled. Used to prevent expensive
+ string formatting before calling one of the printing APIs.
+
+`void trace_disable(struct trace_key *key)`::
+
+ Disables tracing for the specified key, even if the environment
+ variable was set.
+
+`void trace_printf(const char *format, ...)`::
+`void trace_printf_key(struct trace_key *key, const char *format, ...)`::
+
+ Prints a formatted message, similar to printf.
+
+`void trace_argv_printf(const char **argv, const char *format, ...)``::
+
+ Prints a formatted message, followed by a quoted list of arguments.
+
+`void trace_strbuf(struct trace_key *key, const struct strbuf *data)`::
+
+ Prints the strbuf, without additional formatting (i.e. doesn't
+ choke on `%` or even `\0`).
+
+`uint64_t getnanotime(void)`::
+
+ Returns nanoseconds since the epoch (01/01/1970), typically used
+ for performance measurements.
++
+Currently there are high precision timer implementations for Linux (using
+`clock_gettime(CLOCK_MONOTONIC)`) and Windows (`QueryPerformanceCounter`).
+Other platforms use `gettimeofday` as time source.
+
+`void trace_performance(uint64_t nanos, const char *format, ...)`::
+`void trace_performance_since(uint64_t start, const char *format, ...)`::
+
+ Prints the elapsed time (in nanoseconds), or elapsed time since
+ `start`, followed by a formatted message. Enabled via environment
+ variable `GIT_TRACE_PERFORMANCE`. Used for manual profiling, e.g.:
++
+------------
+uint64_t start = getnanotime();
+/* code section to measure */
+trace_performance_since(start, "foobar");
+------------
++
+------------
+uint64_t t = 0;
+for (;;) {
+ /* ignore */
+ t -= getnanotime();
+ /* code section to measure */
+ t += getnanotime();
+ /* ignore */
+}
+trace_performance(t, "frotz");
+------------
+
+Bugs & Caveats
+--------------
+
+GIT_TRACE_* environment variables can be used to tell Git to show
+trace output to its standard error stream. Git can often spawn a pager
+internally to run its subcommand and send its standard output and
+standard error to it.
+
+Because GIT_TRACE_PERFORMANCE trace is generated only at the very end
+of the program with atexit(), which happens after the pager exits, it
+would not work well if you send its log to the standard error output
+and let Git spawn the pager at the same time.
+
+As a work around, you can for example use '--no-pager', or set
+GIT_TRACE_PERFORMANCE to another file descriptor which is redirected
+to stderr, or set GIT_TRACE_PERFORMANCE to a file specified by its
+absolute path.
+
+For example instead of the following command which by default may not
+print any performance information:
+
+------------
+GIT_TRACE_PERFORMANCE=2 git log -1
+------------
+
+you may want to use:
+
+------------
+GIT_TRACE_PERFORMANCE=2 git --no-pager log -1
+------------
+
+or:
+
+------------
+GIT_TRACE_PERFORMANCE=3 3>&2 git log -1
+------------
+
+or:
+
+------------
+GIT_TRACE_PERFORMANCE=/path/to/log/file git log -1
+------------
diff --git a/Documentation/technical/api-tree-walking.txt b/Documentation/technical/api-tree-walking.txt
new file mode 100644
index 0000000000..14af37c3f1
--- /dev/null
+++ b/Documentation/technical/api-tree-walking.txt
@@ -0,0 +1,147 @@
+tree walking API
+================
+
+The tree walking API is used to traverse and inspect trees.
+
+Data Structures
+---------------
+
+`struct name_entry`::
+
+ An entry in a tree. Each entry has a sha1 identifier, pathname, and
+ mode.
+
+`struct tree_desc`::
+
+ A semi-opaque data structure used to maintain the current state of the
+ walk.
++
+* `buffer` is a pointer into the memory representation of the tree. It always
+points at the current entry being visited.
+
+* `size` counts the number of bytes left in the `buffer`.
+
+* `entry` points to the current entry being visited.
+
+`struct traverse_info`::
+
+ A structure used to maintain the state of a traversal.
++
+* `prev` points to the traverse_info which was used to descend into the
+current tree. If this is the top-level tree `prev` will point to
+a dummy traverse_info.
+
+* `name` is the entry for the current tree (if the tree is a subtree).
+
+* `pathlen` is the length of the full path for the current tree.
+
+* `conflicts` can be used by callbacks to maintain directory-file conflicts.
+
+* `fn` is a callback called for each entry in the tree. See Traversing for more
+information.
+
+* `data` can be anything the `fn` callback would want to use.
+
+* `show_all_errors` tells whether to stop at the first error or not.
+
+Initializing
+------------
+
+`init_tree_desc`::
+
+ Initialize a `tree_desc` and decode its first entry. The buffer and
+ size parameters are assumed to be the same as the buffer and size
+ members of `struct tree`.
+
+`fill_tree_descriptor`::
+
+ Initialize a `tree_desc` and decode its first entry given the sha1 of
+ a tree. Returns the `buffer` member if the sha1 is a valid tree
+ identifier and NULL otherwise.
+
+`setup_traverse_info`::
+
+ Initialize a `traverse_info` given the pathname of the tree to start
+ traversing from. The `base` argument is assumed to be the `path`
+ member of the `name_entry` being recursed into unless the tree is a
+ top-level tree in which case the empty string ("") is used.
+
+Walking
+-------
+
+`tree_entry`::
+
+ Visit the next entry in a tree. Returns 1 when there are more entries
+ left to visit and 0 when all entries have been visited. This is
+ commonly used in the test of a while loop.
+
+`tree_entry_len`::
+
+ Calculate the length of a tree entry's pathname. This utilizes the
+ memory structure of a tree entry to avoid the overhead of using a
+ generic strlen().
+
+`update_tree_entry`::
+
+ Walk to the next entry in a tree. This is commonly used in conjunction
+ with `tree_entry_extract` to inspect the current entry.
+
+`tree_entry_extract`::
+
+ Decode the entry currently being visited (the one pointed to by
+ `tree_desc's` `entry` member) and return the sha1 of the entry. The
+ `pathp` and `modep` arguments are set to the entry's pathname and mode
+ respectively.
+
+`get_tree_entry`::
+
+ Find an entry in a tree given a pathname and the sha1 of a tree to
+ search. Returns 0 if the entry is found and -1 otherwise. The third
+ and fourth parameters are set to the entry's sha1 and mode
+ respectively.
+
+Traversing
+----------
+
+`traverse_trees`::
+
+ Traverse `n` number of trees in parallel. The `fn` callback member of
+ `traverse_info` is called once for each tree entry.
+
+`traverse_callback_t`::
+ The arguments passed to the traverse callback are as follows:
++
+* `n` counts the number of trees being traversed.
+
+* `mask` has its nth bit set if something exists in the nth entry.
+
+* `dirmask` has its nth bit set if the nth tree's entry is a directory.
+
+* `entry` is an array of size `n` where the nth entry is from the nth tree.
+
+* `info` maintains the state of the traversal.
+
++
+Returning a negative value will terminate the traversal. Otherwise the
+return value is treated as an update mask. If the nth bit is set the nth tree
+will be updated and if the bit is not set the nth tree entry will be the
+same in the next callback invocation.
+
+`make_traverse_path`::
+
+ Generate the full pathname of a tree entry based from the root of the
+ traversal. For example, if the traversal has recursed into another
+ tree named "bar" the pathname of an entry "baz" in the "bar"
+ tree would be "bar/baz".
+
+`traverse_path_len`::
+
+ Calculate the length of a pathname returned by `make_traverse_path`.
+ This utilizes the memory structure of a tree entry to avoid the
+ overhead of using a generic strlen().
+
+Authors
+-------
+
+Written by Junio C Hamano <gitster@pobox.com> and Linus Torvalds
+<torvalds@linux-foundation.org>
diff --git a/Documentation/technical/api-xdiff-interface.txt b/Documentation/technical/api-xdiff-interface.txt
new file mode 100644
index 0000000000..6296ecad1d
--- /dev/null
+++ b/Documentation/technical/api-xdiff-interface.txt
@@ -0,0 +1,7 @@
+xdiff interface API
+===================
+
+Talk about our calling convention to xdiff library, including
+xdiff_emit_consume_fn.
+
+(Dscho, JC)
diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
new file mode 100644
index 0000000000..f8c18a0f7a
--- /dev/null
+++ b/Documentation/technical/bitmap-format.txt
@@ -0,0 +1,164 @@
+GIT bitmap v1 format
+====================
+
+ - A header appears at the beginning:
+
+ 4-byte signature: {'B', 'I', 'T', 'M'}
+
+ 2-byte version number (network byte order)
+ The current implementation only supports version 1
+ of the bitmap index (the same one as JGit).
+
+ 2-byte flags (network byte order)
+
+ The following flags are supported:
+
+ - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
+ This flag must always be present. It implies that the bitmap
+ index has been generated for a packfile with full closure
+ (i.e. where every single object in the packfile can find
+ its parent links inside the same packfile). This is a
+ requirement for the bitmap index format, also present in JGit,
+ that greatly reduces the complexity of the implementation.
+
+ - BITMAP_OPT_HASH_CACHE (0x4)
+ If present, the end of the bitmap file contains
+ `N` 32-bit name-hash values, one per object in the
+ pack. The format and meaning of the name-hash is
+ described below.
+
+ 4-byte entry count (network byte order)
+
+ The total count of entries (bitmapped commits) in this bitmap index.
+
+ 20-byte checksum
+
+ The SHA1 checksum of the pack this bitmap index belongs to.
+
+ - 4 EWAH bitmaps that act as type indexes
+
+ Type indexes are serialized after the hash cache in the shape
+ of four EWAH bitmaps stored consecutively (see Appendix A for
+ the serialization format of an EWAH bitmap).
+
+ There is a bitmap for each Git object type, stored in the following
+ order:
+
+ - Commits
+ - Trees
+ - Blobs
+ - Tags
+
+ In each bitmap, the `n`th bit is set to true if the `n`th object
+ in the packfile is of that type.
+
+ The obvious consequence is that the OR of all 4 bitmaps will result
+ in a full set (all bits set), and the AND of all 4 bitmaps will
+ result in an empty bitmap (no bits set).
+
+ - N entries with compressed bitmaps, one for each indexed commit
+
+ Where `N` is the total amount of entries in this bitmap index.
+ Each entry contains the following:
+
+ - 4-byte object position (network byte order)
+ The position **in the index for the packfile** where the
+ bitmap for this commit is found.
+
+ - 1-byte XOR-offset
+ The xor offset used to compress this bitmap. For an entry
+ in position `x`, a XOR offset of `y` means that the actual
+ bitmap representing this commit is composed by XORing the
+ bitmap for this entry with the bitmap in entry `x-y` (i.e.
+ the bitmap `y` entries before this one).
+
+ Note that this compression can be recursive. In order to
+ XOR this entry with a previous one, the previous entry needs
+ to be decompressed first, and so on.
+
+ The hard-limit for this offset is 160 (an entry can only be
+ xor'ed against one of the 160 entries preceding it). This
+ number is always positive, and hence entries are always xor'ed
+ with **previous** bitmaps, not bitmaps that will come afterwards
+ in the index.
+
+ - 1-byte flags for this bitmap
+ At the moment the only available flag is `0x1`, which hints
+ that this bitmap can be re-used when rebuilding bitmap indexes
+ for the repository.
+
+ - The compressed bitmap itself, see Appendix A.
+
+== Appendix A: Serialization format for an EWAH bitmap
+
+Ewah bitmaps are serialized in the same protocol as the JAVAEWAH
+library, making them backwards compatible with the JGit
+implementation:
+
+ - 4-byte number of bits of the resulting UNCOMPRESSED bitmap
+
+ - 4-byte number of words of the COMPRESSED bitmap, when stored
+
+ - N x 8-byte words, as specified by the previous field
+
+ This is the actual content of the compressed bitmap.
+
+ - 4-byte position of the current RLW for the compressed
+ bitmap
+
+All words are stored in network byte order for their corresponding
+sizes.
+
+The compressed bitmap is stored in a form of run-length encoding, as
+follows. It consists of a concatenation of an arbitrary number of
+chunks. Each chunk consists of one or more 64-bit words
+
+ H L_1 L_2 L_3 .... L_M
+
+H is called RLW (run length word). It consists of (from lower to higher
+order bits):
+
+ - 1 bit: the repeated bit B
+
+ - 32 bits: repetition count K (unsigned)
+
+ - 31 bits: literal word count M (unsigned)
+
+The bitstream represented by the above chunk is then:
+
+ - K repetitions of B
+
+ - The bits stored in `L_1` through `L_M`. Within a word, bits at
+ lower order come earlier in the stream than those at higher
+ order.
+
+The next word after `L_M` (if any) must again be a RLW, for the next
+chunk. For efficient appending to the bitstream, the EWAH stores a
+pointer to the last RLW in the stream.
+
+
+== Appendix B: Optional Bitmap Sections
+
+These sections may or may not be present in the `.bitmap` file; their
+presence is indicated by the header flags section described above.
+
+Name-hash cache
+---------------
+
+If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
+a cache of 32-bit values, one per object in the pack. The value at
+position `i` is the hash of the pathname at which the `i`th object
+(counting in index order) in the pack can be found. This can be fed
+into the delta heuristics to compare objects with similar pathnames.
+
+The hash algorithm used is:
+
+ hash = 0;
+ while ((c = *name++))
+ if (!isspace(c))
+ hash = (hash >> 2) + (c << 24);
+
+Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
+If implementations want to choose a different hashing scheme, they are
+free to do so, but MUST allocate a new header flag (because comparing
+hashes made under two different schemes would be pointless).
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
new file mode 100644
index 0000000000..1c561bdd92
--- /dev/null
+++ b/Documentation/technical/http-protocol.txt
@@ -0,0 +1,507 @@
+HTTP transfer protocols
+=======================
+
+Git supports two HTTP based transfer protocols. A "dumb" protocol
+which requires only a standard HTTP server on the server end of the
+connection, and a "smart" protocol which requires a Git aware CGI
+(or server module). This document describes both protocols.
+
+As a design feature smart clients can automatically upgrade "dumb"
+protocol URLs to smart URLs. This permits all users to have the
+same published URL, and the peers automatically select the most
+efficient transport available to them.
+
+
+URL Format
+----------
+
+URLs for Git repositories accessed by HTTP use the standard HTTP
+URL syntax documented by RFC 1738, so they are of the form:
+
+ http://<host>:<port>/<path>?<searchpart>
+
+Within this documentation the placeholder `$GIT_URL` will stand for
+the http:// repository URL entered by the end-user.
+
+Servers SHOULD handle all requests to locations matching `$GIT_URL`, as
+both the "smart" and "dumb" HTTP protocols used by Git operate
+by appending additional path components onto the end of the user
+supplied `$GIT_URL` string.
+
+An example of a dumb client requesting for a loose object:
+
+ $GIT_URL: http://example.com:8080/git/repo.git
+ URL request: http://example.com:8080/git/repo.git/objects/d0/49f6c27a2244e12041955e262a404c7faba355
+
+An example of a smart request to a catch-all gateway:
+
+ $GIT_URL: http://example.com/daemon.cgi?svc=git&q=
+ URL request: http://example.com/daemon.cgi?svc=git&q=/info/refs&service=git-receive-pack
+
+An example of a request to a submodule:
+
+ $GIT_URL: http://example.com/git/repo.git/path/submodule.git
+ URL request: http://example.com/git/repo.git/path/submodule.git/info/refs
+
+Clients MUST strip a trailing `/`, if present, from the user supplied
+`$GIT_URL` string to prevent empty path tokens (`//`) from appearing
+in any URL sent to a server. Compatible clients MUST expand
+`$GIT_URL/info/refs` as `foo/info/refs` and not `foo//info/refs`.
+
+
+Authentication
+--------------
+
+Standard HTTP authentication is used if authentication is required
+to access a repository, and MAY be configured and enforced by the
+HTTP server software.
+
+Because Git repositories are accessed by standard path components
+server administrators MAY use directory based permissions within
+their HTTP server to control repository access.
+
+Clients SHOULD support Basic authentication as described by RFC 2617.
+Servers SHOULD support Basic authentication by relying upon the
+HTTP server placed in front of the Git server software.
+
+Servers SHOULD NOT require HTTP cookies for the purposes of
+authentication or access control.
+
+Clients and servers MAY support other common forms of HTTP based
+authentication, such as Digest authentication.
+
+
+SSL
+---
+
+Clients and servers SHOULD support SSL, particularly to protect
+passwords when relying on Basic HTTP authentication.
+
+
+Session State
+-------------
+
+The Git over HTTP protocol (much like HTTP itself) is stateless
+from the perspective of the HTTP server side. All state MUST be
+retained and managed by the client process. This permits simple
+round-robin load-balancing on the server side, without needing to
+worry about state management.
+
+Clients MUST NOT require state management on the server side in
+order to function correctly.
+
+Servers MUST NOT require HTTP cookies in order to function correctly.
+Clients MAY store and forward HTTP cookies during request processing
+as described by RFC 2616 (HTTP/1.1). Servers SHOULD ignore any
+cookies sent by a client.
+
+
+General Request Processing
+--------------------------
+
+Except where noted, all standard HTTP behavior SHOULD be assumed
+by both client and server. This includes (but is not necessarily
+limited to):
+
+If there is no repository at `$GIT_URL`, or the resource pointed to by a
+location matching `$GIT_URL` does not exist, the server MUST NOT respond
+with `200 OK` response. A server SHOULD respond with
+`404 Not Found`, `410 Gone`, or any other suitable HTTP status code
+which does not imply the resource exists as requested.
+
+If there is a repository at `$GIT_URL`, but access is not currently
+permitted, the server MUST respond with the `403 Forbidden` HTTP
+status code.
+
+Servers SHOULD support both HTTP 1.0 and HTTP 1.1.
+Servers SHOULD support chunked encoding for both request and response
+bodies.
+
+Clients SHOULD support both HTTP 1.0 and HTTP 1.1.
+Clients SHOULD support chunked encoding for both request and response
+bodies.
+
+Servers MAY return ETag and/or Last-Modified headers.
+
+Clients MAY revalidate cached entities by including If-Modified-Since
+and/or If-None-Match request headers.
+
+Servers MAY return `304 Not Modified` if the relevant headers appear
+in the request and the entity has not changed. Clients MUST treat
+`304 Not Modified` identical to `200 OK` by reusing the cached entity.
+
+Clients MAY reuse a cached entity without revalidation if the
+Cache-Control and/or Expires header permits caching. Clients and
+servers MUST follow RFC 2616 for cache controls.
+
+
+Discovering References
+----------------------
+
+All HTTP clients MUST begin either a fetch or a push exchange by
+discovering the references available on the remote repository.
+
+Dumb Clients
+~~~~~~~~~~~~
+
+HTTP clients that only support the "dumb" protocol MUST discover
+references by making a request for the special info/refs file of
+the repository.
+
+Dumb HTTP clients MUST make a `GET` request to `$GIT_URL/info/refs`,
+without any search/query parameters.
+
+ C: GET $GIT_URL/info/refs HTTP/1.0
+
+ S: 200 OK
+ S:
+ S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint
+ S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
+ S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
+ S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
+
+The Content-Type of the returned info/refs entity SHOULD be
+`text/plain; charset=utf-8`, but MAY be any content type.
+Clients MUST NOT attempt to validate the returned Content-Type.
+Dumb servers MUST NOT return a return type starting with
+`application/x-git-`.
+
+Cache-Control headers MAY be returned to disable caching of the
+returned entity.
+
+When examining the response clients SHOULD only examine the HTTP
+status code. Valid responses are `200 OK`, or `304 Not Modified`.
+
+The returned content is a UNIX formatted text file describing
+each ref and its known value. The file SHOULD be sorted by name
+according to the C locale ordering. The file SHOULD NOT include
+the default ref named `HEAD`.
+
+ info_refs = *( ref_record )
+ ref_record = any_ref / peeled_ref
+
+ any_ref = obj-id HTAB refname LF
+ peeled_ref = obj-id HTAB refname LF
+ obj-id HTAB refname "^{}" LF
+
+Smart Clients
+~~~~~~~~~~~~~
+
+HTTP clients that support the "smart" protocol (or both the
+"smart" and "dumb" protocols) MUST discover references by making
+a parameterized request for the info/refs file of the repository.
+
+The request MUST contain exactly one query parameter,
+`service=$servicename`, where `$servicename` MUST be the service
+name the client wishes to contact to complete the operation.
+The request MUST NOT contain additional query parameters.
+
+ C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0
+
+dumb server reply:
+
+ S: 200 OK
+ S:
+ S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint
+ S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master
+ S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0
+ S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}
+
+smart server reply:
+
+ S: 200 OK
+ S: Content-Type: application/x-git-upload-pack-advertisement
+ S: Cache-Control: no-cache
+ S:
+ S: 001e# service=git-upload-pack\n
+ S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n
+ S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n
+ S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n
+ S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n
+
+Dumb Server Response
+^^^^^^^^^^^^^^^^^^^^
+Dumb servers MUST respond with the dumb server reply format.
+
+See the prior section under dumb clients for a more detailed
+description of the dumb server response.
+
+Smart Server Response
+^^^^^^^^^^^^^^^^^^^^^
+If the server does not recognize the requested service name, or the
+requested service name has been disabled by the server administrator,
+the server MUST respond with the `403 Forbidden` HTTP status code.
+
+Otherwise, smart servers MUST respond with the smart server reply
+format for the requested service name.
+
+Cache-Control headers SHOULD be used to disable caching of the
+returned entity.
+
+The Content-Type MUST be `application/x-$servicename-advertisement`.
+Clients SHOULD fall back to the dumb protocol if another content
+type is returned. When falling back to the dumb protocol clients
+SHOULD NOT make an additional request to `$GIT_URL/info/refs`, but
+instead SHOULD use the response already in hand. Clients MUST NOT
+continue if they do not support the dumb protocol.
+
+Clients MUST validate the status code is either `200 OK` or
+`304 Not Modified`.
+
+Clients MUST validate the first five bytes of the response entity
+matches the regex `^[0-9a-f]{4}#`. If this test fails, clients
+MUST NOT continue.
+
+Clients MUST parse the entire response as a sequence of pkt-line
+records.
+
+Clients MUST verify the first pkt-line is `# service=$servicename`.
+Servers MUST set $servicename to be the request parameter value.
+Servers SHOULD include an LF at the end of this line.
+Clients MUST ignore an LF at the end of the line.
+
+Servers MUST terminate the response with the magic `0000` end
+pkt-line marker.
+
+The returned response is a pkt-line stream describing each ref and
+its known value. The stream SHOULD be sorted by name according to
+the C locale ordering. The stream SHOULD include the default ref
+named `HEAD` as the first ref. The stream MUST include capability
+declarations behind a NUL on the first ref.
+
+ smart_reply = PKT-LINE("# service=$servicename" LF)
+ ref_list
+ "0000"
+ ref_list = empty_list / non_empty_list
+
+ empty_list = PKT-LINE(zero-id SP "capabilities^{}" NUL cap-list LF)
+
+ non_empty_list = PKT-LINE(obj-id SP name NUL cap_list LF)
+ *ref_record
+
+ cap-list = capability *(SP capability)
+ capability = 1*(LC_ALPHA / DIGIT / "-" / "_")
+ LC_ALPHA = %x61-7A
+
+ ref_record = any_ref / peeled_ref
+ any_ref = PKT-LINE(obj-id SP name LF)
+ peeled_ref = PKT-LINE(obj-id SP name LF)
+ PKT-LINE(obj-id SP name "^{}" LF
+
+
+Smart Service git-upload-pack
+------------------------------
+This service reads from the repository pointed to by `$GIT_URL`.
+
+Clients MUST first perform ref discovery with
+`$GIT_URL/info/refs?service=git-upload-pack`.
+
+ C: POST $GIT_URL/git-upload-pack HTTP/1.0
+ C: Content-Type: application/x-git-upload-pack-request
+ C:
+ C: 0032want 0a53e9ddeaddad63ad106860237bbf53411d11a7\n
+ C: 0032have 441b40d833fdfa93eb2908e52742248faf0ee993\n
+ C: 0000
+
+ S: 200 OK
+ S: Content-Type: application/x-git-upload-pack-result
+ S: Cache-Control: no-cache
+ S:
+ S: ....ACK %s, continue
+ S: ....NAK
+
+Clients MUST NOT reuse or revalidate a cached response.
+Servers MUST include sufficient Cache-Control headers
+to prevent caching of the response.
+
+Servers SHOULD support all capabilities defined here.
+
+Clients MUST send at least one "want" command in the request body.
+Clients MUST NOT reference an id in a "want" command which did not
+appear in the response obtained through ref discovery unless the
+server advertises capability `allow-tip-sha1-in-want` or
+`allow-reachable-sha1-in-want`.
+
+ compute_request = want_list
+ have_list
+ request_end
+ request_end = "0000" / "done"
+
+ want_list = PKT-LINE(want NUL cap_list LF)
+ *(want_pkt)
+ want_pkt = PKT-LINE(want LF)
+ want = "want" SP id
+ cap_list = *(SP capability) SP
+
+ have_list = *PKT-LINE("have" SP id LF)
+
+TODO: Document this further.
+
+The Negotiation Algorithm
+~~~~~~~~~~~~~~~~~~~~~~~~~
+The computation to select the minimal pack proceeds as follows
+(C = client, S = server):
+
+'init step:'
+
+C: Use ref discovery to obtain the advertised refs.
+
+C: Place any object seen into set `advertised`.
+
+C: Build an empty set, `common`, to hold the objects that are later
+ determined to be on both ends.
+
+C: Build a set, `want`, of the objects from `advertised` the client
+ wants to fetch, based on what it saw during ref discovery.
+
+C: Start a queue, `c_pending`, ordered by commit time (popping newest
+ first). Add all client refs. When a commit is popped from
+ the queue its parents SHOULD be automatically inserted back.
+ Commits MUST only enter the queue once.
+
+'one compute step:'
+
+C: Send one `$GIT_URL/git-upload-pack` request:
+
+ C: 0032want <want #1>...............................
+ C: 0032want <want #2>...............................
+ ....
+ C: 0032have <common #1>.............................
+ C: 0032have <common #2>.............................
+ ....
+ C: 0032have <have #1>...............................
+ C: 0032have <have #2>...............................
+ ....
+ C: 0000
+
+The stream is organized into "commands", with each command
+appearing by itself in a pkt-line. Within a command line,
+the text leading up to the first space is the command name,
+and the remainder of the line to the first LF is the value.
+Command lines are terminated with an LF as the last byte of
+the pkt-line value.
+
+Commands MUST appear in the following order, if they appear
+at all in the request stream:
+
+* "want"
+* "have"
+
+The stream is terminated by a pkt-line flush (`0000`).
+
+A single "want" or "have" command MUST have one hex formatted
+SHA-1 as its value. Multiple SHA-1s MUST be sent by sending
+multiple commands.
+
+The `have` list is created by popping the first 32 commits
+from `c_pending`. Less can be supplied if `c_pending` empties.
+
+If the client has sent 256 "have" commits and has not yet
+received one of those back from `s_common`, or the client has
+emptied `c_pending` it SHOULD include a "done" command to let
+the server know it won't proceed:
+
+ C: 0009done
+
+S: Parse the git-upload-pack request:
+
+Verify all objects in `want` are directly reachable from refs.
+
+The server MAY walk backwards through history or through
+the reflog to permit slightly stale requests.
+
+If no "want" objects are received, send an error:
+TODO: Define error if no "want" lines are requested.
+
+If any "want" object is not reachable, send an error:
+TODO: Define error if an invalid "want" is requested.
+
+Create an empty list, `s_common`.
+
+If "have" was sent:
+
+Loop through the objects in the order supplied by the client.
+
+For each object, if the server has the object reachable from
+a ref, add it to `s_common`. If a commit is added to `s_common`,
+do not add any ancestors, even if they also appear in `have`.
+
+S: Send the git-upload-pack response:
+
+If the server has found a closed set of objects to pack or the
+request ends with "done", it replies with the pack.
+TODO: Document the pack based response
+
+ S: PACK...
+
+The returned stream is the side-band-64k protocol supported
+by the git-upload-pack service, and the pack is embedded into
+stream 1. Progress messages from the server side MAY appear
+in stream 2.
+
+Here a "closed set of objects" is defined to have at least
+one path from every "want" to at least one "common" object.
+
+If the server needs more information, it replies with a
+status continue response:
+TODO: Document the non-pack response
+
+C: Parse the upload-pack response:
+ TODO: Document parsing response
+
+'Do another compute step.'
+
+
+Smart Service git-receive-pack
+------------------------------
+This service reads from the repository pointed to by `$GIT_URL`.
+
+Clients MUST first perform ref discovery with
+`$GIT_URL/info/refs?service=git-receive-pack`.
+
+ C: POST $GIT_URL/git-receive-pack HTTP/1.0
+ C: Content-Type: application/x-git-receive-pack-request
+ C:
+ C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status
+ C: 0000
+ C: PACK....
+
+ S: 200 OK
+ S: Content-Type: application/x-git-receive-pack-result
+ S: Cache-Control: no-cache
+ S:
+ S: ....
+
+Clients MUST NOT reuse or revalidate a cached response.
+Servers MUST include sufficient Cache-Control headers
+to prevent caching of the response.
+
+Servers SHOULD support all capabilities defined here.
+
+Clients MUST send at least one command in the request body.
+Within the command portion of the request body clients SHOULD send
+the id obtained through ref discovery as old_id.
+
+ update_request = command_list
+ "PACK" <binary data>
+
+ command_list = PKT-LINE(command NUL cap_list LF)
+ *(command_pkt)
+ command_pkt = PKT-LINE(command LF)
+ cap_list = *(SP capability) SP
+
+ command = create / delete / update
+ create = zero-id SP new_id SP name
+ delete = old_id SP zero-id SP name
+ update = old_id SP new_id SP name
+
+TODO: Document this further.
+
+
+References
+----------
+
+http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)]
+http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1]
+link:technical/pack-protocol.html
+link:technical/protocol-capabilities.html
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
new file mode 100644
index 0000000000..ade0b0c445
--- /dev/null
+++ b/Documentation/technical/index-format.txt
@@ -0,0 +1,297 @@
+Git index format
+================
+
+== The Git index file has the following format
+
+ All binary numbers are in network byte order. Version 2 is described
+ here unless stated otherwise.
+
+ - A 12-byte header consisting of
+
+ 4-byte signature:
+ The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
+
+ 4-byte version number:
+ The current supported versions are 2, 3 and 4.
+
+ 32-bit number of index entries.
+
+ - A number of sorted index entries (see below).
+
+ - Extensions
+
+ Extensions are identified by signature. Optional extensions can
+ be ignored if Git does not understand them.
+
+ Git currently supports cached tree and resolve undo extensions.
+
+ 4-byte extension signature. If the first byte is 'A'..'Z' the
+ extension is optional and can be ignored.
+
+ 32-bit size of the extension
+
+ Extension data
+
+ - 160-bit SHA-1 over the content of the index file before this
+ checksum.
+
+== Index entry
+
+ Index entries are sorted in ascending order on the name field,
+ interpreted as a string of unsigned bytes (i.e. memcmp() order, no
+ localization, no special casing of directory separator '/'). Entries
+ with the same name are sorted by their stage field.
+
+ 32-bit ctime seconds, the last time a file's metadata changed
+ this is stat(2) data
+
+ 32-bit ctime nanosecond fractions
+ this is stat(2) data
+
+ 32-bit mtime seconds, the last time a file's data changed
+ this is stat(2) data
+
+ 32-bit mtime nanosecond fractions
+ this is stat(2) data
+
+ 32-bit dev
+ this is stat(2) data
+
+ 32-bit ino
+ this is stat(2) data
+
+ 32-bit mode, split into (high to low bits)
+
+ 4-bit object type
+ valid values in binary are 1000 (regular file), 1010 (symbolic link)
+ and 1110 (gitlink)
+
+ 3-bit unused
+
+ 9-bit unix permission. Only 0755 and 0644 are valid for regular files.
+ Symbolic links and gitlinks have value 0 in this field.
+
+ 32-bit uid
+ this is stat(2) data
+
+ 32-bit gid
+ this is stat(2) data
+
+ 32-bit file size
+ This is the on-disk size from stat(2), truncated to 32-bit.
+
+ 160-bit SHA-1 for the represented object
+
+ A 16-bit 'flags' field split into (high to low bits)
+
+ 1-bit assume-valid flag
+
+ 1-bit extended flag (must be zero in version 2)
+
+ 2-bit stage (during merge)
+
+ 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
+ is stored in this field.
+
+ (Version 3 or later) A 16-bit field, only applicable if the
+ "extended flag" above is 1, split into (high to low bits).
+
+ 1-bit reserved for future
+
+ 1-bit skip-worktree flag (used by sparse checkout)
+
+ 1-bit intent-to-add flag (used by "git add -N")
+
+ 13-bit unused, must be zero
+
+ Entry path name (variable length) relative to top level directory
+ (without leading slash). '/' is used as path separator. The special
+ path components ".", ".." and ".git" (without quotes) are disallowed.
+ Trailing slash is also disallowed.
+
+ The exact encoding is undefined, but the '.' and '/' characters
+ are encoded in 7-bit ASCII and the encoding cannot contain a NUL
+ byte (iow, this is a UNIX pathname).
+
+ (Version 4) In version 4, the entry path name is prefix-compressed
+ relative to the path name for the previous entry (the very first
+ entry is encoded as if the path name for the previous entry is an
+ empty string). At the beginning of an entry, an integer N in the
+ variable width encoding (the same encoding as the offset is encoded
+ for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
+ by a NUL-terminated string S. Removing N bytes from the end of the
+ path name for the previous entry, and replacing it with the string S
+ yields the path name for this entry.
+
+ 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
+ while keeping the name NUL-terminated.
+
+ (Version 4) In version 4, the padding after the pathname does not
+ exist.
+
+ Interpretation of index entries in split index mode is completely
+ different. See below for details.
+
+== Extensions
+
+=== Cached tree
+
+ Cached tree extension contains pre-computed hashes for trees that can
+ be derived from the index. It helps speed up tree object generation
+ from index for a new commit.
+
+ When a path is updated in index, the path must be invalidated and
+ removed from tree cache.
+
+ The signature for this extension is { 'T', 'R', 'E', 'E' }.
+
+ A series of entries fill the entire extension; each of which
+ consists of:
+
+ - NUL-terminated path component (relative to its parent directory);
+
+ - ASCII decimal number of entries in the index that is covered by the
+ tree this entry represents (entry_count);
+
+ - A space (ASCII 32);
+
+ - ASCII decimal number that represents the number of subtrees this
+ tree has;
+
+ - A newline (ASCII 10); and
+
+ - 160-bit object name for the object that would result from writing
+ this span of index as a tree.
+
+ An entry can be in an invalidated state and is represented by having
+ a negative number in the entry_count field. In this case, there is no
+ object name and the next entry starts immediately after the newline.
+ When writing an invalid entry, -1 should always be used as entry_count.
+
+ The entries are written out in the top-down, depth-first order. The
+ first entry represents the root level of the repository, followed by the
+ first subtree--let's call this A--of the root level (with its name
+ relative to the root level), followed by the first subtree of A (with
+ its name relative to A), ...
+
+=== Resolve undo
+
+ A conflict is represented in the index as a set of higher stage entries.
+ When a conflict is resolved (e.g. with "git add path"), these higher
+ stage entries will be removed and a stage-0 entry with proper resolution
+ is added.
+
+ When these higher stage entries are removed, they are saved in the
+ resolve undo extension, so that conflicts can be recreated (e.g. with
+ "git checkout -m"), in case users want to redo a conflict resolution
+ from scratch.
+
+ The signature for this extension is { 'R', 'E', 'U', 'C' }.
+
+ A series of entries fill the entire extension; each of which
+ consists of:
+
+ - NUL-terminated pathname the entry describes (relative to the root of
+ the repository, i.e. full pathname);
+
+ - Three NUL-terminated ASCII octal numbers, entry mode of entries in
+ stage 1 to 3 (a missing stage is represented by "0" in this field);
+ and
+
+ - At most three 160-bit object names of the entry in stages from 1 to 3
+ (nothing is written for a missing stage).
+
+=== Split index
+
+ In split index mode, the majority of index entries could be stored
+ in a separate file. This extension records the changes to be made on
+ top of that to produce the final index.
+
+ The signature for this extension is { 'l', 'i', 'n', 'k' }.
+
+ The extension consists of:
+
+ - 160-bit SHA-1 of the shared index file. The shared index file path
+ is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
+ index does not require a shared index file.
+
+ - An ewah-encoded delete bitmap, each bit represents an entry in the
+ shared index. If a bit is set, its corresponding entry in the
+ shared index will be removed from the final index. Note, because
+ a delete operation changes index entry positions, but we do need
+ original positions in replace phase, it's best to just mark
+ entries for removal, then do a mass deletion after replacement.
+
+ - An ewah-encoded replace bitmap, each bit represents an entry in
+ the shared index. If a bit is set, its corresponding entry in the
+ shared index will be replaced with an entry in this index
+ file. All replaced entries are stored in sorted order in this
+ index. The first "1" bit in the replace bitmap corresponds to the
+ first index entry, the second "1" bit to the second entry and so
+ on. Replaced entries may have empty path names to save space.
+
+ The remaining index entries after replaced ones will be added to the
+ final index. These added entries are also sorted by entry name then
+ stage.
+
+== Untracked cache
+
+ Untracked cache saves the untracked file list and necessary data to
+ verify the cache. The signature for this extension is { 'U', 'N',
+ 'T', 'R' }.
+
+ The extension starts with
+
+ - A sequence of NUL-terminated strings, preceded by the size of the
+ sequence in variable width encoding. Each string describes the
+ environment where the cache can be used.
+
+ - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
+ ctime field until "file size".
+
+ - Stat data of core.excludesfile
+
+ - 32-bit dir_flags (see struct dir_struct)
+
+ - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
+ does not exist.
+
+ - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
+ not exist.
+
+ - NUL-terminated string of per-dir exclude file name. This usually
+ is ".gitignore".
+
+ - The number of following directory blocks, variable width
+ encoding. If this number is zero, the extension ends here with a
+ following NUL.
+
+ - A number of directory blocks in depth-first-search order, each
+ consists of
+
+ - The number of untracked entries, variable width encoding.
+
+ - The number of sub-directory blocks, variable width encoding.
+
+ - The directory name terminated by NUL.
+
+ - A number of untracked file/dir names terminated by NUL.
+
+The remaining data of each directory block is grouped by type:
+
+ - An ewah bitmap, the n-th bit marks whether the n-th directory has
+ valid untracked cache entries.
+
+ - An ewah bitmap, the n-th bit records "check-only" bit of
+ read_directory_recursive() for the n-th directory.
+
+ - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
+ is valid for the n-th directory and exists in the next data.
+
+ - An array of stat data. The n-th data corresponds with the n-th
+ "one" bit in the previous ewah bitmap.
+
+ - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
+ in the previous ewah bitmap.
+
+ - One NUL.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
new file mode 100644
index 0000000000..8e5bf60be3
--- /dev/null
+++ b/Documentation/technical/pack-format.txt
@@ -0,0 +1,162 @@
+Git pack format
+===============
+
+== pack-*.pack files have the following format:
+
+ - A header appears at the beginning and consists of the following:
+
+ 4-byte signature:
+ The signature is: {'P', 'A', 'C', 'K'}
+
+ 4-byte version number (network byte order):
+ Git currently accepts version number 2 or 3 but
+ generates version 2 only.
+
+ 4-byte number of objects contained in the pack (network byte order)
+
+ Observation: we cannot have more than 4G versions ;-) and
+ more than 4G objects in a pack.
+
+ - The header is followed by number of object entries, each of
+ which looks like this:
+
+ (undeltified representation)
+ n-byte type and length (3-bit type, (n-1)*7+4-bit length)
+ compressed data
+
+ (deltified representation)
+ n-byte type and length (3-bit type, (n-1)*7+4-bit length)
+ 20-byte base object name if OBJ_REF_DELTA or a negative relative
+ offset from the delta object's position in the pack if this
+ is an OBJ_OFS_DELTA object
+ compressed delta data
+
+ Observation: length of each object is encoded in a variable
+ length format and is not constrained to 32-bit or anything.
+
+ - The trailer records 20-byte SHA-1 checksum of all of the above.
+
+== Original (version 1) pack-*.idx files have the following format:
+
+ - The header consists of 256 4-byte network byte order
+ integers. N-th entry of this table records the number of
+ objects in the corresponding pack, the first byte of whose
+ object name is less than or equal to N. This is called the
+ 'first-level fan-out' table.
+
+ - The header is followed by sorted 24-byte entries, one entry
+ per object in the pack. Each entry is:
+
+ 4-byte network byte order integer, recording where the
+ object is stored in the packfile as the offset from the
+ beginning.
+
+ 20-byte object name.
+
+ - The file is concluded with a trailer:
+
+ A copy of the 20-byte SHA-1 checksum at the end of
+ corresponding packfile.
+
+ 20-byte SHA-1-checksum of all of the above.
+
+Pack Idx file:
+
+ -- +--------------------------------+
+fanout | fanout[0] = 2 (for example) |-.
+table +--------------------------------+ |
+ | fanout[1] | |
+ +--------------------------------+ |
+ | fanout[2] | |
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
+ | fanout[255] = total objects |---.
+ -- +--------------------------------+ | |
+main | offset | | |
+index | object name 00XXXXXXXXXXXXXXXX | | |
+table +--------------------------------+ | |
+ | offset | | |
+ | object name 00XXXXXXXXXXXXXXXX | | |
+ +--------------------------------+<+ |
+ .-| offset | |
+ | | object name 01XXXXXXXXXXXXXXXX | |
+ | +--------------------------------+ |
+ | | offset | |
+ | | object name 01XXXXXXXXXXXXXXXX | |
+ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
+ | | offset | |
+ | | object name FFXXXXXXXXXXXXXXXX | |
+ --| +--------------------------------+<--+
+trailer | | packfile checksum |
+ | +--------------------------------+
+ | | idxfile checksum |
+ | +--------------------------------+
+ .-------.
+ |
+Pack file entry: <+
+
+ packed object header:
+ 1-byte size extension bit (MSB)
+ type (next 3 bit)
+ size0 (lower 4-bit)
+ n-byte sizeN (as long as MSB is set, each 7-bit)
+ size0..sizeN form 4+7+7+..+7 bit integer, size0
+ is the least significant part, and sizeN is the
+ most significant part.
+ packed object data:
+ If it is not DELTA, then deflated bytes (the size above
+ is the size before compression).
+ If it is REF_DELTA, then
+ 20-byte base object name SHA-1 (the size above is the
+ size of the delta data that follows).
+ delta data, deflated.
+ If it is OFS_DELTA, then
+ n-byte offset (see below) interpreted as a negative
+ offset from the type-byte of the header of the
+ ofs-delta entry (the size above is the size of
+ the delta data that follows).
+ delta data, deflated.
+
+ offset encoding:
+ n bytes with MSB set in all but the last one.
+ The offset is then the number constructed by
+ concatenating the lower 7 bit of each byte, and
+ for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
+ to the result.
+
+
+
+== Version 2 pack-*.idx files support packs larger than 4 GiB, and
+ have some other reorganizations. They have the format:
+
+ - A 4-byte magic number '\377tOc' which is an unreasonable
+ fanout[0] value.
+
+ - A 4-byte version number (= 2)
+
+ - A 256-entry fan-out table just like v1.
+
+ - A table of sorted 20-byte SHA-1 object names. These are
+ packed together without offset values to reduce the cache
+ footprint of the binary search for a specific object name.
+
+ - A table of 4-byte CRC32 values of the packed object data.
+ This is new in v2 so compressed data can be copied directly
+ from pack to pack during repacking without undetected
+ data corruption.
+
+ - A table of 4-byte offset values (in network byte order).
+ These are usually 31-bit pack file offsets, but large
+ offsets are encoded as an index into the next table with
+ the msbit set.
+
+ - A table of 8-byte offset entries (empty for pack files less
+ than 2 GiB). Pack files are organized with heavily used
+ objects toward the front, so most object references should
+ not need to refer to this table.
+
+ - The same trailer as a v1 pack file:
+
+ A copy of the 20-byte SHA-1 checksum at the end of
+ corresponding packfile.
+
+ 20-byte SHA-1-checksum of all of the above.
diff --git a/Documentation/technical/pack-heuristics.txt b/Documentation/technical/pack-heuristics.txt
new file mode 100644
index 0000000000..95a07db6e8
--- /dev/null
+++ b/Documentation/technical/pack-heuristics.txt
@@ -0,0 +1,460 @@
+Concerning Git's Packing Heuristics
+===================================
+
+ Oh, here's a really stupid question:
+
+ Where do I go
+ to learn the details
+ of Git's packing heuristics?
+
+Be careful what you ask!
+
+Followers of the Git, please open the Git IRC Log and turn to
+February 10, 2006.
+
+It's a rare occasion, and we are joined by the King Git Himself,
+Linus Torvalds (linus). Nathaniel Smith, (njs`), has the floor
+and seeks enlightenment. Others are present, but silent.
+
+Let's listen in!
+
+ <njs`> Oh, here's a really stupid question -- where do I go to
+ learn the details of Git's packing heuristics? google avails
+ me not, reading the source didn't help a lot, and wading
+ through the whole mailing list seems less efficient than any
+ of that.
+
+It is a bold start! A plea for help combined with a simultaneous
+tri-part attack on some of the tried and true mainstays in the quest
+for enlightenment. Brash accusations of google being useless. Hubris!
+Maligning the source. Heresy! Disdain for the mailing list archives.
+Woe.
+
+ <pasky> yes, the packing-related delta stuff is somewhat
+ mysterious even for me ;)
+
+Ah! Modesty after all.
+
+ <linus> njs, I don't think the docs exist. That's something where
+ I don't think anybody else than me even really got involved.
+ Most of the rest of Git others have been busy with (especially
+ Junio), but packing nobody touched after I did it.
+
+It's cryptic, yet vague. Linus in style for sure. Wise men
+interpret this as an apology. A few argue it is merely a
+statement of fact.
+
+ <njs`> I guess the next step is "read the source again", but I
+ have to build up a certain level of gumption first :-)
+
+Indeed! On both points.
+
+ <linus> The packing heuristic is actually really really simple.
+
+Bait...
+
+ <linus> But strange.
+
+And switch. That ought to do it!
+
+ <linus> Remember: Git really doesn't follow files. So what it does is
+ - generate a list of all objects
+ - sort the list according to magic heuristics
+ - walk the list, using a sliding window, seeing if an object
+ can be diffed against another object in the window
+ - write out the list in recency order
+
+The traditional understatement:
+
+ <njs`> I suspect that what I'm missing is the precise definition of
+ the word "magic"
+
+The traditional insight:
+
+ <pasky> yes
+
+And Babel-like confusion flowed.
+
+ <njs`> oh, hmm, and I'm not sure what this sliding window means either
+
+ <pasky> iirc, it appeared to me to be just the sha1 of the object
+ when reading the code casually ...
+
+ ... which simply doesn't sound as a very good heuristics, though ;)
+
+ <njs`> .....and recency order. okay, I think it's clear I didn't
+ even realize how much I wasn't realizing :-)
+
+Ah, grasshopper! And thus the enlightenment begins anew.
+
+ <linus> The "magic" is actually in theory totally arbitrary.
+ ANY order will give you a working pack, but no, it's not
+ ordered by SHA-1.
+
+ Before talking about the ordering for the sliding delta
+ window, let's talk about the recency order. That's more
+ important in one way.
+
+ <njs`> Right, but if all you want is a working way to pack things
+ together, you could just use cat and save yourself some
+ trouble...
+
+Waaait for it....
+
+ <linus> The recency ordering (which is basically: put objects
+ _physically_ into the pack in the order that they are
+ "reachable" from the head) is important.
+
+ <njs`> okay
+
+ <linus> It's important because that's the thing that gives packs
+ good locality. It keeps the objects close to the head (whether
+ they are old or new, but they are _reachable_ from the head)
+ at the head of the pack. So packs actually have absolutely
+ _wonderful_ IO patterns.
+
+Read that again, because it is important.
+
+ <linus> But recency ordering is totally useless for deciding how
+ to actually generate the deltas, so the delta ordering is
+ something else.
+
+ The delta ordering is (wait for it):
+ - first sort by the "basename" of the object, as defined by
+ the name the object was _first_ reached through when
+ generating the object list
+ - within the same basename, sort by size of the object
+ - but always sort different types separately (commits first).
+
+ That's not exactly it, but it's very close.
+
+ <njs`> The "_first_ reached" thing is not too important, just you
+ need some way to break ties since the same objects may be
+ reachable many ways, yes?
+
+And as if to clarify:
+
+ <linus> The point is that it's all really just any random
+ heuristic, and the ordering is totally unimportant for
+ correctness, but it helps a lot if the heuristic gives
+ "clumping" for things that are likely to delta well against
+ each other.
+
+It is an important point, so secretly, I did my own research and have
+included my results below. To be fair, it has changed some over time.
+And through the magic of Revisionistic History, I draw upon this entry
+from The Git IRC Logs on my father's birthday, March 1:
+
+ <gitster> The quote from the above linus should be rewritten a
+ bit (wait for it):
+ - first sort by type. Different objects never delta with
+ each other.
+ - then sort by filename/dirname. hash of the basename
+ occupies the top BITS_PER_INT-DIR_BITS bits, and bottom
+ DIR_BITS are for the hash of leading path elements.
+ - then if we are doing "thin" pack, the objects we are _not_
+ going to pack but we know about are sorted earlier than
+ other objects.
+ - and finally sort by size, larger to smaller.
+
+In one swell-foop, clarification and obscurification! Nonetheless,
+authoritative. Cryptic, yet concise. It even solicits notions of
+quotes from The Source Code. Clearly, more study is needed.
+
+ <gitster> That's the sort order. What this means is:
+ - we do not delta different object types.
+ - we prefer to delta the objects with the same full path, but
+ allow files with the same name from different directories.
+ - we always prefer to delta against objects we are not going
+ to send, if there are some.
+ - we prefer to delta against larger objects, so that we have
+ lots of removals.
+
+ The penultimate rule is for "thin" packs. It is used when
+ the other side is known to have such objects.
+
+There it is again. "Thin" packs. I'm thinking to myself, "What
+is a 'thin' pack?" So I ask:
+
+ <jdl> What is a "thin" pack?
+
+ <gitster> Use of --objects-edge to rev-list as the upstream of
+ pack-objects. The pack transfer protocol negotiates that.
+
+Woo hoo! Cleared that _right_ up!
+
+ <gitster> There are two directions - push and fetch.
+
+There! Did you see it? It is not '"push" and "pull"'! How often the
+confusion has started here. So casually mentioned, too!
+
+ <gitster> For push, git-send-pack invokes git-receive-pack on the
+ other end. The receive-pack says "I have up to these commits".
+ send-pack looks at them, and computes what are missing from
+ the other end. So "thin" could be the default there.
+
+ In the other direction, fetch, git-fetch-pack and
+ git-clone-pack invokes git-upload-pack on the other end
+ (via ssh or by talking to the daemon).
+
+ There are two cases: fetch-pack with -k and clone-pack is one,
+ fetch-pack without -k is the other. clone-pack and fetch-pack
+ with -k will keep the downloaded packfile without expanded, so
+ we do not use thin pack transfer. Otherwise, the generated
+ pack will have delta without base object in the same pack.
+
+ But fetch-pack without -k will explode the received pack into
+ individual objects, so we automatically ask upload-pack to
+ give us a thin pack if upload-pack supports it.
+
+OK then.
+
+Uh.
+
+Let's return to the previous conversation still in progress.
+
+ <njs`> and "basename" means something like "the tail of end of
+ path of file objects and dir objects, as per basename(3), and
+ we just declare all commit and tag objects to have the same
+ basename" or something?
+
+Luckily, that too is a point that gitster clarified for us!
+
+If I might add, the trick is to make files that _might_ be similar be
+located close to each other in the hash buckets based on their file
+names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and
+"Makefile" all landed in the same bucket due to their common basename,
+"Makefile". However, now they land in "close" buckets.
+
+The algorithm allows not just for the _same_ bucket, but for _close_
+buckets to be considered delta candidates. The rationale is
+essentially that files, like Makefiles, often have very similar
+content no matter what directory they live in.
+
+ <linus> I played around with different delta algorithms, and with
+ making the "delta window" bigger, but having too big of a
+ sliding window makes it very expensive to generate the pack:
+ you need to compare every object with a _ton_ of other objects.
+
+ There are a number of other trivial heuristics too, which
+ basically boil down to "don't bother even trying to delta this
+ pair" if we can tell before-hand that the delta isn't worth it
+ (due to size differences, where we can take a previous delta
+ result into account to decide that "ok, no point in trying
+ that one, it will be worse").
+
+ End result: packing is actually very size efficient. It's
+ somewhat CPU-wasteful, but on the other hand, since you're
+ really only supposed to do it maybe once a month (and you can
+ do it during the night), nobody really seems to care.
+
+Nice Engineering Touch, there. Find when it doesn't matter, and
+proclaim it a non-issue. Good style too!
+
+ <njs`> So, just to repeat to see if I'm following, we start by
+ getting a list of the objects we want to pack, we sort it by
+ this heuristic (basically lexicographically on the tuple
+ (type, basename, size)).
+
+ Then we walk through this list, and calculate a delta of
+ each object against the last n (tunable parameter) objects,
+ and pick the smallest of these deltas.
+
+Vastly simplified, but the essence is there!
+
+ <linus> Correct.
+
+ <njs`> And then once we have picked a delta or fulltext to
+ represent each object, we re-sort by recency, and write them
+ out in that order.
+
+ <linus> Yup. Some other small details:
+
+And of course there is the "Other Shoe" Factor too.
+
+ <linus> - We limit the delta depth to another magic value (right
+ now both the window and delta depth magic values are just "10")
+
+ <njs`> Hrm, my intuition is that you'd end up with really _bad_ IO
+ patterns, because the things you want are near by, but to
+ actually reconstruct them you may have to jump all over in
+ random ways.
+
+ <linus> - When we write out a delta, and we haven't yet written
+ out the object it is a delta against, we write out the base
+ object first. And no, when we reconstruct them, we actually
+ get nice IO patterns, because:
+ - larger objects tend to be "more recent" (Linus' law: files grow)
+ - we actively try to generate deltas from a larger object to a
+ smaller one
+ - this means that the top-of-tree very seldom has deltas
+ (i.e. deltas in _practice_ are "backwards deltas")
+
+Again, we should reread that whole paragraph. Not just because
+Linus has slipped Linus's Law in there on us, but because it is
+important. Let's make sure we clarify some of the points here:
+
+ <njs`> So the point is just that in practice, delta order and
+ recency order match each other quite well.
+
+ <linus> Yes. There's another nice side to this (and yes, it was
+ designed that way ;):
+ - the reason we generate deltas against the larger object is
+ actually a big space saver too!
+
+ <njs`> Hmm, but your last comment (if "we haven't yet written out
+ the object it is a delta against, we write out the base object
+ first"), seems like it would make these facts mostly
+ irrelevant because even if in practice you would not have to
+ wander around much, in fact you just brute-force say that in
+ the cases where you might have to wander, don't do that :-)
+
+ <linus> Yes and no. Notice the rule: we only write out the base
+ object first if the delta against it was more recent. That
+ means that you can actually have deltas that refer to a base
+ object that is _not_ close to the delta object, but that only
+ happens when the delta is needed to generate an _old_ object.
+
+ <linus> See?
+
+Yeah, no. I missed that on the first two or three readings myself.
+
+ <linus> This keeps the front of the pack dense. The front of the
+ pack never contains data that isn't relevant to a "recent"
+ object. The size optimization comes from our use of xdelta
+ (but is true for many other delta algorithms): removing data
+ is cheaper (in size) than adding data.
+
+ When you remove data, you only need to say "copy bytes n--m".
+ In contrast, in a delta that _adds_ data, you have to say "add
+ these bytes: 'actual data goes here'"
+
+ *** njs` has quit: Read error: 104 (Connection reset by peer)
+
+ <linus> Uhhuh. I hope I didn't blow njs` mind.
+
+ *** njs` has joined channel #git
+
+ <pasky> :)
+
+The silent observers are amused. Of course.
+
+And as if njs` was expected to be omniscient:
+
+ <linus> njs - did you miss anything?
+
+OK, I'll spell it out. That's Geek Humor. If njs` was not actually
+connected for a little bit there, how would he know if missed anything
+while he was disconnected? He's a benevolent dictator with a sense of
+humor! Well noted!
+
+ <njs`> Stupid router. Or gremlins, or whatever.
+
+It's a cheap shot at Cisco. Take 'em when you can.
+
+ <njs`> Yes and no. Notice the rule: we only write out the base
+ object first if the delta against it was more recent.
+
+ I'm getting lost in all these orders, let me re-read :-)
+ So the write-out order is from most recent to least recent?
+ (Conceivably it could be the opposite way too, I'm not sure if
+ we've said) though my connection back at home is logging, so I
+ can just read what you said there :-)
+
+And for those of you paying attention, the Omniscient Trick has just
+been detailed!
+
+ <linus> Yes, we always write out most recent first
+
+ <njs`> And, yeah, I got the part about deeper-in-history stuff
+ having worse IO characteristics, one sort of doesn't care.
+
+ <linus> With the caveat that if the "most recent" needs an older
+ object to delta against (hey, shrinking sometimes does
+ happen), we write out the old object with the delta.
+
+ <njs`> (if only it happened more...)
+
+ <linus> Anyway, the pack-file could easily be denser still, but
+ because it's used both for streaming (the Git protocol) and
+ for on-disk, it has a few pessimizations.
+
+Actually, it is a made-up word. But it is a made-up word being
+used as setup for a later optimization, which is a real word:
+
+ <linus> In particular, while the pack-file is then compressed,
+ it's compressed just one object at a time, so the actual
+ compression factor is less than it could be in theory. But it
+ means that it's all nice random-access with a simple index to
+ do "object name->location in packfile" translation.
+
+ <njs`> I'm assuming the real win for delta-ing large->small is
+ more homogeneous statistics for gzip to run over?
+
+ (You have to put the bytes in one place or another, but
+ putting them in a larger blob wins on compression)
+
+ Actually, what is the compression strategy -- each delta
+ individually gzipped, the whole file gzipped, somewhere in
+ between, no compression at all, ....?
+
+ Right.
+
+Reality IRC sets in. For example:
+
+ <pasky> I'll read the rest in the morning, I really have to go
+ sleep or there's no hope whatsoever for me at the today's
+ exam... g'nite all.
+
+Heh.
+
+ <linus> pasky: g'nite
+
+ <njs`> pasky: 'luck
+
+ <linus> Right: large->small matters exactly because of compression
+ behaviour. If it was non-compressed, it probably wouldn't make
+ any difference.
+
+ <njs`> yeah
+
+ <linus> Anyway: I'm not even trying to claim that the pack-files
+ are perfect, but they do tend to have a nice balance of
+ density vs ease-of use.
+
+Gasp! OK, saved. That's a fair Engineering trade off. Close call!
+In fact, Linus reflects on some Basic Engineering Fundamentals,
+design options, etc.
+
+ <linus> More importantly, they allow Git to still _conceptually_
+ never deal with deltas at all, and be a "whole object" store.
+
+ Which has some problems (we discussed bad huge-file
+ behaviour on the Git lists the other day), but it does mean
+ that the basic Git concepts are really really simple and
+ straightforward.
+
+ It's all been quite stable.
+
+ Which I think is very much a result of having very simple
+ basic ideas, so that there's never any confusion about what's
+ going on.
+
+ Bugs happen, but they are "simple" bugs. And bugs that
+ actually get some object store detail wrong are almost always
+ so obvious that they never go anywhere.
+
+ <njs`> Yeah.
+
+Nuff said.
+
+ <linus> Anyway. I'm off for bed. It's not 6AM here, but I've got
+ three kids, and have to get up early in the morning to send
+ them off. I need my beauty sleep.
+
+ <njs`> :-)
+
+ <njs`> appreciate the infodump, I really was failing to find the
+ details on Git packs :-)
+
+And now you know the rest of the story.
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
new file mode 100644
index 0000000000..c59ac9936a
--- /dev/null
+++ b/Documentation/technical/pack-protocol.txt
@@ -0,0 +1,607 @@
+Packfile transfer protocols
+===========================
+
+Git supports transferring data in packfiles over the ssh://, git://, http:// and
+file:// transports. There exist two sets of protocols, one for pushing
+data from a client to a server and another for fetching data from a
+server to a client. The three transports (ssh, git, file) use the same
+protocol to transfer data. http is documented in http-protocol.txt.
+
+The processes invoked in the canonical Git implementation are 'upload-pack'
+on the server side and 'fetch-pack' on the client side for fetching data;
+then 'receive-pack' on the server and 'send-pack' on the client for pushing
+data. The protocol functions to have a server tell a client what is
+currently on the server, then for the two to negotiate the smallest amount
+of data to send in order to fully update one or the other.
+
+pkt-line Format
+---------------
+
+The descriptions below build on the pkt-line format described in
+protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless
+otherwise noted the usual pkt-line LF rules apply: the sender SHOULD
+include a LF, but the receiver MUST NOT complain if it is not present.
+
+Transports
+----------
+There are three transports over which the packfile protocol is
+initiated. The Git transport is a simple, unauthenticated server that
+takes the command (almost always 'upload-pack', though Git
+servers can be configured to be globally writable, in which 'receive-
+pack' initiation is also allowed) with which the client wishes to
+communicate and executes it and connects it to the requesting
+process.
+
+In the SSH transport, the client just runs the 'upload-pack'
+or 'receive-pack' process on the server over the SSH protocol and then
+communicates with that invoked process over the SSH connection.
+
+The file:// transport runs the 'upload-pack' or 'receive-pack'
+process locally and communicates with it over a pipe.
+
+Git Transport
+-------------
+
+The Git transport starts off by sending the command and repository
+on the wire using the pkt-line format, followed by a NUL byte and a
+hostname parameter, terminated by a NUL byte.
+
+ 0032git-upload-pack /project.git\0host=myserver.com\0
+
+--
+ git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
+ request-command = "git-upload-pack" / "git-receive-pack" /
+ "git-upload-archive" ; case sensitive
+ pathname = *( %x01-ff ) ; exclude NUL
+ host-parameter = "host=" hostname [ ":" port ]
+--
+
+Only host-parameter is allowed in the git-proto-request. Clients
+MUST NOT attempt to send additional parameters. It is used for the
+git-daemon name based virtual hosting. See --interpolated-path
+option to git daemon, with the %H/%CH format characters.
+
+Basically what the Git client is doing to connect to an 'upload-pack'
+process on the server side over the Git protocol is this:
+
+ $ echo -e -n \
+ "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
+ nc -v example.com 9418
+
+If the server refuses the request for some reasons, it could abort
+gracefully with an error message.
+
+----
+ error-line = PKT-LINE("ERR" SP explanation-text)
+----
+
+
+SSH Transport
+-------------
+
+Initiating the upload-pack or receive-pack processes over SSH is
+executing the binary on the server via SSH remote execution.
+It is basically equivalent to running this:
+
+ $ ssh git.example.com "git-upload-pack '/project.git'"
+
+For a server to support Git pushing and pulling for a given user over
+SSH, that user needs to be able to execute one or both of those
+commands via the SSH shell that they are provided on login. On some
+systems, that shell access is limited to only being able to run those
+two commands, or even just one of them.
+
+In an ssh:// format URI, it's absolute in the URI, so the '/' after
+the host name (or port number) is sent as an argument, which is then
+read by the remote git-upload-pack exactly as is, so it's effectively
+an absolute path in the remote filesystem.
+
+ git clone ssh://user@example.com/project.git
+ |
+ v
+ ssh user@example.com "git-upload-pack '/project.git'"
+
+In a "user@host:path" format URI, its relative to the user's home
+directory, because the Git client will run:
+
+ git clone user@example.com:project.git
+ |
+ v
+ ssh user@example.com "git-upload-pack 'project.git'"
+
+The exception is if a '~' is used, in which case
+we execute it without the leading '/'.
+
+ ssh://user@example.com/~alice/project.git,
+ |
+ v
+ ssh user@example.com "git-upload-pack '~alice/project.git'"
+
+A few things to remember here:
+
+- The "command name" is spelled with dash (e.g. git-upload-pack), but
+ this can be overridden by the client;
+
+- The repository path is always quoted with single quotes.
+
+Fetching Data From a Server
+---------------------------
+
+When one Git repository wants to get data that a second repository
+has, the first can 'fetch' from the second. This operation determines
+what data the server has that the client does not then streams that
+data down to the client in packfile format.
+
+
+Reference Discovery
+-------------------
+
+When the client initially connects the server will immediately respond
+with a listing of each reference it has (all branches and tags) along
+with the object name that each reference currently points to.
+
+ $ echo -e -n "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
+ nc -v example.com 9418
+ 00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack
+ side-band side-band-64k ofs-delta shallow no-progress include-tag
+ 00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration
+ 003f7217a7c7e582c46cec22a130adf4b9d7d950fba0 refs/heads/master
+ 003cb88d2441cac0977faf98efc80305012112238d9d refs/tags/v0.9
+ 003c525128480b96c89e6418b1e40909bf6c5b2d580f refs/tags/v1.0
+ 003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{}
+ 0000
+
+The returned response is a pkt-line stream describing each ref and
+its current value. The stream MUST be sorted by name according to
+the C locale ordering.
+
+If HEAD is a valid ref, HEAD MUST appear as the first advertised
+ref. If HEAD is not a valid ref, HEAD MUST NOT appear in the
+advertisement list at all, but other refs may still appear.
+
+The stream MUST include capability declarations behind a NUL on the
+first ref. The peeled value of a ref (that is "ref^{}") MUST be
+immediately after the ref itself, if presented. A conforming server
+MUST peel the ref if it's an annotated tag.
+
+----
+ advertised-refs = (no-refs / list-of-refs)
+ *shallow
+ flush-pkt
+
+ no-refs = PKT-LINE(zero-id SP "capabilities^{}"
+ NUL capability-list)
+
+ list-of-refs = first-ref *other-ref
+ first-ref = PKT-LINE(obj-id SP refname
+ NUL capability-list)
+
+ other-ref = PKT-LINE(other-tip / other-peeled)
+ other-tip = obj-id SP refname
+ other-peeled = obj-id SP refname "^{}"
+
+ shallow = PKT-LINE("shallow" SP obj-id)
+
+ capability-list = capability *(SP capability)
+ capability = 1*(LC_ALPHA / DIGIT / "-" / "_")
+ LC_ALPHA = %x61-7A
+----
+
+Server and client MUST use lowercase for obj-id, both MUST treat obj-id
+as case-insensitive.
+
+See protocol-capabilities.txt for a list of allowed server capabilities
+and descriptions.
+
+Packfile Negotiation
+--------------------
+After reference and capabilities discovery, the client can decide to
+terminate the connection by sending a flush-pkt, telling the server it can
+now gracefully terminate, and disconnect, when it does not need any pack
+data. This can happen with the ls-remote command, and also can happen when
+the client already is up-to-date.
+
+Otherwise, it enters the negotiation phase, where the client and
+server determine what the minimal packfile necessary for transport is,
+by telling the server what objects it wants, its shallow objects
+(if any), and the maximum commit depth it wants (if any). The client
+will also send a list of the capabilities it wants to be in effect,
+out of what the server said it could do with the first 'want' line.
+
+----
+ upload-request = want-list
+ *shallow-line
+ *1depth-request
+ flush-pkt
+
+ want-list = first-want
+ *additional-want
+
+ shallow-line = PKT-LINE("shallow" SP obj-id)
+
+ depth-request = PKT-LINE("deepen" SP depth) /
+ PKT-LINE("deepen-since" SP timestamp) /
+ PKT-LINE("deepen-not" SP ref)
+
+ first-want = PKT-LINE("want" SP obj-id SP capability-list)
+ additional-want = PKT-LINE("want" SP obj-id)
+
+ depth = 1*DIGIT
+----
+
+Clients MUST send all the obj-ids it wants from the reference
+discovery phase as 'want' lines. Clients MUST send at least one
+'want' command in the request body. Clients MUST NOT mention an
+obj-id in a 'want' command which did not appear in the response
+obtained through ref discovery.
+
+The client MUST write all obj-ids which it only has shallow copies
+of (meaning that it does not have the parents of a commit) as
+'shallow' lines so that the server is aware of the limitations of
+the client's history.
+
+The client now sends the maximum commit history depth it wants for
+this transaction, which is the number of commits it wants from the
+tip of the history, if any, as a 'deepen' line. A depth of 0 is the
+same as not making a depth request. The client does not want to receive
+any commits beyond this depth, nor does it want objects needed only to
+complete those commits. Commits whose parents are not received as a
+result are defined as shallow and marked as such in the server. This
+information is sent back to the client in the next step.
+
+Once all the 'want's and 'shallow's (and optional 'deepen') are
+transferred, clients MUST send a flush-pkt, to tell the server side
+that it is done sending the list.
+
+Otherwise, if the client sent a positive depth request, the server
+will determine which commits will and will not be shallow and
+send this information to the client. If the client did not request
+a positive depth, this step is skipped.
+
+----
+ shallow-update = *shallow-line
+ *unshallow-line
+ flush-pkt
+
+ shallow-line = PKT-LINE("shallow" SP obj-id)
+
+ unshallow-line = PKT-LINE("unshallow" SP obj-id)
+----
+
+If the client has requested a positive depth, the server will compute
+the set of commits which are no deeper than the desired depth. The set
+of commits start at the client's wants.
+
+The server writes 'shallow' lines for each
+commit whose parents will not be sent as a result. The server writes
+an 'unshallow' line for each commit which the client has indicated is
+shallow, but is no longer shallow at the currently requested depth
+(that is, its parents will now be sent). The server MUST NOT mark
+as unshallow anything which the client has not indicated was shallow.
+
+Now the client will send a list of the obj-ids it has using 'have'
+lines, so the server can make a packfile that only contains the objects
+that the client needs. In multi_ack mode, the canonical implementation
+will send up to 32 of these at a time, then will send a flush-pkt. The
+canonical implementation will skip ahead and send the next 32 immediately,
+so that there is always a block of 32 "in-flight on the wire" at a time.
+
+----
+ upload-haves = have-list
+ compute-end
+
+ have-list = *have-line
+ have-line = PKT-LINE("have" SP obj-id)
+ compute-end = flush-pkt / PKT-LINE("done")
+----
+
+If the server reads 'have' lines, it then will respond by ACKing any
+of the obj-ids the client said it had that the server also has. The
+server will ACK obj-ids differently depending on which ack mode is
+chosen by the client.
+
+In multi_ack mode:
+
+ * the server will respond with 'ACK obj-id continue' for any common
+ commits.
+
+ * once the server has found an acceptable common base commit and is
+ ready to make a packfile, it will blindly ACK all 'have' obj-ids
+ back to the client.
+
+ * the server will then send a 'NAK' and then wait for another response
+ from the client - either a 'done' or another list of 'have' lines.
+
+In multi_ack_detailed mode:
+
+ * the server will differentiate the ACKs where it is signaling
+ that it is ready to send data with 'ACK obj-id ready' lines, and
+ signals the identified common commits with 'ACK obj-id common' lines.
+
+Without either multi_ack or multi_ack_detailed:
+
+ * upload-pack sends "ACK obj-id" on the first common object it finds.
+ After that it says nothing until the client gives it a "done".
+
+ * upload-pack sends "NAK" on a flush-pkt if no common object
+ has been found yet. If one has been found, and thus an ACK
+ was already sent, it's silent on the flush-pkt.
+
+After the client has gotten enough ACK responses that it can determine
+that the server has enough information to send an efficient packfile
+(in the canonical implementation, this is determined when it has received
+enough ACKs that it can color everything left in the --date-order queue
+as common with the server, or the --date-order queue is empty), or the
+client determines that it wants to give up (in the canonical implementation,
+this is determined when the client sends 256 'have' lines without getting
+any of them ACKed by the server - meaning there is nothing in common and
+the server should just send all of its objects), then the client will send
+a 'done' command. The 'done' command signals to the server that the client
+is ready to receive its packfile data.
+
+However, the 256 limit *only* turns on in the canonical client
+implementation if we have received at least one "ACK %s continue"
+during a prior round. This helps to ensure that at least one common
+ancestor is found before we give up entirely.
+
+Once the 'done' line is read from the client, the server will either
+send a final 'ACK obj-id' or it will send a 'NAK'. 'obj-id' is the object
+name of the last commit determined to be common. The server only sends
+ACK after 'done' if there is at least one common base and multi_ack or
+multi_ack_detailed is enabled. The server always sends NAK after 'done'
+if there is no common base found.
+
+Then the server will start sending its packfile data.
+
+----
+ server-response = *ack_multi ack / nak
+ ack_multi = PKT-LINE("ACK" SP obj-id ack_status)
+ ack_status = "continue" / "common" / "ready"
+ ack = PKT-LINE("ACK" SP obj-id)
+ nak = PKT-LINE("NAK")
+----
+
+A simple clone may look like this (with no 'have' lines):
+
+----
+ C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \
+ side-band-64k ofs-delta\n
+ C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
+ C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
+ C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
+ C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n
+ C: 0000
+ C: 0009done\n
+
+ S: 0008NAK\n
+ S: [PACKFILE]
+----
+
+An incremental update (fetch) response might look like this:
+
+----
+ C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \
+ side-band-64k ofs-delta\n
+ C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
+ C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
+ C: 0000
+ C: 0032have 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
+ C: [30 more have lines]
+ C: 0032have 74730d410fcb6603ace96f1dc55ea6196122532d\n
+ C: 0000
+
+ S: 003aACK 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 continue\n
+ S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d continue\n
+ S: 0008NAK\n
+
+ C: 0009done\n
+
+ S: 0031ACK 74730d410fcb6603ace96f1dc55ea6196122532d\n
+ S: [PACKFILE]
+----
+
+
+Packfile Data
+-------------
+
+Now that the client and server have finished negotiation about what
+the minimal amount of data that needs to be sent to the client is, the server
+will construct and send the required data in packfile format.
+
+See pack-format.txt for what the packfile itself actually looks like.
+
+If 'side-band' or 'side-band-64k' capabilities have been specified by
+the client, the server will send the packfile data multiplexed.
+
+Each packet starting with the packet-line length of the amount of data
+that follows, followed by a single byte specifying the sideband the
+following data is coming in on.
+
+In 'side-band' mode, it will send up to 999 data bytes plus 1 control
+code, for a total of up to 1000 bytes in a pkt-line. In 'side-band-64k'
+mode it will send up to 65519 data bytes plus 1 control code, for a
+total of up to 65520 bytes in a pkt-line.
+
+The sideband byte will be a '1', '2' or a '3'. Sideband '1' will contain
+packfile data, sideband '2' will be used for progress information that the
+client will generally print to stderr and sideband '3' is used for error
+information.
+
+If no 'side-band' capability was specified, the server will stream the
+entire packfile without multiplexing.
+
+
+Pushing Data To a Server
+------------------------
+
+Pushing data to a server will invoke the 'receive-pack' process on the
+server, which will allow the client to tell it which references it should
+update and then send all the data the server will need for those new
+references to be complete. Once all the data is received and validated,
+the server will then update its references to what the client specified.
+
+Authentication
+--------------
+
+The protocol itself contains no authentication mechanisms. That is to be
+handled by the transport, such as SSH, before the 'receive-pack' process is
+invoked. If 'receive-pack' is configured over the Git transport, those
+repositories will be writable by anyone who can access that port (9418) as
+that transport is unauthenticated.
+
+Reference Discovery
+-------------------
+
+The reference discovery phase is done nearly the same way as it is in the
+fetching protocol. Each reference obj-id and name on the server is sent
+in packet-line format to the client, followed by a flush-pkt. The only
+real difference is that the capability listing is different - the only
+possible values are 'report-status', 'delete-refs', 'ofs-delta' and
+'push-options'.
+
+Reference Update Request and Packfile Transfer
+----------------------------------------------
+
+Once the client knows what references the server is at, it can send a
+list of reference update requests. For each reference on the server
+that it wants to update, it sends a line listing the obj-id currently on
+the server, the obj-id the client would like to update it to and the name
+of the reference.
+
+This list is followed by a flush-pkt. Then the push options are transmitted
+one per packet followed by another flush-pkt. After that the packfile that
+should contain all the objects that the server will need to complete the new
+references will be sent.
+
+----
+ update-request = *shallow ( command-list | push-cert ) [packfile]
+
+ shallow = PKT-LINE("shallow" SP obj-id)
+
+ command-list = PKT-LINE(command NUL capability-list)
+ *PKT-LINE(command)
+ flush-pkt
+
+ command = create / delete / update
+ create = zero-id SP new-id SP name
+ delete = old-id SP zero-id SP name
+ update = old-id SP new-id SP name
+
+ old-id = obj-id
+ new-id = obj-id
+
+ push-cert = PKT-LINE("push-cert" NUL capability-list LF)
+ PKT-LINE("certificate version 0.1" LF)
+ PKT-LINE("pusher" SP ident LF)
+ PKT-LINE("pushee" SP url LF)
+ PKT-LINE("nonce" SP nonce LF)
+ PKT-LINE(LF)
+ *PKT-LINE(command LF)
+ *PKT-LINE(gpg-signature-lines LF)
+ PKT-LINE("push-cert-end" LF)
+
+ packfile = "PACK" 28*(OCTET)
+----
+
+If the receiving end does not support delete-refs, the sending end MUST
+NOT ask for delete command.
+
+If the receiving end does not support push-cert, the sending end
+MUST NOT send a push-cert command. When a push-cert command is
+sent, command-list MUST NOT be sent; the commands recorded in the
+push certificate is used instead.
+
+The packfile MUST NOT be sent if the only command used is 'delete'.
+
+A packfile MUST be sent if either create or update command is used,
+even if the server already has all the necessary objects. In this
+case the client MUST send an empty packfile. The only time this
+is likely to happen is if the client is creating
+a new branch or a tag that points to an existing obj-id.
+
+The server will receive the packfile, unpack it, then validate each
+reference that is being updated that it hasn't changed while the request
+was being processed (the obj-id is still the same as the old-id), and
+it will run any update hooks to make sure that the update is acceptable.
+If all of that is fine, the server will then update the references.
+
+Push Certificate
+----------------
+
+A push certificate begins with a set of header lines. After the
+header and an empty line, the protocol commands follow, one per
+line. Note that the trailing LF in push-cert PKT-LINEs is _not_
+optional; it must be present.
+
+Currently, the following header fields are defined:
+
+`pusher` ident::
+ Identify the GPG key in "Human Readable Name <email@address>"
+ format.
+
+`pushee` url::
+ The repository URL (anonymized, if the URL contains
+ authentication material) the user who ran `git push`
+ intended to push into.
+
+`nonce` nonce::
+ The 'nonce' string the receiving repository asked the
+ pushing user to include in the certificate, to prevent
+ replay attacks.
+
+The GPG signature lines are a detached signature for the contents
+recorded in the push certificate before the signature block begins.
+The detached signature is used to certify that the commands were
+given by the pusher, who must be the signer.
+
+Report Status
+-------------
+
+After receiving the pack data from the sender, the receiver sends a
+report if 'report-status' capability is in effect.
+It is a short listing of what happened in that update. It will first
+list the status of the packfile unpacking as either 'unpack ok' or
+'unpack [error]'. Then it will list the status for each of the references
+that it tried to update. Each line is either 'ok [refname]' if the
+update was successful, or 'ng [refname] [error]' if the update was not.
+
+----
+ report-status = unpack-status
+ 1*(command-status)
+ flush-pkt
+
+ unpack-status = PKT-LINE("unpack" SP unpack-result)
+ unpack-result = "ok" / error-msg
+
+ command-status = command-ok / command-fail
+ command-ok = PKT-LINE("ok" SP refname)
+ command-fail = PKT-LINE("ng" SP refname SP error-msg)
+
+ error-msg = 1*(OCTECT) ; where not "ok"
+----
+
+Updates can be unsuccessful for a number of reasons. The reference can have
+changed since the reference discovery phase was originally sent, meaning
+someone pushed in the meantime. The reference being pushed could be a
+non-fast-forward reference and the update hooks or configuration could be
+set to not allow that, etc. Also, some references can be updated while others
+can be rejected.
+
+An example client/server communication might look like this:
+
+----
+ S: 007c74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n
+ S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n
+ S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n
+ S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n
+ S: 0000
+
+ C: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n
+ C: 003e74730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n
+ C: 0000
+ C: [PACKDATA]
+
+ S: 000eunpack ok\n
+ S: 0018ok refs/heads/debug\n
+ S: 002ang refs/heads/master non-fast-forward\n
+----
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
new file mode 100644
index 0000000000..26dcc6f502
--- /dev/null
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -0,0 +1,311 @@
+Git Protocol Capabilities
+=========================
+
+Servers SHOULD support all capabilities defined in this document.
+
+On the very first line of the initial server response of either
+receive-pack and upload-pack the first reference is followed by
+a NUL byte and then a list of space delimited server capabilities.
+These allow the server to declare what it can and cannot support
+to the client.
+
+Client will then send a space separated list of capabilities it wants
+to be in effect. The client MUST NOT ask for capabilities the server
+did not say it supports.
+
+Server MUST diagnose and abort if capabilities it does not understand
+was sent. Server MUST NOT ignore capabilities that client requested
+and server advertised. As a consequence of these rules, server MUST
+NOT advertise capabilities it does not understand.
+
+The 'atomic', 'report-status', 'delete-refs', 'quiet', and 'push-cert'
+capabilities are sent and recognized by the receive-pack (push to server)
+process.
+
+The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized
+by both upload-pack and receive-pack protocols. The 'agent' capability
+may optionally be sent in both protocols.
+
+All other capabilities are only recognized by the upload-pack (fetch
+from server) process.
+
+multi_ack
+---------
+
+The 'multi_ack' capability allows the server to return "ACK obj-id
+continue" as soon as it finds a commit that it can use as a common
+base, between the client's wants and the client's have set.
+
+By sending this early, the server can potentially head off the client
+from walking any further down that particular branch of the client's
+repository history. The client may still need to walk down other
+branches, sending have lines for those, until the server has a
+complete cut across the DAG, or the client has said "done".
+
+Without multi_ack, a client sends have lines in --date-order until
+the server has found a common base. That means the client will send
+have lines that are already known by the server to be common, because
+they overlap in time with another branch that the server hasn't found
+a common base on yet.
+
+For example suppose the client has commits in caps that the server
+doesn't and the server has commits in lower case that the client
+doesn't, as in the following diagram:
+
+ +---- u ---------------------- x
+ / +----- y
+ / /
+ a -- b -- c -- d -- E -- F
+ \
+ +--- Q -- R -- S
+
+If the client wants x,y and starts out by saying have F,S, the server
+doesn't know what F,S is. Eventually the client says "have d" and
+the server sends "ACK d continue" to let the client know to stop
+walking down that line (so don't send c-b-a), but it's not done yet,
+it needs a base for x. The client keeps going with S-R-Q, until a
+gets reached, at which point the server has a clear base and it all
+ends.
+
+Without multi_ack the client would have sent that c-b-a chain anyway,
+interleaved with S-R-Q.
+
+multi_ack_detailed
+------------------
+This is an extension of multi_ack that permits client to better
+understand the server's in-memory state. See pack-protocol.txt,
+section "Packfile Negotiation" for more information.
+
+no-done
+-------
+This capability should only be used with the smart HTTP protocol. If
+multi_ack_detailed and no-done are both present, then the sender is
+free to immediately send a pack following its first "ACK obj-id ready"
+message.
+
+Without no-done in the smart HTTP protocol, the server session would
+end and the client has to make another trip to send "done" before
+the server can send the pack. no-done removes the last round and
+thus slightly reduces latency.
+
+thin-pack
+---------
+
+A thin pack is one with deltas which reference base objects not
+contained within the pack (but are known to exist at the receiving
+end). This can reduce the network traffic significantly, but it
+requires the receiving end to know how to "thicken" these packs by
+adding the missing bases to the pack.
+
+The upload-pack server advertises 'thin-pack' when it can generate
+and send a thin pack. A client requests the 'thin-pack' capability
+when it understands how to "thicken" it, notifying the server that
+it can receive such a pack. A client MUST NOT request the
+'thin-pack' capability if it cannot turn a thin pack into a
+self-contained pack.
+
+Receive-pack, on the other hand, is assumed by default to be able to
+handle thin packs, but can ask the client not to use the feature by
+advertising the 'no-thin' capability. A client MUST NOT send a thin
+pack if the server advertises the 'no-thin' capability.
+
+The reasons for this asymmetry are historical. The receive-pack
+program did not exist until after the invention of thin packs, so
+historically the reference implementation of receive-pack always
+understood thin packs. Adding 'no-thin' later allowed receive-pack
+to disable the feature in a backwards-compatible manner.
+
+
+side-band, side-band-64k
+------------------------
+
+This capability means that server can send, and client understand multiplexed
+progress reports and error info interleaved with the packfile itself.
+
+These two options are mutually exclusive. A modern client always
+favors 'side-band-64k'.
+
+Either mode indicates that the packfile data will be streamed broken
+up into packets of up to either 1000 bytes in the case of 'side_band',
+or 65520 bytes in the case of 'side_band_64k'. Each packet is made up
+of a leading 4-byte pkt-line length of how much data is in the packet,
+followed by a 1-byte stream code, followed by the actual data.
+
+The stream code can be one of:
+
+ 1 - pack data
+ 2 - progress messages
+ 3 - fatal error message just before stream aborts
+
+The "side-band-64k" capability came about as a way for newer clients
+that can handle much larger packets to request packets that are
+actually crammed nearly full, while maintaining backward compatibility
+for the older clients.
+
+Further, with side-band and its up to 1000-byte messages, it's actually
+999 bytes of payload and 1 byte for the stream code. With side-band-64k,
+same deal, you have up to 65519 bytes of data and 1 byte for the stream
+code.
+
+The client MUST send only maximum of one of "side-band" and "side-
+band-64k". Server MUST diagnose it as an error if client requests
+both.
+
+ofs-delta
+---------
+
+Server can send, and client understand PACKv2 with delta referring to
+its base by position in pack rather than by an obj-id. That is, they can
+send/read OBJ_OFS_DELTA (aka type 6) in a packfile.
+
+agent
+-----
+
+The server may optionally send a capability of the form `agent=X` to
+notify the client that the server is running version `X`. The client may
+optionally return its own agent string by responding with an `agent=Y`
+capability (but it MUST NOT do so if the server did not mention the
+agent capability). The `X` and `Y` strings may contain any printable
+ASCII characters except space (i.e., the byte range 32 < x < 127), and
+are typically of the form "package/version" (e.g., "git/1.8.3.1"). The
+agent strings are purely informative for statistics and debugging
+purposes, and MUST NOT be used to programmatically assume the presence
+or absence of particular features.
+
+shallow
+-------
+
+This capability adds "deepen", "shallow" and "unshallow" commands to
+the fetch-pack/upload-pack protocol so clients can request shallow
+clones.
+
+deepen-since
+------------
+
+This capability adds "deepen-since" command to fetch-pack/upload-pack
+protocol so the client can request shallow clones that are cut at a
+specific time, instead of depth. Internally it's equivalent of doing
+"rev-list --max-age=<timestamp>" on the server side. "deepen-since"
+cannot be used with "deepen".
+
+deepen-not
+----------
+
+This capability adds "deepen-not" command to fetch-pack/upload-pack
+protocol so the client can request shallow clones that are cut at a
+specific revision, instead of depth. Internally it's equivalent of
+doing "rev-list --not <rev>" on the server side. "deepen-not"
+cannot be used with "deepen", but can be used with "deepen-since".
+
+deepen-relative
+---------------
+
+If this capability is requested by the client, the semantics of
+"deepen" command is changed. The "depth" argument is the depth from
+the current shallow boundary, instead of the depth from remote refs.
+
+no-progress
+-----------
+
+The client was started with "git clone -q" or something, and doesn't
+want that side band 2. Basically the client just says "I do not
+wish to receive stream 2 on sideband, so do not send it to me, and if
+you did, I will drop it on the floor anyway". However, the sideband
+channel 3 is still used for error responses.
+
+include-tag
+-----------
+
+The 'include-tag' capability is about sending annotated tags if we are
+sending objects they point to. If we pack an object to the client, and
+a tag object points exactly at that object, we pack the tag object too.
+In general this allows a client to get all new annotated tags when it
+fetches a branch, in a single network connection.
+
+Clients MAY always send include-tag, hardcoding it into a request when
+the server advertises this capability. The decision for a client to
+request include-tag only has to do with the client's desires for tag
+data, whether or not a server had advertised objects in the
+refs/tags/* namespace.
+
+Servers MUST pack the tags if their referrant is packed and the client
+has requested include-tags.
+
+Clients MUST be prepared for the case where a server has ignored
+include-tag and has not actually sent tags in the pack. In such
+cases the client SHOULD issue a subsequent fetch to acquire the tags
+that include-tag would have otherwise given the client.
+
+The server SHOULD send include-tag, if it supports it, regardless
+of whether or not there are tags available.
+
+report-status
+-------------
+
+The receive-pack process can receive a 'report-status' capability,
+which tells it that the client wants a report of what happened after
+a packfile upload and reference update. If the pushing client requests
+this capability, after unpacking and updating references the server
+will respond with whether the packfile unpacked successfully and if
+each reference was updated successfully. If any of those were not
+successful, it will send back an error message. See pack-protocol.txt
+for example messages.
+
+delete-refs
+-----------
+
+If the server sends back the 'delete-refs' capability, it means that
+it is capable of accepting a zero-id value as the target
+value of a reference update. It is not sent back by the client, it
+simply informs the client that it can be sent zero-id values
+to delete references.
+
+quiet
+-----
+
+If the receive-pack server advertises the 'quiet' capability, it is
+capable of silencing human-readable progress output which otherwise may
+be shown when processing the received pack. A send-pack client should
+respond with the 'quiet' capability to suppress server-side progress
+reporting if the local progress reporting is also being suppressed
+(e.g., via `push -q`, or if stderr does not go to a tty).
+
+atomic
+------
+
+If the server sends the 'atomic' capability it is capable of accepting
+atomic pushes. If the pushing client requests this capability, the server
+will update the refs in one atomic transaction. Either all refs are
+updated or none.
+
+push-options
+------------
+
+If the server sends the 'push-options' capability it is able to accept
+push options after the update commands have been sent, but before the
+packfile is streamed. If the pushing client requests this capability,
+the server will pass the options to the pre- and post- receive hooks
+that process this push request.
+
+allow-tip-sha1-in-want
+----------------------
+
+If the upload-pack server advertises this capability, fetch-pack may
+send "want" lines with SHA-1s that exist at the server but are not
+advertised by upload-pack.
+
+allow-reachable-sha1-in-want
+----------------------------
+
+If the upload-pack server advertises this capability, fetch-pack may
+send "want" lines with SHA-1s that exist at the server but are not
+advertised by upload-pack.
+
+push-cert=<nonce>
+-----------------
+
+The receive-pack server that advertises this capability is willing
+to accept a signed push certificate, and asks the <nonce> to be
+included in the push certificate. A send-pack client MUST NOT
+send a push-cert packet unless the receive-pack server advertises
+this capability.
diff --git a/Documentation/technical/protocol-common.txt b/Documentation/technical/protocol-common.txt
new file mode 100644
index 0000000000..ecedb34bba
--- /dev/null
+++ b/Documentation/technical/protocol-common.txt
@@ -0,0 +1,99 @@
+Documentation Common to Pack and Http Protocols
+===============================================
+
+ABNF Notation
+-------------
+
+ABNF notation as described by RFC 5234 is used within the protocol documents,
+except the following replacement core rules are used:
+----
+ HEXDIG = DIGIT / "a" / "b" / "c" / "d" / "e" / "f"
+----
+
+We also define the following common rules:
+----
+ NUL = %x00
+ zero-id = 40*"0"
+ obj-id = 40*(HEXDIGIT)
+
+ refname = "HEAD"
+ refname /= "refs/" <see discussion below>
+----
+
+A refname is a hierarchical octet string beginning with "refs/" and
+not violating the 'git-check-ref-format' command's validation rules.
+More specifically, they:
+
+. They can include slash `/` for hierarchical (directory)
+ grouping, but no slash-separated component can begin with a
+ dot `.`.
+
+. They must contain at least one `/`. This enforces the presence of a
+ category like `heads/`, `tags/` etc. but the actual names are not
+ restricted.
+
+. They cannot have two consecutive dots `..` anywhere.
+
+. They cannot have ASCII control characters (i.e. bytes whose
+ values are lower than \040, or \177 `DEL`), space, tilde `~`,
+ caret `^`, colon `:`, question-mark `?`, asterisk `*`,
+ or open bracket `[` anywhere.
+
+. They cannot end with a slash `/` or a dot `.`.
+
+. They cannot end with the sequence `.lock`.
+
+. They cannot contain a sequence `@{`.
+
+. They cannot contain a `\\`.
+
+
+pkt-line Format
+---------------
+
+Much (but not all) of the payload is described around pkt-lines.
+
+A pkt-line is a variable length binary string. The first four bytes
+of the line, the pkt-len, indicates the total length of the line,
+in hexadecimal. The pkt-len includes the 4 bytes used to contain
+the length's hexadecimal representation.
+
+A pkt-line MAY contain binary data, so implementors MUST ensure
+pkt-line parsing/formatting routines are 8-bit clean.
+
+A non-binary line SHOULD BE terminated by an LF, which if present
+MUST be included in the total length. Receivers MUST treat pkt-lines
+with non-binary data the same whether or not they contain the trailing
+LF (stripping the LF if present, and not complaining when it is
+missing).
+
+The maximum length of a pkt-line's data component is 65516 bytes.
+Implementations MUST NOT send pkt-line whose length exceeds 65520
+(65516 bytes of payload + 4 bytes of length data).
+
+Implementations SHOULD NOT send an empty pkt-line ("0004").
+
+A pkt-line with a length field of 0 ("0000"), called a flush-pkt,
+is a special case and MUST be handled differently than an empty
+pkt-line ("0004").
+
+----
+ pkt-line = data-pkt / flush-pkt
+
+ data-pkt = pkt-len pkt-payload
+ pkt-len = 4*(HEXDIG)
+ pkt-payload = (pkt-len - 4)*(OCTET)
+
+ flush-pkt = "0000"
+----
+
+Examples (as C-style strings):
+
+----
+ pkt-line actual value
+ ---------------------------------
+ "0006a\n" "a\n"
+ "0005a" "a"
+ "000bfoobar\n" "foobar\n"
+ "0004" ""
+----
diff --git a/Documentation/technical/racy-git.txt b/Documentation/technical/racy-git.txt
new file mode 100644
index 0000000000..4a8be4d144
--- /dev/null
+++ b/Documentation/technical/racy-git.txt
@@ -0,0 +1,201 @@
+Use of index and Racy Git problem
+=================================
+
+Background
+----------
+
+The index is one of the most important data structures in Git.
+It represents a virtual working tree state by recording list of
+paths and their object names and serves as a staging area to
+write out the next tree object to be committed. The state is
+"virtual" in the sense that it does not necessarily have to, and
+often does not, match the files in the working tree.
+
+There are cases Git needs to examine the differences between the
+virtual working tree state in the index and the files in the
+working tree. The most obvious case is when the user asks `git
+diff` (or its low level implementation, `git diff-files`) or
+`git-ls-files --modified`. In addition, Git internally checks
+if the files in the working tree are different from what are
+recorded in the index to avoid stomping on local changes in them
+during patch application, switching branches, and merging.
+
+In order to speed up this comparison between the files in the
+working tree and the index entries, the index entries record the
+information obtained from the filesystem via `lstat(2)` system
+call when they were last updated. When checking if they differ,
+Git first runs `lstat(2)` on the files and compares the result
+with this information (this is what was originally done by the
+`ce_match_stat()` function, but the current code does it in
+`ce_match_stat_basic()` function). If some of these "cached
+stat information" fields do not match, Git can tell that the
+files are modified without even looking at their contents.
+
+Note: not all members in `struct stat` obtained via `lstat(2)`
+are used for this comparison. For example, `st_atime` obviously
+is not useful. Currently, Git compares the file type (regular
+files vs symbolic links) and executable bits (only for regular
+files) from `st_mode` member, `st_mtime` and `st_ctime`
+timestamps, `st_uid`, `st_gid`, `st_ino`, and `st_size` members.
+With a `USE_STDEV` compile-time option, `st_dev` is also
+compared, but this is not enabled by default because this member
+is not stable on network filesystems. With `USE_NSEC`
+compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec`
+members are also compared. On Linux, this is not enabled by default
+because in-core timestamps can have finer granularity than
+on-disk timestamps, resulting in meaningless changes when an
+inode is evicted from the inode cache. See commit 8ce13b0
+of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
+([PATCH] Sync in core time granularity with filesystems,
+2005-01-04). This patch is included in kernel 2.6.11 and newer, but
+only fixes the issue for file systems with exactly 1 ns or 1 s
+resolution. Other file systems are still broken in current Linux
+kernels (e.g. CEPH, CIFS, NTFS, UDF), see
+https://lkml.org/lkml/2015/6/9/714
+
+Racy Git
+--------
+
+There is one slight problem with the optimization based on the
+cached stat information. Consider this sequence:
+
+ : modify 'foo'
+ $ git update-index 'foo'
+ : modify 'foo' again, in-place, without changing its size
+
+The first `update-index` computes the object name of the
+contents of file `foo` and updates the index entry for `foo`
+along with the `struct stat` information. If the modification
+that follows it happens very fast so that the file's `st_mtime`
+timestamp does not change, after this sequence, the cached stat
+information the index entry records still exactly match what you
+would see in the filesystem, even though the file `foo` is now
+different.
+This way, Git can incorrectly think files in the working tree
+are unmodified even though they actually are. This is called
+the "racy Git" problem (discovered by Pasky), and the entries
+that appear clean when they may not be because of this problem
+are called "racily clean".
+
+To avoid this problem, Git does two things:
+
+. When the cached stat information says the file has not been
+ modified, and the `st_mtime` is the same as (or newer than)
+ the timestamp of the index file itself (which is the time `git
+ update-index foo` finished running in the above example), it
+ also compares the contents with the object registered in the
+ index entry to make sure they match.
+
+. When the index file is updated that contains racily clean
+ entries, cached `st_size` information is truncated to zero
+ before writing a new version of the index file.
+
+Because the index file itself is written after collecting all
+the stat information from updated paths, `st_mtime` timestamp of
+it is usually the same as or newer than any of the paths the
+index contains. And no matter how quick the modification that
+follows `git update-index foo` finishes, the resulting
+`st_mtime` timestamp on `foo` cannot get a value earlier
+than the index file. Therefore, index entries that can be
+racily clean are limited to the ones that have the same
+timestamp as the index file itself.
+
+The callers that want to check if an index entry matches the
+corresponding file in the working tree continue to call
+`ce_match_stat()`, but with this change, `ce_match_stat()` uses
+`ce_modified_check_fs()` to see if racily clean ones are
+actually clean after comparing the cached stat information using
+`ce_match_stat_basic()`.
+
+The problem the latter solves is this sequence:
+
+ $ git update-index 'foo'
+ : modify 'foo' in-place without changing its size
+ : wait for enough time
+ $ git update-index 'bar'
+
+Without the latter, the timestamp of the index file gets a newer
+value, and falsely clean entry `foo` would not be caught by the
+timestamp comparison check done with the former logic anymore.
+The latter makes sure that the cached stat information for `foo`
+would never match with the file in the working tree, so later
+checks by `ce_match_stat_basic()` would report that the index entry
+does not match the file and Git does not have to fall back on more
+expensive `ce_modified_check_fs()`.
+
+
+Runtime penalty
+---------------
+
+The runtime penalty of falling back to `ce_modified_check_fs()`
+from `ce_match_stat()` can be very expensive when there are many
+racily clean entries. An obvious way to artificially create
+this situation is to give the same timestamp to all the files in
+the working tree in a large project, run `git update-index` on
+them, and give the same timestamp to the index file:
+
+ $ date >.datestamp
+ $ git ls-files | xargs touch -r .datestamp
+ $ git ls-files | git update-index --stdin
+ $ touch -r .datestamp .git/index
+
+This will make all index entries racily clean. The linux project, for
+example, there are over 20,000 files in the working tree. On my
+Athlon 64 X2 3800+, after the above:
+
+ $ /usr/bin/time git diff-files
+ 1.68user 0.54system 0:02.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
+ 0inputs+0outputs (0major+67111minor)pagefaults 0swaps
+ $ git update-index MAINTAINERS
+ $ /usr/bin/time git diff-files
+ 0.02user 0.12system 0:00.14elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
+ 0inputs+0outputs (0major+935minor)pagefaults 0swaps
+
+Running `git update-index` in the middle checked the racily
+clean entries, and left the cached `st_mtime` for all the paths
+intact because they were actually clean (so this step took about
+the same amount of time as the first `git diff-files`). After
+that, they are not racily clean anymore but are truly clean, so
+the second invocation of `git diff-files` fully took advantage
+of the cached stat information.
+
+
+Avoiding runtime penalty
+------------------------
+
+In order to avoid the above runtime penalty, post 1.4.2 Git used
+to have a code that made sure the index file
+got timestamp newer than the youngest files in the index when
+there are many young files with the same timestamp as the
+resulting index file would otherwise would have by waiting
+before finishing writing the index file out.
+
+I suspected that in practice the situation where many paths in the
+index are all racily clean was quite rare. The only code paths
+that can record recent timestamp for large number of paths are:
+
+. Initial `git add .` of a large project.
+
+. `git checkout` of a large project from an empty index into an
+ unpopulated working tree.
+
+Note: switching branches with `git checkout` keeps the cached
+stat information of existing working tree files that are the
+same between the current branch and the new branch, which are
+all older than the resulting index file, and they will not
+become racily clean. Only the files that are actually checked
+out can become racily clean.
+
+In a large project where raciness avoidance cost really matters,
+however, the initial computation of all object names in the
+index takes more than one second, and the index file is written
+out after all that happens. Therefore the timestamp of the
+index file will be more than one seconds later than the
+youngest file in the working tree. This means that in these
+cases there actually will not be any racily clean entry in
+the resulting index.
+
+Based on this discussion, the current code does not use the
+"workaround" to avoid the runtime penalty that does not exist in
+practice anymore. This was done with commit 0fc82cff on Aug 15,
+2006.
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
new file mode 100644
index 0000000000..00ad37986e
--- /dev/null
+++ b/Documentation/technical/repository-version.txt
@@ -0,0 +1,88 @@
+Git Repository Format Versions
+==============================
+
+Every git repository is marked with a numeric version in the
+`core.repositoryformatversion` key of its `config` file. This version
+specifies the rules for operating on the on-disk repository data. An
+implementation of git which does not understand a particular version
+advertised by an on-disk repository MUST NOT operate on that repository;
+doing so risks not only producing wrong results, but actually losing
+data.
+
+Because of this rule, version bumps should be kept to an absolute
+minimum. Instead, we generally prefer these strategies:
+
+ - bumping format version numbers of individual data files (e.g.,
+ index, packfiles, etc). This restricts the incompatibilities only to
+ those files.
+
+ - introducing new data that gracefully degrades when used by older
+ clients (e.g., pack bitmap files are ignored by older clients, which
+ simply do not take advantage of the optimization they provide).
+
+A whole-repository format version bump should only be part of a change
+that cannot be independently versioned. For instance, if one were to
+change the reachability rules for objects, or the rules for locking
+refs, that would require a bump of the repository format version.
+
+Note that this applies only to accessing the repository's disk contents
+directly. An older client which understands only format `0` may still
+connect via `git://` to a repository using format `1`, as long as the
+server process understands format `1`.
+
+The preferred strategy for rolling out a version bump (whether whole
+repository or for a single file) is to teach git to read the new format,
+and allow writing the new format with a config switch or command line
+option (for experimentation or for those who do not care about backwards
+compatibility with older gits). Then after a long period to allow the
+reading capability to become common, we may switch to writing the new
+format by default.
+
+The currently defined format versions are:
+
+Version `0`
+-----------
+
+This is the format defined by the initial version of git, including but
+not limited to the format of the repository directory, the repository
+configuration file, and the object and ref storage. Specifying the
+complete behavior of git is beyond the scope of this document.
+
+Version `1`
+-----------
+
+This format is identical to version `0`, with the following exceptions:
+
+ 1. When reading the `core.repositoryformatversion` variable, a git
+ implementation which supports version 1 MUST also read any
+ configuration keys found in the `extensions` section of the
+ configuration file.
+
+ 2. If a version-1 repository specifies any `extensions.*` keys that
+ the running git has not implemented, the operation MUST NOT
+ proceed. Similarly, if the value of any known key is not understood
+ by the implementation, the operation MUST NOT proceed.
+
+Note that if no extensions are specified in the config file, then
+`core.repositoryformatversion` SHOULD be set to `0` (setting it to `1`
+provides no benefit, and makes the repository incompatible with older
+implementations of git).
+
+This document will serve as the master list for extensions. Any
+implementation wishing to define a new extension should make a note of
+it here, in order to claim the name.
+
+The defined extensions are:
+
+`noop`
+~~~~~~
+
+This extension does not change git's behavior at all. It is useful only
+for testing format-1 compatibility.
+
+`preciousObjects`
+~~~~~~~~~~~~~~~~~
+
+When the config key `extensions.preciousObjects` is set to `true`,
+objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
+`git repack -d`).
diff --git a/Documentation/technical/send-pack-pipeline.txt b/Documentation/technical/send-pack-pipeline.txt
new file mode 100644
index 0000000000..9b5a0bc186
--- /dev/null
+++ b/Documentation/technical/send-pack-pipeline.txt
@@ -0,0 +1,63 @@
+Git-send-pack internals
+=======================
+
+Overall operation
+-----------------
+
+. Connects to the remote side and invokes git-receive-pack.
+
+. Learns what refs the remote has and what commit they point at.
+ Matches them to the refspecs we are pushing.
+
+. Checks if there are non-fast-forwards. Unlike fetch-pack,
+ the repository send-pack runs in is supposed to be a superset
+ of the recipient in fast-forward cases, so there is no need
+ for want/have exchanges, and fast-forward check can be done
+ locally. Tell the result to the other end.
+
+. Calls pack_objects() which generates a packfile and sends it
+ over to the other end.
+
+. If the remote side is new enough (v1.1.0 or later), wait for
+ the unpack and hook status from the other end.
+
+. Exit with appropriate error codes.
+
+
+Pack_objects pipeline
+---------------------
+
+This function gets one file descriptor (`fd`) which is either a
+socket (over the network) or a pipe (local). What's written to
+this fd goes to git-receive-pack to be unpacked.
+
+ send-pack ---> fd ---> receive-pack
+
+The function pack_objects creates a pipe and then forks. The
+forked child execs pack-objects with --revs to receive revision
+parameters from its standard input. This process will write the
+packfile to the other end.
+
+ send-pack
+ |
+ pack_objects() ---> fd ---> receive-pack
+ | ^ (pipe)
+ v |
+ (child)
+
+The child dup2's to arrange its standard output to go back to
+the other end, and read its standard input to come from the
+pipe. After that it exec's pack-objects. On the other hand,
+the parent process, before starting to feed the child pipeline,
+closes the reading side of the pipe and fd to receive-pack.
+
+ send-pack
+ |
+ pack_objects(parent)
+ |
+ v [0]
+ pack-objects [0] ---> receive-pack
+
+
+[jc: the pipeline was much more complex and needed documentation before
+ I understood an earlier bug, but now it is trivial and straightforward.]
diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt
new file mode 100644
index 0000000000..5183b15422
--- /dev/null
+++ b/Documentation/technical/shallow.txt
@@ -0,0 +1,58 @@
+Shallow commits
+===============
+
+.Definition
+*********************************************************
+Shallow commits do have parents, but not in the shallow
+repo, and therefore grafts are introduced pretending that
+these commits have no parents.
+*********************************************************
+
+The basic idea is to write the SHA-1s of shallow commits into
+$GIT_DIR/shallow, and handle its contents like the contents
+of $GIT_DIR/info/grafts (with the difference that shallow
+cannot contain parent information).
+
+This information is stored in a new file instead of grafts, or
+even the config, since the user should not touch that file
+at all (even throughout development of the shallow clone, it
+was never manually edited!).
+
+Each line contains exactly one SHA-1. When read, a commit_graft
+will be constructed, which has nr_parent < 0 to make it easier
+to discern from user provided grafts.
+
+Since fsck-objects relies on the library to read the objects,
+it honours shallow commits automatically.
+
+There are some unfinished ends of the whole shallow business:
+
+- maybe we have to force non-thin packs when fetching into a
+ shallow repo (ATM they are forced non-thin).
+
+- A special handling of a shallow upstream is needed. At some
+ stage, upload-pack has to check if it sends a shallow commit,
+ and it should send that information early (or fail, if the
+ client does not support shallow repositories). There is no
+ support at all for this in this patch series.
+
+- Instead of locking $GIT_DIR/shallow at the start, just
+ the timestamp of it is noted, and when it comes to writing it,
+ a check is performed if the mtime is still the same, dying if
+ it is not.
+
+- It is unclear how "push into/from a shallow repo" should behave.
+
+- If you deepen a history, you'd want to get the tags of the
+ newly stored (but older!) commits. This does not work right now.
+
+To make a shallow clone, you can call "git-clone --depth 20 repo".
+The result contains only commit chains with a length of at most 20.
+It also writes an appropriate $GIT_DIR/shallow.
+
+You can deepen a shallow repository with "git-fetch --depth 20
+repo branch", which will fetch branch from repo, but stop at depth
+20, updating $GIT_DIR/shallow.
+
+The special depth 2147483647 (or 0x7fffffff, the largest positive
+number a signed 32-bit integer can contain) means infinite depth.
diff --git a/Documentation/technical/signature-format.txt b/Documentation/technical/signature-format.txt
new file mode 100644
index 0000000000..2c9406a56a
--- /dev/null
+++ b/Documentation/technical/signature-format.txt
@@ -0,0 +1,186 @@
+Git signature format
+====================
+
+== Overview
+
+Git uses cryptographic signatures in various places, currently objects (tags,
+commits, mergetags) and transactions (pushes). In every case, the command which
+is about to create an object or transaction determines a payload from that,
+calls gpg to obtain a detached signature for the payload (`gpg -bsa`) and
+embeds the signature into the object or transaction.
+
+Signatures always begin with `-----BEGIN PGP SIGNATURE-----`
+and end with `-----END PGP SIGNATURE-----`, unless gpg is told to
+produce RFC1991 signatures which use `MESSAGE` instead of `SIGNATURE`.
+
+The signed payload and the way the signature is embedded depends
+on the type of the object resp. transaction.
+
+== Tag signatures
+
+- created by: `git tag -s`
+- payload: annotated tag object
+- embedding: append the signature to the unsigned tag object
+- example: tag `signedtag` with subject `signed tag`
+
+----
+object 04b871796dc0420f8e7561a895b52484b701d51a
+type commit
+tag signedtag
+tagger C O Mitter <committer@example.com> 1465981006 +0000
+
+signed tag
+
+signed tag message body
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1
+
+iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
+rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
+8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
+q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
+rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
+lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
+=jpXa
+-----END PGP SIGNATURE-----
+----
+
+- verify with: `git verify-tag [-v]` or `git tag -v`
+
+----
+gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
+gpg: Good signature from "Eris Discordia <discord@example.net>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg: There is no indication that the signature belongs to the owner.
+Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
+object 04b871796dc0420f8e7561a895b52484b701d51a
+type commit
+tag signedtag
+tagger C O Mitter <committer@example.com> 1465981006 +0000
+
+signed tag
+
+signed tag message body
+----
+
+== Commit signatures
+
+- created by: `git commit -S`
+- payload: commit object
+- embedding: header entry `gpgsig`
+ (content is preceded by a space)
+- example: commit with subject `signed commit`
+
+----
+tree eebfed94e75e7760540d1485c740902590a00332
+parent 04b871796dc0420f8e7561a895b52484b701d51a
+author A U Thor <author@example.com> 1465981137 +0000
+committer C O Mitter <committer@example.com> 1465981137 +0000
+gpgsig -----BEGIN PGP SIGNATURE-----
+ Version: GnuPG v1
+
+ iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
+ HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
+ DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
+ zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
+ HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
+ EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
+ =jKHM
+ -----END PGP SIGNATURE-----
+
+signed commit
+
+signed commit message body
+----
+
+- verify with: `git verify-commit [-v]` (or `git show --show-signature`)
+
+----
+gpg: Signature made Wed Jun 15 10:58:57 2016 CEST using RSA key ID B7227189
+gpg: Good signature from "Eris Discordia <discord@example.net>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg: There is no indication that the signature belongs to the owner.
+Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
+tree eebfed94e75e7760540d1485c740902590a00332
+parent 04b871796dc0420f8e7561a895b52484b701d51a
+author A U Thor <author@example.com> 1465981137 +0000
+committer C O Mitter <committer@example.com> 1465981137 +0000
+
+signed commit
+
+signed commit message body
+----
+
+== Mergetag signatures
+
+- created by: `git merge` on signed tag
+- payload/embedding: the whole signed tag object is embedded into
+ the (merge) commit object as header entry `mergetag`
+- example: merge of the signed tag `signedtag` as above
+
+----
+tree c7b1cff039a93f3600a1d18b82d26688668c7dea
+parent c33429be94b5f2d3ee9b0adad223f877f174b05d
+parent 04b871796dc0420f8e7561a895b52484b701d51a
+author A U Thor <author@example.com> 1465982009 +0000
+committer C O Mitter <committer@example.com> 1465982009 +0000
+mergetag object 04b871796dc0420f8e7561a895b52484b701d51a
+ type commit
+ tag signedtag
+ tagger C O Mitter <committer@example.com> 1465981006 +0000
+
+ signed tag
+
+ signed tag message body
+ -----BEGIN PGP SIGNATURE-----
+ Version: GnuPG v1
+
+ iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
+ rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
+ 8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
+ q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
+ rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
+ lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
+ =jpXa
+ -----END PGP SIGNATURE-----
+
+Merge tag 'signedtag' into downstream
+
+signed tag
+
+signed tag message body
+
+# gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
+# gpg: Good signature from "Eris Discordia <discord@example.net>"
+# gpg: WARNING: This key is not certified with a trusted signature!
+# gpg: There is no indication that the signature belongs to the owner.
+# Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
+----
+
+- verify with: verification is embedded in merge commit message by default,
+ alternatively with `git show --show-signature`:
+
+----
+commit 9863f0c76ff78712b6800e199a46aa56afbcbd49
+merged tag 'signedtag'
+gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189
+gpg: Good signature from "Eris Discordia <discord@example.net>"
+gpg: WARNING: This key is not certified with a trusted signature!
+gpg: There is no indication that the signature belongs to the owner.
+Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
+Merge: c33429b 04b8717
+Author: A U Thor <author@example.com>
+Date: Wed Jun 15 09:13:29 2016 +0000
+
+ Merge tag 'signedtag' into downstream
+
+ signed tag
+
+ signed tag message body
+
+ # gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189
+ # gpg: Good signature from "Eris Discordia <discord@example.net>"
+ # gpg: WARNING: This key is not certified with a trusted signature!
+ # gpg: There is no indication that the signature belongs to the owner.
+ # Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189
+----
diff --git a/Documentation/technical/trivial-merge.txt b/Documentation/technical/trivial-merge.txt
new file mode 100644
index 0000000000..c79d4a7c47
--- /dev/null
+++ b/Documentation/technical/trivial-merge.txt
@@ -0,0 +1,121 @@
+Trivial merge rules
+===================
+
+This document describes the outcomes of the trivial merge logic in read-tree.
+
+One-way merge
+-------------
+
+This replaces the index with a different tree, keeping the stat info
+for entries that don't change, and allowing -u to make the minimum
+required changes to the working tree to have it match.
+
+Entries marked '+' have stat information. Spaces marked '*' don't
+affect the result.
+
+ index tree result
+ -----------------------
+ * (empty) (empty)
+ (empty) tree tree
+ index+ tree tree
+ index+ index index+
+
+Two-way merge
+-------------
+
+It is permitted for the index to lack an entry; this does not prevent
+any case from applying.
+
+If the index exists, it is an error for it not to match either the old
+or the result.
+
+If multiple cases apply, the one used is listed first.
+
+A result which changes the index is an error if the index is not empty
+and not up-to-date.
+
+Entries marked '+' have stat information. Spaces marked '*' don't
+affect the result.
+
+ case index old new result
+ -------------------------------------
+ 0/2 (empty) * (empty) (empty)
+ 1/3 (empty) * new new
+ 4/5 index+ (empty) (empty) index+
+ 6/7 index+ (empty) index index+
+ 10 index+ index (empty) (empty)
+ 14/15 index+ old old index+
+ 18/19 index+ old index index+
+ 20 index+ index new new
+
+Three-way merge
+---------------
+
+It is permitted for the index to lack an entry; this does not prevent
+any case from applying.
+
+If the index exists, it is an error for it not to match either the
+head or (if the merge is trivial) the result.
+
+If multiple cases apply, the one used is listed first.
+
+A result of "no merge" means that index is left in stage 0, ancest in
+stage 1, head in stage 2, and remote in stage 3 (if any of these are
+empty, no entry is left for that stage). Otherwise, the given entry is
+left in stage 0, and there are no other entries.
+
+A result of "no merge" is an error if the index is not empty and not
+up-to-date.
+
+*empty* means that the tree must not have a directory-file conflict
+ with the entry.
+
+For multiple ancestors, a '+' means that this case applies even if
+only one ancestor or remote fits; a '^' means all of the ancestors
+must be the same.
+
+ case ancest head remote result
+ ----------------------------------------
+ 1 (empty)+ (empty) (empty) (empty)
+ 2ALT (empty)+ *empty* remote remote
+ 2 (empty)^ (empty) remote no merge
+ 3ALT (empty)+ head *empty* head
+ 3 (empty)^ head (empty) no merge
+ 4 (empty)^ head remote no merge
+ 5ALT * head head head
+ 6 ancest+ (empty) (empty) no merge
+ 8 ancest^ (empty) ancest no merge
+ 7 ancest+ (empty) remote no merge
+ 10 ancest^ ancest (empty) no merge
+ 9 ancest+ head (empty) no merge
+ 16 anc1/anc2 anc1 anc2 no merge
+ 13 ancest+ head ancest head
+ 14 ancest+ ancest remote remote
+ 11 ancest+ head remote no merge
+
+Only #2ALT and #3ALT use *empty*, because these are the only cases
+where there can be conflicts that didn't exist before. Note that we
+allow directory-file conflicts between things in different stages
+after the trivial merge.
+
+A possible alternative for #6 is (empty), which would make it like
+#1. This is not used, due to the likelihood that it arises due to
+moving the file to multiple different locations or moving and deleting
+it in different branches.
+
+Case #1 is included for completeness, and also in case we decide to
+put on '+' markings; any path that is never mentioned at all isn't
+handled.
+
+Note that #16 is when both #13 and #14 apply; in this case, we refuse
+the trivial merge, because we can't tell from this data which is
+right. This is a case of a reverted patch (in some direction, maybe
+multiple times), and the right answer depends on looking at crossings
+of history or common ancestors of the ancestors.
+
+Note that, between #6, #7, #9, and #11, all cases not otherwise
+covered are handled in this table.
+
+For #8 and #10, there is alternative behavior, not currently
+implemented, where the result is (empty). As currently implemented,
+the automatic merge will generally give this effect.