34 files changed, 3963 insertions, 0 deletions
diff --git a/Documentation/technical/.gitignore b/Documentation/technical/.gitignore
new file mode 100644
index 0000000000..8aa891daee
--- /dev/null
+++ b/Documentation/technical/.gitignore
@@ -0,0 +1 @@
+api-index.txt
diff --git a/Documentation/technical/api-allocation-growing.txt b/Documentation/technical/api-allocation-growing.txt
new file mode 100644
index 0000000000..43dbe09f73
--- /dev/null
+++ b/Documentation/technical/api-allocation-growing.txt
@@ -0,0 +1,34 @@
+allocation growing API
+======================
+
+Dynamically growing an array using realloc() is error prone and boring.
+
+Define your array with:
+
+* a pointer (`ary`) that points at the array, initialized to `NULL`;
+
+* an integer variable (`alloc`) that keeps track of how big the current
+  allocation is, initialized to `0`;
+
+* another integer variable (`nr`) to keep track of how many elements the
+  array currently has, initialized to `0`.
+
+Then before adding `n`th element to the array, call `ALLOC_GROW(ary, n,
+alloc)`.  This ensures that the array can hold at least `n` elements by
+calling `realloc(3)` and adjusting `alloc` variable.
+
+------------
+sometype *ary;
+size_t nr;
+size_t alloc
+
+for (i = 0; i < nr; i++)
+	if (we like ary[i] already)
+		return;
+
+/* we did not like any existing one, so add one */
+ALLOC_GROW(ary, nr + 1, alloc);
+ary[nr++] = value you like;
+------------
+
+You are responsible for updating the `nr` variable.
diff --git a/Documentation/technical/api-builtin.txt b/Documentation/technical/api-builtin.txt
new file mode 100644
index 0000000000..5cb2b0590a
--- /dev/null
+++ b/Documentation/technical/api-builtin.txt
@@ -0,0 +1,68 @@
+builtin API
+===========
+
+Adding a new built-in
+---------------------
+
+There are 4 things to do to add a built-in command implementation to
+git:
+
+. Define the implementation of the built-in command `foo` with
+  signature:
+
+	int cmd_foo(int argc, const char **argv, const char *prefix);
+
+. Add the external declaration for the function to `builtin.h`.
+
+. Add the command to `commands[]` table in `handle_internal_command()`,
+  defined in `git.c`.  The entry should look like:
+
+	{ "foo", cmd_foo, <options> },
++
+where options is the bitwise-or of:
+
+`RUN_SETUP`::
+
+	Make sure there is a git directory to work on, and if there is a
+	work tree, chdir to the top of it if the command was invoked
+	in a subdirectory.  If there is no work tree, no chdir() is
+	done.
+
+`USE_PAGER`::
+
+	If the standard output is connected to a tty, spawn a pager and
+	feed our output to it.
+
+`NEED_WORK_TREE`::
+
+	Make sure there is a work tree, i.e. the command cannot act
+	on bare repositories.
+	This only makes sense when `RUN_SETUP` is also set.
+
+. Add `builtin-foo.o` to `BUILTIN_OBJS` in `Makefile`.
+
+Additionally, if `foo` is a new command, there are 3 more things to do:
+
+. Add tests to `t/` directory.
+
+. Write documentation in `Documentation/git-foo.txt`.
+
+. Add an entry for `git-foo` to `command-list.txt`.
+
+
+How a built-in is called
+------------------------
+
+The implementation `cmd_foo()` takes three parameters, `argc`, `argv,
+and `prefix`.  The first two are similar to what `main()` of a
+standalone command would be called with.
+
+When `RUN_SETUP` is specified in the `commands[]` table, and when you
+were started from a subdirectory of the work tree, `cmd_foo()` is called
+after chdir(2) to the top of the work tree, and `prefix` gets the path
+to the subdirectory the command started from.  This allows you to
+convert a user-supplied pathname (typically relative to that directory)
+to a pathname relative to the top of the work tree.
+
+The return value from `cmd_foo()` becomes the exit status of the
+command.
diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
new file mode 100644
index 0000000000..1d52a6ce14
--- /dev/null
+++ b/Documentation/technical/api-decorate.txt
@@ -0,0 +1,6 @@
+decorate API
+============
+
+Talk about <decorate.h>
+
+(Linus)
diff --git a/Documentation/technical/api-diff.txt b/Documentation/technical/api-diff.txt
new file mode 100644
index 0000000000..20b0241d30
--- /dev/null
+++ b/Documentation/technical/api-diff.txt
@@ -0,0 +1,166 @@
+diff API
+========
+
+The diff API is for programs that compare two sets of files (e.g. two
+trees, one tree and the index) and present the found difference in
+various ways.  The calling program is responsible for feeding the API
+pairs of files, one from the "old" set and the corresponding one from
+"new" set, that are different.  The library called through this API is
+called diffcore, and is responsible for two things.
+
+* finding total rewrites (`-B`), renames (`-M`) and copies (`-C`), and
+  changes that touch a string (`-S`), as specified by the caller.
+
+* outputting the differences in various formats, as specified by the
+  caller.
+
+Calling sequence
+----------------
+
+* Prepare `struct diff_options` to record the set of diff options, and
+  then call `diff_setup()` to initialize this structure.  This sets up
+  the vanilla default.
+
+* Fill in the options structure to specify desired output format, rename
+  detection, etc.  `diff_opt_parse()` can be used to parse options given
+  from the command line in a way consistent with existing git-diff
+  family of programs.
+
+* Call `diff_setup_done()`; this inspects the options set up so far for
+  internal consistency and make necessary tweaking to it (e.g. if
+  textual patch output was asked, recursive behaviour is turned on).
+
+* As you find different pairs of files, call `diff_change()` to feed
+  modified files, `diff_addremove()` to feed created or deleted files,
+  or `diff_unmerged()` to feed a file whose state is 'unmerged' to the
+  API.  These are thin wrappers to a lower-level `diff_queue()` function
+  that is flexible enough to record any of these kinds of changes.
+
+* Once you finish feeding the pairs of files, call `diffcore_std()`.
+  This will tell the diffcore library to go ahead and do its work.
+
+* Calling `diff_flush()` will produce the output.
+
+
+Data structures
+---------------
+
+* `struct diff_filespec`
+
+This is the internal representation for a single file (blob).  It
+records the blob object name (if known -- for a work tree file it
+typically is a NUL SHA-1), filemode and pathname.  This is what the
+`diff_addremove()`, `diff_change()` and `diff_unmerged()` synthesize and
+feed `diff_queue()` function with.
+
+* `struct diff_filepair`
+
+This records a pair of `struct diff_filespec`; the filespec for a file
+in the "old" set (i.e. preimage) is called `one`, and the filespec for a
+file in the "new" set (i.e. postimage) is called `two`.  A change that
+represents file creation has NULL in `one`, and file deletion has NULL
+in `two`.
+
+A `filepair` starts pointing at `one` and `two` that are from the same
+filename, but `diffcore_std()` can break pairs and match component
+filespecs with other filespecs from a different filepair to form new
+filepair.  This is called 'rename detection'.
+
+* `struct diff_queue`
+
+This is a collection of filepairs.  Notable members are:
+
+`queue`::
+
+	An array of pointers to `struct diff_filepair`.  This
+	dynamically grows as you add filepairs;
+
+`alloc`::
+
+	The allocated size of the `queue` array;
+
+`nr`::
+
+	The number of elements in the `queue` array.
+
+
+* `struct diff_options`
+
+This describes the set of options the calling program wants to affect
+the operation of diffcore library with.
+
+Notable members are:
+
+`output_format`::
+	The output format used when `diff_flush()` is run.
+
+`context`::
+	Number of context lines to generate in patch output.
+
+`break_opt`, `detect_rename`, `rename-score`, `rename_limit`::
+	Affects the way detection logic for complete rewrites, renames
+	and copies.
+
+`abbrev`::
+	Number of hexdigits to abbreviate raw format output to.
+
+`pickaxe`::
+	A constant string (can and typically does contain newlines to
+	look for a block of text, not just a single line) to filter out
+	the filepairs that do not change the number of strings contained
+	in its preimage and postimage of the diff_queue.
+
+`flags`::
+	This is mostly a collection of boolean options that affects the
+	operation, but some do not have anything to do with the diffcore
+	library.
+
+BINARY, TEXT;;
+	Affects the way how a file that is seemingly binary is treated.
+
+FULL_INDEX;;
+	Tells the patch output format not to use abbreviated object
+	names on the "index" lines.
+
+FIND_COPIES_HARDER;;
+	Tells the diffcore library that the caller is feeding unchanged
+	filepairs to allow copies from unmodified files be detected.
+
+COLOR_DIFF;;
+	Output should be colored.
+
+COLOR_DIFF_WORDS;;
+	Output is a colored word-diff.
+
+NO_INDEX;;
+	Tells diff-files that the input is not tracked files but files
+	in random locations on the filesystem.
+
+ALLOW_EXTERNAL;;
+	Tells output routine that it is Ok to call user specified patch
+	output routine.  Plumbing disables this to ensure stable output.
+
+QUIET;;
+	Do not show any output.
+
+REVERSE_DIFF;;
+	Tells the library that the calling program is feeding the
+	filepairs reversed; `one` is two, and `two` is one.
+
+EXIT_WITH_STATUS;;
+	For communication between the calling program and the options
+	parser; tell the calling program to signal the presence of
+	difference using program exit code.
+
+HAS_CHANGES;;
+	Internal; used for optimization to see if there is any change.
+
+SILENT_ON_REMOVE;;
+	Affects if diff-files shows removed files.
+
+RECURSIVE, TREE_IN_RECURSIVE;;
+	Tells if tree traversal done by tree-diff should recursively
+	descend into a tree object pair that are different in preimage
+	and postimage set.
+
+(JC)
diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
new file mode 100644
index 0000000000..add6f435b5
--- /dev/null
+++ b/Documentation/technical/api-directory-listing.txt
@@ -0,0 +1,79 @@
+directory listing API
+=====================
+
+The directory listing API is used to enumerate paths in the work tree,
+optionally taking `.git/info/exclude` and `.gitignore` files per
+directory into account.
+
+Data structure
+--------------
+
+`struct dir_struct` structure is used to pass directory traversal
+options to the library and to record the paths discovered.  The notable
+options are:
+
+`exclude_per_dir`::
+
+	The name of the file to be read in each directory for excluded
+	files (typically `.gitignore`).
+
+`collect_ignored`::
+
+	Include paths that are to be excluded in the result.
+
+`show_ignored`::
+
+	The traversal is for finding just ignored files, not unignored
+	files.
+
+`show_other_directories`::
+
+	Include a directory that is not tracked.
+
+`hide_empty_directories`::
+
+	Do not include a directory that is not tracked and is empty.
+
+`no_gitlinks`::
+
+	If set, recurse into a directory that looks like a git
+	directory.  Otherwise it is shown as a directory.
+
+The result of the enumeration is left in these fields::
+
+`entries[]`::
+
+	An array of `struct dir_entry`, each element of which describes
+	a path.
+
+`nr`::
+
+	The number of members in `entries[]` array.
+
+`alloc`::
+
+	Internal use; keeps track of allocation of `entries[]` array.
+
+
+Calling sequence
+----------------
+
+Note: index may be looked at for .gitignore files that are CE_SKIP_WORKTREE
+marked. If you to exclude files, make sure you have loaded index first.
+
+* Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0,
+  sizeof(dir))`.
+
+* Call `add_exclude()` to add single exclude pattern,
+  `add_excludes_from_file()` to add patterns from a file
+  (e.g. `.git/info/exclude`), and/or set `dir.exclude_per_dir`.  A
+  short-hand function `setup_standard_excludes()` can be used to set up
+  the standard set of exclude settings.
+
+* Set options described in the Data Structure section above.
+
+* Call `read_directory()`.
+
+* Use `dir.entries[]`.
+
+(JC)
diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt
new file mode 100644
index 0000000000..9d97eaa9de
--- /dev/null
+++ b/Documentation/technical/api-gitattributes.txt
@@ -0,0 +1,111 @@
+gitattributes API
+=================
+
+gitattributes mechanism gives a uniform way to associate various
+attributes to set of paths.
+
+
+Data Structure
+--------------
+
+`struct git_attr`::
+
+	An attribute is an opaque object that is identified by its name.
+	Pass the name and its length to `git_attr()` function to obtain
+	the object of this type.  The internal representation of this
+	structure is of no interest to the calling programs.
+
+`struct git_attr_check`::
+
+	This structure represents a set of attributes to check in a call
+	to `git_checkattr()` function, and receives the results.
+
+
+Calling Sequence
+----------------
+
+* Prepare an array of `struct git_attr_check` to define the list of
+  attributes you would want to check.  To populate this array, you would
+  need to define necessary attributes by calling `git_attr()` function.
+
+* Call git_checkattr() to check the attributes for the path.
+
+* Inspect `git_attr_check` structure to see how each of the attribute in
+  the array is defined for the path.
+
+
+Attribute Values
+----------------
+
+An attribute for a path can be in one of four states: Set, Unset,
+Unspecified or set to a string, and `.value` member of `struct
+git_attr_check` records it.  There are three macros to check these:
+
+`ATTR_TRUE()`::
+
+	Returns true if the attribute is Set for the path.
+
+`ATTR_FALSE()`::
+
+	Returns true if the attribute is Unset for the path.
+
+`ATTR_UNSET()`::
+
+	Returns true if the attribute is Unspecified for the path.
+
+If none of the above returns true, `.value` member points at a string
+value of the attribute for the path.
+
+
+Example
+-------
+
+To see how attributes "crlf" and "indent" are set for different paths.
+
+. Prepare an array of `struct git_attr_check` with two elements (because
+  we are checking two attributes).  Initialize their `attr` member with
+  pointers to `struct git_attr` obtained by calling `git_attr()`:
+
+------------
+static struct git_attr_check check[2];
+static void setup_check(void)
+{
+	if (check[0].attr)
+		return; /* already done */
+	check[0].attr = git_attr("crlf", 4);
+	check[1].attr = git_attr("ident", 5);
+}
+------------
+
+. Call `git_checkattr()` with the prepared array of `struct git_attr_check`:
+
+------------
+	const char *path;
+
+	setup_check();
+	git_checkattr(path, ARRAY_SIZE(check), check);
+------------
+
+. Act on `.value` member of the result, left in `check[]`:
+
+------------
+	const char *value = check[0].value;
+
+	if (ATTR_TRUE(value)) {
+		The attribute is Set, by listing only the name of the
+		attribute in the gitattributes file for the path.
+	} else if (ATTR_FALSE(value)) {
+		The attribute is Unset, by listing the name of the
+		attribute prefixed with a dash - for the path.
+	} else if (ATTR_UNSET(value)) {
+		The attribute is not set nor unset for the path.
+	} else if (!strcmp(value, "input")) {
+		If none of ATTR_TRUE(), ATTR_FALSE(), or ATTR_UNSET() is
+		true, the value is a string set in the gitattributes
+		file for the path by saying "attr=value".
+	} else if (... other check using value as string ...) {
+		...
+	}
+------------
+
+(JC)
diff --git a/Documentation/technical/api-grep.txt b/Documentation/technical/api-grep.txt
new file mode 100644
index 0000000000..a69cc8964d
--- /dev/null
+++ b/Documentation/technical/api-grep.txt
@@ -0,0 +1,8 @@
+grep API
+========
+
+Talk about <grep.h>, things like:
+
+* grep_buffer()
+
+(JC)
diff --git a/Documentation/technical/api-hash.txt b/Documentation/technical/api-hash.txt
new file mode 100644
index 0000000000..e5061e0677
--- /dev/null
+++ b/Documentation/technical/api-hash.txt
@@ -0,0 +1,52 @@
+hash API
+========
+
+The hash API is a collection of simple hash table functions. Users are expected
+to implement their own hashing.
+
+Data Structures
+---------------
+
+`struct hash_table`::
+
+	The hash table structure. The `array` member points to the hash table
+	entries. The `size` member counts the total number of valid and invalid
+	entries in the table. The `nr` member keeps track of the number of
+	valid entries.
+
+`struct hash_table_entry`::
+
+	An opaque structure representing an entry in the hash table. The `hash`
+	member is the entry's hash key and the `ptr` member is the entry's
+	value.
+
+Functions
+---------
+
+`init_hash`::
+
+	Initialize the hash table.
+
+`free_hash`::
+
+	Release memory associated with the hash table.
+
+`insert_hash`::
+
+	Insert a pointer into the hash table. If an entry with that hash
+	already exists, a pointer to the existing entry's value is returned.
+	Otherwise NULL is returned.  This allows callers to implement
+	chaining, etc.
+
+`lookup_hash`::
+
+	Lookup an entry in the hash table. If an entry with that hash exists
+	the entry's value is returned. Otherwise NULL is returned.
+
+`for_each_hash`::
+
+	Call a function for each entry in the hash table. The function is
+	expected to take the entry's value as its only argument and return an
+	int. If the function returns a negative int the loop is aborted
+	immediately.  Otherwise, the return value is accumulated and the sum
+	returned upon completion of the loop.
diff --git a/Documentation/technical/api-history-graph.txt b/Documentation/technical/api-history-graph.txt
new file mode 100644
index 0000000000..d6fc90ac7e
--- /dev/null
+++ b/Documentation/technical/api-history-graph.txt
@@ -0,0 +1,174 @@
+history graph API
+=================
+
+The graph API is used to draw a text-based representation of the commit
+history.  The API generates the graph in a line-by-line fashion.
+
+Functions
+---------
+
+Core functions:
+
+* `graph_init()` creates a new `struct git_graph`
+
+* `graph_update()` moves the graph to a new commit.
+
+* `graph_next_line()` outputs the next line of the graph into a strbuf.  It
+  does not add a terminating newline.
+
+* `graph_padding_line()` outputs a line of vertical padding in the graph.  It
+  is similar to `graph_next_line()`, but is guaranteed to never print the line
+  containing the current commit.  Where `graph_next_line()` would print the
+  commit line next, `graph_padding_line()` prints a line that simply extends
+  all branch lines downwards one row, leaving their positions unchanged.
+
+* `graph_is_commit_finished()` determines if the graph has output all lines
+  necessary for the current commit.  If `graph_update()` is called before all
+  lines for the current commit have been printed, the next call to
+  `graph_next_line()` will output an ellipsis, to indicate that a portion of
+  the graph was omitted.
+
+The following utility functions are wrappers around `graph_next_line()` and
+`graph_is_commit_finished()`.  They always print the output to stdout.
+They can all be called with a NULL graph argument, in which case no graph
+output will be printed.
+
+* `graph_show_commit()` calls `graph_next_line()` until it returns non-zero.
+  This prints all graph lines up to, and including, the line containing this
+  commit.  Output is printed to stdout.  The last line printed does not contain
+  a terminating newline.  This should not be called if the commit line has
+  already been printed, or it will loop forever.
+
+* `graph_show_oneline()` calls `graph_next_line()` and prints the result to
+  stdout.  The line printed does not contain a terminating newline.
+
+* `graph_show_padding()` calls `graph_padding_line()` and prints the result to
+  stdout.  The line printed does not contain a terminating newline.
+
+* `graph_show_remainder()` calls `graph_next_line()` until
+  `graph_is_commit_finished()` returns non-zero.  Output is printed to stdout.
+  The last line printed does not contain a terminating newline.  Returns 1 if
+  output was printed, and 0 if no output was necessary.
+
+* `graph_show_strbuf()` prints the specified strbuf to stdout, prefixing all
+  lines but the first with a graph line.  The caller is responsible for
+  ensuring graph output for the first line has already been printed to stdout.
+  (This can be done with `graph_show_commit()` or `graph_show_oneline()`.)  If
+  a NULL graph is supplied, the strbuf is printed as-is.
+
+* `graph_show_commit_msg()` is similar to `graph_show_strbuf()`, but it also
+  prints the remainder of the graph, if more lines are needed after the strbuf
+  ends.  It is better than directly calling `graph_show_strbuf()` followed by
+  `graph_show_remainder()` since it properly handles buffers that do not end in
+  a terminating newline.  The output printed by `graph_show_commit_msg()` will
+  end in a newline if and only if the strbuf ends in a newline.
+
+Data structure
+--------------
+`struct git_graph` is an opaque data type used to store the current graph
+state.
+
+Calling sequence
+----------------
+
+* Create a `struct git_graph` by calling `graph_init()`.  When using the
+  revision walking API, this is done automatically by `setup_revisions()` if
+  the '--graph' option is supplied.
+
+* Use the revision walking API to walk through a group of contiguous commits.
+  The `get_revision()` function automatically calls `graph_update()` each time
+  it is invoked.
+
+* For each commit, call `graph_next_line()` repeatedly, until
+  `graph_is_commit_finished()` returns non-zero.  Each call go
+  `graph_next_line()` will output a single line of the graph.  The resulting
+  lines will not contain any newlines.  `graph_next_line()` returns 1 if the
+  resulting line contains the current commit, or 0 if this is merely a line
+  needed to adjust the graph before or after the current commit.  This return
+  value can be used to determine where to print the commit summary information
+  alongside the graph output.
+
+Limitations
+-----------
+
+* `graph_update()` must be called with commits in topological order.  It should
+  not be called on a commit if it has already been invoked with an ancestor of
+  that commit, or the graph output will be incorrect.
+
+* `graph_update()` must be called on a contiguous group of commits.  If
+  `graph_update()` is called on a particular commit, it should later be called
+  on all parents of that commit.  Parents must not be skipped, or the graph
+  output will appear incorrect.
++
+`graph_update()` may be used on a pruned set of commits only if the parent list
+has been rewritten so as to include only ancestors from the pruned set.
+
+* The graph API does not currently support reverse commit ordering.  In
+  order to implement reverse ordering, the graphing API needs an
+  (efficient) mechanism to find the children of a commit.
+
+Sample usage
+------------
+
+------------
+struct commit *commit;
+struct git_graph *graph = graph_init(opts);
+
+while ((commit = get_revision(opts)) != NULL) {
+	graph_update(graph, commit);
+	while (!graph_is_commit_finished(graph))
+	{
+		struct strbuf sb;
+		int is_commit_line;
+
+		strbuf_init(&sb, 0);
+		is_commit_line = graph_next_line(graph, &sb);
+		fputs(sb.buf, stdout);
+
+		if (is_commit_line)
+			log_tree_commit(opts, commit);
+		else
+			putchar(opts->diffopt.line_termination);
+	}
+}
+------------
+
+Sample output
+-------------
+
+The following is an example of the output from the graph API.  This output does
+not include any commit summary information--callers are responsible for
+outputting that information, if desired.
+
+------------
+*
+*
+*
+|\
+* |
+| | *
+| \ \
+|  \ \
+*-. \ \
+|\ \ \ \
+| | * | |
+| | | | | *
+| | | | | *
+| | | | | *
+| | | | | |\
+| | | | | | *
+| * | | | | |
+| | | | | *  \
+| | | | | |\  |
+| | | | * | | |
+| | | | * | | |
+* | | | | | | |
+| |/ / / / / /
+|/| / / / / /
+* | | | | | |
+|/ / / / / /
+* | | | | |
+| | | | | *
+| | | | |/
+| | | | *
+------------
diff --git a/Documentation/technical/api-in-core-index.txt b/Documentation/technical/api-in-core-index.txt
new file mode 100644
index 0000000000..adbdbf5d75
--- /dev/null
+++ b/Documentation/technical/api-in-core-index.txt
@@ -0,0 +1,21 @@
+in-core index API
+=================
+
+Talk about <read-cache.c> and <cache-tree.c>, things like:
+
+* cache -> the_index macros
+* read_index()
+* write_index()
+* ie_match_stat() and ie_modified(); how they are different and when to
+  use which.
+* index_name_pos()
+* remove_index_entry_at()
+* remove_file_from_index()
+* add_file_to_index()
+* add_index_entry()
+* refresh_index()
+* discard_index()
+* cache_tree_invalidate_path()
+* cache_tree_update()
+
+(JC, Linus)
diff --git a/Documentation/technical/api-index-skel.txt b/Documentation/technical/api-index-skel.txt
new file mode 100644
index 0000000000..af7cc2e395
--- /dev/null
+++ b/Documentation/technical/api-index-skel.txt
@@ -0,0 +1,15 @@
+GIT API Documents
+=================
+
+GIT has grown a set of internal API over time.  This collection
+documents them.
+
+////////////////////////////////////////////////////////////////
+// table of contents begin
+////////////////////////////////////////////////////////////////
+
+////////////////////////////////////////////////////////////////
+// table of contents end
+////////////////////////////////////////////////////////////////
+
+2007-11-24
diff --git a/Documentation/technical/api-index.sh b/Documentation/technical/api-index.sh
new file mode 100755
index 0000000000..9c3f4131b8
--- /dev/null
+++ b/Documentation/technical/api-index.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+(
+	c=////////////////////////////////////////////////////////////////
+	skel=api-index-skel.txt
+	sed -e '/^\/\/ table of contents begin/q' "$skel"
+	echo "$c"
+
+	ls api-*.txt |
+	while read filename
+	do
+		case "$filename" in
+		api-index-skel.txt | api-index.txt) continue ;;
+		esac
+		title=$(sed -e 1q "$filename")
+		html=${filename%.txt}.html
+		echo "* link:$html[$title]"
+	done
+	echo "$c"
+	sed -n -e '/^\/\/ table of contents end/,$p' "$skel"
+) >api-index.txt+
+
+if test -f api-index.txt && cmp api-index.txt api-index.txt+ >/dev/null
+then
+	rm -f api-index.txt+
+else
+	mv api-index.txt+ api-index.txt
+fi
diff --git a/Documentation/technical/api-lockfile.txt b/Documentation/technical/api-lockfile.txt
new file mode 100644
index 0000000000..dd894043ae
--- /dev/null
+++ b/Documentation/technical/api-lockfile.txt
@@ -0,0 +1,74 @@
+lockfile API
+============
+
+The lockfile API serves two purposes:
+
+* Mutual exclusion.  When we write out a new index file, first
+  we create a new file `$GIT_DIR/index.lock`, write the new
+  contents into it, and rename it to the final destination
+  `$GIT_DIR/index`.  We try to create the `$GIT_DIR/index.lock`
+  file with O_EXCL so that we can notice and fail when somebody
+  else is already trying to update the index file.
+
+* Automatic cruft removal.  After we create the "lock" file, we
+  may decide to `die()`, and we would want to make sure that we
+  remove the file that has not been committed to its final
+  destination.  This is done by remembering the lockfiles we
+  created in a linked list and cleaning them up from an
+  `atexit(3)` handler.  Outstanding lockfiles are also removed
+  when the program dies on a signal.
+
+
+The functions
+-------------
+
+hold_lock_file_for_update::
+
+	Take a pointer to `struct lock_file`, the filename of
+	the final destination (e.g. `$GIT_DIR/index`) and a flag
+	`die_on_error`.  Attempt to create a lockfile for the
+	destination and return the file descriptor for writing
+	to the file.  If `die_on_error` flag is true, it dies if
+	a lock is already taken for the file; otherwise it
+	returns a negative integer to the caller on failure.
+
+commit_lock_file::
+
+	Take a pointer to the `struct lock_file` initialized
+	with an earlier call to `hold_lock_file_for_update()`,
+	close the file descriptor and rename the lockfile to its
+	final destination.  Returns 0 upon success, a negative
+	value on failure to close(2) or rename(2).
+
+rollback_lock_file::
+
+	Take a pointer to the `struct lock_file` initialized
+	with an earlier call to `hold_lock_file_for_update()`,
+	close the file descriptor and remove the lockfile.
+
+close_lock_file::
+	Take a pointer to the `struct lock_file` initialized
+	with an earlier call to `hold_lock_file_for_update()`,
+	and close the file descriptor.  Returns 0 upon success,
+	a negative value on failure to close(2).
+
+Because the structure is used in an `atexit(3)` handler, its
+storage has to stay throughout the life of the program.  It
+cannot be an auto variable allocated on the stack.
+
+Call `commit_lock_file()` or `rollback_lock_file()` when you are
+done writing to the file descriptor.  If you do not call either
+and simply `exit(3)` from the program, an `atexit(3)` handler
+will close and remove the lockfile.
+
+If you need to close the file descriptor you obtained from
+`hold_lock_file_for_update` function yourself, do so by calling
+`close_lock_file()`.  You should never call `close(2)` yourself!
+Otherwise the `struct
+lock_file` structure still remembers that the file descriptor
+needs to be closed, and a later call to `commit_lock_file()` or
+`rollback_lock_file()` will result in duplicate calls to
+`close(2)`.  Worse yet, if you `close(2)`, open another file
+descriptor for completely different purpose, and then call
+`commit_lock_file()` or `rollback_lock_file()`, they may close
+that unrelated file descriptor.
diff --git a/Documentation/technical/api-object-access.txt b/Documentation/technical/api-object-access.txt
new file mode 100644
index 0000000000..03bb0e950d
--- /dev/null
+++ b/Documentation/technical/api-object-access.txt
@@ -0,0 +1,15 @@
+object access API
+=================
+
+Talk about <sha1_file.c> and <object.h> family, things like
+
+* read_sha1_file()
+* read_object_with_reference()
+* has_sha1_file()
+* write_sha1_file()
+* pretend_sha1_file()
+* lookup_{object,commit,tag,blob,tree}
+* parse_{object,commit,tag,blob,tree}
+* Use of object flags
+
+(JC, Shawn, Daniel, Dscho, Linus)
diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt
new file mode 100644
index 0000000000..312e3b2e2b
--- /dev/null
+++ b/Documentation/technical/api-parse-options.txt
@@ -0,0 +1,263 @@
+parse-options API
+=================
+
+The parse-options API is used to parse and massage options in git
+and to provide a usage help with consistent look.
+
+Basics
+------
+
+The argument vector `argv[]` may usually contain mandatory or optional
+'non-option arguments', e.g. a filename or a branch, and 'options'.
+Options are optional arguments that start with a dash and
+that allow to change the behavior of a command.
+
+* There are basically three types of options:
+  'boolean' options,
+  options with (mandatory) 'arguments' and
+  options with 'optional arguments'
+  (i.e. a boolean option that can be adjusted).
+
+* There are basically two forms of options:
+  'Short options' consist of one dash (`-`) and one alphanumeric
+  character.
+  'Long options' begin with two dashes (`\--`) and some
+  alphanumeric characters.
+
+* Options are case-sensitive.
+  Please define 'lower-case long options' only.
+
+The parse-options API allows:
+
+* 'sticked' and 'separate form' of options with arguments.
+  `-oArg` is sticked, `-o Arg` is separate form.
+  `\--option=Arg` is sticked, `\--option Arg` is separate form.
+
+* Long options may be 'abbreviated', as long as the abbreviation
+  is unambiguous.
+
+* Short options may be bundled, e.g. `-a -b` can be specified as `-ab`.
+
+* Boolean long options can be 'negated' (or 'unset') by prepending
+  `no-`, e.g. `\--no-abbrev` instead of `\--abbrev`.
+
+* Options and non-option arguments can clearly be separated using the `\--`
+  option, e.g. `-a -b \--option \-- \--this-is-a-file` indicates that
+  `\--this-is-a-file` must not be processed as an option.
+
+Steps to parse options
+----------------------
+
+. `#include "parse-options.h"`
+
+. define a NULL-terminated
+  `static const char * const builtin_foo_usage[]` array
+  containing alternative usage strings
+
+. define `builtin_foo_options` array as described below
+  in section 'Data Structure'.
+
+. in `cmd_foo(int argc, const char **argv, const char *prefix)`
+  call
+
+	argc = parse_options(argc, argv, prefix, builtin_foo_options, builtin_foo_usage, flags);
++
+`parse_options()` will filter out the processed options of `argv[]` and leave the
+non-option arguments in `argv[]`.
+`argc` is updated appropriately because of the assignment.
++
+You can also pass NULL instead of a usage array as the fifth parameter of
+parse_options(), to avoid displaying a help screen with usage info and
+option list.  This should only be done if necessary, e.g. to implement
+a limited parser for only a subset of the options that needs to be run
+before the full parser, which in turn shows the full help message.
++
+Flags are the bitwise-or of:
+
+`PARSE_OPT_KEEP_DASHDASH`::
+	Keep the `\--` that usually separates options from
+	non-option arguments.
+
+`PARSE_OPT_STOP_AT_NON_OPTION`::
+	Usually the whole argument vector is massaged and reordered.
+	Using this flag, processing is stopped at the first non-option
+	argument.
+
+`PARSE_OPT_KEEP_ARGV0`::
+	Keep the first argument, which contains the program name.  It's
+	removed from argv[] by default.
+
+`PARSE_OPT_KEEP_UNKNOWN`::
+	Keep unknown arguments instead of erroring out.  This doesn't
+	work for all combinations of arguments as users might expect
+	it to do.  E.g. if the first argument in `--unknown --known`
+	takes a value (which we can't know), the second one is
+	mistakenly interpreted as a known option.  Similarly, if
+	`PARSE_OPT_STOP_AT_NON_OPTION` is set, the second argument in
+	`--unknown value` will be mistakenly interpreted as a
+	non-option, not as a value belonging to the unknown option,
+	the parser early.  That's why parse_options() errors out if
+	both options are set.
+
+`PARSE_OPT_NO_INTERNAL_HELP`::
+	By default, parse_options() handles `-h`, `--help` and
+	`--help-all` internally, by showing a help screen.  This option
+	turns it off and allows one to add custom handlers for these
+	options, or to just leave them unknown.
+
+Data Structure
+--------------
+
+The main data structure is an array of the `option` struct,
+say `static struct option builtin_add_options[]`.
+There are some macros to easily define options:
+
+`OPT__ABBREV(&int_var)`::
+	Add `\--abbrev[=<n>]`.
+
+`OPT__COLOR(&int_var, description)`::
+	Add `\--color[=<when>]` and `--no-color`.
+
+`OPT__DRY_RUN(&int_var)`::
+	Add `-n, \--dry-run`.
+
+`OPT__QUIET(&int_var)`::
+	Add `-q, \--quiet`.
+
+`OPT__VERBOSE(&int_var)`::
+	Add `-v, \--verbose`.
+
+`OPT_GROUP(description)`::
+	Start an option group. `description` is a short string that
+	describes the group or an empty string.
+	Start the description with an upper-case letter.
+
+`OPT_BOOLEAN(short, long, &int_var, description)`::
+	Introduce a boolean option.
+	`int_var` is incremented on each use.
+
+`OPT_BIT(short, long, &int_var, description, mask)`::
+	Introduce a boolean option.
+	If used, `int_var` is bitwise-ored with `mask`.
+
+`OPT_NEGBIT(short, long, &int_var, description, mask)`::
+	Introduce a boolean option.
+	If used, `int_var` is bitwise-anded with the inverted `mask`.
+
+`OPT_SET_INT(short, long, &int_var, description, integer)`::
+	Introduce a boolean option.
+	If used, set `int_var` to `integer`.
+
+`OPT_SET_PTR(short, long, &ptr_var, description, ptr)`::
+	Introduce a boolean option.
+	If used, set `ptr_var` to `ptr`.
+
+`OPT_STRING(short, long, &str_var, arg_str, description)`::
+	Introduce an option with string argument.
+	The string argument is put into `str_var`.
+
+`OPT_INTEGER(short, long, &int_var, description)`::
+	Introduce an option with integer argument.
+	The integer is put into `int_var`.
+
+`OPT_DATE(short, long, &int_var, description)`::
+	Introduce an option with date argument, see `approxidate()`.
+	The timestamp is put into `int_var`.
+
+`OPT_CALLBACK(short, long, &var, arg_str, description, func_ptr)`::
+	Introduce an option with argument.
+	The argument will be fed into the function given by `func_ptr`
+	and the result will be put into `var`.
+	See 'Option Callbacks' below for a more elaborate description.
+
+`OPT_FILENAME(short, long, &var, description)`::
+	Introduce an option with a filename argument.
+	The filename will be prefixed by passing the filename along with
+	the prefix argument of `parse_options()` to `prefix_filename()`.
+
+`OPT_ARGUMENT(long, description)`::
+	Introduce a long-option argument that will be kept in `argv[]`.
+
+`OPT_NUMBER_CALLBACK(&var, description, func_ptr)`::
+	Recognize numerical options like -123 and feed the integer as
+	if it was an argument to the function given by `func_ptr`.
+	The result will be put into `var`.  There can be only one such
+	option definition.  It cannot be negated and it takes no
+	arguments.  Short options that happen to be digits take
+	precedence over it.
+
+`OPT_COLOR_FLAG(short, long, &int_var, description)`::
+	Introduce an option that takes an optional argument that can
+	have one of three values: "always", "never", or "auto".  If the
+	argument is not given, it defaults to "always".  The `--no-` form
+	works like `--long=never`; it cannot take an argument.  If
+	"always", set `int_var` to 1; if "never", set `int_var` to 0; if
+	"auto", set `int_var` to 1 if stdout is a tty or a pager,
+	0 otherwise.
+
+
+The last element of the array must be `OPT_END()`.
+
+If not stated otherwise, interpret the arguments as follows:
+
+* `short` is a character for the short option
+  (e.g. `\'e\'` for `-e`, use `0` to omit),
+
+* `long` is a string for the long option
+  (e.g. `"example"` for `\--example`, use `NULL` to omit),
+
+* `int_var` is an integer variable,
+
+* `str_var` is a string variable (`char *`),
+
+* `arg_str` is the string that is shown as argument
+  (e.g. `"branch"` will result in `<branch>`).
+  If set to `NULL`, three dots (`...`) will be displayed.
+
+* `description` is a short string to describe the effect of the option.
+  It shall begin with a lower-case letter and a full stop (`.`) shall be
+  omitted at the end.
+
+Option Callbacks
+----------------
+
+The function must be defined in this form:
+
+	int func(const struct option *opt, const char *arg, int unset)
+
+The callback mechanism is as follows:
+
+* Inside `func`, the only interesting member of the structure
+  given by `opt` is the void pointer `opt->value`.
+  `\*opt->value` will be the value that is saved into `var`, if you
+  use `OPT_CALLBACK()`.
+  For example, do `*(unsigned long *)opt->value = 42;` to get 42
+  into an `unsigned long` variable.
+
+* Return value `0` indicates success and non-zero return
+  value will invoke `usage_with_options()` and, thus, die.
+
+* If the user negates the option, `arg` is `NULL` and `unset` is 1.
+
+Sophisticated option parsing
+----------------------------
+
+If you need, for example, option callbacks with optional arguments
+or without arguments at all, or if you need other special cases,
+that are not handled by the macros above, you need to specify the
+members of the `option` structure manually.
+
+This is not covered in this document, but well documented
+in `parse-options.h` itself.
+
+Examples
+--------
+
+See `test-parse-options.c` and
+`builtin-add.c`,
+`builtin-clone.c`,
+`builtin-commit.c`,
+`builtin-fetch.c`,
+`builtin-fsck.c`,
+`builtin-rm.c`
+for real-world examples.
diff --git a/Documentation/technical/api-quote.txt b/Documentation/technical/api-quote.txt
new file mode 100644
index 0000000000..e8a1bce94e
--- /dev/null
+++ b/Documentation/technical/api-quote.txt
@@ -0,0 +1,10 @@
+quote API
+=========
+
+Talk about <quote.h>, things like
+
+* sq_quote and unquote
+* c_style quote and unquote
+* quoting for foreign languages
+
+(JC)
diff --git a/Documentation/technical/api-remote.txt b/Documentation/technical/api-remote.txt
new file mode 100644
index 0000000000..c54b17db69
--- /dev/null
+++ b/Documentation/technical/api-remote.txt
@@ -0,0 +1,127 @@
+Remotes configuration API
+=========================
+
+The API in remote.h gives access to the configuration related to
+remotes. It handles all three configuration mechanisms historically
+and currently used by git, and presents the information in a uniform
+fashion. Note that the code also handles plain URLs without any
+configuration, giving them just the default information.
+
+struct remote
+-------------
+
+`name`::
+
+	The user's nickname for the remote
+
+`url`::
+
+	An array of all of the url_nr URLs configured for the remote
+
+`pushurl`::
+
+	An array of all of the pushurl_nr push URLs configured for the remote
+
+`push`::
+
+	 An array of refspecs configured for pushing, with
+	 push_refspec being the literal strings, and push_refspec_nr
+	 being the quantity.
+
+`fetch`::
+
+	An array of refspecs configured for fetching, with
+	fetch_refspec being the literal strings, and fetch_refspec_nr
+	being the quantity.
+
+`fetch_tags`::
+
+	The setting for whether to fetch tags (as a separate rule from
+	the configured refspecs); -1 means never to fetch tags, 0
+	means to auto-follow tags based on the default heuristic, 1
+	means to always auto-follow tags, and 2 means to fetch all
+	tags.
+
+`receivepack`, `uploadpack`::
+
+	The configured helper programs to run on the remote side, for
+	git-native protocols.
+
+`http_proxy`::
+
+	The proxy to use for curl (http, https, ftp, etc.) URLs.
+
+struct remotes can be found by name with remote_get(), and iterated
+through with for_each_remote(). remote_get(NULL) will return the
+default remote, given the current branch and configuration.
+
+struct refspec
+--------------
+
+A struct refspec holds the parsed interpretation of a refspec. If it
+will force updates (starts with a '+'), force is true. If it is a
+pattern (sides end with '*') pattern is true. src and dest are the two
+sides (if a pattern, only the part outside of the wildcards); if there
+is only one side, it is src, and dst is NULL; if sides exist but are
+empty (i.e., the refspec either starts or ends with ':'), the
+corresponding side is "".
+
+This parsing can be done to an array of strings to give an array of
+struct refpsecs with parse_ref_spec().
+
+remote_find_tracking(), given a remote and a struct refspec with
+either src or dst filled out, will fill out the other such that the
+result is in the "fetch" specification for the remote (note that this
+evaluates patterns and returns a single result).
+
+struct branch
+-------------
+
+Note that this may end up moving to branch.h
+
+struct branch holds the configuration for a branch. It can be looked
+up with branch_get(name) for "refs/heads/{name}", or with
+branch_get(NULL) for HEAD.
+
+It contains:
+
+`name`::
+
+	The short name of the branch.
+
+`refname`::
+
+	The full path for the branch ref.
+
+`remote_name`::
+
+	The name of the remote listed in the configuration.
+
+`remote`::
+
+	The struct remote for that remote.
+
+`merge_name`::
+
+	An array of the "merge" lines in the configuration.
+
+`merge`::
+
+	An array of the struct refspecs used for the merge lines. That
+	is, merge[i]->dst is a local tracking ref which should be
+	merged into this branch by default.
+
+`merge_nr`::
+
+	The number of merge configurations
+
+branch_has_merge_config() returns true if the given branch has merge
+configuration given.
+
+Other stuff
+-----------
+
+There is other stuff in remote.h that is related, in general, to the
+process of interacting with remotes.
+
+(Daniel Barkalow)
diff --git a/Documentation/technical/api-revision-walking.txt b/Documentation/technical/api-revision-walking.txt
new file mode 100644
index 0000000000..996da0503a
--- /dev/null
+++ b/Documentation/technical/api-revision-walking.txt
@@ -0,0 +1,67 @@
+revision walking API
+====================
+
+The revision walking API offers functions to build a list of revisions
+and then iterate over that list.
+
+Calling sequence
+----------------
+
+The walking API has a given calling sequence: first you need to
+initialize a rev_info structure, then add revisions to control what kind
+of revision list do you want to get, finally you can iterate over the
+revision list.
+
+Functions
+---------
+
+`init_revisions`::
+
+	Initialize a rev_info structure with default values. The second
+	parameter may be NULL or can be prefix path, and then the `.prefix`
+	variable will be set to it. This is typically the first function you
+	want to call when you want to deal with a revision list. After calling
+	this function, you are free to customize options, like set
+	`.ignore_merges` to 0 if you don't want to ignore merges, and so on. See
+	`revision.h` for a complete list of available options.
+
+`add_pending_object`::
+
+	This function can be used if you want to add commit objects as revision
+	information. You can use the `UNINTERESTING` object flag to indicate if
+	you want to include or exclude the given commit (and commits reachable
+	from the given commit) from the revision list.
++
+NOTE: If you have the commits as a string list then you probably want to
+use setup_revisions(), instead of parsing each string and using this
+function.
+
+`setup_revisions`::
+
+	Parse revision information, filling in the `rev_info` structure, and
+	removing the used arguments from the argument list. Returns the number
+	of arguments left that weren't recognized, which are also moved to the
+	head of the argument list. The last parameter is used in case no
+	parameter given by the first two arguments.
+
+`prepare_revision_walk`::
+
+	Prepares the rev_info structure for a walk. You should check if it
+	returns any error (non-zero return code) and if it does not, you can
+	start using get_revision() to do the iteration.
+
+`get_revision`::
+
+	Takes a pointer to a `rev_info` structure and iterates over it,
+	returning a `struct commit *` each time you call it. The end of the
+	revision list is indicated by returning a NULL pointer.
+
+Data structures
+---------------
+
+Talk about <revision.h>, things like:
+
+* two diff_options, one for path limiting, another for output;
+* remaining functions;
+
+(Linus, JC, Dscho)
diff --git a/Documentation/technical/api-run-command.txt b/Documentation/technical/api-run-command.txt
new file mode 100644
index 0000000000..44876fa703
--- /dev/null
+++ b/Documentation/technical/api-run-command.txt
@@ -0,0 +1,242 @@
+run-command API
+===============
+
+The run-command API offers a versatile tool to run sub-processes with
+redirected input and output as well as with a modified environment
+and an alternate current directory.
+
+A similar API offers the capability to run a function asynchronously,
+which is primarily used to capture the output that the function
+produces in the caller in order to process it.
+
+
+Functions
+---------
+
+`start_command`::
+
+	Start a sub-process. Takes a pointer to a `struct child_process`
+	that specifies the details and returns pipe FDs (if requested).
+	See below for details.
+
+`finish_command`::
+
+	Wait for the completion of a sub-process that was started with
+	start_command().
+
+`run_command`::
+
+	A convenience function that encapsulates a sequence of
+	start_command() followed by finish_command(). Takes a pointer
+	to a `struct child_process` that specifies the details.
+
+`run_command_v_opt`, `run_command_v_opt_cd_env`::
+
+	Convenience functions that encapsulate a sequence of
+	start_command() followed by finish_command(). The argument argv
+	specifies the program and its arguments. The argument opt is zero
+	or more of the flags `RUN_COMMAND_NO_STDIN`, `RUN_GIT_CMD`,
+	`RUN_COMMAND_STDOUT_TO_STDERR`, or `RUN_SILENT_EXEC_FAILURE`
+	that correspond to the members .no_stdin, .git_cmd,
+	.stdout_to_stderr, .silent_exec_failure of `struct child_process`.
+	The argument dir corresponds the member .dir. The argument env
+	corresponds to the member .env.
+
+The functions above do the following:
+
+. If a system call failed, errno is set and -1 is returned. A diagnostic
+  is printed.
+
+. If the program was not found, then -1 is returned and errno is set to
+  ENOENT; a diagnostic is printed only if .silent_exec_failure is 0.
+
+. Otherwise, the program is run. If it terminates regularly, its exit
+  code is returned. No diagnostic is printed, even if the exit code is
+  non-zero.
+
+. If the program terminated due to a signal, then the return value is the
+  signal number - 128, ie. it is negative and so indicates an unusual
+  condition; a diagnostic is printed. This return value can be passed to
+  exit(2), which will report the same code to the parent process that a
+  POSIX shell's $? would report for a program that died from the signal.
+
+
+`start_async`::
+
+	Run a function asynchronously. Takes a pointer to a `struct
+	async` that specifies the details and returns a set of pipe FDs
+	for communication with the function. See below for details.
+
+`finish_async`::
+
+	Wait for the completion of an asynchronous function that was
+	started with start_async().
+
+`run_hook`::
+
+	Run a hook.
+	The first argument is a pathname to an index file, or NULL
+	if the hook uses the default index file or no index is needed.
+	The second argument is the name of the hook.
+	The further arguments correspond to the hook arguments.
+	The last argument has to be NULL to terminate the arguments list.
+	If the hook does not exist or is not executable, the return
+	value will be zero.
+	If it is executable, the hook will be executed and the exit
+	status of the hook is returned.
+	On execution, .stdout_to_stderr and .no_stdin will be set.
+	(See below.)
+
+
+Data structures
+---------------
+
+* `struct child_process`
+
+This describes the arguments, redirections, and environment of a
+command to run in a sub-process.
+
+The caller:
+
+1. allocates and clears (memset(&chld, 0, sizeof(chld));) a
+   struct child_process variable;
+2. initializes the members;
+3. calls start_command();
+4. processes the data;
+5. closes file descriptors (if necessary; see below);
+6. calls finish_command().
+
+The .argv member is set up as an array of string pointers (NULL
+terminated), of which .argv[0] is the program name to run (usually
+without a path). If the command to run is a git command, set argv[0] to
+the command name without the 'git-' prefix and set .git_cmd = 1.
+
+The members .in, .out, .err are used to redirect stdin, stdout,
+stderr as follows:
+
+. Specify 0 to request no special redirection. No new file descriptor
+  is allocated. The child process simply inherits the channel from the
+  parent.
+
+. Specify -1 to have a pipe allocated; start_command() replaces -1
+  by the pipe FD in the following way:
+
+	.in: Returns the writable pipe end into which the caller writes;
+		the readable end of the pipe becomes the child's stdin.
+
+	.out, .err: Returns the readable pipe end from which the caller
+		reads; the writable end of the pipe end becomes child's
+		stdout/stderr.
+
+  The caller of start_command() must close the so returned FDs
+  after it has completed reading from/writing to it!
+
+. Specify a file descriptor > 0 to be used by the child:
+
+	.in: The FD must be readable; it becomes child's stdin.
+	.out: The FD must be writable; it becomes child's stdout.
+	.err: The FD must be writable; it becomes child's stderr.
+
+  The specified FD is closed by start_command(), even if it fails to
+  run the sub-process!
+
+. Special forms of redirection are available by setting these members
+  to 1:
+
+	.no_stdin, .no_stdout, .no_stderr: The respective channel is
+		redirected to /dev/null.
+
+	.stdout_to_stderr: stdout of the child is redirected to its
+		stderr. This happens after stderr is itself redirected.
+		So stdout will follow stderr to wherever it is
+		redirected.
+
+To modify the environment of the sub-process, specify an array of
+string pointers (NULL terminated) in .env:
+
+. If the string is of the form "VAR=value", i.e. it contains '='
+  the variable is added to the child process's environment.
+
+. If the string does not contain '=', it names an environment
+  variable that will be removed from the child process's environment.
+
+To specify a new initial working directory for the sub-process,
+specify it in the .dir member.
+
+If the program cannot be found, the functions return -1 and set
+errno to ENOENT. Normally, an error message is printed, but if
+.silent_exec_failure is set to 1, no message is printed for this
+special error condition.
+
+
+* `struct async`
+
+This describes a function to run asynchronously, whose purpose is
+to produce output that the caller reads.
+
+The caller:
+
+1. allocates and clears (memset(&asy, 0, sizeof(asy));) a
+   struct async variable;
+2. initializes .proc and .data;
+3. calls start_async();
+4. processes communicates with proc through .in and .out;
+5. closes .in and .out;
+6. calls finish_async().
+
+The members .in, .out are used to provide a set of fd's for
+communication between the caller and the callee as follows:
+
+. Specify 0 to have no file descriptor passed.  The callee will
+  receive -1 in the corresponding argument.
+
+. Specify < 0 to have a pipe allocated; start_async() replaces
+  with the pipe FD in the following way:
+
+	.in: Returns the writable pipe end into which the caller
+	writes; the readable end of the pipe becomes the function's
+	in argument.
+
+	.out: Returns the readable pipe end from which the caller
+	reads; the writable end of the pipe becomes the function's
+	out argument.
+
+  The caller of start_async() must close the returned FDs after it
+  has completed reading from/writing from them.
+
+. Specify a file descriptor > 0 to be used by the function:
+
+	.in: The FD must be readable; it becomes the function's in.
+	.out: The FD must be writable; it becomes the function's out.
+
+  The specified FD is closed by start_async(), even if it fails to
+  run the function.
+
+The function pointer in .proc has the following signature:
+
+	int proc(int in, int out, void *data);
+
+. in, out specifies a set of file descriptors to which the function
+  must read/write the data that it needs/produces.  The function
+  *must* close these descriptors before it returns.  A descriptor
+  may be -1 if the caller did not configure a descriptor for that
+  direction.
+
+. data is the value that the caller has specified in the .data member
+  of struct async.
+
+. The return value of the function is 0 on success and non-zero
+  on failure. If the function indicates failure, finish_async() will
+  report failure as well.
+
+
+There are serious restrictions on what the asynchronous function can do
+because this facility is implemented by a pipe to a forked process on
+UNIX, but by a thread in the same address space on Windows:
+
+. It cannot change the program's state (global variables, environment,
+  etc.) in a way that the caller notices; in other words, .in and .out
+  are the only communication channels to the caller.
+
+. It must not change the program's state that the caller of the
+  facility also uses.
diff --git a/Documentation/technical/api-setup.txt b/Documentation/technical/api-setup.txt
new file mode 100644
index 0000000000..4f63a04d7d
--- /dev/null
+++ b/Documentation/technical/api-setup.txt
@@ -0,0 +1,13 @@
+setup API
+=========
+
+Talk about
+
+* setup_git_directory()
+* setup_git_directory_gently()
+* is_inside_git_dir()
+* is_inside_work_tree()
+* setup_work_tree()
+* get_pathspec()
+
+(Dscho)
diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
new file mode 100644
index 0000000000..afe2759951
--- /dev/null
+++ b/Documentation/technical/api-strbuf.txt
@@ -0,0 +1,272 @@
+strbuf API
+==========
+
+strbuf's are meant to be used with all the usual C string and memory
+APIs. Given that the length of the buffer is known, it's often better to
+use the mem* functions than a str* one (memchr vs. strchr e.g.).
+Though, one has to be careful about the fact that str* functions often
+stop on NULs and that strbufs may have embedded NULs.
+
+An strbuf is NUL terminated for convenience, but no function in the
+strbuf API actually relies on the string being free of NULs.
+
+strbufs has some invariants that are very important to keep in mind:
+
+. The `buf` member is never NULL, so it can be used in any usual C
+string operations safely. strbuf's _have_ to be initialized either by
+`strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.
++
+Do *not* assume anything on what `buf` really is (e.g. if it is
+allocated memory or not), use `strbuf_detach()` to unwrap a memory
+buffer from its strbuf shell in a safe way. That is the sole supported
+way. This will give you a malloced buffer that you can later `free()`.
++
+However, it is totally safe to modify anything in the string pointed by
+the `buf` member, between the indices `0` and `len-1` (inclusive).
+
+. The `buf` member is a byte array that has at least `len + 1` bytes
+  allocated. The extra byte is used to store a `'\0'`, allowing the
+  `buf` member to be a valid C-string. Every strbuf function ensure this
+  invariant is preserved.
++
+NOTE: It is OK to "play" with the buffer directly if you work it this
+      way:
++
+----
+strbuf_grow(sb, SOME_SIZE); <1>
+strbuf_setlen(sb, sb->len + SOME_OTHER_SIZE);
+----
+<1> Here, the memory array starting at `sb->buf`, and of length
+`strbuf_avail(sb)` is all yours, and you can be sure that
+`strbuf_avail(sb)` is at least `SOME_SIZE`.
++
+NOTE: `SOME_OTHER_SIZE` must be smaller or equal to `strbuf_avail(sb)`.
++
+Doing so is safe, though if it has to be done in many places, adding the
+missing API to the strbuf module is the way to go.
++
+WARNING: Do _not_ assume that the area that is yours is of size `alloc
+- 1` even if it's true in the current implementation. Alloc is somehow a
+"private" member that should not be messed with. Use `strbuf_avail()`
+instead.
+
+Data structures
+---------------
+
+* `struct strbuf`
+
+This is the string buffer structure. The `len` member can be used to
+determine the current length of the string, and `buf` member provides access to
+the string itself.
+
+Functions
+---------
+
+* Life cycle
+
+`strbuf_init`::
+
+	Initialize the structure. The second parameter can be zero or a bigger
+	number to allocate memory, in case you want to prevent further reallocs.
+
+`strbuf_release`::
+
+	Release a string buffer and the memory it used. You should not use the
+	string buffer after using this function, unless you initialize it again.
+
+`strbuf_detach`::
+
+	Detach the string from the strbuf and returns it; you now own the
+	storage the string occupies and it is your responsibility from then on
+	to release it with `free(3)` when you are done with it.
+
+`strbuf_attach`::
+
+	Attach a string to a buffer. You should specify the string to attach,
+	the current length of the string and the amount of allocated memory.
+	The amount must be larger than the string length, because the string you
+	pass is supposed to be a NUL-terminated string.  This string _must_ be
+	malloc()ed, and after attaching, the pointer cannot be relied upon
+	anymore, and neither be free()d directly.
+
+`strbuf_swap`::
+
+	Swap the contents of two string buffers.
+
+* Related to the size of the buffer
+
+`strbuf_avail`::
+
+	Determine the amount of allocated but unused memory.
+
+`strbuf_grow`::
+
+	Ensure that at least this amount of unused memory is available after
+	`len`. This is used when you know a typical size for what you will add
+	and want to avoid repetitive automatic resizing of the underlying buffer.
+	This is never a needed operation, but can be critical for performance in
+	some cases.
+
+`strbuf_setlen`::
+
+	Set the length of the buffer to a given value. This function does *not*
+	allocate new memory, so you should not perform a `strbuf_setlen()` to a
+	length that is larger than `len + strbuf_avail()`. `strbuf_setlen()` is
+	just meant as a 'please fix invariants from this strbuf I just messed
+	with'.
+
+`strbuf_reset`::
+
+	Empty the buffer by setting the size of it to zero.
+
+* Related to the contents of the buffer
+
+`strbuf_rtrim`::
+
+	Strip whitespace from the end of a string.
+
+`strbuf_cmp`::
+
+	Compare two buffers. Returns an integer less than, equal to, or greater
+	than zero if the first buffer is found, respectively, to be less than,
+	to match, or be greater than the second buffer.
+
+* Adding data to the buffer
+
+NOTE: All of the functions in this section will grow the buffer as necessary.
+If they fail for some reason other than memory shortage and the buffer hadn't
+been allocated before (i.e. the `struct strbuf` was set to `STRBUF_INIT`),
+then they will free() it.
+
+`strbuf_addch`::
+
+	Add a single character to the buffer.
+
+`strbuf_insert`::
+
+	Insert data to the given position of the buffer. The remaining contents
+	will be shifted, not overwritten.
+
+`strbuf_remove`::
+
+	Remove given amount of data from a given position of the buffer.
+
+`strbuf_splice`::
+
+	Remove the bytes between `pos..pos+len` and replace it with the given
+	data.
+
+`strbuf_add`::
+
+	Add data of given length to the buffer.
+
+`strbuf_addstr`::
+
+Add a NUL-terminated string to the buffer.
++
+NOTE: This function will *always* be implemented as an inline or a macro
+that expands to:
++
+----
+strbuf_add(..., s, strlen(s));
+----
++
+Meaning that this is efficient to write things like:
++
+----
+strbuf_addstr(sb, "immediate string");
+----
+
+`strbuf_addbuf`::
+
+	Copy the contents of an other buffer at the end of the current one.
+
+`strbuf_adddup`::
+
+	Copy part of the buffer from a given position till a given length to the
+	end of the buffer.
+
+`strbuf_expand`::
+
+	This function can be used to expand a format string containing
+	placeholders. To that end, it parses the string and calls the specified
+	function for every percent sign found.
++
+The callback function is given a pointer to the character after the `%`
+and a pointer to the struct strbuf.  It is expected to add the expanded
+version of the placeholder to the strbuf, e.g. to add a newline
+character if the letter `n` appears after a `%`.  The function returns
+the length of the placeholder recognized and `strbuf_expand()` skips
+over it.
++
+The format `%%` is automatically expanded to a single `%` as a quoting
+mechanism; callers do not need to handle the `%` placeholder themselves,
+and the callback function will not be invoked for this placeholder.
++
+All other characters (non-percent and not skipped ones) are copied
+verbatim to the strbuf.  If the callback returned zero, meaning that the
+placeholder is unknown, then the percent sign is copied, too.
++
+In order to facilitate caching and to make it possible to give
+parameters to the callback, `strbuf_expand()` passes a context pointer,
+which can be used by the programmer of the callback as she sees fit.
+
+`strbuf_expand_dict_cb`::
+
+	Used as callback for `strbuf_expand()`, expects an array of
+	struct strbuf_expand_dict_entry as context, i.e. pairs of
+	placeholder and replacement string.  The array needs to be
+	terminated by an entry with placeholder set to NULL.
+
+`strbuf_addbuf_percentquote`::
+
+	Append the contents of one strbuf to another, quoting any
+	percent signs ("%") into double-percents ("%%") in the
+	destination. This is useful for literal data to be fed to either
+	strbuf_expand or to the *printf family of functions.
+
+`strbuf_addf`::
+
+	Add a formatted string to the buffer.
+
+`strbuf_fread`::
+
+	Read a given size of data from a FILE* pointer to the buffer.
++
+NOTE: The buffer is rewound if the read fails. If -1 is returned,
+`errno` must be consulted, like you would do for `read(3)`.
+`strbuf_read()`, `strbuf_read_file()` and `strbuf_getline()` has the
+same behaviour as well.
+
+`strbuf_read`::
+
+	Read the contents of a given file descriptor. The third argument can be
+	used to give a hint about the file size, to avoid reallocs.
+
+`strbuf_read_file`::
+
+	Read the contents of a file, specified by its path. The third argument
+	can be used to give a hint about the file size, to avoid reallocs.
+
+`strbuf_readlink`::
+
+	Read the target of a symbolic link, specified by its path.  The third
+	argument can be used to give a hint about the size, to avoid reallocs.
+
+`strbuf_getline`::
+
+	Read a line from a FILE* pointer. The second argument specifies the line
+	terminator character, typically `'\n'`.
+
+`stripspace`::
+
+	Strip whitespace from a buffer. The second parameter controls if
+	comments are considered contents to be removed or not.
+
+`launch_editor`::
+
+	Launch the user preferred editor to edit a file and fill the buffer
+	with the file's contents upon the user completing their editing. The
+	third argument can be used to set the environment which the editor is
+	run in. If the buffer is NULL the editor is launched as usual but the
+	file's contents are not read into the buffer upon completion.
diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt
new file mode 100644
index 0000000000..6d8c24bb1e
--- /dev/null
+++ b/Documentation/technical/api-string-list.txt
@@ -0,0 +1,132 @@
+string-list API
+===============
+
+The string_list API offers a data structure and functions to handle sorted
+and unsorted string lists.
+
+The 'string_list' struct used to be called 'path_list', but was renamed
+because it is not specific to paths.
+
+The caller:
+
+. Allocates and clears a `struct string_list` variable.
+
+. Initializes the members. You might want to set the flag `strdup_strings`
+  if the strings should be strdup()ed. For example, this is necessary
+  when you add something like git_path("..."), since that function returns
+  a static buffer that will change with the next call to git_path().
++
+If you need something advanced, you can manually malloc() the `items`
+member (you need this if you add things later) and you should set the
+`nr` and `alloc` members in that case, too.
+
+. Adds new items to the list, using `string_list_append` or
+  `string_list_insert`.
+
+. Can check if a string is in the list using `string_list_has_string` or
+  `unsorted_string_list_has_string` and get it from the list using
+  `string_list_lookup` for sorted lists.
+
+. Can sort an unsorted list using `sort_string_list`.
+
+. Finally it should free the list using `string_list_clear`.
+
+Example:
+
+----
+struct string_list list;
+int i;
+
+memset(&list, 0, sizeof(struct string_list));
+string_list_append("foo", &list);
+string_list_append("bar", &list);
+for (i = 0; i < list.nr; i++)
+	printf("%s\n", list.items[i].string)
+----
+
+NOTE: It is more efficient to build an unsorted list and sort it
+afterwards, instead of building a sorted list (`O(n log n)` instead of
+`O(n^2)`).
++
+However, if you use the list to check if a certain string was added
+already, you should not do that (using unsorted_string_list_has_string()),
+because the complexity would be quadratic again (but with a worse factor).
+
+Functions
+---------
+
+* General ones (works with sorted and unsorted lists as well)
+
+`print_string_list`::
+
+	Dump a string_list to stdout, useful mainly for debugging purposes. It
+	can take an optional header argument and it writes out the
+	string-pointer pairs of the string_list, each one in its own line.
+
+`string_list_clear`::
+
+	Free a string_list. The `string` pointer of the items will be freed in
+	case the `strdup_strings` member of the string_list is set. The second
+	parameter controls if the `util` pointer of the items should be freed
+	or not.
+
+* Functions for sorted lists only
+
+`string_list_has_string`::
+
+	Determine if the string_list has a given string or not.
+
+`string_list_insert`::
+
+	Insert a new element to the string_list. The returned pointer can be
+	handy if you want to write something to the `util` pointer of the
+	string_list_item containing the just added string.
++
+Since this function uses xrealloc() (which die()s if it fails) if the
+list needs to grow, it is safe not to check the pointer. I.e. you may
+write `string_list_insert(...)->util = ...;`.
+
+`string_list_lookup`::
+
+	Look up a given string in the string_list, returning the containing
+	string_list_item. If the string is not found, NULL is returned.
+
+* Functions for unsorted lists only
+
+`string_list_append`::
+
+	Append a new string to the end of the string_list.
+
+`sort_string_list`::
+
+	Make an unsorted list sorted.
+
+`unsorted_string_list_has_string`::
+
+	It's like `string_list_has_string()` but for unsorted lists.
+
+`unsorted_string_list_lookup`::
+
+	It's like `string_list_lookup()` but for unsorted lists.
++
+The above two functions need to look through all items, as opposed to their
+counterpart for sorted lists, which performs a binary search.
+
+Data structures
+---------------
+
+* `struct string_list_item`
+
+Represents an item of the list. The `string` member is a pointer to the
+string, and you may use the `util` member for any purpose, if you want.
+
+* `struct string_list`
+
+Represents the list itself.
+
+. The array of items are available via the `items` member.
+. The `nr` member contains the number of items stored in the list.
+. The `alloc` member is used to avoid reallocating at every insertion.
+  You should not tamper with it.
+. Setting the `strdup_strings` member to 1 will strdup() the strings
+  before adding them, see above.
diff --git a/Documentation/technical/api-tree-walking.txt b/Documentation/technical/api-tree-walking.txt
new file mode 100644
index 0000000000..55b728632c
--- /dev/null
+++ b/Documentation/technical/api-tree-walking.txt
@@ -0,0 +1,145 @@
+tree walking API
+================
+
+The tree walking API is used to traverse and inspect trees.
+
+Data Structures
+---------------
+
+`struct name_entry`::
+
+	An entry in a tree. Each entry has a sha1 identifier, pathname, and
+	mode.
+
+`struct tree_desc`::
+
+	A semi-opaque data structure used to maintain the current state of the
+	walk.
++
+* `buffer` is a pointer into the memory representation of the tree. It always
+points at the current entry being visited.
+
+* `size` counts the number of bytes left in the `buffer`.
+
+* `entry` points to the current entry being visited.
+
+`struct traverse_info`::
+
+	A structure used to maintain the state of a traversal.
++
+* `prev` points to the traverse_info which was used to descend into the
+current tree. If this is the top-level tree `prev` will point to
+a dummy traverse_info.
+
+* `name` is the entry for the current tree (if the tree is a subtree).
+
+* `pathlen` is the length of the full path for the current tree.
+
+* `conflicts` can be used by callbacks to maintain directory-file conflicts.
+
+* `fn` is a callback called for each entry in the tree. See Traversing for more
+information.
+
+* `data` can be anything the `fn` callback would want to use.
+
+Initializing
+------------
+
+`init_tree_desc`::
+
+	Initialize a `tree_desc` and decode its first entry. The buffer and
+	size parameters are assumed to be the same as the buffer and size
+	members of `struct tree`.
+
+`fill_tree_descriptor`::
+
+	Initialize a `tree_desc` and decode its first entry given the sha1 of
+	a tree. Returns the `buffer` member if the sha1 is a valid tree
+	identifier and NULL otherwise.
+
+`setup_traverse_info`::
+
+	Initialize a `traverse_info` given the pathname of the tree to start
+	traversing from. The `base` argument is assumed to be the `path`
+	member of the `name_entry` being recursed into unless the tree is a
+	top-level tree in which case the empty string ("") is used.
+
+Walking
+-------
+
+`tree_entry`::
+
+	Visit the next entry in a tree. Returns 1 when there are more entries
+	left to visit and 0 when all entries have been visited. This is
+	commonly used in the test of a while loop.
+
+`tree_entry_len`::
+
+	Calculate the length of a tree entry's pathname. This utilizes the
+	memory structure of a tree entry to avoid the overhead of using a
+	generic strlen().
+
+`update_tree_entry`::
+
+	Walk to the next entry in a tree. This is commonly used in conjunction
+	with `tree_entry_extract` to inspect the current entry.
+
+`tree_entry_extract`::
+
+	Decode the entry currently being visited (the one pointed to by
+	`tree_desc's` `entry` member) and return the sha1 of the entry. The
+	`pathp` and `modep` arguments are set to the entry's pathname and mode
+	respectively.
+
+`get_tree_entry`::
+
+	Find an entry in a tree given a pathname and the sha1 of a tree to
+	search. Returns 0 if the entry is found and -1 otherwise. The third
+	and fourth parameters are set to the entry's sha1 and mode
+	respectively.
+
+Traversing
+----------
+
+`traverse_trees`::
+
+	Traverse `n` number of trees in parallel. The `fn` callback member of
+	`traverse_info` is called once for each tree entry.
+
+`traverse_callback_t`::
+	The arguments passed to the traverse callback are as follows:
++
+* `n` counts the number of trees being traversed.
+
+* `mask` has its nth bit set if something exists in the nth entry.
+
+* `dirmask` has its nth bit set if the nth tree's entry is a directory.
+
+* `entry` is an array of size `n` where the nth entry is from the nth tree.
+
+* `info` maintains the state of the traversal.
+
++
+Returning a negative value will terminate the traversal. Otherwise the
+return value is treated as an update mask. If the nth bit is set the nth tree
+will be updated and if the bit is not set the nth tree entry will be the
+same in the next callback invocation.
+
+`make_traverse_path`::
+
+	Generate the full pathname of a tree entry based from the root of the
+	traversal. For example, if the traversal has recursed into another
+	tree named "bar" the pathname of an entry "baz" in the "bar"
+	tree would be "bar/baz".
+
+`traverse_path_len`::
+
+	Calculate the length of a pathname returned by `make_traverse_path`.
+	This utilizes the memory structure of a tree entry to avoid the
+	overhead of using a generic strlen().
+
+Authors
+-------
+
+Written by Junio C Hamano <gitster@pobox.com> and Linus Torvalds
+<torvalds@linux-foundation.org>
diff --git a/Documentation/technical/api-xdiff-interface.txt b/Documentation/technical/api-xdiff-interface.txt
new file mode 100644
index 0000000000..6296ecad1d
--- /dev/null
+++ b/Documentation/technical/api-xdiff-interface.txt
@@ -0,0 +1,7 @@
+xdiff interface API
+===================
+
+Talk about our calling convention to xdiff library, including
+xdiff_emit_consume_fn.
+
+(Dscho, JC)
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
new file mode 100644
index 0000000000..1803e64e46
--- /dev/null
+++ b/Documentation/technical/pack-format.txt
@@ -0,0 +1,160 @@
+GIT pack format
+===============
+
+= pack-*.pack files have the following format:
+
+   - A header appears at the beginning and consists of the following:
+
+     4-byte signature:
+         The signature is: {'P', 'A', 'C', 'K'}
+
+     4-byte version number (network byte order):
+         GIT currently accepts version number 2 or 3 but
+         generates version 2 only.
+
+     4-byte number of objects contained in the pack (network byte order)
+
+     Observation: we cannot have more than 4G versions ;-) and
+     more than 4G objects in a pack.
+
+   - The header is followed by number of object entries, each of
+     which looks like this:
+
+     (undeltified representation)
+     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
+     compressed data
+
+     (deltified representation)
+     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
+     20-byte base object name
+     compressed delta data
+
+     Observation: length of each object is encoded in a variable
+     length format and is not constrained to 32-bit or anything.
+
+  - The trailer records 20-byte SHA1 checksum of all of the above.
+
+= Original (version 1) pack-*.idx files have the following format:
+
+  - The header consists of 256 4-byte network byte order
+    integers.  N-th entry of this table records the number of
+    objects in the corresponding pack, the first byte of whose
+    object name is less than or equal to N.  This is called the
+    'first-level fan-out' table.
+
+  - The header is followed by sorted 24-byte entries, one entry
+    per object in the pack.  Each entry is:
+
+    4-byte network byte order integer, recording where the
+    object is stored in the packfile as the offset from the
+    beginning.
+
+    20-byte object name.
+
+  - The file is concluded with a trailer:
+
+    A copy of the 20-byte SHA1 checksum at the end of
+    corresponding packfile.
+
+    20-byte SHA1-checksum of all of the above.
+
+Pack Idx file:
+
+	--  +--------------------------------+
+fanout	    | fanout[0] = 2 (for example)    |-.
+table	    +--------------------------------+ |
+	    | fanout[1]                      | |
+	    +--------------------------------+ |
+	    | fanout[2]                      | |
+	    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
+	    | fanout[255] = total objects    |---.
+	--  +--------------------------------+ | |
+main	    | offset                         | | |
+index	    | object name 00XXXXXXXXXXXXXXXX | | |
+table	    +--------------------------------+ | |
+	    | offset                         | | |
+	    | object name 00XXXXXXXXXXXXXXXX | | |
+	    +--------------------------------+<+ |
+	  .-| offset                         |   |
+	  | | object name 01XXXXXXXXXXXXXXXX |   |
+	  | +--------------------------------+   |
+	  | | offset                         |   |
+	  | | object name 01XXXXXXXXXXXXXXXX |   |
+	  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   |
+	  | | offset                         |   |
+	  | | object name FFXXXXXXXXXXXXXXXX |   |
+	--| +--------------------------------+<--+
+trailer	  | | packfile checksum              |
+	  | +--------------------------------+
+	  | | idxfile checksum               |
+	  | +--------------------------------+
+          .-------.
+                  |
+Pack file entry: <+
+
+     packed object header:
+	1-byte size extension bit (MSB)
+	       type (next 3 bit)
+	       size0 (lower 4-bit)
+        n-byte sizeN (as long as MSB is set, each 7-bit)
+		size0..sizeN form 4+7+7+..+7 bit integer, size0
+		is the least significant part, and sizeN is the
+		most significant part.
+     packed object data:
+        If it is not DELTA, then deflated bytes (the size above
+		is the size before compression).
+	If it is REF_DELTA, then
+	  20-byte base object name SHA1 (the size above is the
+		size of the delta data that follows).
+          delta data, deflated.
+	If it is OFS_DELTA, then
+	  n-byte offset (see below) interpreted as a negative
+		offset from the type-byte of the header of the
+		ofs-delta entry (the size above is the size of
+		the delta data that follows).
+	  delta data, deflated.
+
+     offset encoding:
+	  n bytes with MSB set in all but the last one.
+	  The offset is then the number constructed by
+	  concatenating the lower 7 bit of each byte, and
+	  for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
+	  to the result.
+
+
+
+= Version 2 pack-*.idx files support packs larger than 4 GiB, and
+  have some other reorganizations.  They have the format:
+
+  - A 4-byte magic number '\377tOc' which is an unreasonable
+    fanout[0] value.
+
+  - A 4-byte version number (= 2)
+
+  - A 256-entry fan-out table just like v1.
+
+  - A table of sorted 20-byte SHA1 object names.  These are
+    packed together without offset values to reduce the cache
+    footprint of the binary search for a specific object name.
+
+  - A table of 4-byte CRC32 values of the packed object data.
+    This is new in v2 so compressed data can be copied directly
+    from pack to pack during repacking without undetected
+    data corruption.
+
+  - A table of 4-byte offset values (in network byte order).
+    These are usually 31-bit pack file offsets, but large
+    offsets are encoded as an index into the next table with
+    the msbit set.
+
+  - A table of 8-byte offset entries (empty for pack files less
+    than 2 GiB).  Pack files are organized with heavily used
+    objects toward the front, so most object references should
+    not need to refer to this table.
+
+  - The same trailer as a v1 pack file:
+
+    A copy of the 20-byte SHA1 checksum at the end of
+    corresponding packfile.
+
+    20-byte SHA1-checksum of all of the above.
diff --git a/Documentation/technical/pack-heuristics.txt b/Documentation/technical/pack-heuristics.txt
new file mode 100644
index 0000000000..103eb5d989
--- /dev/null
+++ b/Documentation/technical/pack-heuristics.txt
@@ -0,0 +1,466 @@
+        Concerning Git's Packing Heuristics
+        ===================================
+
+        Oh, here's a really stupid question:
+
+                  Where do I go
+               to learn the details
+	    of git's packing heuristics?
+
+Be careful what you ask!
+
+Followers of the git, please open the git IRC Log and turn to
+February 10, 2006.
+
+It's a rare occasion, and we are joined by the King Git Himself,
+Linus Torvalds (linus).  Nathaniel Smith, (njs`), has the floor
+and seeks enlightenment.  Others are present, but silent.
+
+Let's listen in!
+
+    <njs`> Oh, here's a really stupid question -- where do I go to
+        learn the details of git's packing heuristics?  google avails
+        me not, reading the source didn't help a lot, and wading
+        through the whole mailing list seems less efficient than any
+        of that.
+
+It is a bold start!  A plea for help combined with a simultaneous
+tri-part attack on some of the tried and true mainstays in the quest
+for enlightenment.  Brash accusations of google being useless. Hubris!
+Maligning the source.  Heresy!  Disdain for the mailing list archives.
+Woe.
+
+    <pasky> yes, the packing-related delta stuff is somewhat
+        mysterious even for me ;)
+
+Ah!  Modesty after all.
+
+    <linus> njs, I don't think the docs exist. That's something where
+	 I don't think anybody else than me even really got involved.
+	 Most of the rest of git others have been busy with (especially
+	 Junio), but packing nobody touched after I did it.
+
+It's cryptic, yet vague.  Linus in style for sure.  Wise men
+interpret this as an apology.  A few argue it is merely a
+statement of fact.
+
+    <njs`> I guess the next step is "read the source again", but I
+        have to build up a certain level of gumption first :-)
+
+Indeed!  On both points.
+
+    <linus> The packing heuristic is actually really really simple.
+
+Bait...
+
+    <linus> But strange.
+
+And switch.  That ought to do it!
+
+    <linus> Remember: git really doesn't follow files. So what it does is
+        - generate a list of all objects
+        - sort the list according to magic heuristics
+        - walk the list, using a sliding window, seeing if an object
+          can be diffed against another object in the window
+        - write out the list in recency order
+
+The traditional understatement:
+
+    <njs`> I suspect that what I'm missing is the precise definition of
+        the word "magic"
+
+The traditional insight:
+
+    <pasky> yes
+
+And Babel-like confusion flowed.
+
+    <njs`> oh, hmm, and I'm not sure what this sliding window means either
+
+    <pasky> iirc, it appeared to me to be just the sha1 of the object
+        when reading the code casually ...
+
+        ... which simply doesn't sound as a very good heuristics, though ;)
+
+    <njs`> .....and recency order.  okay, I think it's clear I didn't
+       even realize how much I wasn't realizing :-)
+
+Ah, grasshopper!  And thus the enlightenment begins anew.
+
+    <linus> The "magic" is actually in theory totally arbitrary.
+        ANY order will give you a working pack, but no, it's not
+        ordered by SHA1.
+
+        Before talking about the ordering for the sliding delta
+        window, let's talk about the recency order. That's more
+        important in one way.
+
+    <njs`> Right, but if all you want is a working way to pack things
+        together, you could just use cat and save yourself some
+        trouble...
+
+Waaait for it....
+
+    <linus> The recency ordering (which is basically: put objects
+        _physically_ into the pack in the order that they are
+        "reachable" from the head) is important.
+
+    <njs`> okay
+
+    <linus> It's important because that's the thing that gives packs
+        good locality. It keeps the objects close to the head (whether
+        they are old or new, but they are _reachable_ from the head)
+        at the head of the pack. So packs actually have absolutely
+        _wonderful_ IO patterns.
+
+Read that again, because it is important.
+
+    <linus> But recency ordering is totally useless for deciding how
+        to actually generate the deltas, so the delta ordering is
+        something else.
+
+        The delta ordering is (wait for it):
+        - first sort by the "basename" of the object, as defined by
+          the name the object was _first_ reached through when
+          generating the object list
+        - within the same basename, sort by size of the object
+        - but always sort different types separately (commits first).
+
+        That's not exactly it, but it's very close.
+
+    <njs`> The "_first_ reached" thing is not too important, just you
+        need some way to break ties since the same objects may be
+        reachable many ways, yes?
+
+And as if to clarify:
+
+    <linus> The point is that it's all really just any random
+        heuristic, and the ordering is totally unimportant for
+        correctness, but it helps a lot if the heuristic gives
+        "clumping" for things that are likely to delta well against
+        each other.
+
+It is an important point, so secretly, I did my own research and have
+included my results below.  To be fair, it has changed some over time.
+And through the magic of Revisionistic History, I draw upon this entry
+from The Git IRC Logs on my father's birthday, March 1:
+
+    <gitster> The quote from the above linus should be rewritten a
+        bit (wait for it):
+        - first sort by type.  Different objects never delta with
+	  each other.
+        - then sort by filename/dirname.  hash of the basename
+          occupies the top BITS_PER_INT-DIR_BITS bits, and bottom
+          DIR_BITS are for the hash of leading path elements.
+        - then if we are doing "thin" pack, the objects we are _not_
+          going to pack but we know about are sorted earlier than
+          other objects.
+        - and finally sort by size, larger to smaller.
+
+In one swell-foop, clarification and obscurification!  Nonetheless,
+authoritative.  Cryptic, yet concise.  It even solicits notions of
+quotes from The Source Code.  Clearly, more study is needed.
+
+    <gitster> That's the sort order.  What this means is:
+        - we do not delta different object types.
+	- we prefer to delta the objects with the same full path, but
+          allow files with the same name from different directories.
+	- we always prefer to delta against objects we are not going
+          to send, if there are some.
+	- we prefer to delta against larger objects, so that we have
+          lots of removals.
+
+        The penultimate rule is for "thin" packs.  It is used when
+        the other side is known to have such objects.
+
+There it is again. "Thin" packs.  I'm thinking to myself, "What
+is a 'thin' pack?"  So I ask:
+
+    <jdl> What is a "thin" pack?
+
+    <gitster> Use of --objects-edge to rev-list as the upstream of
+        pack-objects.  The pack transfer protocol negotiates that.
+
+Woo hoo!  Cleared that _right_ up!
+
+    <gitster> There are two directions - push and fetch.
+
+There!  Did you see it?  It is not '"push" and "pull"'!  How often the
+confusion has started here.  So casually mentioned, too!
+
+    <gitster> For push, git-send-pack invokes git-receive-pack on the
+        other end.  The receive-pack says "I have up to these commits".
+        send-pack looks at them, and computes what are missing from
+        the other end.  So "thin" could be the default there.
+
+        In the other direction, fetch, git-fetch-pack and
+        git-clone-pack invokes git-upload-pack on the other end
+	(via ssh or by talking to the daemon).
+
+	There are two cases: fetch-pack with -k and clone-pack is one,
+        fetch-pack without -k is the other.  clone-pack and fetch-pack
+        with -k will keep the downloaded packfile without expanded, so
+        we do not use thin pack transfer.  Otherwise, the generated
+        pack will have delta without base object in the same pack.
+
+        But fetch-pack without -k will explode the received pack into
+        individual objects, so we automatically ask upload-pack to
+        give us a thin pack if upload-pack supports it.
+
+OK then.
+
+Uh.
+
+Let's return to the previous conversation still in progress.
+
+    <njs`> and "basename" means something like "the tail of end of
+        path of file objects and dir objects, as per basename(3), and
+        we just declare all commit and tag objects to have the same
+        basename" or something?
+
+Luckily, that too is a point that gitster clarified for us!
+
+If I might add, the trick is to make files that _might_ be similar be
+located close to each other in the hash buckets based on their file
+names.  It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and
+"Makefile" all landed in the same bucket due to their common basename,
+"Makefile". However, now they land in "close" buckets.
+
+The algorithm allows not just for the _same_ bucket, but for _close_
+buckets to be considered delta candidates.  The rationale is
+essentially that files, like Makefiles, often have very similar
+content no matter what directory they live in.
+
+    <linus> I played around with different delta algorithms, and with
+        making the "delta window" bigger, but having too big of a
+        sliding window makes it very expensive to generate the pack:
+        you need to compare every object with a _ton_ of other objects.
+
+        There are a number of other trivial heuristics too, which
+        basically boil down to "don't bother even trying to delta this
+        pair" if we can tell before-hand that the delta isn't worth it
+        (due to size differences, where we can take a previous delta
+        result into account to decide that "ok, no point in trying
+        that one, it will be worse").
+
+        End result: packing is actually very size efficient. It's
+        somewhat CPU-wasteful, but on the other hand, since you're
+        really only supposed to do it maybe once a month (and you can
+        do it during the night), nobody really seems to care.
+
+Nice Engineering Touch, there.  Find when it doesn't matter, and
+proclaim it a non-issue.  Good style too!
+
+    <njs`> So, just to repeat to see if I'm following, we start by
+        getting a list of the objects we want to pack, we sort it by
+        this heuristic (basically lexicographically on the tuple
+        (type, basename, size)).
+
+        Then we walk through this list, and calculate a delta of
+        each object against the last n (tunable parameter) objects,
+        and pick the smallest of these deltas.
+
+Vastly simplified, but the essence is there!
+
+    <linus> Correct.
+
+    <njs`> And then once we have picked a delta or fulltext to
+        represent each object, we re-sort by recency, and write them
+        out in that order.
+
+    <linus> Yup. Some other small details:
+
+And of course there is the "Other Shoe" Factor too.
+
+    <linus> - We limit the delta depth to another magic value (right
+        now both the window and delta depth magic values are just "10")
+
+    <njs`> Hrm, my intuition is that you'd end up with really _bad_ IO
+        patterns, because the things you want are near by, but to
+        actually reconstruct them you may have to jump all over in
+        random ways.
+
+    <linus> - When we write out a delta, and we haven't yet written
+        out the object it is a delta against, we write out the base
+        object first.  And no, when we reconstruct them, we actually
+        get nice IO patterns, because:
+        - larger objects tend to be "more recent" (Linus' law: files grow)
+        - we actively try to generate deltas from a larger object to a
+          smaller one
+        - this means that the top-of-tree very seldom has deltas
+          (i.e. deltas in _practice_ are "backwards deltas")
+
+Again, we should reread that whole paragraph.  Not just because
+Linus has slipped Linus's Law in there on us, but because it is
+important.  Let's make sure we clarify some of the points here:
+
+    <njs`> So the point is just that in practice, delta order and
+        recency order match each other quite well.
+
+    <linus> Yes. There's another nice side to this (and yes, it was
+	designed that way ;):
+        - the reason we generate deltas against the larger object is
+	  actually a big space saver too!
+
+    <njs`> Hmm, but your last comment (if "we haven't yet written out
+        the object it is a delta against, we write out the base object
+        first"), seems like it would make these facts mostly
+        irrelevant because even if in practice you would not have to
+        wander around much, in fact you just brute-force say that in
+        the cases where you might have to wander, don't do that :-)
+
+    <linus> Yes and no. Notice the rule: we only write out the base
+        object first if the delta against it was more recent.  That
+        means that you can actually have deltas that refer to a base
+        object that is _not_ close to the delta object, but that only
+        happens when the delta is needed to generate an _old_ object.
+
+    <linus> See?
+
+Yeah, no.  I missed that on the first two or three readings myself.
+
+    <linus> This keeps the front of the pack dense. The front of the
+        pack never contains data that isn't relevant to a "recent"
+        object.  The size optimization comes from our use of xdelta
+        (but is true for many other delta algorithms): removing data
+        is cheaper (in size) than adding data.
+
+        When you remove data, you only need to say "copy bytes n--m".
+	In contrast, in a delta that _adds_ data, you have to say "add
+        these bytes: 'actual data goes here'"
+
+    *** njs` has quit: Read error: 104 (Connection reset by peer)
+
+    <linus> Uhhuh. I hope I didn't blow njs` mind.
+
+    *** njs` has joined channel #git
+
+    <pasky> :)
+
+The silent observers are amused.  Of course.
+
+And as if njs` was expected to be omniscient:
+
+    <linus> njs - did you miss anything?
+
+OK, I'll spell it out.  That's Geek Humor.  If njs` was not actually
+connected for a little bit there, how would he know if missed anything
+while he was disconnected?  He's a benevolent dictator with a sense of
+humor!  Well noted!
+
+    <njs`> Stupid router.  Or gremlins, or whatever.
+
+It's a cheap shot at Cisco.  Take 'em when you can.
+
+    <njs`> Yes and no. Notice the rule: we only write out the base
+        object first if the delta against it was more recent.
+
+        I'm getting lost in all these orders, let me re-read :-)
+	So the write-out order is from most recent to least recent?
+        (Conceivably it could be the opposite way too, I'm not sure if
+        we've said) though my connection back at home is logging, so I
+        can just read what you said there :-)
+
+And for those of you paying attention, the Omniscient Trick has just
+been detailed!
+
+    <linus> Yes, we always write out most recent first
+
+For the other record:
+
+    <pasky> njs`: http://pastebin.com/547965
+
+The 'net never forgets, so that should be good until the end of time.
+
+    <njs`> And, yeah, I got the part about deeper-in-history stuff
+        having worse IO characteristics, one sort of doesn't care.
+
+    <linus> With the caveat that if the "most recent" needs an older
+        object to delta against (hey, shrinking sometimes does
+        happen), we write out the old object with the delta.
+
+    <njs`> (if only it happened more...)
+
+    <linus> Anyway, the pack-file could easily be denser still, but
+        because it's used both for streaming (the git protocol) and
+        for on-disk, it has a few pessimizations.
+
+Actually, it is a made-up word. But it is a made-up word being
+used as setup for a later optimization, which is a real word:
+
+    <linus> In particular, while the pack-file is then compressed,
+        it's compressed just one object at a time, so the actual
+        compression factor is less than it could be in theory. But it
+        means that it's all nice random-access with a simple index to
+        do "object name->location in packfile" translation.
+
+    <njs`> I'm assuming the real win for delta-ing large->small is
+        more homogeneous statistics for gzip to run over?
+
+        (You have to put the bytes in one place or another, but
+        putting them in a larger blob wins on compression)
+
+        Actually, what is the compression strategy -- each delta
+        individually gzipped, the whole file gzipped, somewhere in
+        between, no compression at all, ....?
+
+        Right.
+
+Reality IRC sets in.  For example:
+
+    <pasky> I'll read the rest in the morning, I really have to go
+        sleep or there's no hope whatsoever for me at the today's
+        exam... g'nite all.
+
+Heh.
+
+    <linus> pasky: g'nite
+
+    <njs`> pasky: 'luck
+
+    <linus> Right: large->small matters exactly because of compression
+        behaviour. If it was non-compressed, it probably wouldn't make
+        any difference.
+
+    <njs`> yeah
+
+    <linus> Anyway: I'm not even trying to claim that the pack-files
+        are perfect, but they do tend to have a nice balance of
+        density vs ease-of use.
+
+Gasp!  OK, saved.  That's a fair Engineering trade off.  Close call!
+In fact, Linus reflects on some Basic Engineering Fundamentals,
+design options, etc.
+
+    <linus> More importantly, they allow git to still _conceptually_
+        never deal with deltas at all, and be a "whole object" store.
+
+        Which has some problems (we discussed bad huge-file
+        behaviour on the git lists the other day), but it does mean
+        that the basic git concepts are really really simple and
+        straightforward.
+
+        It's all been quite stable.
+
+        Which I think is very much a result of having very simple
+        basic ideas, so that there's never any confusion about what's
+        going on.
+
+        Bugs happen, but they are "simple" bugs. And bugs that
+        actually get some object store detail wrong are almost always
+        so obvious that they never go anywhere.
+
+    <njs`> Yeah.
+
+Nuff said.
+
+    <linus> Anyway.  I'm off for bed. It's not 6AM here, but I've got
+	 three kids, and have to get up early in the morning to send
+	 them off. I need my beauty sleep.
+
+    <njs`> :-)
+
+    <njs`> appreciate the infodump, I really was failing to find the
+        details on git packs :-)
+
+And now you know the rest of the story.
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
new file mode 100644
index 0000000000..369f91d3b9
--- /dev/null
+++ b/Documentation/technical/pack-protocol.txt
@@ -0,0 +1,494 @@
+Packfile transfer protocols
+===========================
+
+Git supports transferring data in packfiles over the ssh://, git:// and
+file:// transports.  There exist two sets of protocols, one for pushing
+data from a client to a server and another for fetching data from a
+server to a client.  All three transports (ssh, git, file) use the same
+protocol to transfer data.
+
+The processes invoked in the canonical Git implementation are 'upload-pack'
+on the server side and 'fetch-pack' on the client side for fetching data;
+then 'receive-pack' on the server and 'send-pack' on the client for pushing
+data.  The protocol functions to have a server tell a client what is
+currently on the server, then for the two to negotiate the smallest amount
+of data to send in order to fully update one or the other.
+
+Transports
+----------
+There are three transports over which the packfile protocol is
+initiated.  The Git transport is a simple, unauthenticated server that
+takes the command (almost always 'upload-pack', though Git
+servers can be configured to be globally writable, in which 'receive-
+pack' initiation is also allowed) with which the client wishes to
+communicate and executes it and connects it to the requesting
+process.
+
+In the SSH transport, the client just runs the 'upload-pack'
+or 'receive-pack' process on the server over the SSH protocol and then
+communicates with that invoked process over the SSH connection.
+
+The file:// transport runs the 'upload-pack' or 'receive-pack'
+process locally and communicates with it over a pipe.
+
+Git Transport
+-------------
+
+The Git transport starts off by sending the command and repository
+on the wire using the pkt-line format, followed by a NUL byte and a
+hostname parameter, terminated by a NUL byte.
+
+   0032git-upload-pack /project.git\0host=myserver.com\0
+
+--
+   git-proto-request = request-command SP pathname NUL [ host-parameter NUL ]
+   request-command   = "git-upload-pack" / "git-receive-pack" /
+		       "git-upload-archive"   ; case sensitive
+   pathname          = *( %x01-ff ) ; exclude NUL
+   host-parameter    = "host=" hostname [ ":" port ]
+--
+
+Only host-parameter is allowed in the git-proto-request. Clients
+MUST NOT attempt to send additional parameters. It is used for the
+git-daemon name based virtual hosting.  See --interpolated-path
+option to git daemon, with the %H/%CH format characters.
+
+Basically what the Git client is doing to connect to an 'upload-pack'
+process on the server side over the Git protocol is this:
+
+   $ echo -e -n \
+     "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
+     nc -v example.com 9418
+
+
+SSH Transport
+-------------
+
+Initiating the upload-pack or receive-pack processes over SSH is
+executing the binary on the server via SSH remote execution.
+It is basically equivalent to running this:
+
+   $ ssh git.example.com "git-upload-pack '/project.git'"
+
+For a server to support Git pushing and pulling for a given user over
+SSH, that user needs to be able to execute one or both of those
+commands via the SSH shell that they are provided on login.  On some
+systems, that shell access is limited to only being able to run those
+two commands, or even just one of them.
+
+In an ssh:// format URI, it's absolute in the URI, so the '/' after
+the host name (or port number) is sent as an argument, which is then
+read by the remote git-upload-pack exactly as is, so it's effectively
+an absolute path in the remote filesystem.
+
+       git clone ssh://user@example.com/project.git
+		    |
+		    v
+    ssh user@example.com "git-upload-pack '/project.git'"
+
+In a "user@host:path" format URI, its relative to the user's home
+directory, because the Git client will run:
+
+     git clone user@example.com:project.git
+		    |
+		    v
+  ssh user@example.com "git-upload-pack 'project.git'"
+
+The exception is if a '~' is used, in which case
+we execute it without the leading '/'.
+
+      ssh://user@example.com/~alice/project.git,
+		     |
+		     v
+   ssh user@example.com "git-upload-pack '~alice/project.git'"
+
+A few things to remember here:
+
+- The "command name" is spelled with dash (e.g. git-upload-pack), but
+  this can be overridden by the client;
+
+- The repository path is always quoted with single quotes.
+
+Fetching Data From a Server
+===========================
+
+When one Git repository wants to get data that a second repository
+has, the first can 'fetch' from the second.  This operation determines
+what data the server has that the client does not then streams that
+data down to the client in packfile format.
+
+
+Reference Discovery
+-------------------
+
+When the client initially connects the server will immediately respond
+with a listing of each reference it has (all branches and tags) along
+with the object name that each reference currently points to.
+
+   $ echo -e -n "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
+      nc -v example.com 9418
+   00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow no-progress include-tag
+   00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration
+   003f7217a7c7e582c46cec22a130adf4b9d7d950fba0 refs/heads/master
+   003cb88d2441cac0977faf98efc80305012112238d9d refs/tags/v0.9
+   003c525128480b96c89e6418b1e40909bf6c5b2d580f refs/tags/v1.0
+   003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{}
+   0000
+
+Server SHOULD terminate each non-flush line using LF ("\n") terminator;
+client MUST NOT complain if there is no terminator.
+
+The returned response is a pkt-line stream describing each ref and
+its current value.  The stream MUST be sorted by name according to
+the C locale ordering.
+
+If HEAD is a valid ref, HEAD MUST appear as the first advertised
+ref.  If HEAD is not a valid ref, HEAD MUST NOT appear in the
+advertisement list at all, but other refs may still appear.
+
+The stream MUST include capability declarations behind a NUL on the
+first ref. The peeled value of a ref (that is "ref^{}") MUST be
+immediately after the ref itself, if presented. A conforming server
+MUST peel the ref if it's an annotated tag.
+
+----
+  advertised-refs  =  (no-refs / list-of-refs)
+		      flush-pkt
+
+  no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
+		      NUL capability-list LF)
+
+  list-of-refs     =  first-ref *other-ref
+  first-ref        =  PKT-LINE(obj-id SP refname
+		      NUL capability-list LF)
+
+  other-ref        =  PKT-LINE(other-tip / other-peeled)
+  other-tip        =  obj-id SP refname LF
+  other-peeled     =  obj-id SP refname "^{}" LF
+
+  capability-list  =  capability *(SP capability)
+  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
+  LC_ALPHA         =  %x61-7A
+----
+
+Server and client MUST use lowercase for obj-id, both MUST treat obj-id
+as case-insensitive.
+
+See protocol-capabilities.txt for a list of allowed server capabilities
+and descriptions.
+
+Packfile Negotiation
+--------------------
+After reference and capabilities discovery, the client can decide
+to terminate the connection by sending a flush-pkt, telling the
+server it can now gracefully terminate (as happens with the ls-remote
+command) or it can enter the negotiation phase, where the client and
+server determine what the minimal packfile necessary for transport is.
+
+Once the client has the initial list of references that the server
+has, as well as the list of capabilities, it will begin telling the
+server what objects it wants and what objects it has, so the server
+can make a packfile that only contains the objects that the client needs.
+The client will also send a list of the capabilities it wants to be in
+effect, out of what the server said it could do with the first 'want' line.
+
+----
+  upload-request    =  want-list
+		       have-list
+		       compute-end
+
+  want-list         =  first-want
+		       *additional-want
+		       flush-pkt
+
+  first-want        =  PKT-LINE("want" SP obj-id SP capability-list LF)
+  additional-want   =  PKT-LINE("want" SP obj-id LF)
+
+  have-list         =  *have-line
+  have-line         =  PKT-LINE("have" SP obj-id LF)
+  compute-end       =  flush-pkt / PKT-LINE("done")
+----
+
+Clients MUST send all the obj-ids it wants from the reference
+discovery phase as 'want' lines. Clients MUST send at least one
+'want' command in the request body. Clients MUST NOT mention an
+obj-id in a 'want' command which did not appear in the response
+obtained through ref discovery.
+
+If client is requesting a shallow clone, it will now send a 'deepen'
+line with the depth it is requesting.
+
+Once all the "want"s (and optional 'deepen') are transferred,
+clients MUST send a flush-pkt. If the client has all the references
+on the server, client flushes and disconnects.
+
+TODO: shallow/unshallow response and document the deepen command in the ABNF.
+
+Now the client will send a list of the obj-ids it has using 'have'
+lines.  In multi_ack mode, the canonical implementation will send up
+to 32 of these at a time, then will send a flush-pkt.  The canonical
+implementation will skip ahead and send the next 32 immediately,
+so that there is always a block of 32 "in-flight on the wire" at a
+time.
+
+If the server reads 'have' lines, it then will respond by ACKing any
+of the obj-ids the client said it had that the server also has. The
+server will ACK obj-ids differently depending on which ack mode is
+chosen by the client.
+
+In multi_ack mode:
+
+  * the server will respond with 'ACK obj-id continue' for any common
+    commits.
+
+  * once the server has found an acceptable common base commit and is
+    ready to make a packfile, it will blindly ACK all 'have' obj-ids
+    back to the client.
+
+  * the server will then send a 'NACK' and then wait for another response
+    from the client - either a 'done' or another list of 'have' lines.
+
+In multi_ack_detailed mode:
+
+  * the server will differentiate the ACKs where it is signaling
+    that it is ready to send data with 'ACK obj-id ready' lines, and
+    signals the identified common commits with 'ACK obj-id common' lines.
+
+Without either multi_ack or multi_ack_detailed:
+
+ * upload-pack sends "ACK obj-id" on the first common object it finds.
+   After that it says nothing until the client gives it a "done".
+
+ * upload-pack sends "NAK" on a flush-pkt if no common object
+   has been found yet.  If one has been found, and thus an ACK
+   was already sent, it's silent on the flush-pkt.
+
+After the client has gotten enough ACK responses that it can determine
+that the server has enough information to send an efficient packfile
+(in the canonical implementation, this is determined when it has received
+enough ACKs that it can color everything left in the --date-order queue
+as common with the server, or the --date-order queue is empty), or the
+client determines that it wants to give up (in the canonical implementation,
+this is determined when the client sends 256 'have' lines without getting
+any of them ACKed by the server - meaning there is nothing in common and
+the server should just send all of its objects), then the client will send
+a 'done' command.  The 'done' command signals to the server that the client
+is ready to receive its packfile data.
+
+However, the 256 limit *only* turns on in the canonical client
+implementation if we have received at least one "ACK %s continue"
+during a prior round.  This helps to ensure that at least one common
+ancestor is found before we give up entirely.
+
+Once the 'done' line is read from the client, the server will either
+send a final 'ACK obj-id' or it will send a 'NAK'. The server only sends
+ACK after 'done' if there is at least one common base and multi_ack or
+multi_ack_detailed is enabled. The server always sends NAK after 'done'
+if there is no common base found.
+
+Then the server will start sending its packfile data.
+
+----
+  server-response = *ack_multi ack / nak
+  ack_multi       = PKT-LINE("ACK" SP obj-id ack_status LF)
+  ack_status      = "continue" / "common" / "ready"
+  ack             = PKT-LINE("ACK SP obj-id LF)
+  nak             = PKT-LINE("NAK" LF)
+----
+
+A simple clone may look like this (with no 'have' lines):
+
+----
+   C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d\0multi_ack \
+     side-band-64k ofs-delta\n
+   C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
+   C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
+   C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
+   C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n
+   C: 0000
+   C: 0009done\n
+
+   S: 0008NAK\n
+   S: [PACKFILE]
+----
+
+An incremental update (fetch) response might look like this:
+
+----
+   C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d\0multi_ack \
+     side-band-64k ofs-delta\n
+   C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n
+   C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n
+   C: 0000
+   C: 0032have 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n
+   C: [30 more have lines]
+   C: 0032have 74730d410fcb6603ace96f1dc55ea6196122532d\n
+   C: 0000
+
+   S: 003aACK 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 continue\n
+   S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d continue\n
+   S: 0008NAK\n
+
+   C: 0009done\n
+
+   S: 0031ACK 74730d410fcb6603ace96f1dc55ea6196122532d\n
+   S: [PACKFILE]
+----
+
+
+Packfile Data
+-------------
+
+Now that the client and server have finished negotiation about what
+the minimal amount of data that needs to be sent to the client is, the server
+will construct and send the required data in packfile format.
+
+See pack-format.txt for what the packfile itself actually looks like.
+
+If 'side-band' or 'side-band-64k' capabilities have been specified by
+the client, the server will send the packfile data multiplexed.
+
+Each packet starting with the packet-line length of the amount of data
+that follows, followed by a single byte specifying the sideband the
+following data is coming in on.
+
+In 'side-band' mode, it will send up to 999 data bytes plus 1 control
+code, for a total of up to 1000 bytes in a pkt-line.  In 'side-band-64k'
+mode it will send up to 65519 data bytes plus 1 control code, for a
+total of up to 65520 bytes in a pkt-line.
+
+The sideband byte will be a '1', '2' or a '3'. Sideband '1' will contain
+packfile data, sideband '2' will be used for progress information that the
+client will generally print to stderr and sideband '3' is used for error
+information.
+
+If no 'side-band' capability was specified, the server will stream the
+entire packfile without multiplexing.
+
+
+Pushing Data To a Server
+========================
+
+Pushing data to a server will invoke the 'receive-pack' process on the
+server, which will allow the client to tell it which references it should
+update and then send all the data the server will need for those new
+references to be complete.  Once all the data is received and validated,
+the server will then update its references to what the client specified.
+
+Authentication
+--------------
+
+The protocol itself contains no authentication mechanisms.  That is to be
+handled by the transport, such as SSH, before the 'receive-pack' process is
+invoked.  If 'receive-pack' is configured over the Git transport, those
+repositories will be writable by anyone who can access that port (9418) as
+that transport is unauthenticated.
+
+Reference Discovery
+-------------------
+
+The reference discovery phase is done nearly the same way as it is in the
+fetching protocol. Each reference obj-id and name on the server is sent
+in packet-line format to the client, followed by a flush-pkt.  The only
+real difference is that the capability listing is different - the only
+possible values are 'report-status', 'delete-refs' and 'ofs-delta'.
+
+Reference Update Request and Packfile Transfer
+----------------------------------------------
+
+Once the client knows what references the server is at, it can send a
+list of reference update requests.  For each reference on the server
+that it wants to update, it sends a line listing the obj-id currently on
+the server, the obj-id the client would like to update it to and the name
+of the reference.
+
+This list is followed by a flush-pkt and then the packfile that should
+contain all the objects that the server will need to complete the new
+references.
+
+----
+  update-request    =  command-list [pack-file]
+
+  command-list      =  PKT-LINE(command NUL capability-list LF)
+		       *PKT-LINE(command LF)
+		       flush-pkt
+
+  command           =  create / delete / update
+  create            =  zero-id SP new-id  SP name
+  delete            =  old-id  SP zero-id SP name
+  update            =  old-id  SP new-id  SP name
+
+  old-id            =  obj-id
+  new-id            =  obj-id
+
+  pack-file         = "PACK" 28*(OCTET)
+----
+
+If the receiving end does not support delete-refs, the sending end MUST
+NOT ask for delete command.
+
+The pack-file MUST NOT be sent if the only command used is 'delete'.
+
+A pack-file MUST be sent if either create or update command is used,
+even if the server already has all the necessary objects.  In this
+case the client MUST send an empty pack-file.   The only time this
+is likely to happen is if the client is creating
+a new branch or a tag that points to an existing obj-id.
+
+The server will receive the packfile, unpack it, then validate each
+reference that is being updated that it hasn't changed while the request
+was being processed (the obj-id is still the same as the old-id), and
+it will run any update hooks to make sure that the update is acceptable.
+If all of that is fine, the server will then update the references.
+
+Report Status
+-------------
+
+After receiving the pack data from the sender, the receiver sends a
+report if 'report-status' capability is in effect.
+It is a short listing of what happened in that update.  It will first
+list the status of the packfile unpacking as either 'unpack ok' or
+'unpack [error]'.  Then it will list the status for each of the references
+that it tried to update.  Each line is either 'ok [refname]' if the
+update was successful, or 'ng [refname] [error]' if the update was not.
+
+----
+  report-status     = unpack-status
+		      1*(command-status)
+		      flush-pkt
+
+  unpack-status     = PKT-LINE("unpack" SP unpack-result LF)
+  unpack-result     = "ok" / error-msg
+
+  command-status    = command-ok / command-fail
+  command-ok        = PKT-LINE("ok" SP refname LF)
+  command-fail      = PKT-LINE("ng" SP refname SP error-msg LF)
+
+  error-msg         = 1*(OCTECT) ; where not "ok"
+----
+
+Updates can be unsuccessful for a number of reasons.  The reference can have
+changed since the reference discovery phase was originally sent, meaning
+someone pushed in the meantime.  The reference being pushed could be a
+non-fast-forward reference and the update hooks or configuration could be
+set to not allow that, etc.  Also, some references can be updated while others
+can be rejected.
+
+An example client/server communication might look like this:
+
+----
+   S: 007c74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n
+   S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n
+   S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n
+   S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n
+   S: 0000
+
+   C: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n
+   C: 003e74730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n
+   C: 0000
+   C: [PACKDATA]
+
+   S: 000eunpack ok\n
+   S: 0018ok refs/heads/debug\n
+   S: 002ang refs/heads/master non-fast-forward\n
+----
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
new file mode 100644
index 0000000000..fd1a593149
--- /dev/null
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -0,0 +1,187 @@
+Git Protocol Capabilities
+=========================
+
+Servers SHOULD support all capabilities defined in this document.
+
+On the very first line of the initial server response of either
+receive-pack and upload-pack the first reference is followed by
+a NUL byte and then a list of space delimited server capabilities.
+These allow the server to declare what it can and cannot support
+to the client.
+
+Client will then send a space separated list of capabilities it wants
+to be in effect. The client MUST NOT ask for capabilities the server
+did not say it supports.
+
+Server MUST diagnose and abort if capabilities it does not understand
+was sent.  Server MUST NOT ignore capabilities that client requested
+and server advertised.  As a consequence of these rules, server MUST
+NOT advertise capabilities it does not understand.
+
+The 'report-status' and 'delete-refs' capabilities are sent and
+recognized by the receive-pack (push to server) process.
+
+The 'ofs-delta' capability is sent and recognized by both upload-pack
+and receive-pack protocols.
+
+All other capabilities are only recognized by the upload-pack (fetch
+from server) process.
+
+multi_ack
+---------
+
+The 'multi_ack' capability allows the server to return "ACK obj-id
+continue" as soon as it finds a commit that it can use as a common
+base, between the client's wants and the client's have set.
+
+By sending this early, the server can potentially head off the client
+from walking any further down that particular branch of the client's
+repository history.  The client may still need to walk down other
+branches, sending have lines for those, until the server has a
+complete cut across the DAG, or the client has said "done".
+
+Without multi_ack, a client sends have lines in --date-order until
+the server has found a common base.  That means the client will send
+have lines that are already known by the server to be common, because
+they overlap in time with another branch that the server hasn't found
+a common base on yet.
+
+For example suppose the client has commits in caps that the server
+doesn't and the server has commits in lower case that the client
+doesn't, as in the following diagram:
+
+       +---- u ---------------------- x
+      /              +----- y
+     /              /
+    a -- b -- c -- d -- E -- F
+       \
+	+--- Q -- R -- S
+
+If the client wants x,y and starts out by saying have F,S, the server
+doesn't know what F,S is.  Eventually the client says "have d" and
+the server sends "ACK d continue" to let the client know to stop
+walking down that line (so don't send c-b-a), but it's not done yet,
+it needs a base for x. The client keeps going with S-R-Q, until a
+gets reached, at which point the server has a clear base and it all
+ends.
+
+Without multi_ack the client would have sent that c-b-a chain anyway,
+interleaved with S-R-Q.
+
+thin-pack
+---------
+
+This capability means that the server can send a 'thin' pack, a pack
+which does not contain base objects; if those base objects are available
+on client side. Client requests 'thin-pack' capability when it
+understands how to "thicken" it by adding required delta bases making
+it self-contained.
+
+Client MUST NOT request 'thin-pack' capability if it cannot turn a thin
+pack into a self-contained pack.
+
+
+side-band, side-band-64k
+------------------------
+
+This capability means that server can send, and client understand multiplexed
+progress reports and error info interleaved with the packfile itself.
+
+These two options are mutually exclusive. A modern client always
+favors 'side-band-64k'.
+
+Either mode indicates that the packfile data will be streamed broken
+up into packets of up to either 1000 bytes in the case of 'side_band',
+or 65520 bytes in the case of 'side_band_64k'. Each packet is made up
+of a leading 4-byte pkt-line length of how much data is in the packet,
+followed by a 1-byte stream code, followed by the actual data.
+
+The stream code can be one of:
+
+ 1 - pack data
+ 2 - progress messages
+ 3 - fatal error message just before stream aborts
+
+The "side-band-64k" capability came about as a way for newer clients
+that can handle much larger packets to request packets that are
+actually crammed nearly full, while maintaining backward compatibility
+for the older clients.
+
+Further, with side-band and its up to 1000-byte messages, it's actually
+999 bytes of payload and 1 byte for the stream code. With side-band-64k,
+same deal, you have up to 65519 bytes of data and 1 byte for the stream
+code.
+
+The client MUST send only maximum of one of "side-band" and "side-
+band-64k".  Server MUST diagnose it as an error if client requests
+both.
+
+ofs-delta
+---------
+
+Server can send, and client understand PACKv2 with delta refering to
+its base by position in pack rather than by an obj-id.  That is, they can
+send/read OBJ_OFS_DELTA (aka type 6) in a packfile.
+
+shallow
+-------
+
+This capability adds "deepen", "shallow" and "unshallow" commands to
+the  fetch-pack/upload-pack protocol so clients can request shallow
+clones.
+
+no-progress
+-----------
+
+The client was started with "git clone -q" or something, and doesn't
+want that side band 2.  Basically the client just says "I do not
+wish to receive stream 2 on sideband, so do not send it to me, and if
+you did, I will drop it on the floor anyway".  However, the sideband
+channel 3 is still used for error responses.
+
+include-tag
+-----------
+
+The 'include-tag' capability is about sending annotated tags if we are
+sending objects they point to.  If we pack an object to the client, and
+a tag object points exactly at that object, we pack the tag object too.
+In general this allows a client to get all new annotated tags when it
+fetches a branch, in a single network connection.
+
+Clients MAY always send include-tag, hardcoding it into a request when
+the server advertises this capability. The decision for a client to
+request include-tag only has to do with the client's desires for tag
+data, whether or not a server had advertised objects in the
+refs/tags/* namespace.
+
+Servers MUST pack the tags if their referrant is packed and the client
+has requested include-tags.
+
+Clients MUST be prepared for the case where a server has ignored
+include-tag and has not actually sent tags in the pack.  In such
+cases the client SHOULD issue a subsequent fetch to acquire the tags
+that include-tag would have otherwise given the client.
+
+The server SHOULD send include-tag, if it supports it, regardless
+of whether or not there are tags available.
+
+report-status
+-------------
+
+The upload-pack process can receive a 'report-status' capability,
+which tells it that the client wants a report of what happened after
+a packfile upload and reference update.  If the pushing client requests
+this capability, after unpacking and updating references the server
+will respond with whether the packfile unpacked successfully and if
+each reference was updated successfully.  If any of those were not
+successful, it will send back an error message.  See pack-protocol.txt
+for example messages.
+
+delete-refs
+-----------
+
+If the server sends back the 'delete-refs' capability, it means that
+it is capable of accepting a zero-id value as the target
+value of a reference update.  It is not sent back by the client, it
+simply informs the client that it can be sent zero-id values
+to delete references.
diff --git a/Documentation/technical/protocol-common.txt b/Documentation/technical/protocol-common.txt
new file mode 100644
index 0000000000..d30a1b9510
--- /dev/null
+++ b/Documentation/technical/protocol-common.txt
@@ -0,0 +1,96 @@
+Documentation Common to Pack and Http Protocols
+===============================================
+
+ABNF Notation
+-------------
+
+ABNF notation as described by RFC 5234 is used within the protocol documents,
+except the following replacement core rules are used:
+----
+  HEXDIG    =  DIGIT / "a" / "b" / "c" / "d" / "e" / "f"
+----
+
+We also define the following common rules:
+----
+  NUL       =  %x00
+  zero-id   =  40*"0"
+  obj-id    =  40*(HEXDIGIT)
+
+  refname  =  "HEAD"
+  refname /=  "refs/" <see discussion below>
+----
+
+A refname is a hierarchical octet string beginning with "refs/" and
+not violating the 'git-check-ref-format' command's validation rules.
+More specifically, they:
+
+. They can include slash `/` for hierarchical (directory)
+  grouping, but no slash-separated component can begin with a
+  dot `.`.
+
+. They must contain at least one `/`. This enforces the presence of a
+  category like `heads/`, `tags/` etc. but the actual names are not
+  restricted.
+
+. They cannot have two consecutive dots `..` anywhere.
+
+. They cannot have ASCII control characters (i.e. bytes whose
+  values are lower than \040, or \177 `DEL`), space, tilde `~`,
+  caret `{caret}`, colon `:`, question-mark `?`, asterisk `*`,
+  or open bracket `[` anywhere.
+
+. They cannot end with a slash `/` nor a dot `.`.
+
+. They cannot end with the sequence `.lock`.
+
+. They cannot contain a sequence `@{`.
+
+. They cannot contain a `\\`.
+
+
+pkt-line Format
+---------------
+
+Much (but not all) of the payload is described around pkt-lines.
+
+A pkt-line is a variable length binary string.  The first four bytes
+of the line, the pkt-len, indicates the total length of the line,
+in hexadecimal.  The pkt-len includes the 4 bytes used to contain
+the length's hexadecimal representation.
+
+A pkt-line MAY contain binary data, so implementors MUST ensure
+pkt-line parsing/formatting routines are 8-bit clean.
+
+A non-binary line SHOULD BE terminated by an LF, which if present
+MUST be included in the total length.
+
+The maximum length of a pkt-line's data component is 65520 bytes.
+Implementations MUST NOT send pkt-line whose length exceeds 65524
+(65520 bytes of payload + 4 bytes of length data).
+
+Implementations SHOULD NOT send an empty pkt-line ("0004").
+
+A pkt-line with a length field of 0 ("0000"), called a flush-pkt,
+is a special case and MUST be handled differently than an empty
+pkt-line ("0004").
+
+----
+  pkt-line     =  data-pkt / flush-pkt
+
+  data-pkt     =  pkt-len pkt-payload
+  pkt-len      =  4*(HEXDIG)
+  pkt-payload  =  (pkt-len - 4)*(OCTET)
+
+  flush-pkt    = "0000"
+----
+
+Examples (as C-style strings):
+
+----
+  pkt-line          actual value
+  ---------------------------------
+  "0006a\n"         "a\n"
+  "0005a"           "a"
+  "000bfoobar\n"    "foobar\n"
+  "0004"            ""
+----
diff --git a/Documentation/technical/racy-git.txt b/Documentation/technical/racy-git.txt
new file mode 100644
index 0000000000..53aa0c82c2
--- /dev/null
+++ b/Documentation/technical/racy-git.txt
@@ -0,0 +1,197 @@
+Use of index and Racy git problem
+=================================
+
+Background
+----------
+
+The index is one of the most important data structures in git.
+It represents a virtual working tree state by recording list of
+paths and their object names and serves as a staging area to
+write out the next tree object to be committed.  The state is
+"virtual" in the sense that it does not necessarily have to, and
+often does not, match the files in the working tree.
+
+There are cases git needs to examine the differences between the
+virtual working tree state in the index and the files in the
+working tree.  The most obvious case is when the user asks `git
+diff` (or its low level implementation, `git diff-files`) or
+`git-ls-files --modified`.  In addition, git internally checks
+if the files in the working tree are different from what are
+recorded in the index to avoid stomping on local changes in them
+during patch application, switching branches, and merging.
+
+In order to speed up this comparison between the files in the
+working tree and the index entries, the index entries record the
+information obtained from the filesystem via `lstat(2)` system
+call when they were last updated.  When checking if they differ,
+git first runs `lstat(2)` on the files and compares the result
+with this information (this is what was originally done by the
+`ce_match_stat()` function, but the current code does it in
+`ce_match_stat_basic()` function).  If some of these "cached
+stat information" fields do not match, git can tell that the
+files are modified without even looking at their contents.
+
+Note: not all members in `struct stat` obtained via `lstat(2)`
+are used for this comparison.  For example, `st_atime` obviously
+is not useful.  Currently, git compares the file type (regular
+files vs symbolic links) and executable bits (only for regular
+files) from `st_mode` member, `st_mtime` and `st_ctime`
+timestamps, `st_uid`, `st_gid`, `st_ino`, and `st_size` members.
+With a `USE_STDEV` compile-time option, `st_dev` is also
+compared, but this is not enabled by default because this member
+is not stable on network filesystems.  With `USE_NSEC`
+compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec`
+members are also compared, but this is not enabled by default
+because in-core timestamps can have finer granularity than
+on-disk timestamps, resulting in meaningless changes when an
+inode is evicted from the inode cache.  See commit 8ce13b0
+of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
+([PATCH] Sync in core time granuality with filesystems,
+2005-01-04).
+
+Racy git
+--------
+
+There is one slight problem with the optimization based on the
+cached stat information.  Consider this sequence:
+
+  : modify 'foo'
+  $ git update-index 'foo'
+  : modify 'foo' again, in-place, without changing its size
+
+The first `update-index` computes the object name of the
+contents of file `foo` and updates the index entry for `foo`
+along with the `struct stat` information.  If the modification
+that follows it happens very fast so that the file's `st_mtime`
+timestamp does not change, after this sequence, the cached stat
+information the index entry records still exactly match what you
+would see in the filesystem, even though the file `foo` is now
+different.
+This way, git can incorrectly think files in the working tree
+are unmodified even though they actually are.  This is called
+the "racy git" problem (discovered by Pasky), and the entries
+that appear clean when they may not be because of this problem
+are called "racily clean".
+
+To avoid this problem, git does two things:
+
+. When the cached stat information says the file has not been
+  modified, and the `st_mtime` is the same as (or newer than)
+  the timestamp of the index file itself (which is the time `git
+  update-index foo` finished running in the above example), it
+  also compares the contents with the object registered in the
+  index entry to make sure they match.
+
+. When the index file is updated that contains racily clean
+  entries, cached `st_size` information is truncated to zero
+  before writing a new version of the index file.
+
+Because the index file itself is written after collecting all
+the stat information from updated paths, `st_mtime` timestamp of
+it is usually the same as or newer than any of the paths the
+index contains.  And no matter how quick the modification that
+follows `git update-index foo` finishes, the resulting
+`st_mtime` timestamp on `foo` cannot get a value earlier
+than the index file.  Therefore, index entries that can be
+racily clean are limited to the ones that have the same
+timestamp as the index file itself.
+
+The callers that want to check if an index entry matches the
+corresponding file in the working tree continue to call
+`ce_match_stat()`, but with this change, `ce_match_stat()` uses
+`ce_modified_check_fs()` to see if racily clean ones are
+actually clean after comparing the cached stat information using
+`ce_match_stat_basic()`.
+
+The problem the latter solves is this sequence:
+
+  $ git update-index 'foo'
+  : modify 'foo' in-place without changing its size
+  : wait for enough time
+  $ git update-index 'bar'
+
+Without the latter, the timestamp of the index file gets a newer
+value, and falsely clean entry `foo` would not be caught by the
+timestamp comparison check done with the former logic anymore.
+The latter makes sure that the cached stat information for `foo`
+would never match with the file in the working tree, so later
+checks by `ce_match_stat_basic()` would report that the index entry
+does not match the file and git does not have to fall back on more
+expensive `ce_modified_check_fs()`.
+
+
+Runtime penalty
+---------------
+
+The runtime penalty of falling back to `ce_modified_check_fs()`
+from `ce_match_stat()` can be very expensive when there are many
+racily clean entries.  An obvious way to artificially create
+this situation is to give the same timestamp to all the files in
+the working tree in a large project, run `git update-index` on
+them, and give the same timestamp to the index file:
+
+  $ date >.datestamp
+  $ git ls-files | xargs touch -r .datestamp
+  $ git ls-files | git update-index --stdin
+  $ touch -r .datestamp .git/index
+
+This will make all index entries racily clean.  The linux-2.6
+project, for example, there are over 20,000 files in the working
+tree.  On my Athlon 64 X2 3800+, after the above:
+
+  $ /usr/bin/time git diff-files
+  1.68user 0.54system 0:02.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
+  0inputs+0outputs (0major+67111minor)pagefaults 0swaps
+  $ git update-index MAINTAINERS
+  $ /usr/bin/time git diff-files
+  0.02user 0.12system 0:00.14elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
+  0inputs+0outputs (0major+935minor)pagefaults 0swaps
+
+Running `git update-index` in the middle checked the racily
+clean entries, and left the cached `st_mtime` for all the paths
+intact because they were actually clean (so this step took about
+the same amount of time as the first `git diff-files`).  After
+that, they are not racily clean anymore but are truly clean, so
+the second invocation of `git diff-files` fully took advantage
+of the cached stat information.
+
+
+Avoiding runtime penalty
+------------------------
+
+In order to avoid the above runtime penalty, post 1.4.2 git used
+to have a code that made sure the index file
+got timestamp newer than the youngest files in the index when
+there are many young files with the same timestamp as the
+resulting index file would otherwise would have by waiting
+before finishing writing the index file out.
+
+I suspected that in practice the situation where many paths in the
+index are all racily clean was quite rare.  The only code paths
+that can record recent timestamp for large number of paths are:
+
+. Initial `git add .` of a large project.
+
+. `git checkout` of a large project from an empty index into an
+  unpopulated working tree.
+
+Note: switching branches with `git checkout` keeps the cached
+stat information of existing working tree files that are the
+same between the current branch and the new branch, which are
+all older than the resulting index file, and they will not
+become racily clean.  Only the files that are actually checked
+out can become racily clean.
+
+In a large project where raciness avoidance cost really matters,
+however, the initial computation of all object names in the
+index takes more than one second, and the index file is written
+out after all that happens.  Therefore the timestamp of the
+index file will be more than one seconds later than the
+youngest file in the working tree.  This means that in these
+cases there actually will not be any racily clean entry in
+the resulting index.
+
+Based on this discussion, the current code does not use the
+"workaround" to avoid the runtime penalty that does not exist in
+practice anymore.  This was done with commit 0fc82cff on Aug 15,
+2006.
diff --git a/Documentation/technical/send-pack-pipeline.txt b/Documentation/technical/send-pack-pipeline.txt
new file mode 100644
index 0000000000..681efe4219
--- /dev/null
+++ b/Documentation/technical/send-pack-pipeline.txt
@@ -0,0 +1,63 @@
+git-send-pack
+=============
+
+Overall operation
+-----------------
+
+. Connects to the remote side and invokes git-receive-pack.
+
+. Learns what refs the remote has and what commit they point at.
+  Matches them to the refspecs we are pushing.
+
+. Checks if there are non-fast-forwards.  Unlike fetch-pack,
+  the repository send-pack runs in is supposed to be a superset
+  of the recipient in fast-forward cases, so there is no need
+  for want/have exchanges, and fast-forward check can be done
+  locally.  Tell the result to the other end.
+
+. Calls pack_objects() which generates a packfile and sends it
+  over to the other end.
+
+. If the remote side is new enough (v1.1.0 or later), wait for
+  the unpack and hook status from the other end.
+
+. Exit with appropriate error codes.
+
+
+Pack_objects pipeline
+---------------------
+
+This function gets one file descriptor (`fd`) which is either a
+socket (over the network) or a pipe (local).  What's written to
+this fd goes to git-receive-pack to be unpacked.
+
+    send-pack ---> fd ---> receive-pack
+
+The function pack_objects creates a pipe and then forks.  The
+forked child execs pack-objects with --revs to receive revision
+parameters from its standard input. This process will write the
+packfile to the other end.
+
+    send-pack
+       |
+       pack_objects() ---> fd ---> receive-pack
+          | ^ (pipe)
+	  v |
+         (child)
+
+The child dup2's to arrange its standard output to go back to
+the other end, and read its standard input to come from the
+pipe.  After that it exec's pack-objects.  On the other hand,
+the parent process, before starting to feed the child pipeline,
+closes the reading side of the pipe and fd to receive-pack.
+
+    send-pack
+       |
+       pack_objects(parent)
+          |
+	  v [0]
+         pack-objects [0] ---> receive-pack
+
+
+[jc: the pipeline was much more complex and needed documentation before
+ I understood an earlier bug, but now it is trivial and straightforward.]
diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt
new file mode 100644
index 0000000000..559263af48
--- /dev/null
+++ b/Documentation/technical/shallow.txt
@@ -0,0 +1,49 @@
+Def.: Shallow commits do have parents, but not in the shallow
+repo, and therefore grafts are introduced pretending that
+these commits have no parents.
+
+The basic idea is to write the SHA1s of shallow commits into
+$GIT_DIR/shallow, and handle its contents like the contents
+of $GIT_DIR/info/grafts (with the difference that shallow
+cannot contain parent information).
+
+This information is stored in a new file instead of grafts, or
+even the config, since the user should not touch that file
+at all (even throughout development of the shallow clone, it
+was never manually edited!).
+
+Each line contains exactly one SHA1. When read, a commit_graft
+will be constructed, which has nr_parent < 0 to make it easier
+to discern from user provided grafts.
+
+Since fsck-objects relies on the library to read the objects,
+it honours shallow commits automatically.
+
+There are some unfinished ends of the whole shallow business:
+
+- maybe we have to force non-thin packs when fetching into a
+  shallow repo (ATM they are forced non-thin).
+
+- A special handling of a shallow upstream is needed. At some
+  stage, upload-pack has to check if it sends a shallow commit,
+  and it should send that information early (or fail, if the
+  client does not support shallow repositories). There is no
+  support at all for this in this patch series.
+
+- Instead of locking $GIT_DIR/shallow at the start, just
+  the timestamp of it is noted, and when it comes to writing it,
+  a check is performed if the mtime is still the same, dying if
+  it is not.
+
+- It is unclear how "push into/from a shallow repo" should behave.
+
+- If you deepen a history, you'd want to get the tags of the
+  newly stored (but older!) commits. This does not work right now.
+
+To make a shallow clone, you can call "git-clone --depth 20 repo".
+The result contains only commit chains with a length of at most 20.
+It also writes an appropriate $GIT_DIR/shallow.
+
+You can deepen a shallow repository with "git-fetch --depth 20
+repo branch", which will fetch branch from repo, but stop at depth
+20, updating $GIT_DIR/shallow.
diff --git a/Documentation/technical/trivial-merge.txt b/Documentation/technical/trivial-merge.txt
new file mode 100644
index 0000000000..24c84100b0
--- /dev/null
+++ b/Documentation/technical/trivial-merge.txt
@@ -0,0 +1,121 @@
+Trivial merge rules
+===================
+
+This document describes the outcomes of the trivial merge logic in read-tree.
+
+One-way merge
+-------------
+
+This replaces the index with a different tree, keeping the stat info
+for entries that don't change, and allowing -u to make the minimum
+required changes to the working tree to have it match.
+
+Entries marked '+' have stat information. Spaces marked '*' don't
+affect the result.
+
+   index   tree    result
+   -----------------------
+   *       (empty) (empty)
+   (empty) tree    tree
+   index+  tree    tree
+   index+  index   index+
+
+Two-way merge
+-------------
+
+It is permitted for the index to lack an entry; this does not prevent
+any case from applying.
+
+If the index exists, it is an error for it not to match either the old
+or the result.
+
+If multiple cases apply, the one used is listed first.
+
+A result which changes the index is an error if the index is not empty
+and not up-to-date.
+
+Entries marked '+' have stat information. Spaces marked '*' don't
+affect the result.
+
+ case  index   old     new     result
+ -------------------------------------
+ 0/2   (empty) *       (empty) (empty)
+ 1/3   (empty) *       new     new
+ 4/5   index+  (empty) (empty) index+
+ 6/7   index+  (empty) index   index+
+ 10    index+  index   (empty) (empty)
+ 14/15 index+  old     old     index+
+ 18/19 index+  old     index   index+
+ 20    index+  index   new     new
+
+Three-way merge
+---------------
+
+It is permitted for the index to lack an entry; this does not prevent
+any case from applying.
+
+If the index exists, it is an error for it not to match either the
+head or (if the merge is trivial) the result.
+
+If multiple cases apply, the one used is listed first.
+
+A result of "no merge" means that index is left in stage 0, ancest in
+stage 1, head in stage 2, and remote in stage 3 (if any of these are
+empty, no entry is left for that stage). Otherwise, the given entry is
+left in stage 0, and there are no other entries.
+
+A result of "no merge" is an error if the index is not empty and not
+up-to-date.
+
+*empty* means that the tree must not have a directory-file conflict
+ with the entry.
+
+For multiple ancestors, a '+' means that this case applies even if
+only one ancestor or remote fits; a '^' means all of the ancestors
+must be the same.
+
+case  ancest    head    remote    result
+----------------------------------------
+1     (empty)+  (empty) (empty)   (empty)
+2ALT  (empty)+  *empty* remote    remote
+2     (empty)^  (empty) remote    no merge
+3ALT  (empty)+  head    *empty*   head
+3     (empty)^  head    (empty)   no merge
+4     (empty)^  head    remote    no merge
+5ALT  *         head    head      head
+6     ancest+   (empty) (empty)   no merge
+8     ancest^   (empty) ancest    no merge
+7     ancest+   (empty) remote    no merge
+10    ancest^   ancest  (empty)   no merge
+9     ancest+   head    (empty)   no merge
+16    anc1/anc2 anc1    anc2      no merge
+13    ancest+   head    ancest    head
+14    ancest+   ancest  remote    remote
+11    ancest+   head    remote    no merge
+
+Only #2ALT and #3ALT use *empty*, because these are the only cases
+where there can be conflicts that didn't exist before. Note that we
+allow directory-file conflicts between things in different stages
+after the trivial merge.
+
+A possible alternative for #6 is (empty), which would make it like
+#1. This is not used, due to the likelihood that it arises due to
+moving the file to multiple different locations or moving and deleting
+it in different branches.
+
+Case #1 is included for completeness, and also in case we decide to
+put on '+' markings; any path that is never mentioned at all isn't
+handled.
+
+Note that #16 is when both #13 and #14 apply; in this case, we refuse
+the trivial merge, because we can't tell from this data which is
+right. This is a case of a reverted patch (in some direction, maybe
+multiple times), and the right answer depends on looking at crossings
+of history or common ancestors of the ancestors.
+
+Note that, between #6, #7, #9, and #11, all cases not otherwise
+covered are handled in this table.
+
+For #8 and #10, there is alternative behavior, not currently
+implemented, where the result is (empty). As currently implemented,
+the automatic merge will generally give this effect.