path: root/Documentation/technical
diff options
Diffstat (limited to 'Documentation/technical')
11 files changed, 807 insertions, 663 deletions
diff --git a/Documentation/technical/api-argv-array.txt b/Documentation/technical/api-argv-array.txt
index cfc063018c..870c8edbfb 100644
--- a/Documentation/technical/api-argv-array.txt
+++ b/Documentation/technical/api-argv-array.txt
@@ -8,7 +8,7 @@ always NULL-terminated at the element pointed to by `argv[argc]`. This
makes the result suitable for passing to functions expecting to receive
argv from main(), or the link:api-run-command.html[run-command API].
-The link:api-string-list.html[string-list API] is similar, but cannot be
+The string-list API (documented in string-list.h) is similar, but cannot be
used for these purposes; instead of storing a straight string pointer,
it contains an item structure with a `util` field that is not compatible
with the traditional argv interface.
diff --git a/Documentation/technical/api-builtin.txt b/Documentation/technical/api-builtin.txt
deleted file mode 100644
index 22a39b9299..0000000000
--- a/Documentation/technical/api-builtin.txt
+++ /dev/null
@@ -1,73 +0,0 @@
-builtin API
-Adding a new built-in
-There are 4 things to do to add a built-in command implementation to
-. Define the implementation of the built-in command `foo` with
- signature:
- int cmd_foo(int argc, const char **argv, const char *prefix);
-. Add the external declaration for the function to `builtin.h`.
-. Add the command to the `commands[]` table defined in `git.c`.
- The entry should look like:
- { "foo", cmd_foo, <options> },
-where options is the bitwise-or of:
- If there is not a Git directory to work on, abort. If there
- is a work tree, chdir to the top of it if the command was
- invoked in a subdirectory. If there is no work tree, no
- chdir() is done.
- If there is a Git directory, chdir as per RUN_SETUP, otherwise,
- don't chdir anywhere.
- If the standard output is connected to a tty, spawn a pager and
- feed our output to it.
- Make sure there is a work tree, i.e. the command cannot act
- on bare repositories.
- This only makes sense when `RUN_SETUP` is also set.
-. Add `builtin/foo.o` to `BUILTIN_OBJS` in `Makefile`.
-Additionally, if `foo` is a new command, there are 3 more things to do:
-. Add tests to `t/` directory.
-. Write documentation in `Documentation/git-foo.txt`.
-. Add an entry for `git-foo` to `command-list.txt`.
-. Add an entry for `/git-foo` to `.gitignore`.
-How a built-in is called
-The implementation `cmd_foo()` takes three parameters, `argc`, `argv,
-and `prefix`. The first two are similar to what `main()` of a
-standalone command would be called with.
-When `RUN_SETUP` is specified in the `commands[]` table, and when you
-were started from a subdirectory of the work tree, `cmd_foo()` is called
-after chdir(2) to the top of the work tree, and `prefix` gets the path
-to the subdirectory the command started from. This allows you to
-convert a user-supplied pathname (typically relative to that directory)
-to a pathname relative to the top of the work tree.
-The return value from `cmd_foo()` becomes the exit status of the
diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
index 20741f345e..9a778b0cad 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -186,7 +186,7 @@ parsing is successful, the return value is the result.
Same as `git_config_bool`, except that integers are returned as-is, and
an `is_bool` flag is unset.
Same as `git_config_bool`, except that it returns -1 on error rather
than dying.
diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
deleted file mode 100644
index ccc634bbd7..0000000000
--- a/Documentation/technical/api-hashmap.txt
+++ /dev/null
@@ -1,309 +0,0 @@
-hashmap API
-The hashmap API is a generic implementation of hash-based key-value mappings.
-Data Structures
-`struct hashmap`::
- The hash table structure. Members can be used as follows, but should
- not be modified directly:
-The `size` member keeps track of the total number of entries (0 means the
-hashmap is empty).
-`tablesize` is the allocated size of the hash table. A non-0 value indicates
-that the hashmap is initialized. It may also be useful for statistical purposes
-(i.e. `size / tablesize` is the current load factor).
-`cmpfn` stores the comparison function specified in `hashmap_init()`. In
-advanced scenarios, it may be useful to change this, e.g. to switch between
-case-sensitive and case-insensitive lookup.
-When `disallow_rehash` is set, automatic rehashes are prevented during inserts
-and deletes.
-`struct hashmap_entry`::
- An opaque structure representing an entry in the hash table, which must
- be used as first member of user data structures. Ideally it should be
- followed by an int-sized member to prevent unused memory on 64-bit
- systems due to alignment.
-The `hash` member is the entry's hash code and the `next` member points to the
-next entry in case of collisions (i.e. if multiple entries map to the same
-`struct hashmap_iter`::
- An iterator structure, to be used with hashmap_iter_* functions.
-`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
- User-supplied function to test two hashmap entries for equality. Shall
- return 0 if the entries are equal.
-This function is always called with non-NULL `entry` / `entry_or_key`
-parameters that have the same hash code. When looking up an entry, the `key`
-and `keydata` parameters to hashmap_get and hashmap_remove are always passed
-as second and third argument, respectively. Otherwise, `keydata` is NULL.
-`unsigned int strhash(const char *buf)`::
-`unsigned int strihash(const char *buf)`::
-`unsigned int memhash(const void *buf, size_t len)`::
-`unsigned int memihash(const void *buf, size_t len)`::
-`unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len)`::
- Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
-`strhash` and `strihash` take 0-terminated strings, while `memhash` and
-`memihash` operate on arbitrary-length memory.
-`strihash` and `memihash` are case insensitive versions.
-`memihash_cont` is a variant of `memihash` that allows a computation to be
-continued with another chunk of data.
-`unsigned int sha1hash(const unsigned char *sha1)`::
- Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
- for use in hash tables. Cryptographic hashes are supposed to have
- uniform distribution, so in contrast to `memhash()`, this just copies
- the first `sizeof(int)` bytes without shuffling any bits. Note that
- the results will be different on big-endian and little-endian
- platforms, so they should not be stored or transferred over the net.
-`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
- Initializes a hashmap structure.
-`map` is the hashmap to initialize.
-The `equals_function` can be specified to compare two entries for equality.
-If NULL, entries are considered equal if their hash codes are equal.
-If the total number of entries is known in advance, the `initial_size`
-parameter may be used to preallocate a sufficiently large table and thus
-prevent expensive resizing. If 0, the table is dynamically resized.
-`void hashmap_free(struct hashmap *map, int free_entries)`::
- Frees a hashmap structure and allocated memory.
-`map` is the hashmap to free.
-If `free_entries` is true, each hashmap_entry in the map is freed as well
-(using stdlib's free()).
-`void hashmap_entry_init(void *entry, unsigned int hash)`::
- Initializes a hashmap_entry structure.
-`entry` points to the entry to initialize.
-`hash` is the hash code of the entry.
-The hashmap_entry structure does not hold references to external resources,
-and it is safe to just discard it once you are done with it (i.e. if
-your structure was allocated with xmalloc(), you can just free(3) it,
-and if it is on stack, you can just let it go out of scope).
-`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
- Returns the hashmap entry for the specified key, or NULL if not found.
-`map` is the hashmap structure.
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-If an entry with matching hash code is found, `key` and `keydata` are passed
-to `hashmap_cmp_fn` to decide whether the entry matches the key.
-`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
- Returns the hashmap entry for the specified hash code and key data,
- or NULL if not found.
-`map` is the hashmap structure.
-`hash` is the hash code of the entry to look up.
-If an entry with matching hash code is found, `keydata` is passed to
-`hashmap_cmp_fn` to decide whether the entry matches the key. The
-`entry_or_key` parameter points to a bogus hashmap_entry structure that
-should not be used in the comparison.
-`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
- Returns the next equal hashmap entry, or NULL if not found. This can be
- used to iterate over duplicate entries (see `hashmap_add`).
-`map` is the hashmap structure.
-`entry` is the hashmap_entry to start the search from, obtained via a previous
-call to `hashmap_get` or `hashmap_get_next`.
-`void hashmap_add(struct hashmap *map, void *entry)`::
- Adds a hashmap entry. This allows to add duplicate entries (i.e.
- separate values with the same key according to hashmap_cmp_fn).
-`map` is the hashmap structure.
-`entry` is the entry to add.
-`void *hashmap_put(struct hashmap *map, void *entry)`::
- Adds or replaces a hashmap entry. If the hashmap contains duplicate
- entries equal to the specified entry, only one of them will be replaced.
-`map` is the hashmap structure.
-`entry` is the entry to add or replace.
-Returns the replaced entry, or NULL if not found (i.e. the entry was added).
-`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
- Removes a hashmap entry matching the specified key. If the hashmap
- contains duplicate entries equal to the specified key, only one of
- them will be removed.
-`map` is the hashmap structure.
-`key` is a hashmap_entry structure (or user data structure that starts with
-hashmap_entry) that has at least been initialized with the proper hash code
-(via `hashmap_entry_init`).
-If an entry with matching hash code is found, `key` and `keydata` are
-passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
-Returns the removed entry, or NULL if not found.
-`void hashmap_disallow_rehash(struct hashmap *map, unsigned value)`::
- Disallow/allow automatic rehashing of the hashmap during inserts
- and deletes.
-This is useful if the caller knows that the hashmap will be accessed
-by multiple threads.
-The caller is still responsible for any necessary locking; this simply
-prevents unexpected rehashing. The caller is also responsible for properly
-sizing the initial hashmap to ensure good performance.
-A call to allow rehashing does not force a rehash; that might happen
-with the next insert or delete.
-`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
-`void *hashmap_iter_next(struct hashmap_iter *iter)`::
-`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
- Used to iterate over all entries of a hashmap. Note that it is
- not safe to add or remove entries to the hashmap while
- iterating.
-`hashmap_iter_init` initializes a `hashmap_iter` structure.
-`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
-more entries.
-`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
-and returns the first entry, if any).
-`const char *strintern(const char *string)`::
-`const void *memintern(const void *data, size_t len)`::
- Returns the unique, interned version of the specified string or data,
- similar to the `String.intern` API in Java and .NET, respectively.
- Interned strings remain valid for the entire lifetime of the process.
-Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
-strings / data must not be modified or freed.
-Interned strings are best used for short strings with high probability of
-Uses a hashmap to store the pool of interned strings.
-Usage example
-Here's a simple usage example that maps long keys to double values.
-struct hashmap map;
-struct long2double {
- struct hashmap_entry ent; /* must be the first member! */
- long key;
- double value;
-static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
- return !(e1->key == e2->key);
-void long2double_init(void)
- hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
-void long2double_free(void)
- hashmap_free(&map, 1);
-static struct long2double *find_entry(long key)
- struct long2double k;
- hashmap_entry_init(&k, memhash(&key, sizeof(long)));
- k.key = key;
- return hashmap_get(&map, &k, NULL);
-double get_value(long key)
- struct long2double *e = find_entry(key);
- return e ? e->value : 0;
-void set_value(long key, double value)
- struct long2double *e = find_entry(key);
- if (!e) {
- e = malloc(sizeof(struct long2double));
- hashmap_entry_init(e, memhash(&key, sizeof(long)));
- e->key = key;
- hashmap_add(&map, e);
- }
- e->value = value;
-Using variable-sized keys
-The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
-`hashmap_entry` structure as key to find the correct entry. If the key data is
-variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
-to create a full-fledged entry structure on the heap and copy all the key data
-into the structure.
-In this case, the `keydata` parameter can be used to pass
-variable-sized key data directly to the comparison function, and the `key`
-parameter can be a stripped-down, fixed size entry structure allocated on the
-See test-hashmap.c for an example using arbitrary-length strings as keys.
diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt
index 37379d8337..46c3d5c355 100644
--- a/Documentation/technical/api-ref-iteration.txt
+++ b/Documentation/technical/api-ref-iteration.txt
@@ -32,11 +32,8 @@ Iteration functions
* `for_each_glob_ref_in()` the previous and `for_each_ref_in()` combined.
-* `head_ref_submodule()`, `for_each_ref_submodule()`,
- `for_each_ref_in_submodule()`, `for_each_tag_ref_submodule()`,
- `for_each_branch_ref_submodule()`, `for_each_remote_ref_submodule()`
- do the same as the functions described above but for a specified
- submodule.
+* Use `refs_` API for accessing submodules. The submodule ref store could
+ be obtained with `get_submodule_ref_store()`.
* `for_each_rawref()` can be used to learn about broken ref and symref.
diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt
deleted file mode 100644
index c08402b12e..0000000000
--- a/Documentation/technical/api-string-list.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-string-list API
-The string_list API offers a data structure and functions to handle
-sorted and unsorted string lists. A "sorted" list is one whose
-entries are sorted by string value in `strcmp()` order.
-The 'string_list' struct used to be called 'path_list', but was renamed
-because it is not specific to paths.
-The caller:
-. Allocates and clears a `struct string_list` variable.
-. Initializes the members. You might want to set the flag `strdup_strings`
- if the strings should be strdup()ed. For example, this is necessary
- when you add something like git_path("..."), since that function returns
- a static buffer that will change with the next call to git_path().
-If you need something advanced, you can manually malloc() the `items`
-member (you need this if you add things later) and you should set the
-`nr` and `alloc` members in that case, too.
-. Adds new items to the list, using `string_list_append`,
- `string_list_append_nodup`, `string_list_insert`,
- `string_list_split`, and/or `string_list_split_in_place`.
-. Can check if a string is in the list using `string_list_has_string` or
- `unsorted_string_list_has_string` and get it from the list using
- `string_list_lookup` for sorted lists.
-. Can sort an unsorted list using `string_list_sort`.
-. Can remove duplicate items from a sorted list using
- `string_list_remove_duplicates`.
-. Can remove individual items of an unsorted list using
- `unsorted_string_list_delete_item`.
-. Can remove items not matching a criterion from a sorted or unsorted
- list using `filter_string_list`, or remove empty strings using
- `string_list_remove_empty_items`.
-. Finally it should free the list using `string_list_clear`.
-struct string_list list = STRING_LIST_INIT_NODUP;
-int i;
-string_list_append(&list, "foo");
-string_list_append(&list, "bar");
-for (i = 0; i <; i++)
- printf("%s\n", list.items[i].string)
-NOTE: It is more efficient to build an unsorted list and sort it
-afterwards, instead of building a sorted list (`O(n log n)` instead of
-However, if you use the list to check if a certain string was added
-already, you should not do that (using unsorted_string_list_has_string()),
-because the complexity would be quadratic again (but with a worse factor).
-* General ones (works with sorted and unsorted lists as well)
- Initialize the members of the string_list, set `strdup_strings`
- member according to the value of the second parameter.
- Apply a function to each item in a list, retaining only the
- items for which the function returns true. If free_util is
- true, call free() on the util members of any items that have
- to be deleted. Preserve the order of the items that are
- retained.
- Remove any empty strings from the list. If free_util is true,
- call free() on the util members of any items that have to be
- deleted. Preserve the order of the items that are retained.
- Dump a string_list to stdout, useful mainly for debugging purposes. It
- can take an optional header argument and it writes out the
- string-pointer pairs of the string_list, each one in its own line.
- Free a string_list. The `string` pointer of the items will be freed in
- case the `strdup_strings` member of the string_list is set. The second
- parameter controls if the `util` pointer of the items should be freed
- or not.
-* Functions for sorted lists only
- Determine if the string_list has a given string or not.
- Insert a new element to the string_list. The returned pointer can be
- handy if you want to write something to the `util` pointer of the
- string_list_item containing the just added string. If the given
- string already exists the insertion will be skipped and the
- pointer to the existing item returned.
-Since this function uses xrealloc() (which die()s if it fails) if the
-list needs to grow, it is safe not to check the pointer. I.e. you may
-write `string_list_insert(...)->util = ...;`.
- Look up a given string in the string_list, returning the containing
- string_list_item. If the string is not found, NULL is returned.
- Remove all but the first of consecutive entries that have the
- same string value. If free_util is true, call free() on the
- util members of any items that have to be deleted.
-* Functions for unsorted lists only
- Append a new string to the end of the string_list. If
- `strdup_string` is set, then the string argument is copied;
- otherwise the new `string_list_entry` refers to the input
- string.
- Append a new string to the end of the string_list. The new
- `string_list_entry` always refers to the input string, even if
- `strdup_string` is set. This function can be used to hand
- ownership of a malloc()ed string to a `string_list` that has
- `strdup_string` set.
- Sort the list's entries by string value in `strcmp()` order.
- It's like `string_list_has_string()` but for unsorted lists.
- It's like `string_list_lookup()` but for unsorted lists.
-The above two functions need to look through all items, as opposed to their
-counterpart for sorted lists, which performs a binary search.
- Remove an item from a string_list. The `string` pointer of the items
- will be freed in case the `strdup_strings` member of the string_list
- is set. The third parameter controls if the `util` pointer of the
- items should be freed or not.
- Split a string into substrings on a delimiter character and
- append the substrings to a `string_list`. If `maxsplit` is
- non-negative, then split at most `maxsplit` times. Return the
- number of substrings appended to the list.
-`string_list_split` requires a `string_list` that has `strdup_strings`
-set to true; it leaves the input string untouched and makes copies of
-the substrings in newly-allocated memory.
-`string_list_split_in_place` requires a `string_list` that has
-`strdup_strings` set to false; it splits the input string in place,
-overwriting the delimiter characters with NULs and creating new
-string_list_items that point into the original string (the original
-string must therefore not be modified or freed while the `string_list`
-is in use).
-Data structures
-* `struct string_list_item`
-Represents an item of the list. The `string` member is a pointer to the
-string, and you may use the `util` member for any purpose, if you want.
-* `struct string_list`
-Represents the list itself.
-. The array of items are available via the `items` member.
-. The `nr` member contains the number of items stored in the list.
-. The `alloc` member is used to avoid reallocating at every insertion.
- You should not tamper with it.
-. Setting the `strdup_strings` member to 1 will strdup() the strings
- before adding them, see above.
-. The `compare_strings_fn` member is used to specify a custom compare
- function, otherwise `strcmp()` is used as the default function.
diff --git a/Documentation/technical/api-sub-process.txt b/Documentation/technical/api-sub-process.txt
deleted file mode 100644
index 793508cf3e..0000000000
--- a/Documentation/technical/api-sub-process.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-sub-process API
-The sub-process API makes it possible to run background sub-processes
-for the entire lifetime of a Git invocation. If Git needs to communicate
-with an external process multiple times, then this can reduces the process
-invocation overhead. Git and the sub-process communicate through stdin and
-The sub-processes are kept in a hashmap by command name and looked up
-via the subprocess_find_entry function. If an existing instance can not
-be found then a new process should be created and started. When the
-parent git command terminates, all sub-processes are also terminated.
-This API is based on the run-command API.
-Data structures
-* `struct subprocess_entry`
-The sub-process structure. Members should not be accessed directly.
-'int(*subprocess_start_fn)(struct subprocess_entry *entry)'::
- User-supplied function to initialize the sub-process. This is
- typically used to negotiate the interface version and capabilities.
- Function to test two subprocess hashmap entries for equality.
- Start a subprocess and add it to the subprocess hashmap.
- Kill a subprocess and remove it from the subprocess hashmap.
- Find a subprocess in the subprocess hashmap.
- Get the underlying `struct child_process` from a subprocess.
- Helper function to read packets looking for the last "status=<foo>"
- key/value pair.
diff --git a/Documentation/technical/api-tree-walking.txt b/Documentation/technical/api-tree-walking.txt
index 14af37c3f1..bde18622a8 100644
--- a/Documentation/technical/api-tree-walking.txt
+++ b/Documentation/technical/api-tree-walking.txt
@@ -55,9 +55,9 @@ Initializing
- Initialize a `tree_desc` and decode its first entry given the sha1 of
- a tree. Returns the `buffer` member if the sha1 is a valid tree
- identifier and NULL otherwise.
+ Initialize a `tree_desc` and decode its first entry given the
+ object ID of a tree. Returns the `buffer` member if the latter
+ is a valid tree identifier and NULL otherwise.
diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
new file mode 100644
index 0000000000..417ba491d0
--- /dev/null
+++ b/Documentation/technical/hash-function-transition.txt
@@ -0,0 +1,797 @@
+Git hash function transition
+Migrate Git from SHA-1 to a stronger hash function.
+At its core, the Git version control system is a content addressable
+filesystem. It uses the SHA-1 hash function to name content. For
+example, files, directories, and revisions are referred to by hash
+values unlike in other traditional version control systems where files
+or versions are referred to via sequential numbers. The use of a hash
+function to address its content delivers a few advantages:
+* Integrity checking is easy. Bit flips, for example, are easily
+ detected, as the hash of corrupted content does not match its name.
+* Lookup of objects is fast.
+Using a cryptographically secure hash function brings additional
+* Object names can be signed and third parties can trust the hash to
+ address the signed object and all objects it references.
+* Communication using Git protocol and out of band communication
+ methods have a short reliable string that can be used to reliably
+ address stored content.
+Over time some flaws in SHA-1 have been discovered by security
+researchers. demonstrated a practical SHA-1 hash
+collision. As a result, SHA-1 cannot be considered cryptographically
+secure any more. This impacts the communication of hash values because
+we cannot trust that a given hash value represents the known good
+version of content that the speaker intended.
+SHA-1 still possesses the other properties such as fast object lookup
+and safe error checking, but other hash functions are equally suitable
+that are believed to be cryptographically secure.
+Where NewHash is a strong 256-bit hash function to replace SHA-1 (see
+"Selection of a New Hash", below):
+1. The transition to NewHash can be done one local repository at a time.
+ a. Requiring no action by any other party.
+ b. A NewHash repository can communicate with SHA-1 Git servers
+ (push/fetch).
+ c. Users can use SHA-1 and NewHash identifiers for objects
+ interchangeably (see "Object names on the command line", below).
+ d. New signed objects make use of a stronger hash function than
+ SHA-1 for their security guarantees.
+2. Allow a complete transition away from SHA-1.
+ a. Local metadata for SHA-1 compatibility can be removed from a
+ repository if compatibility with SHA-1 is no longer needed.
+3. Maintainability throughout the process.
+ a. The object format is kept simple and consistent.
+ b. Creation of a generalized repository conversion tool.
+1. Add NewHash support to Git protocol. This is valuable and the
+ logical next step but it is out of scope for this initial design.
+2. Transparently improving the security of existing SHA-1 signed
+ objects.
+3. Intermixing objects using multiple hash functions in a single
+ repository.
+4. Taking the opportunity to fix other bugs in Git's formats and
+ protocols.
+5. Shallow clones and fetches into a NewHash repository. (This will
+ change when we add NewHash support to Git protocol.)
+6. Skip fetching some submodules of a project into a NewHash
+ repository. (This also depends on NewHash support in Git
+ protocol.)
+We introduce a new repository format extension. Repositories with this
+extension enabled use NewHash instead of SHA-1 to name their objects.
+This affects both object names and object content --- both the names
+of objects and all references to other objects within an object are
+switched to the new hash function.
+NewHash repositories cannot be read by older versions of Git.
+Alongside the packfile, a NewHash repository stores a bidirectional
+mapping between NewHash and SHA-1 object names. The mapping is generated
+locally and can be verified using "git fsck". Object lookups use this
+mapping to allow naming objects using either their SHA-1 and NewHash names
+"git cat-file" and "git hash-object" gain options to display an object
+in its sha1 form and write an object given its sha1 form. This
+requires all objects referenced by that object to be present in the
+object database so that they can be named using the appropriate name
+(using the bidirectional hash mapping).
+Fetches from a SHA-1 based server convert the fetched objects into
+NewHash form and record the mapping in the bidirectional mapping table
+(see below for details). Pushes to a SHA-1 based server convert the
+objects being pushed into sha1 form so the server does not have to be
+aware of the hash function the client is using.
+Detailed Design
+Repository format extension
+A NewHash repository uses repository format version `1` (see
+Documentation/technical/repository-version.txt) with extensions
+`objectFormat` and `compatObjectFormat`:
+ [core]
+ repositoryFormatVersion = 1
+ [extensions]
+ objectFormat = newhash
+ compatObjectFormat = sha1
+Specifying a repository format extension ensures that versions of Git
+not aware of NewHash do not try to operate on these repositories,
+instead producing an error message:
+ $ git status
+ fatal: unknown repository extensions found:
+ objectformat
+ compatobjectformat
+See the "Transition plan" section below for more details on these
+repository extensions.
+Object names
+Objects can be named by their 40 hexadecimal digit sha1-name or 64
+hexadecimal digit newhash-name, plus names derived from those (see
+The sha1-name of an object is the SHA-1 of the concatenation of its
+type, length, a nul byte, and the object's sha1-content. This is the
+traditional <sha1> used in Git to name objects.
+The newhash-name of an object is the NewHash of the concatenation of its
+type, length, a nul byte, and the object's newhash-content.
+Object format
+The content as a byte sequence of a tag, commit, or tree object named
+by sha1 and newhash differ because an object named by newhash-name refers to
+other objects by their newhash-names and an object named by sha1-name
+refers to other objects by their sha1-names.
+The newhash-content of an object is the same as its sha1-content, except
+that objects referenced by the object are named using their newhash-names
+instead of sha1-names. Because a blob object does not refer to any
+other object, its sha1-content and newhash-content are the same.
+The format allows round-trip conversion between newhash-content and
+Object storage
+Loose objects use zlib compression and packed objects use the packed
+format described in Documentation/technical/pack-format.txt, just like
+today. The content that is compressed and stored uses newhash-content
+instead of sha1-content.
+Pack index
+Pack index (.idx) files use a new v3 format that supports multiple
+hash functions. They have the following format (all integers are in
+network byte order):
+- A header appears at the beginning and consists of the following:
+ - The 4-byte pack index signature: '\377t0c'
+ - 4-byte version number: 3
+ - 4-byte length