diff options
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/api-argv-array.txt | 46 | ||||
-rw-r--r-- | Documentation/technical/api-builtin.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/api-diff.txt | 4 | ||||
-rw-r--r-- | Documentation/technical/api-gitattributes.txt | 61 | ||||
-rw-r--r-- | Documentation/technical/api-merge.txt | 104 | ||||
-rw-r--r-- | Documentation/technical/api-parse-options.txt | 29 | ||||
-rw-r--r-- | Documentation/technical/api-ref-iteration.txt | 81 | ||||
-rw-r--r-- | Documentation/technical/api-run-command.txt | 57 | ||||
-rw-r--r-- | Documentation/technical/api-sha1-array.txt | 79 | ||||
-rw-r--r-- | Documentation/technical/api-sigchain.txt | 41 | ||||
-rw-r--r-- | Documentation/technical/api-string-list.txt | 20 | ||||
-rw-r--r-- | Documentation/technical/api-tree-walking.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/index-format.txt | 186 | ||||
-rw-r--r-- | Documentation/technical/pack-protocol.txt | 113 | ||||
-rw-r--r-- | Documentation/technical/protocol-capabilities.txt | 2 |
15 files changed, 745 insertions, 82 deletions
diff --git a/Documentation/technical/api-argv-array.txt b/Documentation/technical/api-argv-array.txt new file mode 100644 index 0000000000..49b3d52952 --- /dev/null +++ b/Documentation/technical/api-argv-array.txt @@ -0,0 +1,46 @@ +argv-array API +============== + +The argv-array API allows one to dynamically build and store +NULL-terminated lists. An argv-array maintains the invariant that the +`argv` member always points to a non-NULL array, and that the array is +always NULL-terminated at the element pointed to by `argv[argc]`. This +makes the result suitable for passing to functions expecting to receive +argv from main(), or the link:api-run-command.html[run-command API]. + +The link:api-string-list.html[string-list API] is similar, but cannot be +used for these purposes; instead of storing a straight string pointer, +it contains an item structure with a `util` field that is not compatible +with the traditional argv interface. + +Each `argv_array` manages its own memory. Any strings pushed into the +array are duplicated, and all memory is freed by argv_array_clear(). + +Data Structures +--------------- + +`struct argv_array`:: + + A single array. This should be initialized by assignment from + `ARGV_ARRAY_INIT`, or by calling `argv_array_init`. The `argv` + member contains the actual array; the `argc` member contains the + number of elements in the array, not including the terminating + NULL. + +Functions +--------- + +`argv_array_init`:: + Initialize an array. This is no different than assigning from + `ARGV_ARRAY_INIT`. + +`argv_array_push`:: + Push a copy of a string onto the end of the array. + +`argv_array_pushf`:: + Format a string and push it onto the end of the array. This is a + convenience wrapper combining `strbuf_addf` and `argv_array_push`. + +`argv_array_clear`:: + Free all memory associated with the array and return it to the + initial, empty state. diff --git a/Documentation/technical/api-builtin.txt b/Documentation/technical/api-builtin.txt index 5cb2b0590a..b0cafe87be 100644 --- a/Documentation/technical/api-builtin.txt +++ b/Documentation/technical/api-builtin.txt @@ -49,6 +49,8 @@ Additionally, if `foo` is a new command, there are 3 more things to do: . Add an entry for `git-foo` to `command-list.txt`. +. Add an entry for `/git-foo` to `.gitignore`. + How a built-in is called ------------------------ diff --git a/Documentation/technical/api-diff.txt b/Documentation/technical/api-diff.txt index 20b0241d30..2d2ebc04b7 100644 --- a/Documentation/technical/api-diff.txt +++ b/Documentation/technical/api-diff.txt @@ -32,7 +32,7 @@ Calling sequence * As you find different pairs of files, call `diff_change()` to feed modified files, `diff_addremove()` to feed created or deleted files, - or `diff_unmerged()` to feed a file whose state is 'unmerged' to the + or `diff_unmerge()` to feed a file whose state is 'unmerged' to the API. These are thin wrappers to a lower-level `diff_queue()` function that is flexible enough to record any of these kinds of changes. @@ -50,7 +50,7 @@ Data structures This is the internal representation for a single file (blob). It records the blob object name (if known -- for a work tree file it typically is a NUL SHA-1), filemode and pathname. This is what the -`diff_addremove()`, `diff_change()` and `diff_unmerged()` synthesize and +`diff_addremove()`, `diff_change()` and `diff_unmerge()` synthesize and feed `diff_queue()` function with. * `struct diff_filepair` diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt index 9d97eaa9de..ce363b6305 100644 --- a/Documentation/technical/api-gitattributes.txt +++ b/Documentation/technical/api-gitattributes.txt @@ -11,27 +11,15 @@ Data Structure `struct git_attr`:: An attribute is an opaque object that is identified by its name. - Pass the name and its length to `git_attr()` function to obtain - the object of this type. The internal representation of this - structure is of no interest to the calling programs. + Pass the name to `git_attr()` function to obtain the object of + this type. The internal representation of this structure is + of no interest to the calling programs. The name of the + attribute can be retrieved by calling `git_attr_name()`. `struct git_attr_check`:: This structure represents a set of attributes to check in a call - to `git_checkattr()` function, and receives the results. - - -Calling Sequence ----------------- - -* Prepare an array of `struct git_attr_check` to define the list of - attributes you would want to check. To populate this array, you would - need to define necessary attributes by calling `git_attr()` function. - -* Call git_checkattr() to check the attributes for the path. - -* Inspect `git_attr_check` structure to see how each of the attribute in - the array is defined for the path. + to `git_check_attr()` function, and receives the results. Attribute Values @@ -57,6 +45,19 @@ If none of the above returns true, `.value` member points at a string value of the attribute for the path. +Querying Specific Attributes +---------------------------- + +* Prepare an array of `struct git_attr_check` to define the list of + attributes you would want to check. To populate this array, you would + need to define necessary attributes by calling `git_attr()` function. + +* Call `git_check_attr()` to check the attributes for the path. + +* Inspect `git_attr_check` structure to see how each of the attribute in + the array is defined for the path. + + Example ------- @@ -72,18 +73,18 @@ static void setup_check(void) { if (check[0].attr) return; /* already done */ - check[0].attr = git_attr("crlf", 4); - check[1].attr = git_attr("ident", 5); + check[0].attr = git_attr("crlf"); + check[1].attr = git_attr("ident"); } ------------ -. Call `git_checkattr()` with the prepared array of `struct git_attr_check`: +. Call `git_check_attr()` with the prepared array of `struct git_attr_check`: ------------ const char *path; setup_check(); - git_checkattr(path, ARRAY_SIZE(check), check); + git_check_attr(path, ARRAY_SIZE(check), check); ------------ . Act on `.value` member of the result, left in `check[]`: @@ -108,4 +109,20 @@ static void setup_check(void) } ------------ -(JC) + +Querying All Attributes +----------------------- + +To get the values of all attributes associated with a file: + +* Call `git_all_attrs()`, which returns an array of `git_attr_check` + structures. + +* Iterate over the `git_attr_check` array to examine the attribute + names and values. The name of the attribute described by a + `git_attr_check` object can be retrieved via + `git_attr_name(check[i].attr)`. (Please note that no items will be + returned for unset attributes, so `ATTR_UNSET()` will return false + for all returned `git_array_check` objects.) + +* Free the `git_array_check` array. diff --git a/Documentation/technical/api-merge.txt b/Documentation/technical/api-merge.txt new file mode 100644 index 0000000000..9dc1bed768 --- /dev/null +++ b/Documentation/technical/api-merge.txt @@ -0,0 +1,104 @@ +merge API +========= + +The merge API helps a program to reconcile two competing sets of +improvements to some files (e.g., unregistered changes from the work +tree versus changes involved in switching to a new branch), reporting +conflicts if found. The library called through this API is +responsible for a few things. + + * determining which trees to merge (recursive ancestor consolidation); + + * lining up corresponding files in the trees to be merged (rename + detection, subtree shifting), reporting edge cases like add/add + and rename/rename conflicts to the user; + + * performing a three-way merge of corresponding files, taking + path-specific merge drivers (specified in `.gitattributes`) + into account. + +Data structures +--------------- + +* `mmbuffer_t`, `mmfile_t` + +These store data usable for use by the xdiff backend, for writing and +for reading, respectively. See `xdiff/xdiff.h` for the definitions +and `diff.c` for examples. + +* `struct ll_merge_options` + +This describes the set of options the calling program wants to affect +the operation of a low-level (single file) merge. Some options: + +`virtual_ancestor`:: + Behave as though this were part of a merge between common + ancestors in a recursive merge. + If a helper program is specified by the + `[merge "<driver>"] recursive` configuration, it will + be used (see linkgit:gitattributes[5]). + +`variant`:: + Resolve local conflicts automatically in favor + of one side or the other (as in 'git merge-file' + `--ours`/`--theirs`/`--union`). Can be `0`, + `XDL_MERGE_FAVOR_OURS`, `XDL_MERGE_FAVOR_THEIRS`, or + `XDL_MERGE_FAVOR_UNION`. + +`renormalize`:: + Resmudge and clean the "base", "theirs" and "ours" files + before merging. Use this when the merge is likely to have + overlapped with a change in smudge/clean or end-of-line + normalization rules. + +Low-level (single file) merge +----------------------------- + +`ll_merge`:: + + Perform a three-way single-file merge in core. This is + a thin wrapper around `xdl_merge` that takes the path and + any merge backend specified in `.gitattributes` or + `.git/info/attributes` into account. Returns 0 for a + clean merge. + +Calling sequence: + +* Prepare a `struct ll_merge_options` to record options. + If you have no special requests, skip this and pass `NULL` + as the `opts` parameter to use the default options. + +* Allocate an mmbuffer_t variable for the result. + +* Allocate and fill variables with the file's original content + and two modified versions (using `read_mmfile`, for example). + +* Call `ll_merge()`. + +* Read the merged content from `result_buf.ptr` and `result_buf.size`. + +* Release buffers when finished. A simple + `free(ancestor.ptr); free(ours.ptr); free(theirs.ptr); + free(result_buf.ptr);` will do. + +If the modifications do not merge cleanly, `ll_merge` will return a +nonzero value and `result_buf` will generally include a description of +the conflict bracketed by markers such as the traditional `<<<<<<<` +and `>>>>>>>`. + +The `ancestor_label`, `our_label`, and `their_label` parameters are +used to label the different sides of a conflict if the merge driver +supports this. + +Everything else +--------------- + +Talk about <merge-recursive.h> and merge_file(): + + - merge_trees() to merge with rename detection + - merge_recursive() for ancestor consolidation + - try_merge_command() for other strategies + - conflict format + - merge options + +(Daniel, Miklos, Stephan, JC) diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt index 50f9e9ac17..f6a4a361bd 100644 --- a/Documentation/technical/api-parse-options.txt +++ b/Documentation/technical/api-parse-options.txt @@ -115,13 +115,19 @@ There are some macros to easily define options: `OPT__ABBREV(&int_var)`:: Add `\--abbrev[=<n>]`. -`OPT__DRY_RUN(&int_var)`:: +`OPT__COLOR(&int_var, description)`:: + Add `\--color[=<when>]` and `--no-color`. + +`OPT__DRY_RUN(&int_var, description)`:: Add `-n, \--dry-run`. -`OPT__QUIET(&int_var)`:: +`OPT__FORCE(&int_var, description)`:: + Add `-f, \--force`. + +`OPT__QUIET(&int_var, description)`:: Add `-q, \--quiet`. -`OPT__VERBOSE(&int_var)`:: +`OPT__VERBOSE(&int_var, description)`:: Add `-v, \--verbose`. `OPT_GROUP(description)`:: @@ -183,13 +189,22 @@ There are some macros to easily define options: arguments. Short options that happen to be digits take precedence over it. +`OPT_COLOR_FLAG(short, long, &int_var, description)`:: + Introduce an option that takes an optional argument that can + have one of three values: "always", "never", or "auto". If the + argument is not given, it defaults to "always". The `--no-` form + works like `--long=never`; it cannot take an argument. If + "always", set `int_var` to 1; if "never", set `int_var` to 0; if + "auto", set `int_var` to 1 if stdout is a tty or a pager, + 0 otherwise. + The last element of the array must be `OPT_END()`. If not stated otherwise, interpret the arguments as follows: * `short` is a character for the short option - (e.g. `\'e\'` for `-e`, use `0` to omit), + (e.g. `{apostrophe}e{apostrophe}` for `-e`, use `0` to omit), * `long` is a string for the long option (e.g. `"example"` for `\--example`, use `NULL` to omit), @@ -216,10 +231,10 @@ The function must be defined in this form: The callback mechanism is as follows: * Inside `func`, the only interesting member of the structure - given by `opt` is the void pointer `opt->value`. - `\*opt->value` will be the value that is saved into `var`, if you + given by `opt` is the void pointer `opt\->value`. + `\*opt\->value` will be the value that is saved into `var`, if you use `OPT_CALLBACK()`. - For example, do `*(unsigned long *)opt->value = 42;` to get 42 + For example, do `*(unsigned long *)opt\->value = 42;` to get 42 into an `unsigned long` variable. * Return value `0` indicates success and non-zero return diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt new file mode 100644 index 0000000000..dbbea95db7 --- /dev/null +++ b/Documentation/technical/api-ref-iteration.txt @@ -0,0 +1,81 @@ +ref iteration API +================= + + +Iteration of refs is done by using an iterate function which will call a +callback function for every ref. The callback function has this +signature: + + int handle_one_ref(const char *refname, const unsigned char *sha1, + int flags, void *cb_data); + +There are different kinds of iterate functions which all take a +callback of this type. The callback is then called for each found ref +until the callback returns nonzero. The returned value is then also +returned by the iterate function. + +Iteration functions +------------------- + +* `head_ref()` just iterates the head ref. + +* `for_each_ref()` iterates all refs. + +* `for_each_ref_in()` iterates all refs which have a defined prefix and + strips that prefix from the passed variable refname. + +* `for_each_tag_ref()`, `for_each_branch_ref()`, `for_each_remote_ref()`, + `for_each_replace_ref()` iterate refs from the respective area. + +* `for_each_glob_ref()` iterates all refs that match the specified glob + pattern. + +* `for_each_glob_ref_in()` the previous and `for_each_ref_in()` combined. + +* `head_ref_submodule()`, `for_each_ref_submodule()`, + `for_each_ref_in_submodule()`, `for_each_tag_ref_submodule()`, + `for_each_branch_ref_submodule()`, `for_each_remote_ref_submodule()` + do the same as the functions descibed above but for a specified + submodule. + +* `for_each_rawref()` can be used to learn about broken ref and symref. + +* `for_each_reflog()` iterates each reflog file. + +Submodules +---------- + +If you want to iterate the refs of a submodule you first need to add the +submodules object database. You can do this by a code-snippet like +this: + + const char *path = "path/to/submodule" + if (!add_submodule_odb(path)) + die("Error submodule '%s' not populated.", path); + +`add_submodule_odb()` will return an non-zero value on success. If you +do not do this you will get an error for each ref that it does not point +to a valid object. + +Note: As a side-effect of this you can not safely assume that all +objects you lookup are available in superproject. All submodule objects +will be available the same way as the superprojects objects. + +Example: +-------- + +---- +static int handle_remote_ref(const char *refname, + const unsigned char *sha1, int flags, void *cb_data) +{ + struct strbuf *output = cb_data; + strbuf_addf(output, "%s\n", refname); + return 0; +} + +... + + struct strbuf output = STRBUF_INIT; + for_each_remote_ref(handle_remote_ref, &output); + printf("%s", output.buf); +---- diff --git a/Documentation/technical/api-run-command.txt b/Documentation/technical/api-run-command.txt index 68bf4cad8b..f18b4f4817 100644 --- a/Documentation/technical/api-run-command.txt +++ b/Documentation/technical/api-run-command.txt @@ -64,8 +64,8 @@ The functions above do the following: `start_async`:: Run a function asynchronously. Takes a pointer to a `struct - async` that specifies the details and returns a pipe FD - from which the caller reads. See below for details. + async` that specifies the details and returns a set of pipe FDs + for communication with the function. See below for details. `finish_async`:: @@ -135,7 +135,7 @@ stderr as follows: .in: The FD must be readable; it becomes child's stdin. .out: The FD must be writable; it becomes child's stdout. - .err > 0 is not supported. + .err: The FD must be writable; it becomes child's stderr. The specified FD is closed by start_command(), even if it fails to run the sub-process! @@ -180,17 +180,47 @@ The caller: struct async variable; 2. initializes .proc and .data; 3. calls start_async(); -4. processes the data by reading from the fd in .out; -5. closes .out; +4. processes communicates with proc through .in and .out; +5. closes .in and .out; 6. calls finish_async(). +The members .in, .out are used to provide a set of fd's for +communication between the caller and the callee as follows: + +. Specify 0 to have no file descriptor passed. The callee will + receive -1 in the corresponding argument. + +. Specify < 0 to have a pipe allocated; start_async() replaces + with the pipe FD in the following way: + + .in: Returns the writable pipe end into which the caller + writes; the readable end of the pipe becomes the function's + in argument. + + .out: Returns the readable pipe end from which the caller + reads; the writable end of the pipe becomes the function's + out argument. + + The caller of start_async() must close the returned FDs after it + has completed reading from/writing from them. + +. Specify a file descriptor > 0 to be used by the function: + + .in: The FD must be readable; it becomes the function's in. + .out: The FD must be writable; it becomes the function's out. + + The specified FD is closed by start_async(), even if it fails to + run the function. + The function pointer in .proc has the following signature: - int proc(int fd, void *data); + int proc(int in, int out, void *data); -. fd specifies a writable file descriptor to which the function must - write the data that it produces. The function *must* close this - descriptor before it returns. +. in, out specifies a set of file descriptors to which the function + must read/write the data that it needs/produces. The function + *must* close these descriptors before it returns. A descriptor + may be -1 if the caller did not configure a descriptor for that + direction. . data is the value that the caller has specified in the .data member of struct async. @@ -201,12 +231,13 @@ The function pointer in .proc has the following signature: There are serious restrictions on what the asynchronous function can do -because this facility is implemented by a pipe to a forked process on -UNIX, but by a thread in the same address space on Windows: +because this facility is implemented by a thread in the same address +space on most platforms (when pthreads is available), but by a pipe to +a forked process otherwise: . It cannot change the program's state (global variables, environment, - etc.) in a way that the caller notices; in other words, .out is the - only communication channel to the caller. + etc.) in a way that the caller notices; in other words, .in and .out + are the only communication channels to the caller. . It must not change the program's state that the caller of the facility also uses. diff --git a/Documentation/technical/api-sha1-array.txt b/Documentation/technical/api-sha1-array.txt new file mode 100644 index 0000000000..4a4bae8109 --- /dev/null +++ b/Documentation/technical/api-sha1-array.txt @@ -0,0 +1,79 @@ +sha1-array API +============== + +The sha1-array API provides storage and manipulation of sets of SHA1 +identifiers. The emphasis is on storage and processing efficiency, +making them suitable for large lists. Note that the ordering of items is +not preserved over some operations. + +Data Structures +--------------- + +`struct sha1_array`:: + + A single array of SHA1 hashes. This should be initialized by + assignment from `SHA1_ARRAY_INIT`. The `sha1` member contains + the actual data. The `nr` member contains the number of items in + the set. The `alloc` and `sorted` members are used internally, + and should not be needed by API callers. + +Functions +--------- + +`sha1_array_append`:: + Add an item to the set. The sha1 will be placed at the end of + the array (but note that some operations below may lose this + ordering). + +`sha1_array_sort`:: + Sort the elements in the array. + +`sha1_array_lookup`:: + Perform a binary search of the array for a specific sha1. + If found, returns the offset (in number of elements) of the + sha1. If not found, returns a negative integer. If the array is + not sorted, this function has the side effect of sorting it. + +`sha1_array_clear`:: + Free all memory associated with the array and return it to the + initial, empty state. + +`sha1_array_for_each_unique`:: + Efficiently iterate over each unique element of the list, + executing the callback function for each one. If the array is + not sorted, this function has the side effect of sorting it. + +Examples +-------- + +----------------------------------------- +void print_callback(const unsigned char sha1[20], + void *data) +{ + printf("%s\n", sha1_to_hex(sha1)); +} + +void some_func(void) +{ + struct sha1_array hashes = SHA1_ARRAY_INIT; + unsigned char sha1[20]; + + /* Read objects into our set */ + while (read_object_from_stdin(sha1)) + sha1_array_append(&hashes, sha1); + + /* Check if some objects are in our set */ + while (read_object_from_stdin(sha1)) { + if (sha1_array_lookup(&hashes, sha1) >= 0) + printf("it's in there!\n"); + + /* + * Print the unique set of objects. We could also have + * avoided adding duplicate objects in the first place, + * but we would end up re-sorting the array repeatedly. + * Instead, this will sort once and then skip duplicates + * in linear time. + */ + sha1_array_for_each_unique(&hashes, print_callback, NULL); +} +----------------------------------------- diff --git a/Documentation/technical/api-sigchain.txt b/Documentation/technical/api-sigchain.txt new file mode 100644 index 0000000000..9e1189ef01 --- /dev/null +++ b/Documentation/technical/api-sigchain.txt @@ -0,0 +1,41 @@ +sigchain API +============ + +Code often wants to set a signal handler to clean up temporary files or +other work-in-progress when we die unexpectedly. For multiple pieces of +code to do this without conflicting, each piece of code must remember +the old value of the handler and restore it either when: + + 1. The work-in-progress is finished, and the handler is no longer + necessary. The handler should revert to the original behavior + (either another handler, SIG_DFL, or SIG_IGN). + + 2. The signal is received. We should then do our cleanup, then chain + to the next handler (or die if it is SIG_DFL). + +Sigchain is a tiny library for keeping a stack of handlers. Your handler +and installation code should look something like: + +------------------------------------------ + void clean_foo_on_signal(int sig) + { + clean_foo(); + sigchain_pop(sig); + raise(sig); + } + + void other_func() + { + sigchain_push_common(clean_foo_on_signal); + mess_up_foo(); + clean_foo(); + } +------------------------------------------ + +Handlers are given the typedef of sigchain_fun. This is the same type +that is given to signal() or sigaction(). It is perfectly reasonable to +push SIG_DFL or SIG_IGN onto the stack. + +You can sigchain_push and sigchain_pop individual signals. For +convenience, sigchain_push_common will push the handler onto the stack +for many common signals. diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt index 293bb15d20..ce24eb96f5 100644 --- a/Documentation/technical/api-string-list.txt +++ b/Documentation/technical/api-string-list.txt @@ -29,6 +29,9 @@ member (you need this if you add things later) and you should set the . Can sort an unsorted list using `sort_string_list`. +. Can remove individual items of an unsorted list using + `unsorted_string_list_delete_item`. + . Finally it should free the list using `string_list_clear`. Example: @@ -38,8 +41,8 @@ struct string_list list; int i; memset(&list, 0, sizeof(struct string_list)); -string_list_append("foo", &list); -string_list_append("bar", &list); +string_list_append(&list, "foo"); +string_list_append(&list, "bar"); for (i = 0; i < list.nr; i++) printf("%s\n", list.items[i].string) ---- @@ -104,10 +107,21 @@ write `string_list_insert(...)->util = ...;`. `unsorted_string_list_has_string`:: It's like `string_list_has_string()` but for unsorted lists. + +`unsorted_string_list_lookup`:: + + It's like `string_list_lookup()` but for unsorted lists. + -This function needs to look through all items, as opposed to its +The above two functions need to look through all items, as opposed to their counterpart for sorted lists, which performs a binary search. +`unsorted_string_list_delete_item`:: + + Remove an item from a string_list. The `string` pointer of the items + will be freed in case the `strdup_strings` member of the string_list + is set. The third parameter controls if the `util` pointer of the + items should be freed or not. + Data structures --------------- diff --git a/Documentation/technical/api-tree-walking.txt b/Documentation/technical/api-tree-walking.txt index 55b728632c..14af37c3f1 100644 --- a/Documentation/technical/api-tree-walking.txt +++ b/Documentation/technical/api-tree-walking.txt @@ -42,6 +42,8 @@ information. * `data` can be anything the `fn` callback would want to use. +* `show_all_errors` tells whether to stop at the first error or not. + Initializing ------------ diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt new file mode 100644 index 0000000000..8930b3fabc --- /dev/null +++ b/Documentation/technical/index-format.txt @@ -0,0 +1,186 @@ +GIT index format +================ + += The git index file has the following format + + All binary numbers are in network byte order. Version 2 is described + here unless stated otherwise. + + - A 12-byte header consisting of + + 4-byte signature: + The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") + + 4-byte version number: + The current supported versions are 2 and 3. + + 32-bit number of index entries. + + - A number of sorted index entries (see below). + + - Extensions + + Extensions are identified by signature. Optional extensions can + be ignored if GIT does not understand them. + + GIT currently supports cached tree and resolve undo extensions. + + 4-byte extension signature. If the first byte is 'A'..'Z' the + extension is optional and can be ignored. + + 32-bit size of the extension + + Extension data + + - 160-bit SHA-1 over the content of the index file before this + checksum. + +== Index entry + + Index entries are sorted in ascending order on the name field, + interpreted as a string of unsigned bytes (i.e. memcmp() order, no + localization, no special casing of directory separator '/'). Entries + with the same name are sorted by their stage field. + + 32-bit ctime seconds, the last time a file's metadata changed + this is stat(2) data + + 32-bit ctime nanosecond fractions + this is stat(2) data + + 32-bit mtime seconds, the last time a file's data changed + this is stat(2) data + + 32-bit mtime nanosecond fractions + this is stat(2) data + + 32-bit dev + this is stat(2) data + + 32-bit ino + this is stat(2) data + + 32-bit mode, split into (high to low bits) + + 4-bit object type + valid values in binary are 1000 (regular file), 1010 (symbolic link) + and 1110 (gitlink) + + 3-bit unused + + 9-bit unix permission. Only 0755 and 0644 are valid for regular files. + Symbolic links and gitlinks have value 0 in this field. + + 32-bit uid + this is stat(2) data + + 32-bit gid + this is stat(2) data + + 32-bit file size + This is the on-disk size from stat(2), truncated to 32-bit. + + 160-bit SHA-1 for the represented object + + A 16-bit 'flags' field split into (high to low bits) + + 1-bit assume-valid flag + + 1-bit extended flag (must be zero in version 2) + + 2-bit stage (during merge) + + 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF + is stored in this field. + + (Version 3) A 16-bit field, only applicable if the "extended flag" + above is 1, split into (high to low bits). + + 1-bit reserved for future + + 1-bit skip-worktree flag (used by sparse checkout) + + 1-bit intent-to-add flag (used by "git add -N") + + 13-bit unused, must be zero + + Entry path name (variable length) relative to top level directory + (without leading slash). '/' is used as path separator. The special + path components ".", ".." and ".git" (without quotes) are disallowed. + Trailing slash is also disallowed. + + The exact encoding is undefined, but the '.' and '/' characters + are encoded in 7-bit ASCII and the encoding cannot contain a NUL + byte (iow, this is a UNIX pathname). + + 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes + while keeping the name NUL-terminated. + +== Extensions + +=== Cached tree + + Cached tree extension contains pre-computed hashes for trees that can + be derived from the index. It helps speed up tree object generation + from index for a new commit. + + When a path is updated in index, the path must be invalidated and + removed from tree cache. + + The signature for this extension is { 'T', 'R', 'E', 'E' }. + + A series of entries fill the entire extension; each of which + consists of: + + - NUL-terminated path component (relative to its parent directory); + + - ASCII decimal number of entries in the index that is covered by the + tree this entry represents (entry_count); + + - A space (ASCII 32); + + - ASCII decimal number that represents the number of subtrees this + tree has; + + - A newline (ASCII 10); and + + - 160-bit object name for the object that would result from writing + this span of index as a tree. + + An entry can be in an invalidated state and is represented by having + -1 in the entry_count field. In this case, there is no object name + and the next entry starts immediately after the newline. + + The entries are written out in the top-down, depth-first order. The + first entry represents the root level of the repository, followed by the + first subtree---let's call this A---of the root level (with its name + relative to the root level), followed by the first subtree of A (with + its name relative to A), ... + +=== Resolve undo + + A conflict is represented in the index as a set of higher stage entries. + When a conflict is resolved (e.g. with "git add path"), these higher + stage entries will be removed and a stage-0 entry with proper resoluton + is added. + + When these higher stage entries are removed, they are saved in the + resolve undo extension, so that conflicts can be recreated (e.g. with + "git checkout -m"), in case users want to redo a conflict resolution + from scratch. + + The signature for this extension is { 'R', 'E', 'U', 'C' }. + + A series of entries fill the entire extension; each of which + consists of: + + - NUL-terminated pathname the entry describes (relative to the root of + the repository, i.e. full pathname); + + - Three NUL-terminated ASCII octal numbers, entry mode of entries in + stage 1 to 3 (a missing stage is represented by "0" in this field); + and + + - At most three 160-bit object names of the entry in stages from 1 to 3 + (nothing is written for a missing stage). + diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index 9a5cdafa9c..a7004c63e7 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -36,7 +36,7 @@ Git Transport The Git transport starts off by sending the command and repository on the wire using the pkt-line format, followed by a NUL byte and a -hostname paramater, terminated by a NUL byte. +hostname parameter, terminated by a NUL byte. 0032git-upload-pack /project.git\0host=myserver.com\0 @@ -179,34 +179,36 @@ and descriptions. Packfile Negotiation -------------------- -After reference and capabilities discovery, the client can decide -to terminate the connection by sending a flush-pkt, telling the -server it can now gracefully terminate (as happens with the ls-remote -command) or it can enter the negotiation phase, where the client and -server determine what the minimal packfile necessary for transport is. - -Once the client has the initial list of references that the server -has, as well as the list of capabilities, it will begin telling the -server what objects it wants and what objects it has, so the server -can make a packfile that only contains the objects that the client needs. -The client will also send a list of the capabilities it wants to be in -effect, out of what the server said it could do with the first 'want' line. +After reference and capabilities discovery, the client can decide to +terminate the connection by sending a flush-pkt, telling the server it can +now gracefully terminate, and disconnect, when it does not need any pack +data. This can happen with the ls-remote command, and also can happen when +the client already is up-to-date. + +Otherwise, it enters the negotiation phase, where the client and +server determine what the minimal packfile necessary for transport is, +by telling the server what objects it wants, its shallow objects +(if any), and the maximum commit depth it wants (if any). The client +will also send a list of the capabilities it wants to be in effect, +out of what the server said it could do with the first 'want' line. ---- upload-request = want-list - have-list - compute-end + *shallow-line + *1depth-request + flush-pkt want-list = first-want *additional-want - flush-pkt + + shallow-line = PKT_LINE("shallow" SP obj-id) + + depth-request = PKT_LINE("deepen" SP depth) first-want = PKT-LINE("want" SP obj-id SP capability-list LF) additional-want = PKT-LINE("want" SP obj-id LF) - have-list = *have-line - have-line = PKT-LINE("have" SP obj-id LF) - compute-end = flush-pkt / PKT-LINE("done") + depth = 1*DIGIT ---- Clients MUST send all the obj-ids it wants from the reference @@ -215,21 +217,64 @@ discovery phase as 'want' lines. Clients MUST send at least one obj-id in a 'want' command which did not appear in the response obtained through ref discovery. -If client is requesting a shallow clone, it will now send a 'deepen' -line with the depth it is requesting. +The client MUST write all obj-ids which it only has shallow copies +of (meaning that it does not have the parents of a commit) as +'shallow' lines so that the server is aware of the limitations of +the client's history. Clients MUST NOT mention an obj-id which +it does not know exists on the server. + +The client now sends the maximum commit history depth it wants for +this transaction, which is the number of commits it wants from the +tip of the history, if any, as a 'deepen' line. A depth of 0 is the +same as not making a depth request. The client does not want to receive +any commits beyond this depth, nor objects needed only to complete +those commits. Commits whose parents are not received as a result are +defined as shallow and marked as such in the server. This information +is sent back to the client in the next step. + +Once all the 'want's and 'shallow's (and optional 'deepen') are +transferred, clients MUST send a flush-pkt, to tell the server side +that it is done sending the list. + +Otherwise, if the client sent a positive depth request, the server +will determine which commits will and will not be shallow and +send this information to the client. If the client did not request +a positive depth, this step is skipped. -Once all the "want"s (and optional 'deepen') are transferred, -clients MUST send a flush-pkt. If the client has all the references -on the server, client flushes and disconnects. +---- + shallow-update = *shallow-line + *unshallow-line + flush-pkt -TODO: shallow/unshallow response and document the deepen command in the ABNF. + shallow-line = PKT-LINE("shallow" SP obj-id) + + unshallow-line = PKT-LINE("unshallow" SP obj-id) +---- + +If the client has requested a positive depth, the server will compute +the set of commits which are no deeper than the desired depth, starting +at the client's wants. The server writes 'shallow' lines for each +commit whose parents will not be sent as a result. The server writes +an 'unshallow' line for each commit which the client has indicated is +shallow, but is no longer shallow at the currently requested depth +(that is, its parents will now be sent). The server MUST NOT mark +as unshallow anything which the client has not indicated was shallow. Now the client will send a list of the obj-ids it has using 'have' -lines. In multi_ack mode, the canonical implementation will send up -to 32 of these at a time, then will send a flush-pkt. The canonical -implementation will skip ahead and send the next 32 immediately, -so that there is always a block of 32 "in-flight on the wire" at a -time. +lines, so the server can make a packfile that only contains the objects +that the client needs. In multi_ack mode, the canonical implementation +will send up to 32 of these at a time, then will send a flush-pkt. The +canonical implementation will skip ahead and send the next 32 immediately, +so that there is always a block of 32 "in-flight on the wire" at a time. + +---- + upload-haves = have-list + compute-end + + have-list = *have-line + have-line = PKT-LINE("have" SP obj-id LF) + compute-end = flush-pkt / PKT-LINE("done") +---- If the server reads 'have' lines, it then will respond by ACKing any of the obj-ids the client said it had that the server also has. The @@ -331,7 +376,7 @@ An incremental update (fetch) response might look like this: C: 0009done\n - S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d\n + S: 0031ACK 74730d410fcb6603ace96f1dc55ea6196122532d\n S: [PACKFILE] ---- @@ -488,7 +533,7 @@ An example client/server communication might look like this: C: 0000 C: [PACKDATA] - S: 000aunpack ok\n - S: 0014ok refs/heads/debug\n - S: 0026ng refs/heads/master non-fast-forward\n + S: 000eunpack ok\n + S: 0018ok refs/heads/debug\n + S: 002ang refs/heads/master non-fast-forward\n ---- diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index fd1a593149..b15517fa06 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -119,7 +119,7 @@ both. ofs-delta --------- -Server can send, and client understand PACKv2 with delta refering to +Server can send, and client understand PACKv2 with delta referring to its base by position in pack rather than by an obj-id. That is, they can send/read OBJ_OFS_DELTA (aka type 6) in a packfile. |