diff options
Diffstat (limited to 'Documentation/technical')
25 files changed, 2577 insertions, 245 deletions
diff --git a/Documentation/technical/api-argv-array.txt b/Documentation/technical/api-argv-array.txt new file mode 100644 index 0000000000..1b7d8f140c --- /dev/null +++ b/Documentation/technical/api-argv-array.txt @@ -0,0 +1,51 @@ +argv-array API +============== + +The argv-array API allows one to dynamically build and store +NULL-terminated lists. An argv-array maintains the invariant that the +`argv` member always points to a non-NULL array, and that the array is +always NULL-terminated at the element pointed to by `argv[argc]`. This +makes the result suitable for passing to functions expecting to receive +argv from main(), or the link:api-run-command.html[run-command API]. + +The link:api-string-list.html[string-list API] is similar, but cannot be +used for these purposes; instead of storing a straight string pointer, +it contains an item structure with a `util` field that is not compatible +with the traditional argv interface. + +Each `argv_array` manages its own memory. Any strings pushed into the +array are duplicated, and all memory is freed by argv_array_clear(). + +Data Structures +--------------- + +`struct argv_array`:: + + A single array. This should be initialized by assignment from + `ARGV_ARRAY_INIT`, or by calling `argv_array_init`. The `argv` + member contains the actual array; the `argc` member contains the + number of elements in the array, not including the terminating + NULL. + +Functions +--------- + +`argv_array_init`:: + Initialize an array. This is no different than assigning from + `ARGV_ARRAY_INIT`. + +`argv_array_push`:: + Push a copy of a string onto the end of the array. + +`argv_array_pushl`:: + Push a list of strings onto the end of the array. The arguments + should be a list of `const char *` strings, terminated by a NULL + argument. + +`argv_array_pushf`:: + Format a string and push it onto the end of the array. This is a + convenience wrapper combining `strbuf_addf` and `argv_array_push`. + +`argv_array_clear`:: + Free all memory associated with the array and return it to the + initial, empty state. diff --git a/Documentation/technical/api-builtin.txt b/Documentation/technical/api-builtin.txt index 52cdb4c520..b0cafe87be 100644 --- a/Documentation/technical/api-builtin.txt +++ b/Documentation/technical/api-builtin.txt @@ -4,7 +4,7 @@ builtin API Adding a new built-in --------------------- -There are 4 things to do to add a bulit-in command implementation to +There are 4 things to do to add a built-in command implementation to git: . Define the implementation of the built-in command `foo` with @@ -18,8 +18,8 @@ git: defined in `git.c`. The entry should look like: { "foo", cmd_foo, <options> }, - - where options is the bitwise-or of: ++ +where options is the bitwise-or of: `RUN_SETUP`:: @@ -33,6 +33,12 @@ git: If the standard output is connected to a tty, spawn a pager and feed our output to it. +`NEED_WORK_TREE`:: + + Make sure there is a work tree, i.e. the command cannot act + on bare repositories. + This only makes sense when `RUN_SETUP` is also set. + . Add `builtin-foo.o` to `BUILTIN_OBJS` in `Makefile`. Additionally, if `foo` is a new command, there are 3 more things to do: @@ -41,8 +47,9 @@ Additionally, if `foo` is a new command, there are 3 more things to do: . Write documentation in `Documentation/git-foo.txt`. -. Add an entry for `git-foo` to the list at the end of - `Documentation/cmd-list.perl`. +. Add an entry for `git-foo` to `command-list.txt`. + +. Add an entry for `/git-foo` to `.gitignore`. How a built-in is called diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt new file mode 100644 index 0000000000..edf8dfb99b --- /dev/null +++ b/Documentation/technical/api-config.txt @@ -0,0 +1,140 @@ +config API +========== + +The config API gives callers a way to access git configuration files +(and files which have the same syntax). See linkgit:git-config[1] for a +discussion of the config file syntax. + +General Usage +------------- + +Config files are parsed linearly, and each variable found is passed to a +caller-provided callback function. The callback function is responsible +for any actions to be taken on the config option, and is free to ignore +some options. It is not uncommon for the configuration to be parsed +several times during the run of a git program, with different callbacks +picking out different variables useful to themselves. + +A config callback function takes three parameters: + +- the name of the parsed variable. This is in canonical "flat" form: the + section, subsection, and variable segments will be separated by dots, + and the section and variable segments will be all lowercase. E.g., + `core.ignorecase`, `diff.SomeType.textconv`. + +- the value of the found variable, as a string. If the variable had no + value specified, the value will be NULL (typically this means it + should be interpreted as boolean true). + +- a void pointer passed in by the caller of the config API; this can + contain callback-specific data + +A config callback should return 0 for success, or -1 if the variable +could not be parsed properly. + +Basic Config Querying +--------------------- + +Most programs will simply want to look up variables in all config files +that git knows about, using the normal precedence rules. To do this, +call `git_config` with a callback function and void data pointer. + +`git_config` will read all config sources in order of increasing +priority. Thus a callback should typically overwrite previously-seen +entries with new ones (e.g., if both the user-wide `~/.gitconfig` and +repo-specific `.git/config` contain `color.ui`, the config machinery +will first feed the user-wide one to the callback, and then the +repo-specific one; by overwriting, the higher-priority repo-specific +value is left at the end). + +The `git_config_with_options` function lets the caller examine config +while adjusting some of the default behavior of `git_config`. It should +almost never be used by "regular" git code that is looking up +configuration variables. It is intended for advanced callers like +`git-config`, which are intentionally tweaking the normal config-lookup +process. It takes two extra parameters: + +`filename`:: +If this parameter is non-NULL, it specifies the name of a file to +parse for configuration, rather than looking in the usual files. Regular +`git_config` defaults to `NULL`. + +`respect_includes`:: +Specify whether include directives should be followed in parsed files. +Regular `git_config` defaults to `1`. + +There is a special version of `git_config` called `git_config_early`. +This version takes an additional parameter to specify the repository +config, instead of having it looked up via `git_path`. This is useful +early in a git program before the repository has been found. Unless +you're working with early setup code, you probably don't want to use +this. + +Reading Specific Files +---------------------- + +To read a specific file in git-config format, use +`git_config_from_file`. This takes the same callback and data parameters +as `git_config`. + +Value Parsing Helpers +--------------------- + +To aid in parsing string values, the config API provides callbacks with +a number of helper functions, including: + +`git_config_int`:: +Parse the string to an integer, including unit factors. Dies on error; +otherwise, returns the parsed result. + +`git_config_ulong`:: +Identical to `git_config_int`, but for unsigned longs. + +`git_config_bool`:: +Parse a string into a boolean value, respecting keywords like "true" and +"false". Integer values are converted into true/false values (when they +are non-zero or zero, respectively). Other values cause a die(). If +parsing is successful, the return value is the result. + +`git_config_bool_or_int`:: +Same as `git_config_bool`, except that integers are returned as-is, and +an `is_bool` flag is unset. + +`git_config_maybe_bool`:: +Same as `git_config_bool`, except that it returns -1 on error rather +than dying. + +`git_config_string`:: +Allocates and copies the value string into the `dest` parameter; if no +string is given, prints an error message and returns -1. + +`git_config_pathname`:: +Similar to `git_config_string`, but expands `~` or `~user` into the +user's home directory when found at the beginning of the path. + +Include Directives +------------------ + +By default, the config parser does not respect include directives. +However, a caller can use the special `git_config_include` wrapper +callback to support them. To do so, you simply wrap your "real" callback +function and data pointer in a `struct config_include_data`, and pass +the wrapper to the regular config-reading functions. For example: + +------------------------------------------- +int read_file_with_include(const char *file, config_fn_t fn, void *data) +{ + struct config_include_data inc = CONFIG_INCLUDE_INIT; + inc.fn = fn; + inc.data = data; + return git_config_from_file(git_config_include, file, &inc); +} +------------------------------------------- + +`git_config` respects includes automatically. The lower-level +`git_config_from_file` does not. + +Writing Config Files +-------------------- + +TODO diff --git a/Documentation/technical/api-credentials.txt b/Documentation/technical/api-credentials.txt new file mode 100644 index 0000000000..21ca6a2553 --- /dev/null +++ b/Documentation/technical/api-credentials.txt @@ -0,0 +1,245 @@ +credentials API +=============== + +The credentials API provides an abstracted way of gathering username and +password credentials from the user (even though credentials in the wider +world can take many forms, in this document the word "credential" always +refers to a username and password pair). + +Data Structures +--------------- + +`struct credential`:: + + This struct represents a single username/password combination + along with any associated context. All string fields should be + heap-allocated (or NULL if they are not known or not applicable). + The meaning of the individual context fields is the same as + their counterparts in the helper protocol; see the section below + for a description of each field. ++ +The `helpers` member of the struct is a `string_list` of helpers. Each +string specifies an external helper which will be run, in order, to +either acquire or store credentials. See the section on credential +helpers below. ++ +This struct should always be initialized with `CREDENTIAL_INIT` or +`credential_init`. + + +Functions +--------- + +`credential_init`:: + + Initialize a credential structure, setting all fields to empty. + +`credential_clear`:: + + Free any resources associated with the credential structure, + returning it to a pristine initialized state. + +`credential_fill`:: + + Instruct the credential subsystem to fill the username and + password fields of the passed credential struct by first + consulting helpers, then asking the user. After this function + returns, the username and password fields of the credential are + guaranteed to be non-NULL. If an error occurs, the function will + die(). + +`credential_reject`:: + + Inform the credential subsystem that the provided credentials + have been rejected. This will cause the credential subsystem to + notify any helpers of the rejection (which allows them, for + example, to purge the invalid credentials from storage). It + will also free() the username and password fields of the + credential and set them to NULL (readying the credential for + another call to `credential_fill`). Any errors from helpers are + ignored. + +`credential_approve`:: + + Inform the credential subsystem that the provided credentials + were successfully used for authentication. This will cause the + credential subsystem to notify any helpers of the approval, so + that they may store the result to be used again. Any errors + from helpers are ignored. + +`credential_from_url`:: + + Parse a URL into broken-down credential fields. + +Example +------- + +The example below shows how the functions of the credential API could be +used to login to a fictitious "foo" service on a remote host: + +----------------------------------------------------------------------- +int foo_login(struct foo_connection *f) +{ + int status; + /* + * Create a credential with some context; we don't yet know the + * username or password. + */ + + struct credential c = CREDENTIAL_INIT; + c.protocol = xstrdup("foo"); + c.host = xstrdup(f->hostname); + + /* + * Fill in the username and password fields by contacting + * helpers and/or asking the user. The function will die if it + * fails. + */ + credential_fill(&c); + + /* + * Otherwise, we have a username and password. Try to use it. + */ + status = send_foo_login(f, c.username, c.password); + switch (status) { + case FOO_OK: + /* It worked. Store the credential for later use. */ + credential_accept(&c); + break; + case FOO_BAD_LOGIN: + /* Erase the credential from storage so we don't try it + * again. */ + credential_reject(&c); + break; + default: + /* + * Some other error occured. We don't know if the + * credential is good or bad, so report nothing to the + * credential subsystem. + */ + } + + /* Free any associated resources. */ + credential_clear(&c); + + return status; +} +----------------------------------------------------------------------- + + +Credential Helpers +------------------ + +Credential helpers are programs executed by git to fetch or save +credentials from and to long-term storage (where "long-term" is simply +longer than a single git process; e.g., credentials may be stored +in-memory for a few minutes, or indefinitely on disk). + +Each helper is specified by a single string. The string is transformed +by git into a command to be executed using these rules: + + 1. If the helper string begins with "!", it is considered a shell + snippet, and everything after the "!" becomes the command. + + 2. Otherwise, if the helper string begins with an absolute path, the + verbatim helper string becomes the command. + + 3. Otherwise, the string "git credential-" is prepended to the helper + string, and the result becomes the command. + +The resulting command then has an "operation" argument appended to it +(see below for details), and the result is executed by the shell. + +Here are some example specifications: + +---------------------------------------------------- +# run "git credential-foo" +foo + +# same as above, but pass an argument to the helper +foo --bar=baz + +# the arguments are parsed by the shell, so use shell +# quoting if necessary +foo --bar="whitespace arg" + +# you can also use an absolute path, which will not use the git wrapper +/path/to/my/helper --with-arguments + +# or you can specify your own shell snippet +!f() { echo "password=`cat $HOME/.secret`"; }; f +---------------------------------------------------- + +Generally speaking, rule (3) above is the simplest for users to specify. +Authors of credential helpers should make an effort to assist their +users by naming their program "git-credential-$NAME", and putting it in +the $PATH or $GIT_EXEC_PATH during installation, which will allow a user +to enable it with `git config credential.helper $NAME`. + +When a helper is executed, it will have one "operation" argument +appended to its command line, which is one of: + +`get`:: + + Return a matching credential, if any exists. + +`store`:: + + Store the credential, if applicable to the helper. + +`erase`:: + + Remove a matching credential, if any, from the helper's storage. + +The details of the credential will be provided on the helper's stdin +stream. The credential is split into a set of named attributes. +Attributes are provided to the helper, one per line. Each attribute is +specified by a key-value pair, separated by an `=` (equals) sign, +followed by a newline. The key may contain any bytes except `=`, +newline, or NUL. The value may contain any bytes except newline or NUL. +In both cases, all bytes are treated as-is (i.e., there is no quoting, +and one cannot transmit a value with newline or NUL in it). The list of +attributes is terminated by a blank line or end-of-file. + +Git will send the following attributes (but may not send all of +them for a given credential; for example, a `host` attribute makes no +sense when dealing with a non-network protocol): + +`protocol`:: + + The protocol over which the credential will be used (e.g., + `https`). + +`host`:: + + The remote hostname for a network credential. + +`path`:: + + The path with which the credential will be used. E.g., for + accessing a remote https repository, this will be the + repository's path on the server. + +`username`:: + + The credential's username, if we already have one (e.g., from a + URL, from the user, or from a previously run helper). + +`password`:: + + The credential's password, if we are asking it to be stored. + +For a `get` operation, the helper should produce a list of attributes +on stdout in the same format. A helper is free to produce a subset, or +even no values at all if it has nothing useful to provide. Any provided +attributes will overwrite those already known about by git. + +For a `store` or `erase` operation, the helper's output is ignored. +If it fails to perform the requested operation, it may complain to +stderr to inform the user. If it does not support the requested +operation (e.g., a read-only store), it should silently ignore the +request. + +If a helper receives any other operation, it should silently ignore the +request. This leaves room for future operations to be added (older +helpers will just ignore the new requests). diff --git a/Documentation/technical/api-diff.txt b/Documentation/technical/api-diff.txt index 20b0241d30..2d2ebc04b7 100644 --- a/Documentation/technical/api-diff.txt +++ b/Documentation/technical/api-diff.txt @@ -32,7 +32,7 @@ Calling sequence * As you find different pairs of files, call `diff_change()` to feed modified files, `diff_addremove()` to feed created or deleted files, - or `diff_unmerged()` to feed a file whose state is 'unmerged' to the + or `diff_unmerge()` to feed a file whose state is 'unmerged' to the API. These are thin wrappers to a lower-level `diff_queue()` function that is flexible enough to record any of these kinds of changes. @@ -50,7 +50,7 @@ Data structures This is the internal representation for a single file (blob). It records the blob object name (if known -- for a work tree file it typically is a NUL SHA-1), filemode and pathname. This is what the -`diff_addremove()`, `diff_change()` and `diff_unmerged()` synthesize and +`diff_addremove()`, `diff_change()` and `diff_unmerge()` synthesize and feed `diff_queue()` function with. * `struct diff_filepair` diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt index 5bbd18f020..add6f435b5 100644 --- a/Documentation/technical/api-directory-listing.txt +++ b/Documentation/technical/api-directory-listing.txt @@ -58,6 +58,9 @@ The result of the enumeration is left in these fields:: Calling sequence ---------------- +Note: index may be looked at for .gitignore files that are CE_SKIP_WORKTREE +marked. If you to exclude files, make sure you have loaded index first. + * Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0, sizeof(dir))`. diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt index 9d97eaa9de..ce363b6305 100644 --- a/Documentation/technical/api-gitattributes.txt +++ b/Documentation/technical/api-gitattributes.txt @@ -11,27 +11,15 @@ Data Structure `struct git_attr`:: An attribute is an opaque object that is identified by its name. - Pass the name and its length to `git_attr()` function to obtain - the object of this type. The internal representation of this - structure is of no interest to the calling programs. + Pass the name to `git_attr()` function to obtain the object of + this type. The internal representation of this structure is + of no interest to the calling programs. The name of the + attribute can be retrieved by calling `git_attr_name()`. `struct git_attr_check`:: This structure represents a set of attributes to check in a call - to `git_checkattr()` function, and receives the results. - - -Calling Sequence ----------------- - -* Prepare an array of `struct git_attr_check` to define the list of - attributes you would want to check. To populate this array, you would - need to define necessary attributes by calling `git_attr()` function. - -* Call git_checkattr() to check the attributes for the path. - -* Inspect `git_attr_check` structure to see how each of the attribute in - the array is defined for the path. + to `git_check_attr()` function, and receives the results. Attribute Values @@ -57,6 +45,19 @@ If none of the above returns true, `.value` member points at a string value of the attribute for the path. +Querying Specific Attributes +---------------------------- + +* Prepare an array of `struct git_attr_check` to define the list of + attributes you would want to check. To populate this array, you would + need to define necessary attributes by calling `git_attr()` function. + +* Call `git_check_attr()` to check the attributes for the path. + +* Inspect `git_attr_check` structure to see how each of the attribute in + the array is defined for the path. + + Example ------- @@ -72,18 +73,18 @@ static void setup_check(void) { if (check[0].attr) return; /* already done */ - check[0].attr = git_attr("crlf", 4); - check[1].attr = git_attr("ident", 5); + check[0].attr = git_attr("crlf"); + check[1].attr = git_attr("ident"); } ------------ -. Call `git_checkattr()` with the prepared array of `struct git_attr_check`: +. Call `git_check_attr()` with the prepared array of `struct git_attr_check`: ------------ const char *path; setup_check(); - git_checkattr(path, ARRAY_SIZE(check), check); + git_check_attr(path, ARRAY_SIZE(check), check); ------------ . Act on `.value` member of the result, left in `check[]`: @@ -108,4 +109,20 @@ static void setup_check(void) } ------------ -(JC) + +Querying All Attributes +----------------------- + +To get the values of all attributes associated with a file: + +* Call `git_all_attrs()`, which returns an array of `git_attr_check` + structures. + +* Iterate over the `git_attr_check` array to examine the attribute + names and values. The name of the attribute described by a + `git_attr_check` object can be retrieved via + `git_attr_name(check[i].attr)`. (Please note that no items will be + returned for unset attributes, so `ATTR_UNSET()` will return false + for all returned `git_array_check` objects.) + +* Free the `git_array_check` array. diff --git a/Documentation/technical/api-hash.txt b/Documentation/technical/api-hash.txt index c784d3edcb..e5061e0677 100644 --- a/Documentation/technical/api-hash.txt +++ b/Documentation/technical/api-hash.txt @@ -1,6 +1,52 @@ hash API ======== -Talk about <hash.h> +The hash API is a collection of simple hash table functions. Users are expected +to implement their own hashing. -(Linus) +Data Structures +--------------- + +`struct hash_table`:: + + The hash table structure. The `array` member points to the hash table + entries. The `size` member counts the total number of valid and invalid + entries in the table. The `nr` member keeps track of the number of + valid entries. + +`struct hash_table_entry`:: + + An opaque structure representing an entry in the hash table. The `hash` + member is the entry's hash key and the `ptr` member is the entry's + value. + +Functions +--------- + +`init_hash`:: + + Initialize the hash table. + +`free_hash`:: + + Release memory associated with the hash table. + +`insert_hash`:: + + Insert a pointer into the hash table. If an entry with that hash + already exists, a pointer to the existing entry's value is returned. + Otherwise NULL is returned. This allows callers to implement + chaining, etc. + +`lookup_hash`:: + + Lookup an entry in the hash table. If an entry with that hash exists + the entry's value is returned. Otherwise NULL is returned. + +`for_each_hash`:: + + Call a function for each entry in the hash table. The function is + expected to take the entry's value as its only argument and return an + int. If the function returns a negative int the loop is aborted + immediately. Otherwise, the return value is accumulated and the sum + returned upon completion of the loop. diff --git a/Documentation/technical/api-history-graph.txt b/Documentation/technical/api-history-graph.txt index e9559790a3..d6fc90ac7e 100644 --- a/Documentation/technical/api-history-graph.txt +++ b/Documentation/technical/api-history-graph.txt @@ -11,9 +11,6 @@ Core functions: * `graph_init()` creates a new `struct git_graph` -* `graph_release()` destroys a `struct git_graph`, and frees the memory - associated with it. - * `graph_update()` moves the graph to a new commit. * `graph_next_line()` outputs the next line of the graph into a strbuf. It @@ -134,8 +131,6 @@ while ((commit = get_revision(opts)) != NULL) { putchar(opts->diffopt.line_termination); } } - -graph_release(graph); ------------ Sample output @@ -148,22 +143,22 @@ outputting that information, if desired. ------------ * * -M +* |\ * | | | * | \ \ | \ \ -M-. \ \ +*-. \ \ |\ \ \ \ | | * | | | | | | | * | | | | | * -| | | | | M +| | | | | * | | | | | |\ | | | | | | * | * | | | | | -| | | | | M \ +| | | | | * \ | | | | | |\ | | | | | * | | | | | | | * | | | diff --git a/Documentation/technical/api-merge.txt b/Documentation/technical/api-merge.txt new file mode 100644 index 0000000000..9dc1bed768 --- /dev/null +++ b/Documentation/technical/api-merge.txt @@ -0,0 +1,104 @@ +merge API +========= + +The merge API helps a program to reconcile two competing sets of +improvements to some files (e.g., unregistered changes from the work +tree versus changes involved in switching to a new branch), reporting +conflicts if found. The library called through this API is +responsible for a few things. + + * determining which trees to merge (recursive ancestor consolidation); + + * lining up corresponding files in the trees to be merged (rename + detection, subtree shifting), reporting edge cases like add/add + and rename/rename conflicts to the user; + + * performing a three-way merge of corresponding files, taking + path-specific merge drivers (specified in `.gitattributes`) + into account. + +Data structures +--------------- + +* `mmbuffer_t`, `mmfile_t` + +These store data usable for use by the xdiff backend, for writing and +for reading, respectively. See `xdiff/xdiff.h` for the definitions +and `diff.c` for examples. + +* `struct ll_merge_options` + +This describes the set of options the calling program wants to affect +the operation of a low-level (single file) merge. Some options: + +`virtual_ancestor`:: + Behave as though this were part of a merge between common + ancestors in a recursive merge. + If a helper program is specified by the + `[merge "<driver>"] recursive` configuration, it will + be used (see linkgit:gitattributes[5]). + +`variant`:: + Resolve local conflicts automatically in favor + of one side or the other (as in 'git merge-file' + `--ours`/`--theirs`/`--union`). Can be `0`, + `XDL_MERGE_FAVOR_OURS`, `XDL_MERGE_FAVOR_THEIRS`, or + `XDL_MERGE_FAVOR_UNION`. + +`renormalize`:: + Resmudge and clean the "base", "theirs" and "ours" files + before merging. Use this when the merge is likely to have + overlapped with a change in smudge/clean or end-of-line + normalization rules. + +Low-level (single file) merge +----------------------------- + +`ll_merge`:: + + Perform a three-way single-file merge in core. This is + a thin wrapper around `xdl_merge` that takes the path and + any merge backend specified in `.gitattributes` or + `.git/info/attributes` into account. Returns 0 for a + clean merge. + +Calling sequence: + +* Prepare a `struct ll_merge_options` to record options. + If you have no special requests, skip this and pass `NULL` + as the `opts` parameter to use the default options. + +* Allocate an mmbuffer_t variable for the result. + +* Allocate and fill variables with the file's original content + and two modified versions (using `read_mmfile`, for example). + +* Call `ll_merge()`. + +* Read the merged content from `result_buf.ptr` and `result_buf.size`. + +* Release buffers when finished. A simple + `free(ancestor.ptr); free(ours.ptr); free(theirs.ptr); + free(result_buf.ptr);` will do. + +If the modifications do not merge cleanly, `ll_merge` will return a +nonzero value and `result_buf` will generally include a description of +the conflict bracketed by markers such as the traditional `<<<<<<<` +and `>>>>>>>`. + +The `ancestor_label`, `our_label`, and `their_label` parameters are +used to label the different sides of a conflict if the merge driver +supports this. + +Everything else +--------------- + +Talk about <merge-recursive.h> and merge_file(): + + - merge_trees() to merge with rename detection + - merge_recursive() for ancestor consolidation + - try_merge_command() for other strategies + - conflict format + - merge options + +(Daniel, Miklos, Stephan, JC) diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt index b7cda94f54..3062389404 100644 --- a/Documentation/technical/api-parse-options.txt +++ b/Documentation/technical/api-parse-options.txt @@ -1,6 +1,278 @@ parse-options API ================= -Talk about <parse-options.h> +The parse-options API is used to parse and massage options in git +and to provide a usage help with consistent look. -(Pierre) +Basics +------ + +The argument vector `argv[]` may usually contain mandatory or optional +'non-option arguments', e.g. a filename or a branch, and 'options'. +Options are optional arguments that start with a dash and +that allow to change the behavior of a command. + +* There are basically three types of options: + 'boolean' options, + options with (mandatory) 'arguments' and + options with 'optional arguments' + (i.e. a boolean option that can be adjusted). + +* There are basically two forms of options: + 'Short options' consist of one dash (`-`) and one alphanumeric + character. + 'Long options' begin with two dashes (`--`) and some + alphanumeric characters. + +* Options are case-sensitive. + Please define 'lower-case long options' only. + +The parse-options API allows: + +* 'sticked' and 'separate form' of options with arguments. + `-oArg` is sticked, `-o Arg` is separate form. + `--option=Arg` is sticked, `--option Arg` is separate form. + +* Long options may be 'abbreviated', as long as the abbreviation + is unambiguous. + +* Short options may be bundled, e.g. `-a -b` can be specified as `-ab`. + +* Boolean long options can be 'negated' (or 'unset') by prepending + `no-`, e.g. `--no-abbrev` instead of `--abbrev`. Conversely, + options that begin with `no-` can be 'negated' by removing it. + +* Options and non-option arguments can clearly be separated using the `--` + option, e.g. `-a -b --option -- --this-is-a-file` indicates that + `--this-is-a-file` must not be processed as an option. + +Steps to parse options +---------------------- + +. `#include "parse-options.h"` + +. define a NULL-terminated + `static const char * const builtin_foo_usage[]` array + containing alternative usage strings + +. define `builtin_foo_options` array as described below + in section 'Data Structure'. + +. in `cmd_foo(int argc, const char **argv, const char *prefix)` + call + + argc = parse_options(argc, argv, prefix, builtin_foo_options, builtin_foo_usage, flags); ++ +`parse_options()` will filter out the processed options of `argv[]` and leave the +non-option arguments in `argv[]`. +`argc` is updated appropriately because of the assignment. ++ +You can also pass NULL instead of a usage array as the fifth parameter of +parse_options(), to avoid displaying a help screen with usage info and +option list. This should only be done if necessary, e.g. to implement +a limited parser for only a subset of the options that needs to be run +before the full parser, which in turn shows the full help message. ++ +Flags are the bitwise-or of: + +`PARSE_OPT_KEEP_DASHDASH`:: + Keep the `--` that usually separates options from + non-option arguments. + +`PARSE_OPT_STOP_AT_NON_OPTION`:: + Usually the whole argument vector is massaged and reordered. + Using this flag, processing is stopped at the first non-option + argument. + +`PARSE_OPT_KEEP_ARGV0`:: + Keep the first argument, which contains the program name. It's + removed from argv[] by default. + +`PARSE_OPT_KEEP_UNKNOWN`:: + Keep unknown arguments instead of erroring out. This doesn't + work for all combinations of arguments as users might expect + it to do. E.g. if the first argument in `--unknown --known` + takes a value (which we can't know), the second one is + mistakenly interpreted as a known option. Similarly, if + `PARSE_OPT_STOP_AT_NON_OPTION` is set, the second argument in + `--unknown value` will be mistakenly interpreted as a + non-option, not as a value belonging to the unknown option, + the parser early. That's why parse_options() errors out if + both options are set. + +`PARSE_OPT_NO_INTERNAL_HELP`:: + By default, parse_options() handles `-h`, `--help` and + `--help-all` internally, by showing a help screen. This option + turns it off and allows one to add custom handlers for these + options, or to just leave them unknown. + +Data Structure +-------------- + +The main data structure is an array of the `option` struct, +say `static struct option builtin_add_options[]`. +There are some macros to easily define options: + +`OPT__ABBREV(&int_var)`:: + Add `--abbrev[=<n>]`. + +`OPT__COLOR(&int_var, description)`:: + Add `--color[=<when>]` and `--no-color`. + +`OPT__DRY_RUN(&int_var, description)`:: + Add `-n, --dry-run`. + +`OPT__FORCE(&int_var, description)`:: + Add `-f, --force`. + +`OPT__QUIET(&int_var, description)`:: + Add `-q, --quiet`. + +`OPT__VERBOSE(&int_var, description)`:: + Add `-v, --verbose`. + +`OPT_GROUP(description)`:: + Start an option group. `description` is a short string that + describes the group or an empty string. + Start the description with an upper-case letter. + +`OPT_BOOL(short, long, &int_var, description)`:: + Introduce a boolean option. `int_var` is set to one with + `--option` and set to zero with `--no-option`. + +`OPT_COUNTUP(short, long, &int_var, description)`:: + Introduce a count-up option. + `int_var` is incremented on each use of `--option`, and + reset to zero with `--no-option`. + +`OPT_BIT(short, long, &int_var, description, mask)`:: + Introduce a boolean option. + If used, `int_var` is bitwise-ored with `mask`. + +`OPT_NEGBIT(short, long, &int_var, description, mask)`:: + Introduce a boolean option. + If used, `int_var` is bitwise-anded with the inverted `mask`. + +`OPT_SET_INT(short, long, &int_var, description, integer)`:: + Introduce an integer option. + `int_var` is set to `integer` with `--option`, and + reset to zero with `--no-option`. + +`OPT_SET_PTR(short, long, &ptr_var, description, ptr)`:: + Introduce a boolean option. + If used, set `ptr_var` to `ptr`. + +`OPT_STRING(short, long, &str_var, arg_str, description)`:: + Introduce an option with string argument. + The string argument is put into `str_var`. + +`OPT_INTEGER(short, long, &int_var, description)`:: + Introduce an option with integer argument. + The integer is put into `int_var`. + +`OPT_DATE(short, long, &int_var, description)`:: + Introduce an option with date argument, see `approxidate()`. + The timestamp is put into `int_var`. + +`OPT_CALLBACK(short, long, &var, arg_str, description, func_ptr)`:: + Introduce an option with argument. + The argument will be fed into the function given by `func_ptr` + and the result will be put into `var`. + See 'Option Callbacks' below for a more elaborate description. + +`OPT_FILENAME(short, long, &var, description)`:: + Introduce an option with a filename argument. + The filename will be prefixed by passing the filename along with + the prefix argument of `parse_options()` to `prefix_filename()`. + +`OPT_ARGUMENT(long, description)`:: + Introduce a long-option argument that will be kept in `argv[]`. + +`OPT_NUMBER_CALLBACK(&var, description, func_ptr)`:: + Recognize numerical options like -123 and feed the integer as + if it was an argument to the function given by `func_ptr`. + The result will be put into `var`. There can be only one such + option definition. It cannot be negated and it takes no + arguments. Short options that happen to be digits take + precedence over it. + +`OPT_COLOR_FLAG(short, long, &int_var, description)`:: + Introduce an option that takes an optional argument that can + have one of three values: "always", "never", or "auto". If the + argument is not given, it defaults to "always". The `--no-` form + works like `--long=never`; it cannot take an argument. If + "always", set `int_var` to 1; if "never", set `int_var` to 0; if + "auto", set `int_var` to 1 if stdout is a tty or a pager, + 0 otherwise. + +`OPT_NOOP_NOARG(short, long)`:: + Introduce an option that has no effect and takes no arguments. + Use it to hide deprecated options that are still to be recognized + and ignored silently. + + +The last element of the array must be `OPT_END()`. + +If not stated otherwise, interpret the arguments as follows: + +* `short` is a character for the short option + (e.g. `'e'` for `-e`, use `0` to omit), + +* `long` is a string for the long option + (e.g. `"example"` for `--example`, use `NULL` to omit), + +* `int_var` is an integer variable, + +* `str_var` is a string variable (`char *`), + +* `arg_str` is the string that is shown as argument + (e.g. `"branch"` will result in `<branch>`). + If set to `NULL`, three dots (`...`) will be displayed. + +* `description` is a short string to describe the effect of the option. + It shall begin with a lower-case letter and a full stop (`.`) shall be + omitted at the end. + +Option Callbacks +---------------- + +The function must be defined in this form: + + int func(const struct option *opt, const char *arg, int unset) + +The callback mechanism is as follows: + +* Inside `func`, the only interesting member of the structure + given by `opt` is the void pointer `opt->value`. + `*opt->value` will be the value that is saved into `var`, if you + use `OPT_CALLBACK()`. + For example, do `*(unsigned long *)opt->value = 42;` to get 42 + into an `unsigned long` variable. + +* Return value `0` indicates success and non-zero return + value will invoke `usage_with_options()` and, thus, die. + +* If the user negates the option, `arg` is `NULL` and `unset` is 1. + +Sophisticated option parsing +---------------------------- + +If you need, for example, option callbacks with optional arguments +or without arguments at all, or if you need other special cases, +that are not handled by the macros above, you need to specify the +members of the `option` structure manually. + +This is not covered in this document, but well documented +in `parse-options.h` itself. + +Examples +-------- + +See `test-parse-options.c` and +`builtin-add.c`, +`builtin-clone.c`, +`builtin-commit.c`, +`builtin-fetch.c`, +`builtin-fsck.c`, +`builtin-rm.c` +for real-world examples. diff --git a/Documentation/technical/api-path-list.txt b/Documentation/technical/api-path-list.txt deleted file mode 100644 index 9dbedd0a67..0000000000 --- a/Documentation/technical/api-path-list.txt +++ /dev/null @@ -1,126 +0,0 @@ -path-list API -============= - -The path_list API offers a data structure and functions to handle sorted -and unsorted string lists. - -The name is a bit misleading, a path_list may store not only paths but -strings in general. - -The caller: - -. Allocates and clears a `struct path_list` variable. - -. Initializes the members. You might want to set the flag `strdup_paths` - if the strings should be strdup()ed. For example, this is necessary - when you add something like git_path("..."), since that function returns - a static buffer that will change with the next call to git_path(). -+ -If you need something advanced, you can manually malloc() the `items` -member (you need this if you add things later) and you should set the -`nr` and `alloc` members in that case, too. - -. Adds new items to the list, using `path_list_append` or `path_list_insert`. - -. Can check if a string is in the list using `path_list_has_path` or - `unsorted_path_list_has_path` and get it from the list using - `path_list_lookup` for sorted lists. - -. Can sort an unsorted list using `sort_path_list`. - -. Finally it should free the list using `path_list_clear`. - -Example: - ----- -struct path_list list; -int i; - -memset(&list, 0, sizeof(struct path_list)); -path_list_append("foo", &list); -path_list_append("bar", &list); -for (i = 0; i < list.nr; i++) - printf("%s\n", list.items[i].path) ----- - -NOTE: It is more efficient to build an unsorted list and sort it -afterwards, instead of building a sorted list (`O(n log n)` instead of -`O(n^2)`). -+ -However, if you use the list to check if a certain string was added -already, you should not do that (using unsorted_path_list_has_path()), -because the complexity would be quadratic again (but with a worse factor). - -Functions ---------- - -* General ones (works with sorted and unsorted lists as well) - -`print_path_list`:: - - Dump a path_list to stdout, useful mainly for debugging purposes. It - can take an optional header argument and it writes out the - string-pointer pairs of the path_list, each one in its own line. - -`path_list_clear`:: - - Free a path_list. The `path` pointer of the items will be freed in case - the `strdup_paths` member of the path_list is set. The second parameter - controls if the `util` pointer of the items should be freed or not. - -* Functions for sorted lists only - -`path_list_has_path`:: - - Determine if the path_list has a given string or not. - -`path_list_insert`:: - - Insert a new element to the path_list. The returned pointer can be handy - if you want to write something to the `util` pointer of the - path_list_item containing the just added string. -+ -Since this function uses xrealloc() (which die()s if it fails) if the -list needs to grow, it is safe not to check the pointer. I.e. you may -write `path_list_insert(...)->util = ...;`. - -`path_list_lookup`:: - - Look up a given string in the path_list, returning the containing - path_list_item. If the string is not found, NULL is returned. - -* Functions for unsorted lists only - -`path_list_append`:: - - Append a new string to the end of the path_list. - -`sort_path_list`:: - - Make an unsorted list sorted. - -`unsorted_path_list_has_path`:: - - It's like `path_list_has_path()` but for unsorted lists. -+ -This function needs to look through all items, as opposed to its -counterpart for sorted lists, which performs a binary search. - -Data structures ---------------- - -* `struct path_list_item` - -Represents an item of the list. The `path` member is a pointer to the -string, and you may use the `util` member for any purpose, if you want. - -* `struct path_list` - -Represents the list itself. - -. The array of items are available via the `items` member. -. The `nr` member contains the number of items stored in the list. -. The `alloc` member is used to avoid reallocating at every insertion. - You should not tamper with it. -. Setting the `strdup_paths` member to 1 will strdup() the strings - before adding them, see above. diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt new file mode 100644 index 0000000000..dbbea95db7 --- /dev/null +++ b/Documentation/technical/api-ref-iteration.txt @@ -0,0 +1,81 @@ +ref iteration API +================= + + +Iteration of refs is done by using an iterate function which will call a +callback function for every ref. The callback function has this +signature: + + int handle_one_ref(const char *refname, const unsigned char *sha1, + int flags, void *cb_data); + +There are different kinds of iterate functions which all take a +callback of this type. The callback is then called for each found ref +until the callback returns nonzero. The returned value is then also +returned by the iterate function. + +Iteration functions +------------------- + +* `head_ref()` just iterates the head ref. + +* `for_each_ref()` iterates all refs. + +* `for_each_ref_in()` iterates all refs which have a defined prefix and + strips that prefix from the passed variable refname. + +* `for_each_tag_ref()`, `for_each_branch_ref()`, `for_each_remote_ref()`, + `for_each_replace_ref()` iterate refs from the respective area. + +* `for_each_glob_ref()` iterates all refs that match the specified glob + pattern. + +* `for_each_glob_ref_in()` the previous and `for_each_ref_in()` combined. + +* `head_ref_submodule()`, `for_each_ref_submodule()`, + `for_each_ref_in_submodule()`, `for_each_tag_ref_submodule()`, + `for_each_branch_ref_submodule()`, `for_each_remote_ref_submodule()` + do the same as the functions descibed above but for a specified + submodule. + +* `for_each_rawref()` can be used to learn about broken ref and symref. + +* `for_each_reflog()` iterates each reflog file. + +Submodules +---------- + +If you want to iterate the refs of a submodule you first need to add the +submodules object database. You can do this by a code-snippet like +this: + + const char *path = "path/to/submodule" + if (!add_submodule_odb(path)) + die("Error submodule '%s' not populated.", path); + +`add_submodule_odb()` will return an non-zero value on success. If you +do not do this you will get an error for each ref that it does not point +to a valid object. + +Note: As a side-effect of this you can not safely assume that all +objects you lookup are available in superproject. All submodule objects +will be available the same way as the superprojects objects. + +Example: +-------- + +---- +static int handle_remote_ref(const char *refname, + const unsigned char *sha1, int flags, void *cb_data) +{ + struct strbuf *output = cb_data; + strbuf_addf(output, "%s\n", refname); + return 0; +} + +... + + struct strbuf output = STRBUF_INIT; + for_each_remote_ref(handle_remote_ref, &output); + printf("%s", output.buf); +---- diff --git a/Documentation/technical/api-remote.txt b/Documentation/technical/api-remote.txt index 073b22bd83..c54b17db69 100644 --- a/Documentation/technical/api-remote.txt +++ b/Documentation/technical/api-remote.txt @@ -18,6 +18,10 @@ struct remote An array of all of the url_nr URLs configured for the remote +`pushurl`:: + + An array of all of the pushurl_nr push URLs configured for the remote + `push`:: An array of refspecs configured for pushing, with diff --git a/Documentation/technical/api-run-command.txt b/Documentation/technical/api-run-command.txt index 3e1342acf4..f18b4f4817 100644 --- a/Documentation/technical/api-run-command.txt +++ b/Documentation/technical/api-run-command.txt @@ -30,28 +30,63 @@ Functions start_command() followed by finish_command(). Takes a pointer to a `struct child_process` that specifies the details. -`run_command_v_opt`, `run_command_v_opt_dir`, `run_command_v_opt_cd_env`:: +`run_command_v_opt`, `run_command_v_opt_cd_env`:: Convenience functions that encapsulate a sequence of start_command() followed by finish_command(). The argument argv specifies the program and its arguments. The argument opt is zero - or more of the flags `RUN_COMMAND_NO_STDIN`, `RUN_GIT_CMD`, or - `RUN_COMMAND_STDOUT_TO_STDERR` that correspond to the members - .no_stdin, .git_cmd, .stdout_to_stderr of `struct child_process`. + or more of the flags `RUN_COMMAND_NO_STDIN`, `RUN_GIT_CMD`, + `RUN_COMMAND_STDOUT_TO_STDERR`, or `RUN_SILENT_EXEC_FAILURE` + that correspond to the members .no_stdin, .git_cmd, + .stdout_to_stderr, .silent_exec_failure of `struct child_process`. The argument dir corresponds the member .dir. The argument env corresponds to the member .env. +The functions above do the following: + +. If a system call failed, errno is set and -1 is returned. A diagnostic + is printed. + +. If the program was not found, then -1 is returned and errno is set to + ENOENT; a diagnostic is printed only if .silent_exec_failure is 0. + +. Otherwise, the program is run. If it terminates regularly, its exit + code is returned. No diagnostic is printed, even if the exit code is + non-zero. + +. If the program terminated due to a signal, then the return value is the + signal number - 128, ie. it is negative and so indicates an unusual + condition; a diagnostic is printed. This return value can be passed to + exit(2), which will report the same code to the parent process that a + POSIX shell's $? would report for a program that died from the signal. + + `start_async`:: Run a function asynchronously. Takes a pointer to a `struct - async` that specifies the details and returns a pipe FD - from which the caller reads. See below for details. + async` that specifies the details and returns a set of pipe FDs + for communication with the function. See below for details. `finish_async`:: Wait for the completion of an asynchronous function that was started with start_async(). +`run_hook`:: + + Run a hook. + The first argument is a pathname to an index file, or NULL + if the hook uses the default index file or no index is needed. + The second argument is the name of the hook. + The further arguments correspond to the hook arguments. + The last argument has to be NULL to terminate the arguments list. + If the hook does not exist or is not executable, the return + value will be zero. + If it is executable, the hook will be executed and the exit + status of the hook is returned. + On execution, .stdout_to_stderr and .no_stdin will be set. + (See below.) + Data structures --------------- @@ -100,7 +135,7 @@ stderr as follows: .in: The FD must be readable; it becomes child's stdin. .out: The FD must be writable; it becomes child's stdout. - .err > 0 is not supported. + .err: The FD must be writable; it becomes child's stderr. The specified FD is closed by start_command(), even if it fails to run the sub-process! @@ -128,6 +163,11 @@ string pointers (NULL terminated) in .env: To specify a new initial working directory for the sub-process, specify it in the .dir member. +If the program cannot be found, the functions return -1 and set +errno to ENOENT. Normally, an error message is printed, but if +.silent_exec_failure is set to 1, no message is printed for this +special error condition. + * `struct async` @@ -140,17 +180,47 @@ The caller: struct async variable; 2. initializes .proc and .data; 3. calls start_async(); -4. processes the data by reading from the fd in .out; -5. closes .out; +4. processes communicates with proc through .in and .out; +5. closes .in and .out; 6. calls finish_async(). +The members .in, .out are used to provide a set of fd's for +communication between the caller and the callee as follows: + +. Specify 0 to have no file descriptor passed. The callee will + receive -1 in the corresponding argument. + +. Specify < 0 to have a pipe allocated; start_async() replaces + with the pipe FD in the following way: + + .in: Returns the writable pipe end into which the caller + writes; the readable end of the pipe becomes the function's + in argument. + + .out: Returns the readable pipe end from which the caller + reads; the writable end of the pipe becomes the function's + out argument. + + The caller of start_async() must close the returned FDs after it + has completed reading from/writing from them. + +. Specify a file descriptor > 0 to be used by the function: + + .in: The FD must be readable; it becomes the function's in. + .out: The FD must be writable; it becomes the function's out. + + The specified FD is closed by start_async(), even if it fails to + run the function. + The function pointer in .proc has the following signature: - int proc(int fd, void *data); + int proc(int in, int out, void *data); -. fd specifies a writable file descriptor to which the function must - write the data that it produces. The function *must* close this - descriptor before it returns. +. in, out specifies a set of file descriptors to which the function + must read/write the data that it needs/produces. The function + *must* close these descriptors before it returns. A descriptor + may be -1 if the caller did not configure a descriptor for that + direction. . data is the value that the caller has specified in the .data member of struct async. @@ -161,12 +231,13 @@ The function pointer in .proc has the following signature: There are serious restrictions on what the asynchronous function can do -because this facility is implemented by a pipe to a forked process on -UNIX, but by a thread in the same address space on Windows: +because this facility is implemented by a thread in the same address +space on most platforms (when pthreads is available), but by a pipe to +a forked process otherwise: . It cannot change the program's state (global variables, environment, - etc.) in a way that the caller notices; in other words, .out is the - only communication channel to the caller. + etc.) in a way that the caller notices; in other words, .in and .out + are the only communication channels to the caller. . It must not change the program's state that the caller of the facility also uses. diff --git a/Documentation/technical/api-sha1-array.txt b/Documentation/technical/api-sha1-array.txt new file mode 100644 index 0000000000..4a4bae8109 --- /dev/null +++ b/Documentation/technical/api-sha1-array.txt @@ -0,0 +1,79 @@ +sha1-array API +============== + +The sha1-array API provides storage and manipulation of sets of SHA1 +identifiers. The emphasis is on storage and processing efficiency, +making them suitable for large lists. Note that the ordering of items is +not preserved over some operations. + +Data Structures +--------------- + +`struct sha1_array`:: + + A single array of SHA1 hashes. This should be initialized by + assignment from `SHA1_ARRAY_INIT`. The `sha1` member contains + the actual data. The `nr` member contains the number of items in + the set. The `alloc` and `sorted` members are used internally, + and should not be needed by API callers. + +Functions +--------- + +`sha1_array_append`:: + Add an item to the set. The sha1 will be placed at the end of + the array (but note that some operations below may lose this + ordering). + +`sha1_array_sort`:: + Sort the elements in the array. + +`sha1_array_lookup`:: + Perform a binary search of the array for a specific sha1. + If found, returns the offset (in number of elements) of the + sha1. If not found, returns a negative integer. If the array is + not sorted, this function has the side effect of sorting it. + +`sha1_array_clear`:: + Free all memory associated with the array and return it to the + initial, empty state. + +`sha1_array_for_each_unique`:: + Efficiently iterate over each unique element of the list, + executing the callback function for each one. If the array is + not sorted, this function has the side effect of sorting it. + +Examples +-------- + +----------------------------------------- +void print_callback(const unsigned char sha1[20], + void *data) +{ + printf("%s\n", sha1_to_hex(sha1)); +} + +void some_func(void) +{ + struct sha1_array hashes = SHA1_ARRAY_INIT; + unsigned char sha1[20]; + + /* Read objects into our set */ + while (read_object_from_stdin(sha1)) + sha1_array_append(&hashes, sha1); + + /* Check if some objects are in our set */ + while (read_object_from_stdin(sha1)) { + if (sha1_array_lookup(&hashes, sha1) >= 0) + printf("it's in there!\n"); + + /* + * Print the unique set of objects. We could also have + * avoided adding duplicate objects in the first place, + * but we would end up re-sorting the array repeatedly. + * Instead, this will sort once and then skip duplicates + * in linear time. + */ + sha1_array_for_each_unique(&hashes, print_callback, NULL); +} +----------------------------------------- diff --git a/Documentation/technical/api-sigchain.txt b/Documentation/technical/api-sigchain.txt new file mode 100644 index 0000000000..9e1189ef01 --- /dev/null +++ b/Documentation/technical/api-sigchain.txt @@ -0,0 +1,41 @@ +sigchain API +============ + +Code often wants to set a signal handler to clean up temporary files or +other work-in-progress when we die unexpectedly. For multiple pieces of +code to do this without conflicting, each piece of code must remember +the old value of the handler and restore it either when: + + 1. The work-in-progress is finished, and the handler is no longer + necessary. The handler should revert to the original behavior + (either another handler, SIG_DFL, or SIG_IGN). + + 2. The signal is received. We should then do our cleanup, then chain + to the next handler (or die if it is SIG_DFL). + +Sigchain is a tiny library for keeping a stack of handlers. Your handler +and installation code should look something like: + +------------------------------------------ + void clean_foo_on_signal(int sig) + { + clean_foo(); + sigchain_pop(sig); + raise(sig); + } + + void other_func() + { + sigchain_push_common(clean_foo_on_signal); + mess_up_foo(); + clean_foo(); + } +------------------------------------------ + +Handlers are given the typedef of sigchain_fun. This is the same type +that is given to signal() or sigaction(). It is perfectly reasonable to +push SIG_DFL or SIG_IGN onto the stack. + +You can sigchain_push and sigchain_pop individual signals. For +convenience, sigchain_push_common will push the handler onto the stack +for many common signals. diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt index a9668e5f2d..95a8bf3846 100644 --- a/Documentation/technical/api-strbuf.txt +++ b/Documentation/technical/api-strbuf.txt @@ -12,7 +12,7 @@ strbuf API actually relies on the string being free of NULs. strbufs has some invariants that are very important to keep in mind: -. The `buf` member is never NULL, so you it can be used in any usual C +. The `buf` member is never NULL, so it can be used in any usual C string operations safely. strbuf's _have_ to be initialized either by `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though. + @@ -21,7 +21,7 @@ allocated memory or not), use `strbuf_detach()` to unwrap a memory buffer from its strbuf shell in a safe way. That is the sole supported way. This will give you a malloced buffer that you can later `free()`. + -However, it it totally safe to modify anything in the string pointed by +However, it is totally safe to modify anything in the string pointed by the `buf` member, between the indices `0` and `len-1` (inclusive). . The `buf` member is a byte array that has at least `len + 1` bytes @@ -55,7 +55,7 @@ Data structures * `struct strbuf` -This is string buffer structure. The `len` member can be used to +This is the string buffer structure. The `len` member can be used to determine the current length of the string, and `buf` member provides access to the string itself. @@ -133,8 +133,10 @@ Functions * Adding data to the buffer -NOTE: All of these functions in this section will grow the buffer as - necessary. +NOTE: All of the functions in this section will grow the buffer as necessary. +If they fail for some reason other than memory shortage and the buffer hadn't +been allocated before (i.e. the `struct strbuf` was set to `STRBUF_INIT`), +then they will free() it. `strbuf_addch`:: @@ -197,6 +199,10 @@ character if the letter `n` appears after a `%`. The function returns the length of the placeholder recognized and `strbuf_expand()` skips over it. + +The format `%%` is automatically expanded to a single `%` as a quoting +mechanism; callers do not need to handle the `%` placeholder themselves, +and the callback function will not be invoked for this placeholder. ++ All other characters (non-percent and not skipped ones) are copied verbatim to the strbuf. If the callback returned zero, meaning that the placeholder is unknown, then the percent sign is copied, too. @@ -205,6 +211,20 @@ In order to facilitate caching and to make it possible to give parameters to the callback, `strbuf_expand()` passes a context pointer, which can be used by the programmer of the callback as she sees fit. +`strbuf_expand_dict_cb`:: + + Used as callback for `strbuf_expand()`, expects an array of + struct strbuf_expand_dict_entry as context, i.e. pairs of + placeholder and replacement string. The array needs to be + terminated by an entry with placeholder set to NULL. + +`strbuf_addbuf_percentquote`:: + + Append the contents of one strbuf to another, quoting any + percent signs ("%") into double-percents ("%%") in the + destination. This is useful for literal data to be fed to either + strbuf_expand or to the *printf family of functions. + `strbuf_addf`:: Add a formatted string to the buffer. @@ -213,7 +233,7 @@ which can be used by the programmer of the callback as she sees fit. Read a given size of data from a FILE* pointer to the buffer. + -NOTE: The buffer is rewinded if the read fails. If -1 is returned, +NOTE: The buffer is rewound if the read fails. If -1 is returned, `errno` must be consulted, like you would do for `read(3)`. `strbuf_read()`, `strbuf_read_file()` and `strbuf_getline()` has the same behaviour as well. @@ -228,10 +248,31 @@ same behaviour as well. Read the contents of a file, specified by its path. The third argument can be used to give a hint about the file size, to avoid reallocs. +`strbuf_readlink`:: + + Read the target of a symbolic link, specified by its path. The third + argument can be used to give a hint about the size, to avoid reallocs. + `strbuf_getline`:: - Read a line from a FILE* pointer. The second argument specifies the line + Read a line from a FILE *, overwriting the existing contents + of the strbuf. The second argument specifies the line terminator character, typically `'\n'`. + Reading stops after the terminator or at EOF. The terminator + is removed from the buffer before returning. Returns 0 unless + there was nothing left before EOF, in which case it returns `EOF`. + +`strbuf_getwholeline`:: + + Like `strbuf_getline`, but keeps the trailing terminator (if + any) in the buffer. + +`strbuf_getwholeline_fd`:: + + Like `strbuf_getwholeline`, but operates on a file descriptor. + It reads one character at a time, so it is very slow. Do not + use it unless you need the correct position in the file + descriptor. `stripspace`:: @@ -239,3 +280,9 @@ same behaviour as well. comments are considered contents to be removed or not. `launch_editor`:: + + Launch the user preferred editor to edit a file and fill the buffer + with the file's contents upon the user completing their editing. The + third argument can be used to set the environment which the editor is + run in. If the buffer is NULL the editor is launched as usual but the + file's contents are not read into the buffer upon completion. diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt new file mode 100644 index 0000000000..5a0c14fceb --- /dev/null +++ b/Documentation/technical/api-string-list.txt @@ -0,0 +1,144 @@ +string-list API +=============== + +The string_list API offers a data structure and functions to handle sorted +and unsorted string lists. + +The 'string_list' struct used to be called 'path_list', but was renamed +because it is not specific to paths. + +The caller: + +. Allocates and clears a `struct string_list` variable. + +. Initializes the members. You might want to set the flag `strdup_strings` + if the strings should be strdup()ed. For example, this is necessary + when you add something like git_path("..."), since that function returns + a static buffer that will change with the next call to git_path(). ++ +If you need something advanced, you can manually malloc() the `items` +member (you need this if you add things later) and you should set the +`nr` and `alloc` members in that case, too. + +. Adds new items to the list, using `string_list_append` or + `string_list_insert`. + +. Can check if a string is in the list using `string_list_has_string` or + `unsorted_string_list_has_string` and get it from the list using + `string_list_lookup` for sorted lists. + +. Can sort an unsorted list using `sort_string_list`. + +. Can remove individual items of an unsorted list using + `unsorted_string_list_delete_item`. + +. Finally it should free the list using `string_list_clear`. + +Example: + +---- +struct string_list list; +int i; + +memset(&list, 0, sizeof(struct string_list)); +string_list_append(&list, "foo"); +string_list_append(&list, "bar"); +for (i = 0; i < list.nr; i++) + printf("%s\n", list.items[i].string) +---- + +NOTE: It is more efficient to build an unsorted list and sort it +afterwards, instead of building a sorted list (`O(n log n)` instead of +`O(n^2)`). ++ +However, if you use the list to check if a certain string was added +already, you should not do that (using unsorted_string_list_has_string()), +because the complexity would be quadratic again (but with a worse factor). + +Functions +--------- + +* General ones (works with sorted and unsorted lists as well) + +`print_string_list`:: + + Dump a string_list to stdout, useful mainly for debugging purposes. It + can take an optional header argument and it writes out the + string-pointer pairs of the string_list, each one in its own line. + +`string_list_clear`:: + + Free a string_list. The `string` pointer of the items will be freed in + case the `strdup_strings` member of the string_list is set. The second + parameter controls if the `util` pointer of the items should be freed + or not. + +* Functions for sorted lists only + +`string_list_has_string`:: + + Determine if the string_list has a given string or not. + +`string_list_insert`:: + + Insert a new element to the string_list. The returned pointer can be + handy if you want to write something to the `util` pointer of the + string_list_item containing the just added string. If the given + string already exists the insertion will be skipped and the + pointer to the existing item returned. ++ +Since this function uses xrealloc() (which die()s if it fails) if the +list needs to grow, it is safe not to check the pointer. I.e. you may +write `string_list_insert(...)->util = ...;`. + +`string_list_lookup`:: + + Look up a given string in the string_list, returning the containing + string_list_item. If the string is not found, NULL is returned. + +* Functions for unsorted lists only + +`string_list_append`:: + + Append a new string to the end of the string_list. + +`sort_string_list`:: + + Make an unsorted list sorted. + +`unsorted_string_list_has_string`:: + + It's like `string_list_has_string()` but for unsorted lists. + +`unsorted_string_list_lookup`:: + + It's like `string_list_lookup()` but for unsorted lists. ++ +The above two functions need to look through all items, as opposed to their +counterpart for sorted lists, which performs a binary search. + +`unsorted_string_list_delete_item`:: + + Remove an item from a string_list. The `string` pointer of the items + will be freed in case the `strdup_strings` member of the string_list + is set. The third parameter controls if the `util` pointer of the + items should be freed or not. + +Data structures +--------------- + +* `struct string_list_item` + +Represents an item of the list. The `string` member is a pointer to the +string, and you may use the `util` member for any purpose, if you want. + +* `struct string_list` + +Represents the list itself. + +. The array of items are available via the `items` member. +. The `nr` member contains the number of items stored in the list. +. The `alloc` member is used to avoid reallocating at every insertion. + You should not tamper with it. +. Setting the `strdup_strings` member to 1 will strdup() the strings + before adding them, see above. diff --git a/Documentation/technical/api-tree-walking.txt b/Documentation/technical/api-tree-walking.txt index e3ddf91284..14af37c3f1 100644 --- a/Documentation/technical/api-tree-walking.txt +++ b/Documentation/technical/api-tree-walking.txt @@ -1,12 +1,147 @@ tree walking API ================ -Talk about <tree-walk.h>, things like +The tree walking API is used to traverse and inspect trees. -* struct tree_desc -* init_tree_desc -* tree_entry_extract -* update_tree_entry -* get_tree_entry +Data Structures +--------------- -(JC, Linus) +`struct name_entry`:: + + An entry in a tree. Each entry has a sha1 identifier, pathname, and + mode. + +`struct tree_desc`:: + + A semi-opaque data structure used to maintain the current state of the + walk. ++ +* `buffer` is a pointer into the memory representation of the tree. It always +points at the current entry being visited. + +* `size` counts the number of bytes left in the `buffer`. + +* `entry` points to the current entry being visited. + +`struct traverse_info`:: + + A structure used to maintain the state of a traversal. ++ +* `prev` points to the traverse_info which was used to descend into the +current tree. If this is the top-level tree `prev` will point to +a dummy traverse_info. + +* `name` is the entry for the current tree (if the tree is a subtree). + +* `pathlen` is the length of the full path for the current tree. + +* `conflicts` can be used by callbacks to maintain directory-file conflicts. + +* `fn` is a callback called for each entry in the tree. See Traversing for more +information. + +* `data` can be anything the `fn` callback would want to use. + +* `show_all_errors` tells whether to stop at the first error or not. + +Initializing +------------ + +`init_tree_desc`:: + + Initialize a `tree_desc` and decode its first entry. The buffer and + size parameters are assumed to be the same as the buffer and size + members of `struct tree`. + +`fill_tree_descriptor`:: + + Initialize a `tree_desc` and decode its first entry given the sha1 of + a tree. Returns the `buffer` member if the sha1 is a valid tree + identifier and NULL otherwise. + +`setup_traverse_info`:: + + Initialize a `traverse_info` given the pathname of the tree to start + traversing from. The `base` argument is assumed to be the `path` + member of the `name_entry` being recursed into unless the tree is a + top-level tree in which case the empty string ("") is used. + +Walking +------- + +`tree_entry`:: + + Visit the next entry in a tree. Returns 1 when there are more entries + left to visit and 0 when all entries have been visited. This is + commonly used in the test of a while loop. + +`tree_entry_len`:: + + Calculate the length of a tree entry's pathname. This utilizes the + memory structure of a tree entry to avoid the overhead of using a + generic strlen(). + +`update_tree_entry`:: + + Walk to the next entry in a tree. This is commonly used in conjunction + with `tree_entry_extract` to inspect the current entry. + +`tree_entry_extract`:: + + Decode the entry currently being visited (the one pointed to by + `tree_desc's` `entry` member) and return the sha1 of the entry. The + `pathp` and `modep` arguments are set to the entry's pathname and mode + respectively. + +`get_tree_entry`:: + + Find an entry in a tree given a pathname and the sha1 of a tree to + search. Returns 0 if the entry is found and -1 otherwise. The third + and fourth parameters are set to the entry's sha1 and mode + respectively. + +Traversing +---------- + +`traverse_trees`:: + + Traverse `n` number of trees in parallel. The `fn` callback member of + `traverse_info` is called once for each tree entry. + +`traverse_callback_t`:: + The arguments passed to the traverse callback are as follows: ++ +* `n` counts the number of trees being traversed. + +* `mask` has its nth bit set if something exists in the nth entry. + +* `dirmask` has its nth bit set if the nth tree's entry is a directory. + +* `entry` is an array of size `n` where the nth entry is from the nth tree. + +* `info` maintains the state of the traversal. + ++ +Returning a negative value will terminate the traversal. Otherwise the +return value is treated as an update mask. If the nth bit is set the nth tree +will be updated and if the bit is not set the nth tree entry will be the +same in the next callback invocation. + +`make_traverse_path`:: + + Generate the full pathname of a tree entry based from the root of the + traversal. For example, if the traversal has recursed into another + tree named "bar" the pathname of an entry "baz" in the "bar" + tree would be "bar/baz". + +`traverse_path_len`:: + + Calculate the length of a pathname returned by `make_traverse_path`. + This utilizes the memory structure of a tree entry to avoid the + overhead of using a generic strlen(). + +Authors +------- + +Written by Junio C Hamano <gitster@pobox.com> and Linus Torvalds +<torvalds@linux-foundation.org> diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt new file mode 100644 index 0000000000..8930b3fabc --- /dev/null +++ b/Documentation/technical/index-format.txt @@ -0,0 +1,186 @@ +GIT index format +================ + += The git index file has the following format + + All binary numbers are in network byte order. Version 2 is described + here unless stated otherwise. + + - A 12-byte header consisting of + + 4-byte signature: + The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") + + 4-byte version number: + The current supported versions are 2 and 3. + + 32-bit number of index entries. + + - A number of sorted index entries (see below). + + - Extensions + + Extensions are identified by signature. Optional extensions can + be ignored if GIT does not understand them. + + GIT currently supports cached tree and resolve undo extensions. + + 4-byte extension signature. If the first byte is 'A'..'Z' the + extension is optional and can be ignored. + + 32-bit size of the extension + + Extension data + + - 160-bit SHA-1 over the content of the index file before this + checksum. + +== Index entry + + Index entries are sorted in ascending order on the name field, + interpreted as a string of unsigned bytes (i.e. memcmp() order, no + localization, no special casing of directory separator '/'). Entries + with the same name are sorted by their stage field. + + 32-bit ctime seconds, the last time a file's metadata changed + this is stat(2) data + + 32-bit ctime nanosecond fractions + this is stat(2) data + + 32-bit mtime seconds, the last time a file's data changed + this is stat(2) data + + 32-bit mtime nanosecond fractions + this is stat(2) data + + 32-bit dev + this is stat(2) data + + 32-bit ino + this is stat(2) data + + 32-bit mode, split into (high to low bits) + + 4-bit object type + valid values in binary are 1000 (regular file), 1010 (symbolic link) + and 1110 (gitlink) + + 3-bit unused + + 9-bit unix permission. Only 0755 and 0644 are valid for regular files. + Symbolic links and gitlinks have value 0 in this field. + + 32-bit uid + this is stat(2) data + + 32-bit gid + this is stat(2) data + + 32-bit file size + This is the on-disk size from stat(2), truncated to 32-bit. + + 160-bit SHA-1 for the represented object + + A 16-bit 'flags' field split into (high to low bits) + + 1-bit assume-valid flag + + 1-bit extended flag (must be zero in version 2) + + 2-bit stage (during merge) + + 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF + is stored in this field. + + (Version 3) A 16-bit field, only applicable if the "extended flag" + above is 1, split into (high to low bits). + + 1-bit reserved for future + + 1-bit skip-worktree flag (used by sparse checkout) + + 1-bit intent-to-add flag (used by "git add -N") + + 13-bit unused, must be zero + + Entry path name (variable length) relative to top level directory + (without leading slash). '/' is used as path separator. The special + path components ".", ".." and ".git" (without quotes) are disallowed. + Trailing slash is also disallowed. + + The exact encoding is undefined, but the '.' and '/' characters + are encoded in 7-bit ASCII and the encoding cannot contain a NUL + byte (iow, this is a UNIX pathname). + + 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes + while keeping the name NUL-terminated. + +== Extensions + +=== Cached tree + + Cached tree extension contains pre-computed hashes for trees that can + be derived from the index. It helps speed up tree object generation + from index for a new commit. + + When a path is updated in index, the path must be invalidated and + removed from tree cache. + + The signature for this extension is { 'T', 'R', 'E', 'E' }. + + A series of entries fill the entire extension; each of which + consists of: + + - NUL-terminated path component (relative to its parent directory); + + - ASCII decimal number of entries in the index that is covered by the + tree this entry represents (entry_count); + + - A space (ASCII 32); + + - ASCII decimal number that represents the number of subtrees this + tree has; + + - A newline (ASCII 10); and + + - 160-bit object name for the object that would result from writing + this span of index as a tree. + + An entry can be in an invalidated state and is represented by having + -1 in the entry_count field. In this case, there is no object name + and the next entry starts immediately after the newline. + + The entries are written out in the top-down, depth-first order. The + first entry represents the root level of the repository, followed by the + first subtree---let's call this A---of the root level (with its name + relative to the root level), followed by the first subtree of A (with + its name relative to A), ... + +=== Resolve undo + + A conflict is represented in the index as a set of higher stage entries. + When a conflict is resolved (e.g. with "git add path"), these higher + stage entries will be removed and a stage-0 entry with proper resoluton + is added. + + When these higher stage entries are removed, they are saved in the + resolve undo extension, so that conflicts can be recreated (e.g. with + "git checkout -m"), in case users want to redo a conflict resolution + from scratch. + + The signature for this extension is { 'R', 'E', 'U', 'C' }. + + A series of entries fill the entire extension; each of which + consists of: + + - NUL-terminated pathname the entry describes (relative to the root of + the repository, i.e. full pathname); + + - Three NUL-terminated ASCII octal numbers, entry mode of entries in + stage 1 to 3 (a missing stage is represented by "0" in this field); + and + + - At most three 160-bit object names of the entry in stages from 1 to 3 + (nothing is written for a missing stage). + diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index 9cd48b4859..49cdc571cd 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -1,41 +1,546 @@ -Pack transfer protocols -======================= - -There are two Pack push-pull protocols. - -upload-pack (S) | fetch/clone-pack (C) protocol: - - # Tell the puller what commits we have and what their names are - S: SHA1 name - S: ... - S: SHA1 name - S: # flush -- it's your turn - # Tell the pusher what commits we want, and what we have - C: want name - C: .. - C: want name - C: have SHA1 - C: have SHA1 - C: ... - C: # flush -- occasionally ask "had enough?" - S: NAK - C: have SHA1 - C: ... - C: have SHA1 - S: ACK - C: done - S: XXXXXXX -- packfile contents. - -send-pack | receive-pack protocol. - - # Tell the pusher what commits we have and what their names are - C: SHA1 name - C: ... - C: SHA1 name - C: # flush -- it's your turn - # Tell the puller what the pusher has - S: old-SHA1 new-SHA1 name - S: old-SHA1 new-SHA1 name - S: ... - S: # flush -- done with the list - S: XXXXXXX --- packfile contents. +Packfile transfer protocols +=========================== + +Git supports transferring data in packfiles over the ssh://, git:// and +file:// transports. There exist two sets of protocols, one for pushing +data from a client to a server and another for fetching data from a +server to a client. All three transports (ssh, git, file) use the same +protocol to transfer data. + +The processes invoked in the canonical Git implementation are 'upload-pack' +on the server side and 'fetch-pack' on the client side for fetching data; +then 'receive-pack' on the server and 'send-pack' on the client for pushing +data. The protocol functions to have a server tell a client what is +currently on the server, then for the two to negotiate the smallest amount +of data to send in order to fully update one or the other. + +Transports +---------- +There are three transports over which the packfile protocol is +initiated. The Git transport is a simple, unauthenticated server that +takes the command (almost always 'upload-pack', though Git +servers can be configured to be globally writable, in which 'receive- +pack' initiation is also allowed) with which the client wishes to +communicate and executes it and connects it to the requesting +process. + +In the SSH transport, the client just runs the 'upload-pack' +or 'receive-pack' process on the server over the SSH protocol and then +communicates with that invoked process over the SSH connection. + +The file:// transport runs the 'upload-pack' or 'receive-pack' +process locally and communicates with it over a pipe. + +Git Transport +------------- + +The Git transport starts off by sending the command and repository +on the wire using the pkt-line format, followed by a NUL byte and a +hostname parameter, terminated by a NUL byte. + + 0032git-upload-pack /project.git\0host=myserver.com\0 + +-- + git-proto-request = request-command SP pathname NUL [ host-parameter NUL ] + request-command = "git-upload-pack" / "git-receive-pack" / + "git-upload-archive" ; case sensitive + pathname = *( %x01-ff ) ; exclude NUL + host-parameter = "host=" hostname [ ":" port ] +-- + +Only host-parameter is allowed in the git-proto-request. Clients +MUST NOT attempt to send additional parameters. It is used for the +git-daemon name based virtual hosting. See --interpolated-path +option to git daemon, with the %H/%CH format characters. + +Basically what the Git client is doing to connect to an 'upload-pack' +process on the server side over the Git protocol is this: + + $ echo -e -n \ + "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" | + nc -v example.com 9418 + +If the server refuses the request for some reasons, it could abort +gracefully with an error message. + +---- + error-line = PKT-LINE("ERR" SP explanation-text) +---- + + +SSH Transport +------------- + +Initiating the upload-pack or receive-pack processes over SSH is +executing the binary on the server via SSH remote execution. +It is basically equivalent to running this: + + $ ssh git.example.com "git-upload-pack '/project.git'" + +For a server to support Git pushing and pulling for a given user over +SSH, that user needs to be able to execute one or both of those +commands via the SSH shell that they are provided on login. On some +systems, that shell access is limited to only being able to run those +two commands, or even just one of them. + +In an ssh:// format URI, it's absolute in the URI, so the '/' after +the host name (or port number) is sent as an argument, which is then +read by the remote git-upload-pack exactly as is, so it's effectively +an absolute path in the remote filesystem. + + git clone ssh://user@example.com/project.git + | + v + ssh user@example.com "git-upload-pack '/project.git'" + +In a "user@host:path" format URI, its relative to the user's home +directory, because the Git client will run: + + git clone user@example.com:project.git + | + v + ssh user@example.com "git-upload-pack 'project.git'" + +The exception is if a '~' is used, in which case +we execute it without the leading '/'. + + ssh://user@example.com/~alice/project.git, + | + v + ssh user@example.com "git-upload-pack '~alice/project.git'" + +A few things to remember here: + +- The "command name" is spelled with dash (e.g. git-upload-pack), but + this can be overridden by the client; + +- The repository path is always quoted with single quotes. + +Fetching Data From a Server +=========================== + +When one Git repository wants to get data that a second repository +has, the first can 'fetch' from the second. This operation determines +what data the server has that the client does not then streams that +data down to the client in packfile format. + + +Reference Discovery +------------------- + +When the client initially connects the server will immediately respond +with a listing of each reference it has (all branches and tags) along +with the object name that each reference currently points to. + + $ echo -e -n "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" | + nc -v example.com 9418 + 00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow no-progress include-tag + 00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration + 003f7217a7c7e582c46cec22a130adf4b9d7d950fba0 refs/heads/master + 003cb88d2441cac0977faf98efc80305012112238d9d refs/tags/v0.9 + 003c525128480b96c89e6418b1e40909bf6c5b2d580f refs/tags/v1.0 + 003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{} + 0000 + +Server SHOULD terminate each non-flush line using LF ("\n") terminator; +client MUST NOT complain if there is no terminator. + +The returned response is a pkt-line stream describing each ref and +its current value. The stream MUST be sorted by name according to +the C locale ordering. + +If HEAD is a valid ref, HEAD MUST appear as the first advertised +ref. If HEAD is not a valid ref, HEAD MUST NOT appear in the +advertisement list at all, but other refs may still appear. + +The stream MUST include capability declarations behind a NUL on the +first ref. The peeled value of a ref (that is "ref^{}") MUST be +immediately after the ref itself, if presented. A conforming server +MUST peel the ref if it's an annotated tag. + +---- + advertised-refs = (no-refs / list-of-refs) + flush-pkt + + no-refs = PKT-LINE(zero-id SP "capabilities^{}" + NUL capability-list LF) + + list-of-refs = first-ref *other-ref + first-ref = PKT-LINE(obj-id SP refname + NUL capability-list LF) + + other-ref = PKT-LINE(other-tip / other-peeled) + other-tip = obj-id SP refname LF + other-peeled = obj-id SP refname "^{}" LF + + capability-list = capability *(SP capability) + capability = 1*(LC_ALPHA / DIGIT / "-" / "_") + LC_ALPHA = %x61-7A +---- + +Server and client MUST use lowercase for obj-id, both MUST treat obj-id +as case-insensitive. + +See protocol-capabilities.txt for a list of allowed server capabilities +and descriptions. + +Packfile Negotiation +-------------------- +After reference and capabilities discovery, the client can decide to +terminate the connection by sending a flush-pkt, telling the server it can +now gracefully terminate, and disconnect, when it does not need any pack +data. This can happen with the ls-remote command, and also can happen when +the client already is up-to-date. + +Otherwise, it enters the negotiation phase, where the client and +server determine what the minimal packfile necessary for transport is, +by telling the server what objects it wants, its shallow objects +(if any), and the maximum commit depth it wants (if any). The client +will also send a list of the capabilities it wants to be in effect, +out of what the server said it could do with the first 'want' line. + +---- + upload-request = want-list + *shallow-line + *1depth-request + flush-pkt + + want-list = first-want + *additional-want + + shallow-line = PKT_LINE("shallow" SP obj-id) + + depth-request = PKT_LINE("deepen" SP depth) + + first-want = PKT-LINE("want" SP obj-id SP capability-list LF) + additional-want = PKT-LINE("want" SP obj-id LF) + + depth = 1*DIGIT +---- + +Clients MUST send all the obj-ids it wants from the reference +discovery phase as 'want' lines. Clients MUST send at least one +'want' command in the request body. Clients MUST NOT mention an +obj-id in a 'want' command which did not appear in the response +obtained through ref discovery. + +The client MUST write all obj-ids which it only has shallow copies +of (meaning that it does not have the parents of a commit) as +'shallow' lines so that the server is aware of the limitations of +the client's history. Clients MUST NOT mention an obj-id which +it does not know exists on the server. + +The client now sends the maximum commit history depth it wants for +this transaction, which is the number of commits it wants from the +tip of the history, if any, as a 'deepen' line. A depth of 0 is the +same as not making a depth request. The client does not want to receive +any commits beyond this depth, nor objects needed only to complete +those commits. Commits whose parents are not received as a result are +defined as shallow and marked as such in the server. This information +is sent back to the client in the next step. + +Once all the 'want's and 'shallow's (and optional 'deepen') are +transferred, clients MUST send a flush-pkt, to tell the server side +that it is done sending the list. + +Otherwise, if the client sent a positive depth request, the server +will determine which commits will and will not be shallow and +send this information to the client. If the client did not request +a positive depth, this step is skipped. + +---- + shallow-update = *shallow-line + *unshallow-line + flush-pkt + + shallow-line = PKT-LINE("shallow" SP obj-id) + + unshallow-line = PKT-LINE("unshallow" SP obj-id) +---- + +If the client has requested a positive depth, the server will compute +the set of commits which are no deeper than the desired depth, starting +at the client's wants. The server writes 'shallow' lines for each +commit whose parents will not be sent as a result. The server writes +an 'unshallow' line for each commit which the client has indicated is +shallow, but is no longer shallow at the currently requested depth +(that is, its parents will now be sent). The server MUST NOT mark +as unshallow anything which the client has not indicated was shallow. + +Now the client will send a list of the obj-ids it has using 'have' +lines, so the server can make a packfile that only contains the objects +that the client needs. In multi_ack mode, the canonical implementation +will send up to 32 of these at a time, then will send a flush-pkt. The +canonical implementation will skip ahead and send the next 32 immediately, +so that there is always a block of 32 "in-flight on the wire" at a time. + +---- + upload-haves = have-list + compute-end + + have-list = *have-line + have-line = PKT-LINE("have" SP obj-id LF) + compute-end = flush-pkt / PKT-LINE("done") +---- + +If the server reads 'have' lines, it then will respond by ACKing any +of the obj-ids the client said it had that the server also has. The +server will ACK obj-ids differently depending on which ack mode is +chosen by the client. + +In multi_ack mode: + + * the server will respond with 'ACK obj-id continue' for any common + commits. + + * once the server has found an acceptable common base commit and is + ready to make a packfile, it will blindly ACK all 'have' obj-ids + back to the client. + + * the server will then send a 'NACK' and then wait for another response + from the client - either a 'done' or another list of 'have' lines. + +In multi_ack_detailed mode: + + * the server will differentiate the ACKs where it is signaling + that it is ready to send data with 'ACK obj-id ready' lines, and + signals the identified common commits with 'ACK obj-id common' lines. + +Without either multi_ack or multi_ack_detailed: + + * upload-pack sends "ACK obj-id" on the first common object it finds. + After that it says nothing until the client gives it a "done". + + * upload-pack sends "NAK" on a flush-pkt if no common object + has been found yet. If one has been found, and thus an ACK + was already sent, it's silent on the flush-pkt. + +After the client has gotten enough ACK responses that it can determine +that the server has enough information to send an efficient packfile +(in the canonical implementation, this is determined when it has received +enough ACKs that it can color everything left in the --date-order queue +as common with the server, or the --date-order queue is empty), or the +client determines that it wants to give up (in the canonical implementation, +this is determined when the client sends 256 'have' lines without getting +any of them ACKed by the server - meaning there is nothing in common and +the server should just send all of its objects), then the client will send +a 'done' command. The 'done' command signals to the server that the client +is ready to receive its packfile data. + +However, the 256 limit *only* turns on in the canonical client +implementation if we have received at least one "ACK %s continue" +during a prior round. This helps to ensure that at least one common +ancestor is found before we give up entirely. + +Once the 'done' line is read from the client, the server will either +send a final 'ACK obj-id' or it will send a 'NAK'. The server only sends +ACK after 'done' if there is at least one common base and multi_ack or +multi_ack_detailed is enabled. The server always sends NAK after 'done' +if there is no common base found. + +Then the server will start sending its packfile data. + +---- + server-response = *ack_multi ack / nak + ack_multi = PKT-LINE("ACK" SP obj-id ack_status LF) + ack_status = "continue" / "common" / "ready" + ack = PKT-LINE("ACK SP obj-id LF) + nak = PKT-LINE("NAK" LF) +---- + +A simple clone may look like this (with no 'have' lines): + +---- + C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \ + side-band-64k ofs-delta\n + C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n + C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n + C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n + C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n + C: 0000 + C: 0009done\n + + S: 0008NAK\n + S: [PACKFILE] +---- + +An incremental update (fetch) response might look like this: + +---- + C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \ + side-band-64k ofs-delta\n + C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n + C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n + C: 0000 + C: 0032have 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n + C: [30 more have lines] + C: 0032have 74730d410fcb6603ace96f1dc55ea6196122532d\n + C: 0000 + + S: 003aACK 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 continue\n + S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d continue\n + S: 0008NAK\n + + C: 0009done\n + + S: 0031ACK 74730d410fcb6603ace96f1dc55ea6196122532d\n + S: [PACKFILE] +---- + + +Packfile Data +------------- + +Now that the client and server have finished negotiation about what +the minimal amount of data that needs to be sent to the client is, the server +will construct and send the required data in packfile format. + +See pack-format.txt for what the packfile itself actually looks like. + +If 'side-band' or 'side-band-64k' capabilities have been specified by +the client, the server will send the packfile data multiplexed. + +Each packet starting with the packet-line length of the amount of data +that follows, followed by a single byte specifying the sideband the +following data is coming in on. + +In 'side-band' mode, it will send up to 999 data bytes plus 1 control +code, for a total of up to 1000 bytes in a pkt-line. In 'side-band-64k' +mode it will send up to 65519 data bytes plus 1 control code, for a +total of up to 65520 bytes in a pkt-line. + +The sideband byte will be a '1', '2' or a '3'. Sideband '1' will contain +packfile data, sideband '2' will be used for progress information that the +client will generally print to stderr and sideband '3' is used for error +information. + +If no 'side-band' capability was specified, the server will stream the +entire packfile without multiplexing. + + +Pushing Data To a Server +======================== + +Pushing data to a server will invoke the 'receive-pack' process on the +server, which will allow the client to tell it which references it should +update and then send all the data the server will need for those new +references to be complete. Once all the data is received and validated, +the server will then update its references to what the client specified. + +Authentication +-------------- + +The protocol itself contains no authentication mechanisms. That is to be +handled by the transport, such as SSH, before the 'receive-pack' process is +invoked. If 'receive-pack' is configured over the Git transport, those +repositories will be writable by anyone who can access that port (9418) as +that transport is unauthenticated. + +Reference Discovery +------------------- + +The reference discovery phase is done nearly the same way as it is in the +fetching protocol. Each reference obj-id and name on the server is sent +in packet-line format to the client, followed by a flush-pkt. The only +real difference is that the capability listing is different - the only +possible values are 'report-status', 'delete-refs' and 'ofs-delta'. + +Reference Update Request and Packfile Transfer +---------------------------------------------- + +Once the client knows what references the server is at, it can send a +list of reference update requests. For each reference on the server +that it wants to update, it sends a line listing the obj-id currently on +the server, the obj-id the client would like to update it to and the name +of the reference. + +This list is followed by a flush-pkt and then the packfile that should +contain all the objects that the server will need to complete the new +references. + +---- + update-request = command-list [pack-file] + + command-list = PKT-LINE(command NUL capability-list LF) + *PKT-LINE(command LF) + flush-pkt + + command = create / delete / update + create = zero-id SP new-id SP name + delete = old-id SP zero-id SP name + update = old-id SP new-id SP name + + old-id = obj-id + new-id = obj-id + + pack-file = "PACK" 28*(OCTET) +---- + +If the receiving end does not support delete-refs, the sending end MUST +NOT ask for delete command. + +The pack-file MUST NOT be sent if the only command used is 'delete'. + +A pack-file MUST be sent if either create or update command is used, +even if the server already has all the necessary objects. In this +case the client MUST send an empty pack-file. The only time this +is likely to happen is if the client is creating +a new branch or a tag that points to an existing obj-id. + +The server will receive the packfile, unpack it, then validate each +reference that is being updated that it hasn't changed while the request +was being processed (the obj-id is still the same as the old-id), and +it will run any update hooks to make sure that the update is acceptable. +If all of that is fine, the server will then update the references. + +Report Status +------------- + +After receiving the pack data from the sender, the receiver sends a +report if 'report-status' capability is in effect. +It is a short listing of what happened in that update. It will first +list the status of the packfile unpacking as either 'unpack ok' or +'unpack [error]'. Then it will list the status for each of the references +that it tried to update. Each line is either 'ok [refname]' if the +update was successful, or 'ng [refname] [error]' if the update was not. + +---- + report-status = unpack-status + 1*(command-status) + flush-pkt + + unpack-status = PKT-LINE("unpack" SP unpack-result LF) + unpack-result = "ok" / error-msg + + command-status = command-ok / command-fail + command-ok = PKT-LINE("ok" SP refname LF) + command-fail = PKT-LINE("ng" SP refname SP error-msg LF) + + error-msg = 1*(OCTECT) ; where not "ok" +---- + +Updates can be unsuccessful for a number of reasons. The reference can have +changed since the reference discovery phase was originally sent, meaning +someone pushed in the meantime. The reference being pushed could be a +non-fast-forward reference and the update hooks or configuration could be +set to not allow that, etc. Also, some references can be updated while others +can be rejected. + +An example client/server communication might look like this: + +---- + S: 007c74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n + S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n + S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n + S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n + S: 0000 + + C: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n + C: 003e74730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n + C: 0000 + C: [PACKDATA] + + S: 000eunpack ok\n + S: 0018ok refs/heads/debug\n + S: 002ang refs/heads/master non-fast-forward\n +---- diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt new file mode 100644 index 0000000000..b15517fa06 --- /dev/null +++ b/Documentation/technical/protocol-capabilities.txt @@ -0,0 +1,187 @@ +Git Protocol Capabilities +========================= + +Servers SHOULD support all capabilities defined in this document. + +On the very first line of the initial server response of either +receive-pack and upload-pack the first reference is followed by +a NUL byte and then a list of space delimited server capabilities. +These allow the server to declare what it can and cannot support +to the client. + +Client will then send a space separated list of capabilities it wants +to be in effect. The client MUST NOT ask for capabilities the server +did not say it supports. + +Server MUST diagnose and abort if capabilities it does not understand +was sent. Server MUST NOT ignore capabilities that client requested +and server advertised. As a consequence of these rules, server MUST +NOT advertise capabilities it does not understand. + +The 'report-status' and 'delete-refs' capabilities are sent and +recognized by the receive-pack (push to server) process. + +The 'ofs-delta' capability is sent and recognized by both upload-pack +and receive-pack protocols. + +All other capabilities are only recognized by the upload-pack (fetch +from server) process. + +multi_ack +--------- + +The 'multi_ack' capability allows the server to return "ACK obj-id +continue" as soon as it finds a commit that it can use as a common +base, between the client's wants and the client's have set. + +By sending this early, the server can potentially head off the client +from walking any further down that particular branch of the client's +repository history. The client may still need to walk down other +branches, sending have lines for those, until the server has a +complete cut across the DAG, or the client has said "done". + +Without multi_ack, a client sends have lines in --date-order until +the server has found a common base. That means the client will send +have lines that are already known by the server to be common, because +they overlap in time with another branch that the server hasn't found +a common base on yet. + +For example suppose the client has commits in caps that the server +doesn't and the server has commits in lower case that the client +doesn't, as in the following diagram: + + +---- u ---------------------- x + / +----- y + / / + a -- b -- c -- d -- E -- F + \ + +--- Q -- R -- S + +If the client wants x,y and starts out by saying have F,S, the server +doesn't know what F,S is. Eventually the client says "have d" and +the server sends "ACK d continue" to let the client know to stop +walking down that line (so don't send c-b-a), but it's not done yet, +it needs a base for x. The client keeps going with S-R-Q, until a +gets reached, at which point the server has a clear base and it all +ends. + +Without multi_ack the client would have sent that c-b-a chain anyway, +interleaved with S-R-Q. + +thin-pack +--------- + +This capability means that the server can send a 'thin' pack, a pack +which does not contain base objects; if those base objects are available +on client side. Client requests 'thin-pack' capability when it +understands how to "thicken" it by adding required delta bases making +it self-contained. + +Client MUST NOT request 'thin-pack' capability if it cannot turn a thin +pack into a self-contained pack. + + +side-band, side-band-64k +------------------------ + +This capability means that server can send, and client understand multiplexed +progress reports and error info interleaved with the packfile itself. + +These two options are mutually exclusive. A modern client always +favors 'side-band-64k'. + +Either mode indicates that the packfile data will be streamed broken +up into packets of up to either 1000 bytes in the case of 'side_band', +or 65520 bytes in the case of 'side_band_64k'. Each packet is made up +of a leading 4-byte pkt-line length of how much data is in the packet, +followed by a 1-byte stream code, followed by the actual data. + +The stream code can be one of: + + 1 - pack data + 2 - progress messages + 3 - fatal error message just before stream aborts + +The "side-band-64k" capability came about as a way for newer clients +that can handle much larger packets to request packets that are +actually crammed nearly full, while maintaining backward compatibility +for the older clients. + +Further, with side-band and its up to 1000-byte messages, it's actually +999 bytes of payload and 1 byte for the stream code. With side-band-64k, +same deal, you have up to 65519 bytes of data and 1 byte for the stream +code. + +The client MUST send only maximum of one of "side-band" and "side- +band-64k". Server MUST diagnose it as an error if client requests +both. + +ofs-delta +--------- + +Server can send, and client understand PACKv2 with delta referring to +its base by position in pack rather than by an obj-id. That is, they can +send/read OBJ_OFS_DELTA (aka type 6) in a packfile. + +shallow +------- + +This capability adds "deepen", "shallow" and "unshallow" commands to +the fetch-pack/upload-pack protocol so clients can request shallow +clones. + +no-progress +----------- + +The client was started with "git clone -q" or something, and doesn't +want that side band 2. Basically the client just says "I do not +wish to receive stream 2 on sideband, so do not send it to me, and if +you did, I will drop it on the floor anyway". However, the sideband +channel 3 is still used for error responses. + +include-tag +----------- + +The 'include-tag' capability is about sending annotated tags if we are +sending objects they point to. If we pack an object to the client, and +a tag object points exactly at that object, we pack the tag object too. +In general this allows a client to get all new annotated tags when it +fetches a branch, in a single network connection. + +Clients MAY always send include-tag, hardcoding it into a request when +the server advertises this capability. The decision for a client to +request include-tag only has to do with the client's desires for tag +data, whether or not a server had advertised objects in the +refs/tags/* namespace. + +Servers MUST pack the tags if their referrant is packed and the client +has requested include-tags. + +Clients MUST be prepared for the case where a server has ignored +include-tag and has not actually sent tags in the pack. In such +cases the client SHOULD issue a subsequent fetch to acquire the tags +that include-tag would have otherwise given the client. + +The server SHOULD send include-tag, if it supports it, regardless +of whether or not there are tags available. + +report-status +------------- + +The upload-pack process can receive a 'report-status' capability, +which tells it that the client wants a report of what happened after +a packfile upload and reference update. If the pushing client requests +this capability, after unpacking and updating references the server +will respond with whether the packfile unpacked successfully and if +each reference was updated successfully. If any of those were not +successful, it will send back an error message. See pack-protocol.txt +for example messages. + +delete-refs +----------- + +If the server sends back the 'delete-refs' capability, it means that +it is capable of accepting a zero-id value as the target +value of a reference update. It is not sent back by the client, it +simply informs the client that it can be sent zero-id values +to delete references. diff --git a/Documentation/technical/protocol-common.txt b/Documentation/technical/protocol-common.txt new file mode 100644 index 0000000000..fb7ff084f8 --- /dev/null +++ b/Documentation/technical/protocol-common.txt @@ -0,0 +1,96 @@ +Documentation Common to Pack and Http Protocols +=============================================== + +ABNF Notation +------------- + +ABNF notation as described by RFC 5234 is used within the protocol documents, +except the following replacement core rules are used: +---- + HEXDIG = DIGIT / "a" / "b" / "c" / "d" / "e" / "f" +---- + +We also define the following common rules: +---- + NUL = %x00 + zero-id = 40*"0" + obj-id = 40*(HEXDIGIT) + + refname = "HEAD" + refname /= "refs/" <see discussion below> +---- + +A refname is a hierarchical octet string beginning with "refs/" and +not violating the 'git-check-ref-format' command's validation rules. +More specifically, they: + +. They can include slash `/` for hierarchical (directory) + grouping, but no slash-separated component can begin with a + dot `.`. + +. They must contain at least one `/`. This enforces the presence of a + category like `heads/`, `tags/` etc. but the actual names are not + restricted. + +. They cannot have two consecutive dots `..` anywhere. + +. They cannot have ASCII control characters (i.e. bytes whose + values are lower than \040, or \177 `DEL`), space, tilde `~`, + caret `^`, colon `:`, question-mark `?`, asterisk `*`, + or open bracket `[` anywhere. + +. They cannot end with a slash `/` nor a dot `.`. + +. They cannot end with the sequence `.lock`. + +. They cannot contain a sequence `@{`. + +. They cannot contain a `\\`. + + +pkt-line Format +--------------- + +Much (but not all) of the payload is described around pkt-lines. + +A pkt-line is a variable length binary string. The first four bytes +of the line, the pkt-len, indicates the total length of the line, +in hexadecimal. The pkt-len includes the 4 bytes used to contain +the length's hexadecimal representation. + +A pkt-line MAY contain binary data, so implementors MUST ensure +pkt-line parsing/formatting routines are 8-bit clean. + +A non-binary line SHOULD BE terminated by an LF, which if present +MUST be included in the total length. + +The maximum length of a pkt-line's data component is 65520 bytes. +Implementations MUST NOT send pkt-line whose length exceeds 65524 +(65520 bytes of payload + 4 bytes of length data). + +Implementations SHOULD NOT send an empty pkt-line ("0004"). + +A pkt-line with a length field of 0 ("0000"), called a flush-pkt, +is a special case and MUST be handled differently than an empty +pkt-line ("0004"). + +---- + pkt-line = data-pkt / flush-pkt + + data-pkt = pkt-len pkt-payload + pkt-len = 4*(HEXDIG) + pkt-payload = (pkt-len - 4)*(OCTET) + + flush-pkt = "0000" +---- + +Examples (as C-style strings): + +---- + pkt-line actual value + --------------------------------- + "0006a\n" "a\n" + "0005a" "a" + "000bfoobar\n" "foobar\n" + "0004" "" +---- diff --git a/Documentation/technical/racy-git.txt b/Documentation/technical/racy-git.txt index 6bdf034b3a..53aa0c82c2 100644 --- a/Documentation/technical/racy-git.txt +++ b/Documentation/technical/racy-git.txt @@ -42,10 +42,12 @@ compared, but this is not enabled by default because this member is not stable on network filesystems. With `USE_NSEC` compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec` members are also compared, but this is not enabled by default -because the value of this member becomes meaningless once the -inode is evicted from the inode cache on filesystems that do not -store it on disk. - +because in-core timestamps can have finer granularity than +on-disk timestamps, resulting in meaningless changes when an +inode is evicted from the inode cache. See commit 8ce13b0 +of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git +([PATCH] Sync in core time granuality with filesystems, +2005-01-04). Racy git -------- @@ -135,7 +137,7 @@ them, and give the same timestamp to the index file: This will make all index entries racily clean. The linux-2.6 project, for example, there are over 20,000 files in the working -tree. On my Athron 64X2 3800+, after the above: +tree. On my Athlon 64 X2 3800+, after the above: $ /usr/bin/time git diff-files 1.68user 0.54system 0:02.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k |