summaryrefslogtreecommitdiff
path: root/Documentation/technical
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/technical')
-rw-r--r--Documentation/technical/api-config.txt2
-rw-r--r--Documentation/technical/api-diff.txt4
-rw-r--r--Documentation/technical/api-directory-listing.txt6
-rw-r--r--Documentation/technical/api-oid-array.txt5
-rw-r--r--Documentation/technical/api-parse-options.txt8
-rw-r--r--Documentation/technical/api-ref-iteration.txt2
-rw-r--r--Documentation/technical/api-revision-walking.txt4
-rw-r--r--Documentation/technical/api-trace2.txt1378
-rw-r--r--Documentation/technical/api-tree-walking.txt8
-rw-r--r--Documentation/technical/commit-graph-format.txt15
-rw-r--r--Documentation/technical/commit-graph.txt210
-rw-r--r--Documentation/technical/directory-rename-detection.txt4
-rw-r--r--Documentation/technical/hash-function-transition.txt2
-rw-r--r--Documentation/technical/index-format.txt41
-rw-r--r--Documentation/technical/multi-pack-index.txt109
-rw-r--r--Documentation/technical/pack-format.txt77
-rw-r--r--Documentation/technical/pack-protocol.txt28
-rw-r--r--Documentation/technical/partial-clone.txt119
-rw-r--r--Documentation/technical/protocol-capabilities.txt18
-rw-r--r--Documentation/technical/protocol-v2.txt72
-rw-r--r--Documentation/technical/repository-version.txt26
-rw-r--r--Documentation/technical/rerere.txt186
22 files changed, 2197 insertions, 127 deletions
diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
index fa39ac9d71..7d20716c32 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -229,7 +229,7 @@ A `config_set` can be used to construct an in-memory cache for
config-like files that the caller specifies (i.e., files like `.gitmodules`,
`~/.gitconfig` etc.). For example,
----------------------------------------
+----------------------------------------
struct config_set gm_config;
git_configset_init(&gm_config);
int b;
diff --git a/Documentation/technical/api-diff.txt b/Documentation/technical/api-diff.txt
index 8b001de0db..30fc0e9c93 100644
--- a/Documentation/technical/api-diff.txt
+++ b/Documentation/technical/api-diff.txt
@@ -18,8 +18,8 @@ Calling sequence
----------------
* Prepare `struct diff_options` to record the set of diff options, and
- then call `diff_setup()` to initialize this structure. This sets up
- the vanilla default.
+ then call `repo_diff_setup()` to initialize this structure. This
+ sets up the vanilla default.
* Fill in the options structure to specify desired output format, rename
detection, etc. `diff_opt_parse()` can be used to parse options given
diff --git a/Documentation/technical/api-directory-listing.txt b/Documentation/technical/api-directory-listing.txt
index 5abb8e8b1f..76b6e4f71b 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -111,11 +111,11 @@ marked. If you to exclude files, make sure you have loaded index first.
* Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0,
sizeof(dir))`.
-* To add single exclude pattern, call `add_exclude_list()` and then
- `add_exclude()`.
+* To add single exclude pattern, call `add_pattern_list()` and then
+ `add_pattern()`.
* To add patterns from a file (e.g. `.git/info/exclude`), call
- `add_excludes_from_file()` , and/or set `dir.exclude_per_dir`. A
+ `add_patterns_from_file()` , and/or set `dir.exclude_per_dir`. A
short-hand function `setup_standard_excludes()` can be used to set
up the standard set of exclude settings.
diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt
index 9febfb1d52..c97428c2c3 100644
--- a/Documentation/technical/api-oid-array.txt
+++ b/Documentation/technical/api-oid-array.txt
@@ -48,6 +48,11 @@ Functions
is not sorted, this function has the side effect of sorting
it.
+`oid_array_filter`::
+ Apply the callback function `want` to each entry in the array,
+ retaining only the entries for which the function returns true.
+ Preserve the order of the entries that are retained.
+
Examples
--------
diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt
index 829b558110..2e2e7c10c6 100644
--- a/Documentation/technical/api-parse-options.txt
+++ b/Documentation/technical/api-parse-options.txt
@@ -183,10 +183,6 @@ There are some macros to easily define options:
scale the provided value by 1024, 1024^2 or 1024^3 respectively.
The scaled value is put into `unsigned_long_var`.
-`OPT_DATE(short, long, &timestamp_t_var, description)`::
- Introduce an option with date argument, see `approxidate()`.
- The timestamp is put into `timestamp_t_var`.
-
`OPT_EXPIRY_DATE(short, long, &timestamp_t_var, description)`::
Introduce an option with expiry date argument, see `parse_expiry_date()`.
The timestamp is put into `timestamp_t_var`.
@@ -202,8 +198,10 @@ There are some macros to easily define options:
The filename will be prefixed by passing the filename along with
the prefix argument of `parse_options()` to `prefix_filename()`.
-`OPT_ARGUMENT(long, description)`::
+`OPT_ARGUMENT(long, &int_var, description)`::
Introduce a long-option argument that will be kept in `argv[]`.
+ If this option was seen, `int_var` will be set to one (except
+ if a `NULL` pointer was passed).
`OPT_NUMBER_CALLBACK(&var, description, func_ptr)`::
Recognize numerical options like -123 and feed the integer as
diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt
index 46c3d5c355..ad9d019ff9 100644
--- a/Documentation/technical/api-ref-iteration.txt
+++ b/Documentation/technical/api-ref-iteration.txt
@@ -54,7 +54,7 @@ this:
do not do this you will get an error for each ref that it does not point
to a valid object.
-Note: As a side-effect of this you can not safely assume that all
+Note: As a side-effect of this you cannot safely assume that all
objects you lookup are available in superproject. All submodule objects
will be available the same way as the superprojects objects.
diff --git a/Documentation/technical/api-revision-walking.txt b/Documentation/technical/api-revision-walking.txt
index 55b878ade8..03f9ea6ac4 100644
--- a/Documentation/technical/api-revision-walking.txt
+++ b/Documentation/technical/api-revision-walking.txt
@@ -15,9 +15,9 @@ revision list.
Functions
---------
-`init_revisions`::
+`repo_init_revisions`::
- Initialize a rev_info structure with default values. The second
+ Initialize a rev_info structure with default values. The third
parameter may be NULL or can be prefix path, and then the `.prefix`
variable will be set to it. This is typically the first function you
want to call when you want to deal with a revision list. After calling
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
new file mode 100644
index 0000000000..71eb081fed
--- /dev/null
+++ b/Documentation/technical/api-trace2.txt
@@ -0,0 +1,1378 @@
+= Trace2 API
+
+The Trace2 API can be used to print debug, performance, and telemetry
+information to stderr or a file. The Trace2 feature is inactive unless
+explicitly enabled by enabling one or more Trace2 Targets.
+
+The Trace2 API is intended to replace the existing (Trace1)
+printf-style tracing provided by the existing `GIT_TRACE` and
+`GIT_TRACE_PERFORMANCE` facilities. During initial implementation,
+Trace2 and Trace1 may operate in parallel.
+
+The Trace2 API defines a set of high-level messages with known fields,
+such as (`start`: `argv`) and (`exit`: {`exit-code`, `elapsed-time`}).
+
+Trace2 instrumentation throughout the Git code base sends Trace2
+messages to the enabled Trace2 Targets. Targets transform these
+messages content into purpose-specific formats and write events to
+their data streams. In this manner, the Trace2 API can drive
+many different types of analysis.
+
+Targets are defined using a VTable allowing easy extension to other
+formats in the future. This might be used to define a binary format,
+for example.
+
+Trace2 is controlled using `trace2.*` config values in the system and
+global config files and `GIT_TRACE2*` environment variables. Trace2 does
+not read from repo local or worktree config files or respect `-c`
+command line config settings.
+
+== Trace2 Targets
+
+Trace2 defines the following set of Trace2 Targets.
+Format details are given in a later section.
+
+=== The Normal Format Target
+
+The normal format target is a tradition printf format and similar
+to GIT_TRACE format. This format is enabled with the `GIT_TRACE2`
+environment variable or the `trace2.normalTarget` system or global
+config setting.
+
+For example
+
+------------
+$ export GIT_TRACE2=~/log.normal
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+or
+
+------------
+$ git config --global trace2.normalTarget ~/log.normal
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+yields
+
+------------
+$ cat ~/log.normal
+12:28:42.620009 common-main.c:38 version 2.20.1.155.g426c96fcdb
+12:28:42.620989 common-main.c:39 start git version
+12:28:42.621101 git.c:432 cmd_name version (version)
+12:28:42.621215 git.c:662 exit elapsed:0.001227 code:0
+12:28:42.621250 trace2/tr2_tgt_normal.c:124 atexit elapsed:0.001265 code:0
+------------
+
+=== The Performance Format Target
+
+The performance format target (PERF) is a column-based format to
+replace GIT_TRACE_PERFORMANCE and is suitable for development and
+testing, possibly to complement tools like gprof. This format is
+enabled with the `GIT_TRACE2_PERF` environment variable or the
+`trace2.perfTarget` system or global config setting.
+
+For example
+
+------------
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+or
+
+------------
+$ git config --global trace2.perfTarget ~/log.perf
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+yields
+
+------------
+$ cat ~/log.perf
+12:28:42.620675 common-main.c:38 | d0 | main | version | | | | | 2.20.1.155.g426c96fcdb
+12:28:42.621001 common-main.c:39 | d0 | main | start | | 0.001173 | | | git version
+12:28:42.621111 git.c:432 | d0 | main | cmd_name | | | | | version (version)
+12:28:42.621225 git.c:662 | d0 | main | exit | | 0.001227 | | | code:0
+12:28:42.621259 trace2/tr2_tgt_perf.c:211 | d0 | main | atexit | | 0.001265 | | | code:0
+------------
+
+=== The Event Format Target
+
+The event format target is a JSON-based format of event data suitable
+for telemetry analysis. This format is enabled with the `GIT_TRACE2_EVENT`
+environment variable or the `trace2.eventTarget` system or global config
+setting.
+
+For example
+
+------------
+$ export GIT_TRACE2_EVENT=~/log.event
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+or
+
+------------
+$ git config --global trace2.eventTarget ~/log.event
+$ git version
+git version 2.20.1.155.g426c96fcdb
+------------
+
+yields
+
+------------
+$ cat ~/log.event
+{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"1","exe":"2.20.1.155.g426c96fcdb"}
+{"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]}
+{"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"}
+{"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0}
+{"event":"atexit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621268Z","file":"trace2/tr2_tgt_event.c","line":163,"t_abs":0.001265,"code":0}
+------------
+
+=== Enabling a Target
+
+To enable a target, set the corresponding environment variable or
+system or global config value to one of the following:
+
+include::../trace2-target-values.txt[]
+
+If the target already exists and is a directory, the traces will be
+written to files (one per process) underneath the given directory. They
+will be named according to the last component of the SID (optionally
+followed by a counter to avoid filename collisions).
+
+== Trace2 API
+
+All public Trace2 functions and macros are defined in `trace2.h` and
+`trace2.c`. All public symbols are prefixed with `trace2_`.
+
+There are no public Trace2 data structures.
+
+The Trace2 code also defines a set of private functions and data types
+in the `trace2/` directory. These symbols are prefixed with `tr2_`
+and should only be used by functions in `trace2.c`.
+
+== Conventions for Public Functions and Macros
+
+The functions defined by the Trace2 API are declared and documented
+in `trace2.h`. It defines the API functions and wrapper macros for
+Trace2.
+
+Some functions have a `_fl()` suffix to indicate that they take `file`
+and `line-number` arguments.
+
+Some functions have a `_va_fl()` suffix to indicate that they also
+take a `va_list` argument.
+
+Some functions have a `_printf_fl()` suffix to indicate that they also
+take a varargs argument.
+
+There are CPP wrapper macros and ifdefs to hide most of these details.
+See `trace2.h` for more details. The following discussion will only
+describe the simplified forms.
+
+== Public API
+
+All Trace2 API functions send a messsage to all of the active
+Trace2 Targets. This section describes the set of available
+messages.
+
+It helps to divide these functions into groups for discussion
+purposes.
+
+=== Basic Command Messages
+
+These are concerned with the lifetime of the overall git process.
+
+`void trace2_initialize_clock()`::
+
+ Initialize the Trace2 start clock and nothing else. This should
+ be called at the very top of main() to capture the process start
+ time and reduce startup order dependencies.
+
+`void trace2_initialize()`::
+
+ Determines if any Trace2 Targets should be enabled and
+ initializes the Trace2 facility. This includes setting up the
+ Trace2 thread local storage (TLS).
++
+This function emits a "version" message containing the version of git
+and the Trace2 protocol.
++
+This function should be called from `main()` as early as possible in
+the life of the process after essential process initialization.
+
+`int trace2_is_enabled()`::
+
+ Returns 1 if Trace2 is enabled (at least one target is
+ active).
+
+`void trace2_cmd_start(int argc, const char **argv)`::
+
+ Emits a "start" message containing the process command line
+ arguments.
+
+`int trace2_cmd_exit(int exit_code)`::
+
+ Emits an "exit" message containing the process exit-code and
+ elapsed time.
++
+Returns the exit-code.
+
+`void trace2_cmd_error(const char *fmt, va_list ap)`::
+
+ Emits an "error" message containing a formatted error message.
+
+`void trace2_cmd_path(const char *pathname)`::
+
+ Emits a "cmd_path" message with the full pathname of the
+ current process.
+
+=== Command Detail Messages
+
+These are concerned with describing the specific Git command
+after the command line, config, and environment are inspected.
+
+`void trace2_cmd_name(const char *name)`::
+
+ Emits a "cmd_name" message with the canonical name of the
+ command, for example "status" or "checkout".
+
+`void trace2_cmd_mode(const char *mode)`::
+
+ Emits a "cmd_mode" message with a qualifier name to further
+ describe the current git command.
++
+This message is intended to be used with git commands having multiple
+major modes. For example, a "checkout" command can checkout a new
+branch or it can checkout a single file, so the checkout code could
+emit a cmd_mode message of "branch" or "file".
+
+`void trace2_cmd_alias(const char *alias, const char **argv_expansion)`::
+
+ Emits an "alias" message containing the alias used and the
+ argument expansion.
+
+`void trace2_def_param(const char *parameter, const char *value)`::
+
+ Emits a "def_param" message containing a key/value pair.
++
+This message is intended to report some global aspect of the current
+command, such as a configuration setting or command line switch that
+significantly affects program performance or behavior, such as
+`core.abbrev`, `status.showUntrackedFiles`, or `--no-ahead-behind`.
+
+`void trace2_cmd_list_config()`::
+
+ Emits a "def_param" messages for "important" configuration
+ settings.
++
+The environment variable `GIT_TRACE2_CONFIG_PARAMS` or the `trace2.configParams`
+config value can be set to a
+list of patterns of important configuration settings, for example:
+`core.*,remote.*.url`. This function will iterate over all config
+settings and emit a "def_param" message for each match.
+
+`void trace2_cmd_set_config(const char *key, const char *value)`::
+
+ Emits a "def_param" message for a new or updated key/value
+ pair IF `key` is considered important.
++
+This is used to hook into `git_config_set()` and catch any
+configuration changes and update a value previously reported by
+`trace2_cmd_list_config()`.
+
+`void trace2_def_repo(struct repository *repo)`::
+
+ Registers a repository with the Trace2 layer. Assigns a
+ unique "repo-id" to `repo->trace2_repo_id`.
++
+Emits a "worktree" messages containing the repo-id and the worktree
+pathname.
++
+Region and data messages (described later) may refer to this repo-id.
++
+The main/top-level repository will have repo-id value 1 (aka "r1").
++
+The repo-id field is in anticipation of future in-proc submodule
+repositories.
+
+=== Child Process Messages
+
+These are concerned with the various spawned child processes,
+including shell scripts, git commands, editors, pagers, and hooks.
+
+`void trace2_child_start(struct child_process *cmd)`::
+
+ Emits a "child_start" message containing the "child-id",
+ "child-argv", and "child-classification".
++
+Before calling this, set `cmd->trace2_child_class` to a name
+describing the type of child process, for example "editor".
++
+This function assigns a unique "child-id" to `cmd->trace2_child_id`.
+This field is used later during the "child_exit" message to associate
+it with the "child_start" message.
++
+This function should be called before spawning the child process.
+
+`void trace2_child_exit(struct child_proess *cmd, int child_exit_code)`::
+
+ Emits a "child_exit" message containing the "child-id",
+ the child's elapsed time and exit-code.
++
+The reported elapsed time includes the process creation overhead and
+time spend waiting for it to exit, so it may be slightly longer than
+the time reported by the child itself.
++
+This function should be called after reaping the child process.
+
+`int trace2_exec(const char *exe, const char **argv)`::
+
+ Emits a "exec" message containing the "exec-id" and the
+ argv of the new process.
++
+This function should be called before calling one of the `exec()`
+variants, such as `execvp()`.
++
+This function returns a unique "exec-id". This value is used later
+if the exec() fails and a "exec-result" message is necessary.
+
+`void trace2_exec_result(int exec_id, int error_code)`::
+
+ Emits a "exec_result" message containing the "exec-id"
+ and the error code.
++
+On Unix-based systems, `exec()` does not return if successful.
+This message is used to indicate that the `exec()` failed and
+that the current program is continuing.
+
+=== Git Thread Messages
+
+These messages are concerned with Git thread usage.
+
+`void trace2_thread_start(const char *thread_name)`::
+
+ Emits a "thread_start" message.
++
+The `thread_name` field should be a descriptive name, such as the
+unique name of the thread-proc. A unique "thread-id" will be added
+to the name to uniquely identify thread instances.
++
+Region and data messages (described later) may refer to this thread
+name.
++
+This function must be called by the thread-proc of the new thread
+(so that TLS data is properly initialized) and not by the caller
+of `pthread_create()`.
+
+`void trace2_thread_exit()`::
+
+ Emits a "thread_exit" message containing the thread name
+ and the thread elapsed time.
++
+This function must be called by the thread-proc before it returns
+(so that the coorect TLS data is used and cleaned up. It should
+not be called by the caller of `pthread_join()`.
+
+=== Region and Data Messages
+
+These are concerned with recording performance data
+over regions or spans of code.
+
+`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`::
+
+`void trace2_region_enter_printf(const char *category, const char *label, const struct repository *repo, const char *fmt, ...)`::
+
+`void trace2_region_enter_printf_va(const char *category, const char *label, const struct repository *repo, const char *fmt, va_list ap)`::
+
+ Emits a thread-relative "region_enter" message with optional
+ printf string.
++
+This function pushes a new region nesting stack level on the current
+thread and starts a clock for the new stack frame.
++
+The `category` field is an arbitrary category name used to classify
+regions by feature area, such as "status" or "index". At this time
+it is only just printed along with the rest of the message. It may
+be used in the future to filter messages.
++
+The `label` field is an arbitrary label used to describe the activity
+being started, such as "read_recursive" or "do_read_index".
++
+The `repo` field, if set, will be used to get the "repo-id", so that
+recursive oerations can be attributed to the correct repository.
+
+`void trace2_region_leave(const char *category, const char *label, const struct repository *repo)`::
+
+`void trace2_region_leave_printf(const char *category, const char *label, const struct repository *repo, const char *fmt, ...)`::
+
+`void trace2_region_leave_printf_va(const char *category, const char *label, const struct repository *repo, const char *fmt, va_list ap)`::
+
+ Emits a thread-relative "region_leave" message with optional
+ printf string.
++
+This function pops the region nesting stack on the current thread
+and reports the elapsed time of the stack frame.
++
+The `category`, `label`, and `repo` fields are the same as above.
+The `category` and `label` do not need to match the correpsonding
+"region_enter" message, but it makes the data stream easier to
+understand.
+
+`void trace2_data_string(const char *category, const struct repository *repo, const char *key, const char * value)`::
+
+`void trace2_data_intmax(const char *category, const struct repository *repo, const char *key, intmax value)`::
+
+`void trace2_data_json(const char *category, const struct repository *repo, const char *key, const struct json_writer *jw)`::
+
+ Emits a region- and thread-relative "data" or "data_json" message.
++
+This is a key/value pair message containing information about the
+current thread, region stack, and repository. This could be used
+to print the number of files in a directory during a multi-threaded
+recursive tree walk.
+
+`void trace2_printf(const char *fmt, ...)`::
+
+`void trace2_printf_va(const char *fmt, va_list ap)`::
+
+ Emits a region- and thread-relative "printf" message.
+
+== Trace2 Target Formats
+
+=== NORMAL Format
+
+Events are written as lines of the form:
+
+------------
+[<time> SP <filename>:<line> SP+] <event-name> [[SP] <event-message>] LF
+------------
+
+`<event-name>`::
+
+ is the event name.
+
+`<event-message>`::
+ is a free-form printf message intended for human consumption.
++
+Note that this may contain embedded LF or CRLF characters that are
+not escaped, so the event may spill across multiple lines.
+
+If `GIT_TRACE2_BRIEF` or `trace2.normalBrief` is true, the `time`, `filename`,
+and `line` fields are omitted.
+
+This target is intended to be more of a summary (like GIT_TRACE) and
+less detailed than the other targets. It ignores thread, region, and
+data messages, for example.
+
+=== PERF Format
+
+Events are written as lines of the form:
+
+------------
+[<time> SP <filename>:<line> SP+
+ BAR SP] d<depth> SP
+ BAR SP <thread-name> SP+
+ BAR SP <event-name> SP+
+ BAR SP [r<repo-id>] SP+
+ BAR SP [<t_abs>] SP+
+ BAR SP [<t_rel>] SP+
+ BAR SP [<category>] SP+
+ BAR SP DOTS* <perf-event-message>
+ LF
+------------
+
+`<depth>`::
+ is the git process depth. This is the number of parent
+ git processes. A top-level git command has depth value "d0".
+ A child of it has depth value "d1". A second level child
+ has depth value "d2" and so on.
+
+`<thread-name>`::
+ is a unique name for the thread. The primary thread
+ is called "main". Other thread names are of the form "th%d:%s"
+ and include a unique number and the name of the thread-proc.
+
+`<event-name>`::
+ is the event name.
+
+`<repo-id>`::
+ when present, is a number indicating the repository
+ in use. A `def_repo` event is emitted when a repository is
+ opened. This defines the repo-id and associated worktree.
+ Subsequent repo-specific events will reference this repo-id.
++
+Currently, this is always "r1" for the main repository.
+This field is in anticipation of in-proc submodules in the future.
+
+`<t_abs>`::
+ when present, is the absolute time in seconds since the
+ program started.
+
+`<t_rel>`::
+ when present, is time in seconds relative to the start of
+ the current region. For a thread-exit event, it is the elapsed
+ time of the thread.
+
+`<category>`::
+ is present on region and data events and is used to
+ indicate a broad category, such as "index" or "status".
+
+`<perf-event-message>`::
+ is a free-form printf message intended for human consumption.
+
+------------
+15:33:33.532712 wt-status.c:2310 | d0 | main | region_enter | r1 | 0.126064 | | status | label:print
+15:33:33.532712 wt-status.c:2331 | d0 | main | region_leave | r1 | 0.127568 | 0.001504 | status | label:print
+------------
+
+If `GIT_TRACE2_PERF_BRIEF` or `trace2.perfBrief` is true, the `time`, `file`,
+and `line` fields are omitted.
+
+------------
+d0 | main | region_leave | r1 | 0.011717 | 0.009122 | index | label:preload
+------------
+
+The PERF target is intended for interactive performance analysis
+during development and is quite noisy.
+
+=== EVENT Format
+
+Each event is a JSON-object containing multiple key/value pairs
+written as a single line and followed by a LF.
+
+------------
+'{' <key> ':' <value> [',' <key> ':' <value>]* '}' LF
+------------
+
+Some key/value pairs are common to all events and some are
+event-specific.
+
+==== Common Key/Value Pairs
+
+The following key/value pairs are common to all events:
+
+------------
+{
+ "event":"version",
+ "sid":"20190408T191827.272759Z-H9b68c35f-P00003510",
+ "thread":"main",
+ "time":"2019-04-08T19:18:27.282761Z",
+ "file":"common-main.c",
+ "line":42,
+ ...
+}
+------------
+
+`"event":<event>`::
+ is the event name.
+
+`"sid":<sid>`::
+ is the session-id. This is a unique string to identify the
+ process instance to allow all events emitted by a process to
+ be identified. A session-id is used instead of a PID because
+ PIDs are recycled by the OS. For child git processes, the
+ session-id is prepended with the session-id of the parent git
+ process to allow parent-child relationships to be identified
+ during post-processing.
+
+`"thread":<thread>`::
+ is the thread name.
+
+`"time":<time>`::
+ is the UTC time of the event.
+
+`"file":<filename>`::
+ is source file generating the event.
+
+`"line":<line-number>`::
+ is the integer source line number generating the event.
+
+`"repo":<repo-id>`::
+ when present, is the integer repo-id as described previously.
+
+If `GIT_TRACE2_EVENT_BRIEF` or `trace2.eventBrief` is true, the `file`
+and `line` fields are omitted from all events and the `time` field is
+only present on the "start" and "atexit" events.
+
+==== Event-Specific Key/Value Pairs
+
+`"version"`::
+ This event gives the version of the executable and the EVENT format.
++
+------------
+{
+ "event":"version",
+ ...
+ "evt":"1", # EVENT format version
+ "exe":"2.20.1.155.g426c96fcdb" # git version
+}
+------------
+
+`"start"`::
+ This event contains the complete argv received by main().
++
+------------
+{
+ "event":"start",
+ ...
+ "t_abs":0.001227, # elapsed time in seconds
+ "argv":["git","version"]
+}
+------------
+
+`"exit"`::
+ This event is emitted when git calls `exit()`.
++
+------------
+{
+ "event":"exit",
+ ...
+ "t_abs":0.001227, # elapsed time in seconds
+ "code":0 # exit code
+}
+------------
+
+`"atexit"`::
+ This event is emitted by the Trace2 `atexit` routine during
+ final shutdown. It should be the last event emitted by the
+ process.
++
+(The elapsed time reported here is greater than the time reported in
+the "exit" event because it runs after all other atexit tasks have
+completed.)
++
+------------
+{
+ "event":"atexit",
+ ...
+ "t_abs":0.001227, # elapsed time in seconds
+ "code":0 # exit code
+}
+------------
+
+`"signal"`::
+ This event is emitted when the program is terminated by a user
+ signal. Depending on the platform, the signal event may
+ prevent the "atexit" event from being generated.
++
+------------
+{
+ "event":"signal",
+ ...
+ "t_abs":0.001227, # elapsed time in seconds
+ "signo":13 # SIGTERM, SIGINT, etc.
+}
+------------
+
+`"error"`::
+ This event is emitted when one of the `error()`, `die()`,
+ or `usage()` functions are called.
++
+------------
+{
+ "event":"error",
+ ...
+ "msg":"invalid option: --cahced", # formatted error message
+ "fmt":"invalid option: %s" # error format string
+}
+------------
++
+The error event may be emitted more than once. The format string
+allows post-processors to group errors by type without worrying
+about specific error arguments.
+
+`"cmd_path"`::
+ This event contains the discovered full path of the git
+ executable (on platforms that are configured to resolve it).
++
+------------
+{
+ "event":"cmd_path",
+ ...
+ "path":"C:/work/gfw/git.exe"
+}
+------------
+
+`"cmd_name"`::
+ This event contains the command name for this git process
+ and the hierarchy of commands from parent git processes.
++
+------------
+{
+ "event":"cmd_name",
+ ...
+ "name":"pack-objects",
+ "hierarchy":"push/pack-objects"
+}
+------------
++
+Normally, the "name" field contains the canonical name of the
+command. When a canonical name is not available, one of
+these special values are used:
++
+------------
+"_query_" # "git --html-path"
+"_run_dashed_" # when "git foo" tries to run "git-foo"
+"_run_shell_alias_" # alias expansion to a shell command
+"_run_git_alias_" # alias expansion to a git command
+"_usage_" # usage error
+------------
+
+`"cmd_mode"`::
+ This event, when present, describes the command variant This
+ event may be emitted more than once.
++
+------------
+{
+ "event":"cmd_mode",
+ ...
+ "name":"branch"
+}
+------------
++
+The "name" field is an arbitrary string to describe the command mode.
+For example, checkout can checkout a branch or an individual file.
+And these variations typically have different performance
+characteristics that are not comparable.
+
+`"alias"`::
+ This event is present when an alias is expanded.
++
+------------
+{
+ "event":"alias",
+ ...
+ "alias":"l", # registered alias
+ "argv":["log","--graph"] # alias expansion
+}
+------------
+
+`"child_start"`::
+ This event describes a child process that is about to be
+ spawned.
++
+------------
+{
+ "event":"child_start",
+ ...
+ "child_id":2,
+ "child_class":"?",
+ "use_shell":false,
+ "argv":["git","rev-list","--objects","--stdin","--not","--all","--quiet"]
+
+ "hook_name":"<hook_name>" # present when child_class is "hook"
+ "cd":"<path>" # present when cd is required
+}
+------------
++
+The "child_id" field can be used to match this child_start with the
+corresponding child_exit event.
++
+The "child_class" field is a rough classification, such as "editor",
+"pager", "transport/*", and "hook". Unclassified children are classified
+with "?".
+
+`"child_exit"`::
+ This event is generated after the current process has returned
+ from the waitpid() and collected the exit information from the
+ child.
++
+------------
+{
+ "event":"child_exit",
+ ...
+ "child_id":2,
+ "pid":14708, # child PID
+ "code":0, # child exit-code
+ "t_rel":0.110605 # observed run-time of child process
+}
+------------
++
+Note that the session-id of the child process is not available to
+the current/spawning process, so the child's PID is reported here as
+a hint for post-processing. (But it is only a hint because the child
+proces may be a shell script which doesn't have a session-id.)
++
+Note that the `t_rel` field contains the observed run time in seconds
+for the child process (starting before the fork/exec/spawn and
+stopping after the waitpid() and includes OS process creation overhead).
+So this time will be slightly larger than the atexit time reported by
+the child process itself.
+
+`"exec"`::
+ This event is generated before git attempts to `exec()`
+ another command rather than starting a child process.
++
+------------
+{
+ "event":"exec",
+ ...
+ "exec_id":0,
+ "exe":"git",
+ "argv":["foo", "bar"]
+}
+------------
++
+The "exec_id" field is a command-unique id and is only useful if the
+`exec()` fails and a corresponding exec_result event is generated.
+
+`"exec_result"`::
+ This event is generated if the `exec()` fails and control
+ returns to the current git command.
++
+------------
+{
+ "event":"exec_result",
+ ...
+ "exec_id":0,
+ "code":1 # error code (errno) from exec()
+}
+------------
+
+`"thread_start"`::
+ This event is generated when a thread is started. It is
+ generated from *within* the new thread's thread-proc (for TLS
+ reasons).
++
+------------
+{
+ "event":"thread_start",
+ ...
+ "thread":"th02:preload_thread" # thread name
+}
+------------
+
+`"thread_exit"`::
+ This event is generated when a thread exits. It is generated
+ from *within* the thread's thread-proc (for TLS reasons).
++
+------------
+{
+ "event":"thread_exit",
+ ...
+ "thread":"th02:preload_thread", # thread name
+ "t_rel":0.007328 # thread elapsed time
+}
+------------
+
+`"def_param"`::
+ This event is generated to log a global parameter.
++
+------------
+{
+ "event":"def_param",
+ ...
+ "param":"core.abbrev",
+ "value":"7"
+}
+------------
+
+`"def_repo"`::
+ This event defines a repo-id and associates it with the root
+ of the worktree.
++
+------------
+{
+ "event":"def_repo",
+ ...
+ "repo":1,
+ "worktree":"/Users/jeffhost/work/gfw"
+}
+------------
++
+As stated earlier, the repo-id is currently always 1, so there will
+only be one def_repo event. Later, if in-proc submodules are
+supported, a def_repo event should be emitted for each submodule
+visited.
+
+`"region_enter"`::
+ This event is generated when entering a region.
++
+------------
+{
+ "event":"region_enter",
+ ...
+ "repo":1, # optional
+ "nesting":1, # current region stack depth
+ "category":"index", # optional
+ "label":"do_read_index", # optional
+ "msg":".git/index" # optional
+}
+------------
++
+The `category` field may be used in a future enhancement to
+do category-based filtering.
++
+`GIT_TRACE2_EVENT_NESTING` or `trace2.eventNesting` can be used to
+filter deeply nested regions and data events. It defaults to "2".
+
+`"region_leave"`::
+ This event is generated when leaving a region.
++
+------------
+{
+ "event":"region_leave",
+ ...
+ "repo":1, # optional
+ "t_rel":0.002876, # time spent in region in seconds
+ "nesting":1, # region stack depth
+ "category":"index", # optional
+ "label":"do_read_index", # optional
+ "msg":".git/index" # optional
+}
+------------
+
+`"data"`::
+ This event is generated to log a thread- and region-local
+ key/value pair.
++
+------------
+{
+ "event":"data",
+ ...
+ "repo":1, # optional
+ "t_abs":0.024107, # absolute elapsed time
+ "t_rel":0.001031, # elapsed time in region/thread
+ "nesting":2, # region stack depth
+ "category":"index",
+ "key":"read/cache_nr",
+ "value":"3552"
+}
+------------
++
+The "value" field may be an integer or a string.
+
+`"data-json"`::
+ This event is generated to log a pre-formatted JSON string
+ containing structured data.
++
+------------
+{
+ "event":"data_json",
+ ...
+ "repo":1, # optional
+ "t_abs":0.015905,
+ "t_rel":0.015905,
+ "nesting":1,
+ "category":"process",
+ "key":"windows/ancestry",
+ "value":["bash.exe","bash.exe"]
+}
+------------
+
+== Example Trace2 API Usage
+
+Here is a hypothetical usage of the Trace2 API showing the intended
+usage (without worrying about the actual Git details).
+
+Initialization::
+
+ Initialization happens in `main()`. Behind the scenes, an
+ `atexit` and `signal` handler are registered.
++
+----------------
+int main(int argc, const char **argv)
+{
+ int exit_code;
+
+ trace2_initialize();
+ trace2_cmd_start(argv);
+
+ exit_code = cmd_main(argc, argv);
+
+ trace2_cmd_exit(exit_code);
+
+ return exit_code;
+}
+----------------
+
+Command Details::
+
+ After the basics are established, additional command
+ information can be sent to Trace2 as it is discovered.
++
+----------------
+int cmd_checkout(int argc, const char **argv)
+{
+ trace2_cmd_name("checkout");
+ trace2_cmd_mode("branch");
+ trace2_def_repo(the_repository);
+
+ // emit "def_param" messages for "interesting" config settings.
+ trace2_cmd_list_config();
+
+ if (do_something())
+ trace2_cmd_error("Path '%s': cannot do something", path);
+
+ return 0;
+}
+----------------
+
+Child Processes::
+
+ Wrap code spawning child processes.
++
+----------------
+void run_child(...)
+{
+ int child_exit_code;
+ struct child_process cmd = CHILD_PROCESS_INIT;
+ ...
+ cmd.trace2_child_class = "editor";
+
+ trace2_child_start(&cmd);
+ child_exit_code = spawn_child_and_wait_for_it();
+ trace2_child_exit(&cmd, child_exit_code);
+}
+----------------
++
+For example, the following fetch command spawned ssh, index-pack,
+rev-list, and gc. This example also shows that fetch took
+5.199 seconds and of that 4.932 was in ssh.
++
+----------------
+$ export GIT_TRACE2_BRIEF=1
+$ export GIT_TRACE2=~/log.normal
+$ git fetch origin
+...
+----------------
++
+----------------
+$ cat ~/log.normal
+version 2.20.1.vfs.1.1.47.g534dbe1ad1
+start git fetch origin
+worktree /Users/jeffhost/work/gfw
+cmd_name fetch (fetch)
+child_start[0] ssh git@github.com ...
+child_start[1] git index-pack ...
+... (Trace2 events from child processes omitted)
+child_exit[1] pid:14707 code:0 elapsed:0.076353
+child_exit[0] pid:14706 code:0 elapsed:4.931869
+child_start[2] git rev-list ...
+... (Trace2 events from child process omitted)
+child_exit[2] pid:14708 code:0 elapsed:0.110605
+child_start[3] git gc --auto
+... (Trace2 events from child process omitted)
+child_exit[3] pid:14709 code:0 elapsed:0.006240
+exit elapsed:5.198503 code:0
+atexit elapsed:5.198541 code:0
+----------------
++
+When a git process is a (direct or indirect) child of another
+git process, it inherits Trace2 context information. This
+allows the child to print the command hierarchy. This example
+shows gc as child[3] of fetch. When the gc process reports
+its name as "gc", it also reports the hierarchy as "fetch/gc".
+(In this example, trace2 messages from the child process is
+indented for clarity.)
++
+----------------
+$ export GIT_TRACE2_BRIEF=1
+$ export GIT_TRACE2=~/log.normal
+$ git fetch origin
+...
+----------------
++
+----------------
+$ cat ~/log.normal
+version 2.20.1.160.g5676107ecd.dirty
+start git fetch official
+worktree /Users/jeffhost/work/gfw
+cmd_name fetch (fetch)
+...
+child_start[3] git gc --auto
+ version 2.20.1.160.g5676107ecd.dirty
+ start /Users/jeffhost/work/gfw/git gc --auto
+ worktree /Users/jeffhost/work/gfw
+ cmd_name gc (fetch/gc)
+ exit elapsed:0.001959 code:0
+ atexit elapsed:0.001997 code:0
+child_exit[3] pid:20303 code:0 elapsed:0.007564
+exit elapsed:3.868938 code:0
+atexit elapsed:3.868970 code:0
+----------------
+
+Regions::
+
+ Regions can be use to time an interesting section of code.
++
+----------------
+void wt_status_collect(struct wt_status *s)
+{
+ trace2_region_enter("status", "worktrees", s->repo);
+ wt_status_collect_changes_worktree(s);
+ trace2_region_leave("status", "worktrees", s->repo);
+
+ trace2_region_enter("status", "index", s->repo);
+ wt_status_collect_changes_index(s);
+ trace2_region_leave("status", "index", s->repo);
+
+ trace2_region_enter("status", "untracked", s->repo);
+ wt_status_collect_untracked(s);
+ trace2_region_leave("status", "untracked", s->repo);
+}
+
+void wt_status_print(struct wt_status *s)
+{
+ trace2_region_enter("status", "print", s->repo);
+ switch (s->status_format) {
+ ...
+ }
+ trace2_region_leave("status", "print", s->repo);
+}
+----------------
++
+In this example, scanning for untracked files ran from +0.012568 to
++0.027149 (since the process started) and took 0.014581 seconds.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+
+$ cat ~/log.perf
+d0 | main | version | | | | | 2.20.1.160.g5676107ecd.dirty
+d0 | main | start | | 0.001173 | | | git status
+d0 | main | def_repo | r1 | | | | worktree:/Users/jeffhost/work/gfw
+d0 | main | cmd_name | | | | | status (status)
+...
+d0 | main | region_enter | r1 | 0.010988 | | status | label:worktrees
+d0 | main | region_leave | r1 | 0.011236 | 0.000248 | status | label:worktrees
+d0 | main | region_enter | r1 | 0.011260 | | status | label:index
+d0 | main | region_leave | r1 | 0.012542 | 0.001282 | status | label:index
+d0 | main | region_enter | r1 | 0.012568 | | status | label:untracked
+d0 | main | region_leave | r1 | 0.027149 | 0.014581 | status | label:untracked
+d0 | main | region_enter | r1 | 0.027411 | | status | label:print
+d0 | main | region_leave | r1 | 0.028741 | 0.001330 | status | label:print
+d0 | main | exit | | 0.028778 | | | code:0
+d0 | main | atexit | | 0.028809 | | | code:0
+----------------
++
+Regions may be nested. This causes messages to be indented in the
+PERF target, for example.
+Elapsed times are relative to the start of the correpsonding nesting
+level as expected. For example, if we add region message to:
++
+----------------
+static enum path_treatment read_directory_recursive(struct dir_struct *dir,
+ struct index_state *istate, const char *base, int baselen,
+ struct untracked_cache_dir *untracked, int check_only,
+ int stop_at_first_file, const struct pathspec *pathspec)
+{
+ enum path_treatment state, subdir_state, dir_state = path_none;
+
+ trace2_region_enter_printf("dir", "read_recursive", NULL, "%.*s", baselen, base);
+ ...
+ trace2_region_leave_printf("dir", "read_recursive", NULL, "%.*s", baselen, base);
+ return dir_state;
+}
+----------------
++
+We can further investigate the time spent scanning for untracked files.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+d0 | main | version | | | | | 2.20.1.162.gb4ccea44db.dirty
+d0 | main | start | | 0.001173 | | | git status
+d0 | main | def_repo | r1 | | | | worktree:/Users/jeffhost/work/gfw
+d0 | main | cmd_name | | | | | status (status)
+...
+d0 | main | region_enter | r1 | 0.015047 | | status | label:untracked
+d0 | main | region_enter | | 0.015132 | | dir | ..label:read_recursive
+d0 | main | region_enter | | 0.016341 | | dir | ....label:read_recursive vcs-svn/
+d0 | main | region_leave | | 0.016422 | 0.000081 | dir | ....label:read_recursive vcs-svn/
+d0 | main | region_enter | | 0.016446 | | dir | ....label:read_recursive xdiff/
+d0 | main | region_leave | | 0.016522 | 0.000076 | dir | ....label:read_recursive xdiff/
+d0 | main | region_enter | | 0.016612 | | dir | ....label:read_recursive git-gui/
+d0 | main | region_enter | | 0.016698 | | dir | ......label:read_recursive git-gui/po/
+d0 | main | region_enter | | 0.016810 | | dir | ........label:read_recursive git-gui/po/glossary/
+d0 | main | region_leave | | 0.016863 | 0.000053 | dir | ........label:read_recursive git-gui/po/glossary/
+...
+d0 | main | region_enter | | 0.031876 | | dir | ....label:read_recursive builtin/
+d0 | main | region_leave | | 0.032270 | 0.000394 | dir | ....label:read_recursive builtin/
+d0 | main | region_leave | | 0.032414 | 0.017282 | dir | ..label:read_recursive
+d0 | main | region_leave | r1 | 0.032454 | 0.017407 | status | label:untracked
+...
+d0 | main | exit | | 0.034279 | | | code:0
+d0 | main | atexit | | 0.034322 | | | code:0
+----------------
++
+Trace2 regions are similar to the existing trace_performance_enter()
+and trace_performance_leave() routines, but are thread safe and
+maintain per-thread stacks of timers.
+
+Data Messages::
+
+ Data messages added to a region.
++
+----------------
+int read_index_from(struct index_state *istate, const char *path,
+ const char *gitdir)
+{
+ trace2_region_enter_printf("index", "do_read_index", the_repository, "%s", path);
+
+ ...
+
+ trace2_data_intmax("index", the_repository, "read/version", istate->version);
+ trace2_data_intmax("index", the_repository, "read/cache_nr", istate->cache_nr);
+
+ trace2_region_leave_printf("index", "do_read_index", the_repository, "%s", path);
+}
+----------------
++
+This example shows that the index contained 3552 entries.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+d0 | main | version | | | | | 2.20.1.156.gf9916ae094.dirty
+d0 | main | start | | 0.001173 | | | git status
+d0 | main | def_repo | r1 | | | | worktree:/Users/jeffhost/work/gfw
+d0 | main | cmd_name | | | | | status (status)
+d0 | main | region_enter | r1 | 0.001791 | | index | label:do_read_index .git/index
+d0 | main | data | r1 | 0.002494 | 0.000703 | index | ..read/version:2
+d0 | main | data | r1 | 0.002520 | 0.000729 | index | ..read/cache_nr:3552
+d0 | main | region_leave | r1 | 0.002539 | 0.000748 | index | label:do_read_index .git/index
+...
+----------------
+
+Thread Events::
+
+ Thread messages added to a thread-proc.
++
+For example, the multithreaded preload-index code can be
+instrumented with a region around the thread pool and then
+per-thread start and exit events within the threadproc.
++
+----------------
+static void *preload_thread(void *_data)
+{
+ // start the per-thread clock and emit a message.
+ trace2_thread_start("preload_thread");
+
+ // report which chunk of the array this thread was assigned.
+ trace2_data_intmax("index", the_repository, "offset", p->offset);
+ trace2_data_intmax("index", the_repository, "count", nr);
+
+ do {
+ ...
+ } while (--nr > 0);
+ ...
+
+ // report elapsed time taken by this thread.
+ trace2_thread_exit();
+ return NULL;
+}
+
+void preload_index(struct index_state *index,
+ const struct pathspec *pathspec,
+ unsigned int refresh_flags)
+{
+ trace2_region_enter("index", "preload", the_repository);
+
+ for (i = 0; i < threads; i++) {
+ ... /* create thread */
+ }
+
+ for (i = 0; i < threads; i++) {
+ ... /* join thread */
+ }
+
+ trace2_region_leave("index", "preload", the_repository);
+}
+----------------
++
+In this example preload_index() was executed by the `main` thread
+and started the `preload` region. Seven threads, named
+`th01:preload_thread` through `th07:preload_thread`, were started.
+Events from each thread are atomically appended to the shared target
+stream as they occur so they may appear in random order with respect
+other threads. Finally, the main thread waits for the threads to
+finish and leaves the region.
++
+Data events are tagged with the active thread name. They are used
+to report the per-thread parameters.
++
+----------------
+$ export GIT_TRACE2_PERF_BRIEF=1
+$ export GIT_TRACE2_PERF=~/log.perf
+$ git status
+...
+$ cat ~/log.perf
+...
+d0 | main | region_enter | r1 | 0.002595 | | index | label:preload
+d0 | th01:preload_thread | thread_start | | 0.002699 | | |
+d0 | th02:preload_thread | thread_start | | 0.002721 | | |
+d0 | th01:preload_thread | data | r1 | 0.002736 | 0.000037 | index | offset:0
+d0 | th02:preload_thread | data | r1 | 0.002751 | 0.000030 | index | offset:2032
+d0 | th03:preload_thread | thread_start | | 0.002711 | | |
+d0 | th06:preload_thread | thread_start | | 0.002739 | | |
+d0 | th01:preload_thread | data | r1 | 0.002766 | 0.000067 | index | count:508
+d0 | th06:preload_thread | data | r1 | 0.002856 | 0.000117 | index | offset:2540
+d0 | th03:preload_thread | data | r1 | 0.002824 | 0.000113 | index | offset:1016
+d0 | th04:preload_thread | thread_start | | 0.002710 | | |
+d0 | th02:preload_thread | data | r1 | 0.002779 | 0.000058 | index | count:508
+d0 | th06:preload_thread | data | r1 | 0.002966 | 0.000227 | index | count:508
+d0 | th07:preload_thread | thread_start | | 0.002741 | | |
+d0 | th07:preload_thread | data | r1 | 0.003017 | 0.000276 | index | offset:3048
+d0 | th05:preload_thread | thread_start | | 0.002712 | | |
+d0 | th05:preload_thread | data | r1 | 0.003067 | 0.000355 | index | offset:1524
+d0 | th05:preload_thread | data | r1 | 0.003090 | 0.000378 | index | count:508
+d0 | th07:preload_thread | data | r1 | 0.003037 | 0.000296 | index | count:504
+d0 | th03:preload_thread | data | r1 | 0.002971 | 0.000260 | index | count:508
+d0 | th04:preload_thread | data | r1 | 0.002983 | 0.000273 | index | offset:508
+d0 | th04:preload_thread | data | r1 | 0.007311 | 0.004601 | index | count:508
+d0 | th05:preload_thread | thread_exit | | 0.008781 | 0.006069 | |
+d0 | th01:preload_thread | thread_exit | | 0.009561 | 0.006862 | |
+d0 | th03:preload_thread | thread_exit | | 0.009742 | 0.007031 | |
+d0 | th06:preload_thread | thread_exit | | 0.009820 | 0.007081 | |
+d0 | th02:preload_thread | thread_exit | | 0.010274 | 0.007553 | |
+d0 | th07:preload_thread | thread_exit | | 0.010477 | 0.007736 | |
+d0 | th04:preload_thread | thread_exit | | 0.011657 | 0.008947 | |
+d0 | main | region_leave | r1 | 0.011717 | 0.009122 | index | label:preload
+...
+d0 | main | exit | | 0.029996 | | | code:0
+d0 | main | atexit | | 0.030027 | | | code:0
+----------------
++
+In this example, the preload region took 0.009122 seconds. The 7 threads
+took between 0.006069 and 0.008947 seconds to work on their portion of
+the index. Thread "th01" worked on 508 items at offset 0. Thread "th02"
+worked on 508 items at offset 2032. Thread "th04" worked on 508 itemts
+at offset 508.
++
+This example also shows that thread names are assigned in a racy manner
+as each thread starts and allocates TLS storage.
+
+== Future Work
+
+=== Relationship to the Existing Trace Api (api-trace.txt)
+
+There are a few issues to resolve before we can completely
+switch to Trace2.
+
+* Updating existing tests that assume GIT_TRACE format messages.
+
+* How to best handle custom GIT_TRACE_<key> messages?
+
+** The GIT_TRACE_<key> mechanism allows each <key> to write to a
+different file (in addition to just stderr).
+
+** Do we want to maintain that ability or simply write to the existing
+Trace2 targets (and convert <key> to a "category").
diff --git a/Documentation/technical/api-tree-walking.txt b/Documentation/technical/api-tree-walking.txt
index bde18622a8..7962e32854 100644
--- a/Documentation/technical/api-tree-walking.txt
+++ b/Documentation/technical/api-tree-walking.txt
@@ -62,9 +62,7 @@ Initializing
`setup_traverse_info`::
Initialize a `traverse_info` given the pathname of the tree to start
- traversing from. The `base` argument is assumed to be the `path`
- member of the `name_entry` being recursed into unless the tree is a
- top-level tree in which case the empty string ("") is used.
+ traversing from.
Walking
-------
@@ -140,6 +138,10 @@ same in the next callback invocation.
This utilizes the memory structure of a tree entry to avoid the
overhead of using a generic strlen().
+`strbuf_make_traverse_path`::
+
+ Convenience wrapper to `make_traverse_path` into a strbuf.
+
Authors
-------
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index cc0474ba3e..a4f17441ae 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -44,8 +44,9 @@ HEADER:
1-byte number (C) of "chunks"
- 1-byte (reserved for later use)
- Current clients should ignore this value.
+ 1-byte number (B) of base commit-graphs
+ We infer the length (H*B) of the Base Graphs chunk
+ from this value.
CHUNK LOOKUP:
@@ -76,7 +77,7 @@ CHUNK DATA:
of the ith commit. Stores value 0x7000000 if no parent in that
position. If there are more than two parents, the second value
has its most-significant bit on and the other bits store an array
- position into the Large Edge List chunk.
+ position into the Extra Edge List chunk.
* The next 8 bytes store the generation number of the commit and
the commit time in seconds since EPOCH. The generation number
uses the higher 30 bits of the first 4 bytes, while the commit
@@ -84,7 +85,7 @@ CHUNK DATA:
2 bits of the lowest byte, storing the 33rd and 34th bit of the
commit time.
- Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+ Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
This list of 4-byte values store the second through nth parents for
all octopus merges. The second parent value in the commit data stores
an array position within this list along with the most-significant bit
@@ -92,6 +93,12 @@ CHUNK DATA:
positions for the parents until reaching a value with the most-significant
bit on. The other bits correspond to the position of the last parent.
+ Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
+ This list of H-byte hashes describe a set of B commit-graph files that
+ form a commit-graph chain. The graph position for the ith commit in this
+ file's OID Lookup chunk is equal to i plus the number of commits in all
+ base graphs. If B is non-zero, this chunk must exist.
+
TRAILER:
H-byte HASH-checksum of all of the above.
diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index 7805b0968c..729fbcb32f 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -127,22 +127,196 @@ Design Details
helpful for these clones, anyway. The commit-graph will not be read or
written when shallow commits are present.
-Future Work
------------
-
-- After computing and storing generation numbers, we must make graph
- walks aware of generation numbers to gain the performance benefits they
- enable. This will mostly be accomplished by swapping a commit-date-ordered
- priority queue with one ordered by generation number. The following
- operations are important candidates:
-
- - 'log --topo-order'
- - 'tag --merged'
-
-- A server could provide a commit-graph file as part of the network protocol
- to avoid extra calculations by clients. This feature is only of benefit if
- the user is willing to trust the file, because verifying the file is correct
- is as hard as computing it from scratch.
+Commit Graphs Chains
+--------------------
+
+Typically, repos grow with near-constant velocity (commits per day). Over time,
+the number of commits added by a fetch operation is much smaller than the
+number of commits in the full history. By creating a "chain" of commit-graphs,
+we enable fast writes of new commit data without rewriting the entire commit
+history -- at least, most of the time.
+
+## File Layout
+
+A commit-graph chain uses multiple files, and we use a fixed naming convention
+to organize these files. Each commit-graph file has a name
+`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex-
+valued hash stored in the footer of that file (which is a hash of the file's
+contents before that hash). For a chain of commit-graph files, a plain-text
+file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the
+hashes for the files in order from "lowest" to "highest".
+
+For example, if the `commit-graph-chain` file contains the lines
+
+```
+ {hash0}
+ {hash1}
+ {hash2}
+```
+
+then the commit-graph chain looks like the following diagram:
+
+ +-----------------------+
+ | graph-{hash2}.graph |
+ +-----------------------+
+ |
+ +-----------------------+
+ | |
+ | graph-{hash1}.graph |
+ | |
+ +-----------------------+
+ |
+ +-----------------------+
+ | |
+ | |
+ | |
+ | graph-{hash0}.graph |
+ | |
+ | |
+ | |
+ +-----------------------+
+
+Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of
+commits in `graph-{hash1}.graph`, and X2 be the number of commits in
+`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`,
+then we interpret this as being the commit in position (X0 + X1 + i), and that
+will be used as its "graph position". The commits in `graph-{hash2}.graph` use these
+positions to refer to their parents, which may be in `graph-{hash1}.graph` or
+`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking
+its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
+X2).
+
+Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data
+specifying the hashes of all files in the lower layers. In the above example,
+`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
+`{hash0}` and `{hash1}`.
+
+## Merging commit-graph files
+
+If we only added a new commit-graph file on every write, we would run into a
+linear search problem through many commit-graph files. Instead, we use a merge
+strategy to decide when the stack should collapse some number of levels.
+
+The diagram below shows such a collapse. As a set of new commits are added, it
+is determined by the merge strategy that the files should collapse to
+`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and
+the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
+file.
+
+ +---------------------+
+ | |
+ | (new commits) |
+ | |
+ +---------------------+
+ | |
+ +-----------------------+ +---------------------+
+ | graph-{hash2} |->| |
+ +-----------------------+ +---------------------+
+ | | |
+ +-----------------------+ +---------------------+
+ | | | |
+ | graph-{hash1} |->| |
+ | | | |
+ +-----------------------+ +---------------------+
+ | tmp_graphXXX
+ +-----------------------+
+ | |
+ | |
+ | |
+ | graph-{hash0} |
+ | |
+ | |
+ | |
+ +-----------------------+
+
+During this process, the commits to write are combined, sorted and we write the
+contents to a temporary file, all while holding a `commit-graph-chain.lock`
+lock-file. When the file is flushed, we rename it to `graph-{hash3}`
+according to the computed `{hash3}`. Finally, we write the new chain data to
+`commit-graph-chain.lock`:
+
+```
+ {hash3}
+ {hash0}
+```
+
+We then close the lock-file.
+
+## Merge Strategy
+
+When writing a set of commits that do not exist in the commit-graph stack of
+height N, we default to creating a new file at level N + 1. We then decide to
+merge with the Nth level if one of two conditions hold:
+
+ 1. `--size-multiple=<X>` is specified or X = 2, and the number of commits in
+ level N is less than X times the number of commits in level N + 1.
+
+ 2. `--max-commits=<C>` is specified with non-zero C and the number of commits
+ in level N + 1 is more than C commits.
+
+This decision cascades down the levels: when we merge a level we create a new
+set of commits that then compares to the next level.
+
+The first condition bounds the number of levels to be logarithmic in the total
+number of commits. The second condition bounds the total number of commits in
+a `graph-{hashN}` file and not in the `commit-graph` file, preventing
+significant performance issues when the stack merges and another process only
+partially reads the previous stack.
+
+The merge strategy values (2 for the size multiple, 64,000 for the maximum
+number of commits) could be extracted into config settings for full
+flexibility.
+
+## Deleting graph-{hash} files
+
+After a new tip file is written, some `graph-{hash}` files may no longer
+be part of a chain. It is important to remove these files from disk, eventually.
+The main reason to delay removal is that another process could read the
+`commit-graph-chain` file before it is rewritten, but then look for the
+`graph-{hash}` files after they are deleted.
+
+To allow holding old split commit-graphs for a while after they are unreferenced,
+we update the modified times of the files when they become unreferenced. Then,
+we scan the `$OBJDIR/info/commit-graphs/` directory for `graph-{hash}`
+files whose modified times are older than a given expiry window. This window
+defaults to zero, but can be changed using command-line arguments or a config
+setting.
+
+## Chains across multiple object directories
+
+In a repo with alternates, we look for the `commit-graph-chain` file starting
+in the local object directory and then in each alternate. The first file that
+exists defines our chain. As we look for the `graph-{hash}` files for
+each `{hash}` in the chain file, we follow the same pattern for the host
+directories.
+
+This allows commit-graphs to be split across multiple forks in a fork network.
+The typical case is a large "base" repo with many smaller forks.
+
+As the base repo advances, it will likely update and merge its commit-graph
+chain more frequently than the forks. If a fork updates their commit-graph after
+the base repo, then it should "reparent" the commit-graph chain onto the new
+chain in the base repo. When reading each `graph-{hash}` file, we track
+the object directory containing it. During a write of a new commit-graph file,
+we check for any changes in the source object directory and read the
+`commit-graph-chain` file for that source and create a new file based on those
+files. During this "reparent" operation, we necessarily need to collapse all
+levels in the fork, as all of the files are invalid against the new base file.
+
+It is crucial to be careful when cleaning up "unreferenced" `graph-{hash}.graph`
+files in this scenario. It falls to the user to define the proper settings for
+their custom environment:
+
+ 1. When merging levels in the base repo, the unreferenced files may still be
+ referenced by chains from fork repos.
+
+ 2. The expiry time should be set to a length of time such that every fork has
+ time to recompute their commit-graph chain to "reparent" onto the new base
+ file(s).
+
+ 3. If the commit-graph chain is updated in the base, the fork will not have
+ access to the new chain until its chain is updated to reference those files.
+ (This may change in the future [5].)
Related Links
-------------
@@ -170,3 +344,7 @@ Related Links
[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
A patch to remove the ahead-behind calculation from 'status'.
+
+[5] https://public-inbox.org/git/f27db281-abad-5043-6d71-cbb083b1c877@gmail.com/
+ A discussion of a "two-dimensional graph position" that can allow reading
+ multiple commit-graph chains at the same time.
diff --git a/Documentation/technical/directory-rename-detection.txt b/Documentation/technical/directory-rename-detection.txt
index 1c0086e287..844629c8c4 100644
--- a/Documentation/technical/directory-rename-detection.txt
+++ b/Documentation/technical/directory-rename-detection.txt
@@ -20,8 +20,8 @@ More interesting possibilities exist, though, such as:
* one side of history renames x -> z, and the other renames some file to
x/e, causing the need for the merge to do a transitive rename.
- * one side of history renames x -> z, but also renames all files within
- x. For example, x/a -> z/alpha, x/b -> z/bravo, etc.
+ * one side of history renames x -> z, but also renames all files within x.
+ For example, x/a -> z/alpha, x/b -> z/bravo, etc.
* both 'x' and 'y' being merged into a single directory 'z', with a
directory rename being detected for both x->z and y->z.
diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index bc2ace2a6e..2ae8fa470a 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -456,7 +456,7 @@ packfile marked as UNREACHABLE_GARBAGE (using the PSRC field; see
below). To avoid the race when writing new objects referring to an
about-to-be-deleted object, code paths that write new objects will
need to copy any objects from UNREACHABLE_GARBAGE packs that they
-refer to to new, non-UNREACHABLE_GARBAGE packs (or loose objects).
+refer to new, non-UNREACHABLE_GARBAGE packs (or loose objects).
UNREACHABLE_GARBAGE are then safe to delete if their creation time (as
indicated by the file's mtime) is long enough ago.
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index db3572626b..7c4d67aa6a 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -314,3 +314,44 @@ The remaining data of each directory block is grouped by type:
- An ewah bitmap, the n-th bit indicates whether the n-th index entry
is not CE_FSMONITOR_VALID.
+
+== End of Index Entry
+
+ The End of Index Entry (EOIE) is used to locate the end of the variable
+ length index entries and the begining of the extensions. Code can take
+ advantage of this to quickly locate the index extensions without having
+ to parse through all of the index entries.
+
+ Because it must be able to be loaded before the variable length cache
+ entries and other index extensions, this extension must be written last.
+ The signature for this extension is { 'E', 'O', 'I', 'E' }.
+
+ The extension consists of:
+
+ - 32-bit offset to the end of the index entries
+
+ - 160-bit SHA-1 over the extension types and their sizes (but not
+ their contents). E.g. if we have "TREE" extension that is N-bytes
+ long, "REUC" extension that is M-bytes long, followed by "EOIE",
+ then the hash would be:
+
+ SHA-1("TREE" + <binary representation of N> +
+ "REUC" + <binary representation of M>)
+
+== Index Entry Offset Table
+
+ The Index Entry Offset Table (IEOT) is used to help address the CPU
+ cost of loading the index by enabling multi-threading the process of
+ converting cache entries from the on-disk format to the in-memory format.
+ The signature for this extension is { 'I', 'E', 'O', 'T' }.
+
+ The extension consists of:
+
+ - 32-bit version (currently 1)
+
+ - A number of index offset entries each consisting of:
+
+ - 32-bit offset from the begining of the file to the first cache entry
+ in this block of entries.
+
+ - 32-bit count of cache entries in this block
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
new file mode 100644
index 0000000000..d7e57639f7
--- /dev/null
+++ b/Documentation/technical/multi-pack-index.txt
@@ -0,0 +1,109 @@
+Multi-Pack-Index (MIDX) Design Notes
+====================================
+
+The Git object directory contains a 'pack' directory containing
+packfiles (with suffix ".pack") and pack-indexes (with suffix
+".idx"). The pack-indexes provide a way to lookup objects and
+navigate to their offset within the pack, but these must come
+in pairs with the packfiles. This pairing depends on the file
+names, as the pack-index differs only in suffix with its pack-
+file. While the pack-indexes provide fast lookup per packfile,
+this performance degrades as the number of packfiles increases,
+because abbreviations need to inspect every packfile and we are
+more likely to have a miss on our most-recently-used packfile.
+For some large repositories, repacking into a single packfile
+is not feasible due to storage space or excessive repack times.
+
+The multi-pack-index (MIDX for short) stores a list of objects
+and their offsets into multiple packfiles. It contains:
+
+- A list of packfile names.
+- A sorted list of object IDs.
+- A list of metadata for the ith object ID including:
+ - A value j referring to the jth packfile.
+ - An offset within the jth packfile for the object.
+- If large offsets are required, we use another list of large
+ offsets similar to version 2 pack-indexes.
+
+Thus, we can provide O(log N) lookup time for any number
+of packfiles.
+
+Design Details
+--------------
+
+- The MIDX is stored in a file named 'multi-pack-index' in the
+ .git/objects/pack directory. This could be stored in the pack
+ directory of an alternate. It refers only to packfiles in that
+ same directory.
+
+- The pack.multiIndex config setting must be on to consume MIDX files.
+
+- The file format includes parameters for the object ID hash
+ function, so a future change of hash algorithm does not require
+ a change in format.
+
+- The MIDX keeps only one record per object ID. If an object appears
+ in multiple packfiles, then the MIDX selects the copy in the most-
+ recently modified packfile.
+
+- If there exist packfiles in the pack directory not registered in
+ the MIDX, then those packfiles are loaded into the `packed_git`
+ list and `packed_git_mru` cache.
+
+- The pack-indexes (.idx files) remain in the pack directory so we
+ can delete the MIDX file, set core.midx to false, or downgrade
+ without any loss of information.
+
+- The MIDX file format uses a chunk-based approach (similar to the
+ commit-graph file) that allows optional data to be added.
+
+Future Work
+-----------
+
+- Add a 'verify' subcommand to the 'git midx' builtin to verify the
+ contents of the multi-pack-index file match the offsets listed in
+ the corresponding pack-indexes.
+
+- The multi-pack-index allows many packfiles, especially in a context
+ where repacking is expensive (such as a very large repo), or
+ unexpected maintenance time is unacceptable (such as a high-demand
+ build machine). However, the multi-pack-index needs to be rewritten
+ in full every time. We can extend the format to be incremental, so
+ writes are fast. By storing a small "tip" multi-pack-index that
+ points to large "base" MIDX files, we can keep writes fast while
+ still reducing the number of binary searches required for object
+ lookups.
+
+- The reachability bitmap is currently paired directly with a single
+ packfile, using the pack-order as the object order to hopefully
+ compress the bitmaps well using run-length encoding. This could be
+ extended to pair a reachability bitmap with a multi-pack-index. If
+ the multi-pack-index is extended to store a "stable object order"
+ (a function Order(hash) = integer that is constant for a given hash,
+ even as the multi-pack-index is updated) then a reachability bitmap
+ could point to a multi-pack-index and be updated independently.
+
+- Packfiles can be marked as "special" using empty files that share
+ the initial name but replace ".pack" with ".keep" or ".promisor".
+ We can add an optional chunk of data to the multi-pack-index that
+ records flags of information about the packfiles. This allows new
+ states, such as 'repacked' or 'redeltified', that can help with
+ pack maintenance in a multi-pack environment. It may also be
+ helpful to organize packfiles by object type (commit, tree, blob,
+ etc.) and use this metadata to help that maintenance.
+
+- The partial clone feature records special "promisor" packs that
+ may point to objects that are not stored locally, but available
+ on request to a server. The multi-pack-index does not currently
+ track these promisor packs.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=6
+ Chromium work item for: Multi-Pack Index (MIDX)
+
+[1] https://public-inbox.org/git/20180107181459.222909-1-dstolee@microsoft.com/
+ An earlier RFC for the multi-pack-index feature
+
+[2] https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/
+ Git Merge 2018 Contributor's summit notes (includes discussion of MIDX)
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 70a99fd142..cab5bdd2ff 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -252,3 +252,80 @@ Pack file entry: <+
corresponding packfile.
20-byte SHA-1-checksum of all of the above.
+
+== multi-pack-index (MIDX) files have the following format:
+
+The multi-pack-index files refer to multiple pack-files and loose objects.
+
+In order to allow extensions that add extra data to the MIDX, we organize
+the body into "chunks" and provide a lookup table at the beginning of the
+body. The header includes certain length values, such as the number of packs,
+the number of base MIDX files, hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+ 4-byte signature:
+ The signature is: {'M', 'I', 'D', 'X'}
+
+ 1-byte version number:
+ Git only writes or recognizes version 1.
+
+ 1-byte Object Id Version
+ Git only writes or recognizes version 1 (SHA1).
+
+ 1-byte number of "chunks"
+
+ 1-byte number of base multi-pack-index files:
+ This value is currently always zero.
+
+ 4-byte number of pack files
+
+CHUNK LOOKUP:
+
+ (C + 1) * 12 bytes providing the chunk offsets:
+ First 4 bytes describe chunk id. Value 0 is a terminating label.
+ Other 8 bytes provide offset in current file for chunk to start.
+ (Chunks are provided in file-order, so you can infer the length
+ using the next chunk position if necessary.)
+
+ The remaining data in the body is described one chunk at a time, and
+ these chunks may be given in any order. Chunks are required unless
+ otherwise specified.
+
+CHUNK DATA:
+
+ Packfile Names (ID: {'P', 'N', 'A', 'M'})
+ Stores the packfile names as concatenated, null-terminated strings.
+ Packfiles must be listed in lexicographic order for fast lookups by
+ name. This is the only chunk not guaranteed to be a multiple of four
+ bytes in length, so should be the last chunk for alignment reasons.
+
+ OID Fanout (ID: {'O', 'I', 'D', 'F'})
+ The ith entry, F[i], stores the number of OIDs with first
+ byte at most i. Thus F[255] stores the total
+ number of objects.
+
+ OID Lookup (ID: {'O', 'I', 'D', 'L'})
+ The OIDs for all objects in the MIDX are stored in lexicographic
+ order in this chunk.
+
+ Object Offsets (ID: {'O', 'O', 'F', 'F'})
+ Stores two 4-byte values for every object.
+ 1: The pack-int-id for the pack storing this object.
+ 2: The offset within the pack.
+ If all offsets are less than 2^31, then the large offset chunk
+ will not exist and offsets are stored as in IDX v1.
+ If there is at least one offset value larger than 2^32-1, then
+ the large offset chunk must exist. If the large offset chunk
+ exists and the 31st bit is on, then removing that bit reveals
+ the row in the large offsets containing the 8-byte offset of
+ this object.
+
+ [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
+ 8-byte offsets into large packfiles.
+
+TRAILER:
+
+ 20-byte SHA1-checksum of the above contents.
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 6ac774d5f6..c73e72de0e 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -22,6 +22,16 @@ protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless
otherwise noted the usual pkt-line LF rules apply: the sender SHOULD
include a LF, but the receiver MUST NOT complain if it is not present.
+An error packet is a special pkt-line that contains an error string.
+
+----
+ error-line = PKT-LINE("ERR" SP explanation-text)
+----
+
+Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY
+be sent. Once this packet is sent by a client or a server, the data transfer
+process defined in this protocol is terminated.
+
Transports
----------
There are three transports over which the packfile protocol is
@@ -89,13 +99,6 @@ process on the server side over the Git protocol is this:
"0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" |
nc -v example.com 9418
-If the server refuses the request for some reasons, it could abort
-gracefully with an error message.
-
-----
- error-line = PKT-LINE("ERR" SP explanation-text)
-----
-
SSH Transport
-------------
@@ -398,12 +401,11 @@ from the client).
Then the server will start sending its packfile data.
----
- server-response = *ack_multi ack / nak / error-line
+ server-response = *ack_multi ack / nak
ack_multi = PKT-LINE("ACK" SP obj-id ack_status)
ack_status = "continue" / "common" / "ready"
ack = PKT-LINE("ACK" SP obj-id)
nak = PKT-LINE("NAK")
- error-line = PKT-LINE("ERR" SP explanation-text)
----
A simple clone may look like this (with no 'have' lines):
@@ -655,14 +657,14 @@ can be rejected.
An example client/server communication might look like this:
----
- S: 007c74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n
+ S: 006274730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n
S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n
S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n
- S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n
+ S: 003d74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n
S: 0000
- C: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n
- C: 003e74730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n
+ C: 00677d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n
+ C: 006874730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n
C: 0000
C: [PACKDATA]
diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt
index 1ef66bd788..210373e258 100644
--- a/Documentation/technical/partial-clone.txt
+++ b/Documentation/technical/partial-clone.txt
@@ -30,12 +30,20 @@ advance* during clone and fetch operations and thereby reduce download
times and disk usage. Missing objects can later be "demand fetched"
if/when needed.
+A remote that can later provide the missing objects is called a
+promisor remote, as it promises to send the objects when
+requested. Initialy Git supported only one promisor remote, the origin
+remote from which the user cloned and that was configured in the
+"extensions.partialClone" config option. Later support for more than
+one promisor remote has been implemented.
+
Use of partial clone requires that the user be online and the origin
-remote be available for on-demand fetching of missing objects. This may
-or may not be problematic for the user. For example, if the user can
-stay within the pre-selected subset of the source tree, they may not
-encounter any missing objects. Alternatively, the user could try to
-pre-fetch various objects if they know that they are going offline.
+remote or other promisor remotes be available for on-demand fetching
+of missing objects. This may or may not be problematic for the user.
+For example, if the user can stay within the pre-selected subset of
+the source tree, they may not encounter any missing objects.
+Alternatively, the user could try to pre-fetch various objects if they
+know that they are going offline.
Non-Goals
@@ -100,21 +108,21 @@ or commits that reference missing trees.
Handling Missing Objects
------------------------
-- An object may be missing due to a partial clone or fetch, or missing due
- to repository corruption. To differentiate these cases, the local
- repository specially indicates such filtered packfiles obtained from the
- promisor remote as "promisor packfiles".
+- An object may be missing due to a partial clone or fetch, or missing
+ due to repository corruption. To differentiate these cases, the
+ local repository specially indicates such filtered packfiles
+ obtained from promisor remotes as "promisor packfiles".
+
These promisor packfiles consist of a "<name>.promisor" file with
arbitrary contents (like the "<name>.keep" files), in addition to
their "<name>.pack" and "<name>.idx" files.
- The local repository considers a "promisor object" to be an object that
- it knows (to the best of its ability) that the promisor remote has promised
- that it has, either because the local repository has that object in one of
+ it knows (to the best of its ability) that promisor remotes have promised
+ that they have, either because the local repository has that object in one of
its promisor packfiles, or because another promisor object refers to it.
+
-When Git encounters a missing object, Git can see if it a promisor object
+When Git encounters a missing object, Git can see if it is a promisor object
and handle it appropriately. If not, Git can report a corruption.
+
This means that there is no need for the client to explicitly maintain an
@@ -123,12 +131,12 @@ expensive-to-modify list of missing objects.[a]
- Since almost all Git code currently expects any referenced object to be
present locally and because we do not want to force every command to do
a dry-run first, a fallback mechanism is added to allow Git to attempt
- to dynamically fetch missing objects from the promisor remote.
+ to dynamically fetch missing objects from promisor remotes.
+
When the normal object lookup fails to find an object, Git invokes
-fetch-object to try to get the object from the server and then retry
-the object lookup. This allows objects to be "faulted in" without
-complicated prediction algorithms.
+promisor_remote_get_direct() to try to get the object from a promisor
+remote and then retry the object lookup. This allows objects to be
+"faulted in" without complicated prediction algorithms.
+
For efficiency reasons, no check as to whether the missing object is
actually a promisor object is performed.
@@ -157,8 +165,7 @@ and prefetch those objects in bulk.
+
We are not happy with this global variable and would like to remove it,
but that requires significant refactoring of the object code to pass an
-additional flag. We hope that concurrent efforts to add an ODB API can
-encompass this.
+additional flag.
Fetching Missing Objects
@@ -182,21 +189,63 @@ has been updated to not use any object flags when the corresponding argument
though they are not necessary.
+Using many promisor remotes
+---------------------------
+
+Many promisor remotes can be configured and used.
+
+This allows for example a user to have multiple geographically-close
+cache servers for fetching missing blobs while continuing to do
+filtered `git-fetch` commands from the central server.
+
+When fetching objects, promisor remotes are tried one after the other
+until all the objects have been fetched.
+
+Remotes that are considered "promisor" remotes are those specified by
+the following configuration variables:
+
+- `extensions.partialClone = <name>`
+
+- `remote.<name>.promisor = true`
+
+- `remote.<name>.partialCloneFilter = ...`
+
+Only one promisor remote can be configured using the
+`extensions.partialClone` config variable. This promisor remote will
+be the last one tried when fetching objects.
+
+We decided to make it the last one we try, because it is likely that
+someone using many promisor remotes is doing so because the other
+promisor remotes are better for some reason (maybe they are closer or
+faster for some kind of objects) than the origin, and the origin is
+likely to be the remote specified by extensions.partialClone.
+
+This justification is not very strong, but one choice had to be made,
+and anyway the long term plan should be to make the order somehow
+fully configurable.
+
+For now though the other promisor remotes will be tried in the order
+they appear in the config file.
+
Current Limitations
-------------------
-- The remote used for a partial clone (or the first partial fetch
- following a regular clone) is marked as the "promisor remote".
+- It is not possible to specify the order in which the promisor
+ remotes are tried in other ways than the order in which they appear
+ in the config file.
+
-We are currently limited to a single promisor remote and only that
-remote may be used for subsequent partial fetches.
+It is also not possible to specify an order to be used when fetching
+from one remote and a different order when fetching from another
+remote.
+
+- It is not possible to push only specific objects to a promisor
+ remote.
+
-We accept this limitation because we believe initial users of this
-feature will be using it on repositories with a strong single central
-server.
+It is not possible to push at the same time to multiple promisor
+remote in a specific order.
-- Dynamic object fetching will only ask the promisor remote for missing
- objects. We assume that the promisor remote has a complete view of the
+- Dynamic object fetching will only ask promisor remotes for missing
+ objects. We assume that promisor remotes have a complete view of the
repository and can satisfy all such requests.
- Repack essentially treats promisor and non-promisor packfiles as 2
@@ -218,15 +267,17 @@ server.
Future Work
-----------
-- Allow more than one promisor remote and define a strategy for fetching
- missing objects from specific promisor remotes or of iterating over the
- set of promisor remotes until a missing object is found.
+- Improve the way to specify the order in which promisor remotes are
+ tried.
+
-A user might want to have multiple geographically-close cache servers
-for fetching missing blobs while continuing to do filtered `git-fetch`
-commands from the central server, for example.
+For example this could allow to specify explicitly something like:
+"When fetching from this remote, I want to use these promisor remotes
+in this order, though, when pushing or fetching to that remote, I want
+to use those promisor remotes in that order."
+
+- Allow pushing to promisor remotes.
+
-Or the user might want to work in a triangular work flow with multiple
+The user might want to work in a triangular work flow with multiple
promisor remotes that each have an incomplete view of the repository.
- Allow repack to work on promisor packfiles (while keeping them distinct
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 332d209b58..2b267c0da6 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -1,6 +1,10 @@
Git Protocol Capabilities
=========================
+NOTE: this document describes capabilities for versions 0 and 1 of the pack
+protocol. For version 2, please refer to the link:protocol-v2.html[protocol-v2]
+doc.
+
Servers SHOULD support all capabilities defined in this document.
On the very first line of the initial server response of either
@@ -172,6 +176,20 @@ agent strings are purely informative for statistics and debugging
purposes, and MUST NOT be used to programmatically assume the presence
or absence of particular features.
+symref
+------
+
+This parameterized capability is used to inform the receiver which symbolic ref
+points to which ref; for example, "symref=HEAD:refs/heads/master" tells the
+receiver that HEAD points to master. This capability can be repeated to
+represent multiple symrefs.
+
+Servers SHOULD include this capability for the HEAD symref if it is one of the
+refs being sent.
+
+Clients MAY use the parameters from this capability to select the proper initial
+branch when cloning a repository.
+
shallow
-------
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 09e4e0273f..40f91f6b1e 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -1,5 +1,5 @@
- Git Wire Protocol, Version 2
-==============================
+Git Wire Protocol, Version 2
+============================
This document presents a specification for a version 2 of Git's wire
protocol. Protocol v2 will improve upon v1 in the following ways:
@@ -22,8 +22,8 @@ will be commands which a client can request be executed. Once a command
has completed, a client can reuse the connection and request that other
commands be executed.
- Packet-Line Framing
----------------------
+Packet-Line Framing
+-------------------
All communication is done using packet-line framing, just as in v1. See
`Documentation/technical/pack-protocol.txt` and
@@ -34,8 +34,8 @@ In protocol v2 these special packets will have the following semantics:
* '0000' Flush Packet (flush-pkt) - indicates the end of a message
* '0001' Delimiter Packet (delim-pkt) - separates sections of a message
- Initial Client Request
-------------------------
+Initial Client Request
+----------------------
In general a client can request to speak protocol v2 by sending
`version=2` through the respective side-channel for the transport being
@@ -43,22 +43,22 @@ used which inevitably sets `GIT_PROTOCOL`. More information can be
found in `pack-protocol.txt` and `http-protocol.txt`. In all cases the
response from the server is the capability advertisement.
- Git Transport
-~~~~~~~~~~~~~~~
+Git Transport
+~~~~~~~~~~~~~
When using the git:// transport, you can request to use protocol v2 by
sending "version=2" as an extra parameter:
003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0
- SSH and File Transport
-~~~~~~~~~~~~~~~~~~~~~~~~
+SSH and File Transport
+~~~~~~~~~~~~~~~~~~~~~~
When using either the ssh:// or file:// transport, the GIT_PROTOCOL
environment variable must be set explicitly to include "version=2".
- HTTP Transport
-~~~~~~~~~~~~~~~~
+HTTP Transport
+~~~~~~~~~~~~~~
When using the http:// or https:// transport a client makes a "smart"
info/refs request as described in `http-protocol.txt` and requests that
@@ -79,8 +79,8 @@ A v2 server would reply:
Subsequent requests are then made directly to the service
`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack).
- Capability Advertisement
---------------------------
+Capability Advertisement
+------------------------
A server which decides to communicate (based on a request from a client)
using protocol version 2, notifies the client by sending a version string
@@ -101,8 +101,8 @@ to be executed by the client.
key = 1*(ALPHA | DIGIT | "-_")
value = 1*(ALPHA | DIGIT | " -_.,?\/{}[]()<>!@#$%^&*+=:;")
- Command Request
------------------
+Command Request
+---------------
After receiving the capability advertisement, a client can then issue a
request to select the command it wants with any particular capabilities
@@ -137,11 +137,11 @@ command be executed or can terminate the connection. A client may
optionally send an empty request consisting of just a flush-pkt to
indicate that no more requests will be made.
- Capabilities
---------------
+Capabilities
+------------
There are two different types of capabilities: normal capabilities,
-which can be used to to convey information or alter the behavior of a
+which can be used to convey information or alter the behavior of a
request, and commands, which are the core actions that a client wants to
perform (fetch, push, etc).
@@ -153,8 +153,8 @@ management on the server side in order to function correctly. This
permits simple round-robin load-balancing on the server side, without
needing to worry about state management.
- agent
-~~~~~~~
+agent
+~~~~~
The server can advertise the `agent` capability with a value `X` (in the
form `agent=X`) to notify the client that the server is running version
@@ -168,8 +168,8 @@ printable ASCII characters except space (i.e., the byte range 32 < x <
and debugging purposes, and MUST NOT be used to programmatically assume
the presence or absence of particular features.
- ls-refs
-~~~~~~~~~
+ls-refs
+~~~~~~~
`ls-refs` is the command used to request a reference advertisement in v2.
Unlike the current reference advertisement, ls-refs takes in arguments
@@ -199,8 +199,8 @@ The output of ls-refs is as follows:
symref = "symref-target:" symref-target
peeled = "peeled:" obj-id
- fetch
-~~~~~~~
+fetch
+~~~~~
`fetch` is the command used to fetch a packfile in v2. It can be looked
at as a modified version of the v1 fetch where the ref-advertisement is
@@ -296,7 +296,13 @@ included in the client's request:
Request that various objects from the packfile be omitted
using one of several filtering techniques. These are intended
for use with partial clone and partial fetch operations. See
- `rev-list` for possible "filter-spec" values.
+ `rev-list` for possible "filter-spec" values. When communicating
+ with other processes, senders SHOULD translate scaled integers
+ (e.g. "1k") into a fully-expanded form (e.g. "1024") to aid
+ interoperability with older receivers that may not understand
+ newly-invented scaling suffixes. However, receivers SHOULD
+ accept the following suffixes: 'k', 'm', and 'g' for 1024,
+ 1048576, and 1073741824, respectively.
If the 'ref-in-want' feature is advertised, the following argument can
be included in the client's request as well as the potential addition of
@@ -307,6 +313,16 @@ the 'wanted-refs' section in the server's response as explained below.
particular ref, where <ref> is the full name of a ref on the
server.
+If the 'sideband-all' feature is advertised, the following argument can be
+included in the client's request:
+
+ sideband-all
+ Instruct the server to send the whole response multiplexed, not just
+ the packfile section. All non-flush and non-delim PKT-LINE in the
+ response (not only in the packfile section) will then start with a byte
+ indicating its sideband (1, 2, or 3), and the server may send "0005\2"
+ (a PKT-LINE of sideband 2 with no payload) as a keepalive packet.
+
The response of `fetch` is broken into a number of sections separated by
delimiter packets (0001), with each section beginning with its section
header.
@@ -428,8 +444,8 @@ header.
2 - progress messages
3 - fatal error message just before stream aborts
- server-option
-~~~~~~~~~~~~~~~
+server-option
+~~~~~~~~~~~~~
If advertised, indicates that any number of server specific options can be
included in a request. This is done by sending each option as a
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index e03eaccebc..7844ef30ff 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -1,5 +1,4 @@
-Git Repository Format Versions
-==============================
+== Git Repository Format Versions
Every git repository is marked with a numeric version in the
`core.repositoryformatversion` key of its `config` file. This version
@@ -40,16 +39,14 @@ format by default.
The currently defined format versions are:
-Version `0`
------------
+=== Version `0`
This is the format defined by the initial version of git, including but
not limited to the format of the repository directory, the repository
configuration file, and the object and ref storage. Specifying the
complete behavior of git is beyond the scope of this document.
-Version `1`
------------
+=== Version `1`
This format is identical to version `0`, with the following exceptions:
@@ -74,21 +71,18 @@ it here, in order to claim the name.
The defined extensions are:
-`noop`
-~~~~~~
+==== `noop`
This extension does not change git's behavior at all. It is useful only
for testing format-1 compatibility.
-`preciousObjects`
-~~~~~~~~~~~~~~~~~
+==== `preciousObjects`
When the config key `extensions.preciousObjects` is set to `true`,
objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
`git repack -d`).
-`partialclone`
-~~~~~~~~~~~~~~
+==== `partialclone`
When the config key `extensions.partialclone` is set, it indicates
that the repo was created with a partial clone (or later performed
@@ -98,3 +92,11 @@ and it promises that all such omitted objects can be fetched from it
in the future.
The value of this key is the name of the promisor remote.
+
+==== `worktreeConfig`
+
+If set, by default "git config" reads from both "config" and
+"config.worktree" file from GIT_DIR in that order. In
+multiple working directory mode, "config" file is shared while
+"config.worktree" is per-working directory (i.e., it's in
+GIT_COMMON_DIR/worktrees/<id>/config.worktree)
diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..aa22d7ace8
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,186 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Different conflict styles and branch names are normalized by stripping
+the labels from the conflict markers, and removing the common ancestor
+version from the `diff3` conflict style. Branches that are merged
+in different order are normalized by sorting the conflict hunks. More
+on each of those steps in the following sections.
+
+Once these two normalization operations are applied, a conflict ID is
+calculated based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Removing the common ancestor version
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Say we have three branches AB, AC and AC2. The common ancestor of
+these branches has a file with a line containing the string "A" (for
+brevity this is called "line A" in the rest of the document). In
+branch AB this line is changed to "B", in AC, this line is changed to
+"C", and branch AC2 is forked off of AC, after the line was changed to
+"C".
+
+Forking a branch ABAC off of branch AB and then merging AC into it, we
+get a conflict like the following:
+
+ <<<<<<< HEAD
+ B
+ =======
+ C
+ >>>>>>> AC
+
+Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
+and then merging branch AC2 into it), using the diff3 conflict style,
+we get a conflict like the following:
+
+ <<<<<<< HEAD
+ B
+ ||||||| merged common ancestors
+ A
+ =======
+ C
+ >>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+ After examining what branches AB and AC did, I believe that making
+ line A into line D is the best thing to do that is compatible with
+ what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same. To do this, the labels on the conflict
+markers are stripped, and the common ancestor version is removed. The above
+examples would both result in the following normalized conflict:
+
+ <<<<<<<
+ B
+ =======
+ C
+ >>>>>>>
+
+Sorting hunks
+~~~~~~~~~~~~~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part. And then four branches
+are forked that do these things:
+
+ - AB: changes A to B
+ - AC: changes A to C
+ - XY: changes X to Y
+ - XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order. The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+ After examining what branches AB and AC did, I believe that
+ making line A into line D is the best thing to do that is
+ compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean. We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z". We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or C, a late part becames X or Y", and whenever
+any of these four patterns appear, and we can get to the same conflict
+and resolution that we saw earlier.
+
+Without the sorting, we'd have to somehow find a previous resolution
+from combinatorial explosion.
+
+Conflict ID calculation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the conflict normalization is done, the conflict ID is calculated
+as the sha1 hash of the conflict hunks appended to each other,
+separated by <NUL> characters. The conflict markers are stripped out
+before the sha1 is calculated. So in the example above, where we
+merge branch AC which changes line A to line C, into branch AB, which
+changes line A to line C, the conflict ID would be
+SHA1('B<NUL>C<NUL>').
+
+If there are multiple conflicts in one file, the sha1 is calculated
+the same way with all hunks appended to each other, in the order in
+which they appear in the file, separated by a <NUL> character.
+
+Nested conflicts
+~~~~~~~~~~~~~~~~
+
+Nested conflicts are handled very similarly to "simple" conflicts.
+Similar to simple conflicts, the conflict is first normalized by
+stripping the labels from conflict markers, stripping the common ancestor
+version, and the sorting the conflict hunks, both for the outer and the
+inner conflict. This is done recursively, so any number of nested
+conflicts can be handled.
+
+Note that this only works for conflict markers that "cleanly nest". If
+there are any unmatched conflict markers, rerere will fail to handle
+the conflict and record a conflict resolution.
+
+The only difference is in how the conflict ID is calculated. For the
+inner conflict, the conflict markers themselves are not stripped out
+before calculating the sha1.
+
+Say we have the following conflict for example:
+
+ <<<<<<< HEAD
+ 1
+ =======
+ <<<<<<< HEAD
+ 3
+ =======
+ 2
+ >>>>>>> branch-2
+ >>>>>>> branch-3~
+
+After stripping out the labels of the conflict markers, and sorting
+the hunks, the conflict would look as follows:
+
+ <<<<<<<
+ 1
+ =======
+ <<<<<<<
+ 2
+ =======
+ 3
+ >>>>>>>
+ >>>>>>>
+
+and finally the conflict ID would be calculated as:
+`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`