diff options
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/api-config.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/api-parse-options.txt | 4 | ||||
-rw-r--r-- | Documentation/technical/api-ref-iteration.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/api-trace2.txt | 211 | ||||
-rw-r--r-- | Documentation/technical/commit-graph-format.txt | 11 | ||||
-rw-r--r-- | Documentation/technical/commit-graph.txt | 210 | ||||
-rw-r--r-- | Documentation/technical/directory-rename-detection.txt | 4 | ||||
-rw-r--r-- | Documentation/technical/pack-protocol.txt | 8 | ||||
-rw-r--r-- | Documentation/technical/protocol-v2.txt | 52 |
9 files changed, 360 insertions, 144 deletions
diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt index fa39ac9d71..7d20716c32 100644 --- a/Documentation/technical/api-config.txt +++ b/Documentation/technical/api-config.txt @@ -229,7 +229,7 @@ A `config_set` can be used to construct an in-memory cache for config-like files that the caller specifies (i.e., files like `.gitmodules`, `~/.gitconfig` etc.). For example, ---------------------------------------- +---------------------------------------- struct config_set gm_config; git_configset_init(&gm_config); int b; diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt index 2b036d7838..2e2e7c10c6 100644 --- a/Documentation/technical/api-parse-options.txt +++ b/Documentation/technical/api-parse-options.txt @@ -198,8 +198,10 @@ There are some macros to easily define options: The filename will be prefixed by passing the filename along with the prefix argument of `parse_options()` to `prefix_filename()`. -`OPT_ARGUMENT(long, description)`:: +`OPT_ARGUMENT(long, &int_var, description)`:: Introduce a long-option argument that will be kept in `argv[]`. + If this option was seen, `int_var` will be set to one (except + if a `NULL` pointer was passed). `OPT_NUMBER_CALLBACK(&var, description, func_ptr)`:: Recognize numerical options like -123 and feed the integer as diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt index 46c3d5c355..ad9d019ff9 100644 --- a/Documentation/technical/api-ref-iteration.txt +++ b/Documentation/technical/api-ref-iteration.txt @@ -54,7 +54,7 @@ this: do not do this you will get an error for each ref that it does not point to a valid object. -Note: As a side-effect of this you can not safely assume that all +Note: As a side-effect of this you cannot safely assume that all objects you lookup are available in superproject. All submodule objects will be available the same way as the superprojects objects. diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt index 2de565fa3d..71eb081fed 100644 --- a/Documentation/technical/api-trace2.txt +++ b/Documentation/technical/api-trace2.txt @@ -22,21 +22,41 @@ Targets are defined using a VTable allowing easy extension to other formats in the future. This might be used to define a binary format, for example. +Trace2 is controlled using `trace2.*` config values in the system and +global config files and `GIT_TRACE2*` environment variables. Trace2 does +not read from repo local or worktree config files or respect `-c` +command line config settings. + == Trace2 Targets Trace2 defines the following set of Trace2 Targets. Format details are given in a later section. -`GIT_TR2` (NORMAL):: +=== The Normal Format Target + +The normal format target is a tradition printf format and similar +to GIT_TRACE format. This format is enabled with the `GIT_TRACE2` +environment variable or the `trace2.normalTarget` system or global +config setting. + +For example - a simple printf format like GIT_TRACE. -+ ------------ -$ export GIT_TR2=~/log.normal +$ export GIT_TRACE2=~/log.normal $ git version git version 2.20.1.155.g426c96fcdb ------------ -+ + +or + +------------ +$ git config --global trace2.normalTarget ~/log.normal +$ git version +git version 2.20.1.155.g426c96fcdb +------------ + +yields + ------------ $ cat ~/log.normal 12:28:42.620009 common-main.c:38 version 2.20.1.155.g426c96fcdb @@ -46,76 +66,86 @@ $ cat ~/log.normal 12:28:42.621250 trace2/tr2_tgt_normal.c:124 atexit elapsed:0.001265 code:0 ------------ -`GIT_TR2_PERF` (PERF):: +=== The Performance Format Target + +The performance format target (PERF) is a column-based format to +replace GIT_TRACE_PERFORMANCE and is suitable for development and +testing, possibly to complement tools like gprof. This format is +enabled with the `GIT_TRACE2_PERF` environment variable or the +`trace2.perfTarget` system or global config setting. + +For example - a column-based format to replace GIT_TRACE_PERFORMANCE suitable for - development and testing, possibly to complement tools like gprof. -+ ------------ -$ export GIT_TR2_PERF=~/log.perf +$ export GIT_TRACE2_PERF=~/log.perf $ git version git version 2.20.1.155.g426c96fcdb ------------ -+ + +or + +------------ +$ git config --global trace2.perfTarget ~/log.perf +$ git version +git version 2.20.1.155.g426c96fcdb +------------ + +yields + ------------ $ cat ~/log.perf 12:28:42.620675 common-main.c:38 | d0 | main | version | | | | | 2.20.1.155.g426c96fcdb -12:28:42.621001 common-main.c:39 | d0 | main | start | | | | | git version +12:28:42.621001 common-main.c:39 | d0 | main | start | | 0.001173 | | | git version 12:28:42.621111 git.c:432 | d0 | main | cmd_name | | | | | version (version) 12:28:42.621225 git.c:662 | d0 | main | exit | | 0.001227 | | | code:0 12:28:42.621259 trace2/tr2_tgt_perf.c:211 | d0 | main | atexit | | 0.001265 | | | code:0 ------------ -`GIT_TR2_EVENT` (EVENT):: +=== The Event Format Target + +The event format target is a JSON-based format of event data suitable +for telemetry analysis. This format is enabled with the `GIT_TRACE2_EVENT` +environment variable or the `trace2.eventTarget` system or global config +setting. + +For example - a JSON-based format of event data suitable for telemetry analysis. -+ ------------ -$ export GIT_TR2_EVENT=~/log.event +$ export GIT_TRACE2_EVENT=~/log.event $ git version git version 2.20.1.155.g426c96fcdb ------------ -+ ------------- -$ cat ~/log.event -{"event":"version","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.620713","file":"common-main.c","line":38,"evt":"1","exe":"2.20.1.155.g426c96fcdb"} -{"event":"start","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.621027","file":"common-main.c","line":39,"argv":["git","version"]} -{"event":"cmd_name","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.621122","file":"git.c","line":432,"name":"version","hierarchy":"version"} -{"event":"exit","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.621236","file":"git.c","line":662,"t_abs":0.001227,"code":0} -{"event":"atexit","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.621268","file":"trace2/tr2_tgt_event.c","line":163,"t_abs":0.001265,"code":0} ------------- -== Enabling a Target +or -A Trace2 Target is enabled when the corresponding environment variable -(`GIT_TR2`, `GIT_TR2_PERF`, or `GIT_TR2_EVENT`) is set. The following -values are recognized. - -`0`:: -`false`:: - - Disables the target. - -`1`:: -`true`:: - - Enables the target and writes stream to `STDERR`. +------------ +$ git config --global trace2.eventTarget ~/log.event +$ git version +git version 2.20.1.155.g426c96fcdb +------------ -`[2-9]`:: +yields - Enables the target and writes to the already opened file descriptor. +------------ +$ cat ~/log.event +{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"1","exe":"2.20.1.155.g426c96fcdb"} +{"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]} +{"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"} +{"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0} +{"event":"atexit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621268Z","file":"trace2/tr2_tgt_event.c","line":163,"t_abs":0.001265,"code":0} +------------ -`<absolute-pathname>`:: +=== Enabling a Target - Enables the target, opens and writes to the file in append mode. +To enable a target, set the corresponding environment variable or +system or global config value to one of the following: -`af_unix:[<socket_type>:]<absolute-pathname>`:: +include::../trace2-target-values.txt[] - Enables the target, opens and writes to a Unix Domain Socket - (on platforms that support them). -+ -Socket type can be either `stream` or `dgram`. If the socket type is -omitted, Git will try both. +If the target already exists and is a directory, the traces will be +written to files (one per process) underneath the given directory. They +will be named according to the last component of the SID (optionally +followed by a counter to avoid filename collisions). == Trace2 API @@ -160,17 +190,23 @@ purposes. These are concerned with the lifetime of the overall git process. +`void trace2_initialize_clock()`:: + + Initialize the Trace2 start clock and nothing else. This should + be called at the very top of main() to capture the process start + time and reduce startup order dependencies. + `void trace2_initialize()`:: Determines if any Trace2 Targets should be enabled and - initializes the Trace2 facility. This includes starting the - elapsed time clocks and thread local storage (TLS). + initializes the Trace2 facility. This includes setting up the + Trace2 thread local storage (TLS). + This function emits a "version" message containing the version of git and the Trace2 protocol. + This function should be called from `main()` as early as possible in -the life of the process. +the life of the process after essential process initialization. `int trace2_is_enabled()`:: @@ -237,15 +273,16 @@ significantly affects program performance or behavior, such as Emits a "def_param" messages for "important" configuration settings. + -The environment variable `GIT_TR2_CONFIG_PARAMS` can be set to a +The environment variable `GIT_TRACE2_CONFIG_PARAMS` or the `trace2.configParams` +config value can be set to a list of patterns of important configuration settings, for example: `core.*,remote.*.url`. This function will iterate over all config settings and emit a "def_param" message for each match. `void trace2_cmd_set_config(const char *key, const char *value)`:: - Emits a "def_param" message for a specific configuration - setting IFF it matches the `GIT_TR2_CONFIG_PARAMS` pattern. + Emits a "def_param" message for a new or updated key/value + pair IF `key` is considered important. + This is used to hook into `git_config_set()` and catch any configuration changes and update a value previously reported by @@ -412,9 +449,6 @@ recursive tree walk. === NORMAL Format -NORMAL format is enabled when the `GIT_TR2` environment variable is -set. - Events are written as lines of the form: ------------ @@ -431,8 +465,8 @@ Events are written as lines of the form: Note that this may contain embedded LF or CRLF characters that are not escaped, so the event may spill across multiple lines. -If `GIT_TR2_BRIEF` is true, the `time`, `filename`, and `line` fields -are omitted. +If `GIT_TRACE2_BRIEF` or `trace2.normalBrief` is true, the `time`, `filename`, +and `line` fields are omitted. This target is intended to be more of a summary (like GIT_TRACE) and less detailed than the other targets. It ignores thread, region, and @@ -440,9 +474,6 @@ data messages, for example. === PERF Format -PERF format is enabled when the `GIT_TR2_PERF` environment variable -is set. - Events are written as lines of the form: ------------ @@ -502,8 +533,8 @@ This field is in anticipation of in-proc submodules in the future. 15:33:33.532712 wt-status.c:2331 | d0 | main | region_leave | r1 | 0.127568 | 0.001504 | status | label:print ------------ -If `GIT_TR2_PERF_BRIEF` is true, the `time`, `file`, and `line` -fields are omitted. +If `GIT_TRACE2_PERF_BRIEF` or `trace2.perfBrief` is true, the `time`, `file`, +and `line` fields are omitted. ------------ d0 | main | region_leave | r1 | 0.011717 | 0.009122 | index | label:preload @@ -514,9 +545,6 @@ during development and is quite noisy. === EVENT Format -EVENT format is enabled when the `GIT_TR2_EVENT` environment -variable is set. - Each event is a JSON-object containing multiple key/value pairs written as a single line and followed by a LF. @@ -534,11 +562,11 @@ The following key/value pairs are common to all events: ------------ { "event":"version", - "sid":"1547659722619736-11614", + "sid":"20190408T191827.272759Z-H9b68c35f-P00003510", "thread":"main", - "time":"2019-01-16 17:28:42.620713", + "time":"2019-04-08T19:18:27.282761Z", "file":"common-main.c", - "line":38, + "line":42, ... } ------------ @@ -570,9 +598,9 @@ The following key/value pairs are common to all events: `"repo":<repo-id>`:: when present, is the integer repo-id as described previously. -If `GIT_TR2_EVENT_BRIEF` is true, the `file` and `line` fields are omitted -from all events and the `time` field is only present on the "start" and -"atexit" events. +If `GIT_TRACE2_EVENT_BRIEF` or `trace2.eventBrief` is true, the `file` +and `line` fields are omitted from all events and the `time` field is +only present on the "start" and "atexit" events. ==== Event-Specific Key/Value Pairs @@ -595,6 +623,7 @@ from all events and the `time` field is only present on the "start" and { "event":"start", ... + "t_abs":0.001227, # elapsed time in seconds "argv":["git","version"] } ------------ @@ -639,7 +668,7 @@ completed.) "event":"signal", ... "t_abs":0.001227, # elapsed time in seconds - "signal":13 # SIGTERM, SIGINT, etc. + "signo":13 # SIGTERM, SIGINT, etc. } ------------ @@ -882,7 +911,7 @@ visited. The `category` field may be used in a future enhancement to do category-based filtering. + -The `GIT_TR2_EVENT_NESTING` environment variable can be used to +`GIT_TRACE2_EVENT_NESTING` or `trace2.eventNesting` can be used to filter deeply nested regions and data events. It defaults to "2". `"region_leave"`:: @@ -1010,8 +1039,8 @@ rev-list, and gc. This example also shows that fetch took 5.199 seconds and of that 4.932 was in ssh. + ---------------- -$ export GIT_TR2_BRIEF=1 -$ export GIT_TR2=~/log.normal +$ export GIT_TRACE2_BRIEF=1 +$ export GIT_TRACE2=~/log.normal $ git fetch origin ... ---------------- @@ -1046,8 +1075,8 @@ its name as "gc", it also reports the hierarchy as "fetch/gc". indented for clarity.) + ---------------- -$ export GIT_TR2_BRIEF=1 -$ export GIT_TR2=~/log.normal +$ export GIT_TRACE2_BRIEF=1 +$ export GIT_TRACE2=~/log.normal $ git fetch origin ... ---------------- @@ -1105,14 +1134,14 @@ In this example, scanning for untracked files ran from +0.012568 to +0.027149 (since the process started) and took 0.014581 seconds. + ---------------- -$ export GIT_TR2_PERF_BRIEF=1 -$ export GIT_TR2_PERF=~/log.perf +$ export GIT_TRACE2_PERF_BRIEF=1 +$ export GIT_TRACE2_PERF=~/log.perf $ git status ... $ cat ~/log.perf d0 | main | version | | | | | 2.20.1.160.g5676107ecd.dirty -d0 | main | start | | | | | git status +d0 | main | start | | 0.001173 | | | git status d0 | main | def_repo | r1 | | | | worktree:/Users/jeffhost/work/gfw d0 | main | cmd_name | | | | | status (status) ... @@ -1151,13 +1180,13 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, We can further investigate the time spent scanning for untracked files. + ---------------- -$ export GIT_TR2_PERF_BRIEF=1 -$ export GIT_TR2_PERF=~/log.perf +$ export GIT_TRACE2_PERF_BRIEF=1 +$ export GIT_TRACE2_PERF=~/log.perf $ git status ... $ cat ~/log.perf d0 | main | version | | | | | 2.20.1.162.gb4ccea44db.dirty -d0 | main | start | | | | | git status +d0 | main | start | | 0.001173 | | | git status d0 | main | def_repo | r1 | | | | worktree:/Users/jeffhost/work/gfw d0 | main | cmd_name | | | | | status (status) ... @@ -1207,13 +1236,13 @@ int read_index_from(struct index_state *istate, const char *path, This example shows that the index contained 3552 entries. + ---------------- -$ export GIT_TR2_PERF_BRIEF=1 -$ export GIT_TR2_PERF=~/log.perf +$ export GIT_TRACE2_PERF_BRIEF=1 +$ export GIT_TRACE2_PERF=~/log.perf $ git status ... $ cat ~/log.perf d0 | main | version | | | | | 2.20.1.156.gf9916ae094.dirty -d0 | main | start | | | | | git status +d0 | main | start | | 0.001173 | | | git status d0 | main | def_repo | r1 | | | | worktree:/Users/jeffhost/work/gfw d0 | main | cmd_name | | | | | status (status) d0 | main | region_enter | r1 | 0.001791 | | index | label:do_read_index .git/index @@ -1281,8 +1310,8 @@ Data events are tagged with the active thread name. They are used to report the per-thread parameters. + ---------------- -$ export GIT_TR2_PERF_BRIEF=1 -$ export GIT_TR2_PERF=~/log.perf +$ export GIT_TRACE2_PERF_BRIEF=1 +$ export GIT_TRACE2_PERF=~/log.perf $ git status ... $ cat ~/log.perf diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 16452a0504..a4f17441ae 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -44,8 +44,9 @@ HEADER: 1-byte number (C) of "chunks" - 1-byte (reserved for later use) - Current clients should ignore this value. + 1-byte number (B) of base commit-graphs + We infer the length (H*B) of the Base Graphs chunk + from this value. CHUNK LOOKUP: @@ -92,6 +93,12 @@ CHUNK DATA: positions for the parents until reaching a value with the most-significant bit on. The other bits correspond to the position of the last parent. + Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] + This list of H-byte hashes describe a set of B commit-graph files that + form a commit-graph chain. The graph position for the ith commit in this + file's OID Lookup chunk is equal to i plus the number of commits in all + base graphs. If B is non-zero, this chunk must exist. + TRAILER: H-byte HASH-checksum of all of the above. diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index 7805b0968c..729fbcb32f 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -127,22 +127,196 @@ Design Details helpful for these clones, anyway. The commit-graph will not be read or written when shallow commits are present. -Future Work ------------ - -- After computing and storing generation numbers, we must make graph - walks aware of generation numbers to gain the performance benefits they - enable. This will mostly be accomplished by swapping a commit-date-ordered - priority queue with one ordered by generation number. The following - operations are important candidates: - - - 'log --topo-order' - - 'tag --merged' - -- A server could provide a commit-graph file as part of the network protocol - to avoid extra calculations by clients. This feature is only of benefit if - the user is willing to trust the file, because verifying the file is correct - is as hard as computing it from scratch. +Commit Graphs Chains +-------------------- + +Typically, repos grow with near-constant velocity (commits per day). Over time, +the number of commits added by a fetch operation is much smaller than the +number of commits in the full history. By creating a "chain" of commit-graphs, +we enable fast writes of new commit data without rewriting the entire commit +history -- at least, most of the time. + +## File Layout + +A commit-graph chain uses multiple files, and we use a fixed naming convention +to organize these files. Each commit-graph file has a name +`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex- +valued hash stored in the footer of that file (which is a hash of the file's +contents before that hash). For a chain of commit-graph files, a plain-text +file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the +hashes for the files in order from "lowest" to "highest". + +For example, if the `commit-graph-chain` file contains the lines + +``` + {hash0} + {hash1} + {hash2} +``` + +then the commit-graph chain looks like the following diagram: + + +-----------------------+ + | graph-{hash2}.graph | + +-----------------------+ + | + +-----------------------+ + | | + | graph-{hash1}.graph | + | | + +-----------------------+ + | + +-----------------------+ + | | + | | + | | + | graph-{hash0}.graph | + | | + | | + | | + +-----------------------+ + +Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of +commits in `graph-{hash1}.graph`, and X2 be the number of commits in +`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`, +then we interpret this as being the commit in position (X0 + X1 + i), and that +will be used as its "graph position". The commits in `graph-{hash2}.graph` use these +positions to refer to their parents, which may be in `graph-{hash1}.graph` or +`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking +its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 + +X2). + +Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data +specifying the hashes of all files in the lower layers. In the above example, +`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains +`{hash0}` and `{hash1}`. + +## Merging commit-graph files + +If we only added a new commit-graph file on every write, we would run into a +linear search problem through many commit-graph files. Instead, we use a merge +strategy to decide when the stack should collapse some number of levels. + +The diagram below shows such a collapse. As a set of new commits are added, it +is determined by the merge strategy that the files should collapse to +`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and +the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}` +file. + + +---------------------+ + | | + | (new commits) | + | | + +---------------------+ + | | + +-----------------------+ +---------------------+ + | graph-{hash2} |->| | + +-----------------------+ +---------------------+ + | | | + +-----------------------+ +---------------------+ + | | | | + | graph-{hash1} |->| | + | | | | + +-----------------------+ +---------------------+ + | tmp_graphXXX + +-----------------------+ + | | + | | + | | + | graph-{hash0} | + | | + | | + | | + +-----------------------+ + +During this process, the commits to write are combined, sorted and we write the +contents to a temporary file, all while holding a `commit-graph-chain.lock` +lock-file. When the file is flushed, we rename it to `graph-{hash3}` +according to the computed `{hash3}`. Finally, we write the new chain data to +`commit-graph-chain.lock`: + +``` + {hash3} + {hash0} +``` + +We then close the lock-file. + +## Merge Strategy + +When writing a set of commits that do not exist in the commit-graph stack of +height N, we default to creating a new file at level N + 1. We then decide to +merge with the Nth level if one of two conditions hold: + + 1. `--size-multiple=<X>` is specified or X = 2, and the number of commits in + level N is less than X times the number of commits in level N + 1. + + 2. `--max-commits=<C>` is specified with non-zero C and the number of commits + in level N + 1 is more than C commits. + +This decision cascades down the levels: when we merge a level we create a new +set of commits that then compares to the next level. + +The first condition bounds the number of levels to be logarithmic in the total +number of commits. The second condition bounds the total number of commits in +a `graph-{hashN}` file and not in the `commit-graph` file, preventing +significant performance issues when the stack merges and another process only +partially reads the previous stack. + +The merge strategy values (2 for the size multiple, 64,000 for the maximum +number of commits) could be extracted into config settings for full +flexibility. + +## Deleting graph-{hash} files + +After a new tip file is written, some `graph-{hash}` files may no longer +be part of a chain. It is important to remove these files from disk, eventually. +The main reason to delay removal is that another process could read the +`commit-graph-chain` file before it is rewritten, but then look for the +`graph-{hash}` files after they are deleted. + +To allow holding old split commit-graphs for a while after they are unreferenced, +we update the modified times of the files when they become unreferenced. Then, +we scan the `$OBJDIR/info/commit-graphs/` directory for `graph-{hash}` +files whose modified times are older than a given expiry window. This window +defaults to zero, but can be changed using command-line arguments or a config +setting. + +## Chains across multiple object directories + +In a repo with alternates, we look for the `commit-graph-chain` file starting +in the local object directory and then in each alternate. The first file that +exists defines our chain. As we look for the `graph-{hash}` files for +each `{hash}` in the chain file, we follow the same pattern for the host +directories. + +This allows commit-graphs to be split across multiple forks in a fork network. +The typical case is a large "base" repo with many smaller forks. + +As the base repo advances, it will likely update and merge its commit-graph +chain more frequently than the forks. If a fork updates their commit-graph after +the base repo, then it should "reparent" the commit-graph chain onto the new +chain in the base repo. When reading each `graph-{hash}` file, we track +the object directory containing it. During a write of a new commit-graph file, +we check for any changes in the source object directory and read the +`commit-graph-chain` file for that source and create a new file based on those +files. During this "reparent" operation, we necessarily need to collapse all +levels in the fork, as all of the files are invalid against the new base file. + +It is crucial to be careful when cleaning up "unreferenced" `graph-{hash}.graph` +files in this scenario. It falls to the user to define the proper settings for +their custom environment: + + 1. When merging levels in the base repo, the unreferenced files may still be + referenced by chains from fork repos. + + 2. The expiry time should be set to a length of time such that every fork has + time to recompute their commit-graph chain to "reparent" onto the new base + file(s). + + 3. If the commit-graph chain is updated in the base, the fork will not have + access to the new chain until its chain is updated to reference those files. + (This may change in the future [5].) Related Links ------------- @@ -170,3 +344,7 @@ Related Links [4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u A patch to remove the ahead-behind calculation from 'status'. + +[5] https://public-inbox.org/git/f27db281-abad-5043-6d71-cbb083b1c877@gmail.com/ + A discussion of a "two-dimensional graph position" that can allow reading + multiple commit-graph chains at the same time. diff --git a/Documentation/technical/directory-rename-detection.txt b/Documentation/technical/directory-rename-detection.txt index 1c0086e287..844629c8c4 100644 --- a/Documentation/technical/directory-rename-detection.txt +++ b/Documentation/technical/directory-rename-detection.txt @@ -20,8 +20,8 @@ More interesting possibilities exist, though, such as: * one side of history renames x -> z, and the other renames some file to x/e, causing the need for the merge to do a transitive rename. - * one side of history renames x -> z, but also renames all files within - x. For example, x/a -> z/alpha, x/b -> z/bravo, etc. + * one side of history renames x -> z, but also renames all files within x. + For example, x/a -> z/alpha, x/b -> z/bravo, etc. * both 'x' and 'y' being merged into a single directory 'z', with a directory rename being detected for both x->z and y->z. diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index 7a2375a55d..c73e72de0e 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -657,14 +657,14 @@ can be rejected. An example client/server communication might look like this: ---- - S: 007c74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n + S: 006274730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n - S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n + S: 003d74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n S: 0000 - C: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n - C: 003e74730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n + C: 00677d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n + C: 006874730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n C: 0000 C: [PACKDATA] diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index ead85ce35c..03264c7d9a 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -1,5 +1,5 @@ - Git Wire Protocol, Version 2 -============================== +Git Wire Protocol, Version 2 +============================ This document presents a specification for a version 2 of Git's wire protocol. Protocol v2 will improve upon v1 in the following ways: @@ -22,8 +22,8 @@ will be commands which a client can request be executed. Once a command has completed, a client can reuse the connection and request that other commands be executed. - Packet-Line Framing ---------------------- +Packet-Line Framing +------------------- All communication is done using packet-line framing, just as in v1. See `Documentation/technical/pack-protocol.txt` and @@ -34,8 +34,8 @@ In protocol v2 these special packets will have the following semantics: * '0000' Flush Packet (flush-pkt) - indicates the end of a message * '0001' Delimiter Packet (delim-pkt) - separates sections of a message - Initial Client Request ------------------------- +Initial Client Request +---------------------- In general a client can request to speak protocol v2 by sending `version=2` through the respective side-channel for the transport being @@ -43,22 +43,22 @@ used which inevitably sets `GIT_PROTOCOL`. More information can be found in `pack-protocol.txt` and `http-protocol.txt`. In all cases the response from the server is the capability advertisement. - Git Transport -~~~~~~~~~~~~~~~ +Git Transport +~~~~~~~~~~~~~ When using the git:// transport, you can request to use protocol v2 by sending "version=2" as an extra parameter: 003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0 - SSH and File Transport -~~~~~~~~~~~~~~~~~~~~~~~~ +SSH and File Transport +~~~~~~~~~~~~~~~~~~~~~~ When using either the ssh:// or file:// transport, the GIT_PROTOCOL environment variable must be set explicitly to include "version=2". - HTTP Transport -~~~~~~~~~~~~~~~~ +HTTP Transport +~~~~~~~~~~~~~~ When using the http:// or https:// transport a client makes a "smart" info/refs request as described in `http-protocol.txt` and requests that @@ -79,8 +79,8 @@ A v2 server would reply: Subsequent requests are then made directly to the service `$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack). - Capability Advertisement --------------------------- +Capability Advertisement +------------------------ A server which decides to communicate (based on a request from a client) using protocol version 2, notifies the client by sending a version string @@ -101,8 +101,8 @@ to be executed by the client. key = 1*(ALPHA | DIGIT | "-_") value = 1*(ALPHA | DIGIT | " -_.,?\/{}[]()<>!@#$%^&*+=:;") - Command Request ------------------ +Command Request +--------------- After receiving the capability advertisement, a client can then issue a request to select the command it wants with any particular capabilities @@ -137,8 +137,8 @@ command be executed or can terminate the connection. A client may optionally send an empty request consisting of just a flush-pkt to indicate that no more requests will be made. - Capabilities --------------- +Capabilities +------------ There are two different types of capabilities: normal capabilities, which can be used to to convey information or alter the behavior of a @@ -153,8 +153,8 @@ management on the server side in order to function correctly. This permits simple round-robin load-balancing on the server side, without needing to worry about state management. - agent -~~~~~~~ +agent +~~~~~ The server can advertise the `agent` capability with a value `X` (in the form `agent=X`) to notify the client that the server is running version @@ -168,8 +168,8 @@ printable ASCII characters except space (i.e., the byte range 32 < x < and debugging purposes, and MUST NOT be used to programmatically assume the presence or absence of particular features. - ls-refs -~~~~~~~~~ +ls-refs +~~~~~~~ `ls-refs` is the command used to request a reference advertisement in v2. Unlike the current reference advertisement, ls-refs takes in arguments @@ -199,8 +199,8 @@ The output of ls-refs is as follows: symref = "symref-target:" symref-target peeled = "peeled:" obj-id - fetch -~~~~~~~ +fetch +~~~~~ `fetch` is the command used to fetch a packfile in v2. It can be looked at as a modified version of the v1 fetch where the ref-advertisement is @@ -444,8 +444,8 @@ header. 2 - progress messages 3 - fatal error message just before stream aborts - server-option -~~~~~~~~~~~~~~~ +server-option +~~~~~~~~~~~~~ If advertised, indicates that any number of server specific options can be included in a request. This is done by sending each option as a |