summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLibravatar Jiang Xin <worldhello.net@gmail.com>2021-03-04 22:40:13 +0800
committerLibravatar Jiang Xin <worldhello.net@gmail.com>2021-03-04 22:40:13 +0800
commit4dd8469336cc06923acceb3ea4ae02ea00461c5d (patch)
tree92aa9b355242c8904510bb9b61ddfa94b4d1ee47 /Documentation
parentl10n: tr: v2.31.0-rc0 (diff)
parentGit 2.31-rc1 (diff)
downloadtgif-4dd8469336cc06923acceb3ea4ae02ea00461c5d.tar.xz
Merge branch 'master' of github.com:git/git
* 'master' of github.com:git/git: (63 commits) Git 2.31-rc1 Hopefully the last batch before -rc1 Revert "commit-graph: when incompatible with graphs, indicate why" read-cache: make the index write buffer size 128K dir: fix malloc of root untracked_cache_dir commit-graph.c: display correct number of chunks when writing doc/reftable: document how to handle windows fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RelNotes/2.31.0.txt31
-rw-r--r--Documentation/git-for-each-ref.txt8
-rw-r--r--Documentation/git-http-fetch.txt10
-rw-r--r--Documentation/git-index-pack.txt7
-rw-r--r--Documentation/gitdiffcore.txt20
-rw-r--r--Documentation/technical/chunk-format.txt116
-rw-r--r--Documentation/technical/commit-graph-format.txt3
-rw-r--r--Documentation/technical/pack-format.txt3
-rw-r--r--Documentation/technical/reftable.txt42
9 files changed, 216 insertions, 24 deletions
diff --git a/Documentation/RelNotes/2.31.0.txt b/Documentation/RelNotes/2.31.0.txt
index ef8b0d158e..04bd5b70a9 100644
--- a/Documentation/RelNotes/2.31.0.txt
+++ b/Documentation/RelNotes/2.31.0.txt
@@ -197,6 +197,31 @@ Performance, Internal Implementation, Development Support etc.
* The code to implement "git merge-base --independent" was poorly
done and was kept from the very beginning of the feature.
+ * Preliminary changes to fsmonitor integration.
+
+ * Performance optimization work on the rename detection continues.
+
+ * The common code to deal with "chunked file format" that is shared
+ by the multi-pack-index and commit-graph files have been factored
+ out, to help codepaths for both filetypes to become more robust.
+
+ * The approach to "fsck" the incoming objects in "index-pack" is
+ attractive for performance reasons (we have them already in core,
+ inflated and ready to be inspected), but fundamentally cannot be
+ applied fully when we receive more than one pack stream, as a tree
+ object in one pack may refer to a blob object in another pack as
+ ".gitmodules", when we want to inspect blobs that are used as
+ ".gitmodules" file, for example. Teach "index-pack" to emit
+ objects that must be inspected later and check them in the calling
+ "fetch-pack" process.
+
+ * The logic to handle "trailer" related placeholders in the
+ "--format=" mechanisms in the "log" family and "for-each-ref"
+ family is getting unified.
+
+ * Raise the buffer size used when writing the index file out from
+ (obviously too small) 8kB to (clearly sufficiently large) 128kB.
+
Fixes since v2.30
-----------------
@@ -318,6 +343,12 @@ Fixes since v2.30
corrected.
(merge 20e416409f jc/push-delete-nothing later to maint).
+ * Test script modernization.
+ (merge 488acf15df sv/t7001-modernize later to maint).
+
+ * An under-allocation for the untracked cache data has been corrected.
+ (merge 6347d649bc jh/untracked-cache-fix later to maint).
+
* Other code cleanup, docfix, build fix, etc.
(merge e3f5da7e60 sg/t7800-difftool-robustify later to maint).
(merge 9d336655ba js/doc-proto-v2-response-end later to maint).
diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 2962f85a50..2ae2478de7 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -260,11 +260,9 @@ contents:lines=N::
The first `N` lines of the message.
Additionally, the trailers as interpreted by linkgit:git-interpret-trailers[1]
-are obtained as `trailers` (or by using the historical alias
-`contents:trailers`). Non-trailer lines from the trailer block can be omitted
-with `trailers:only`. Whitespace-continuations can be removed from trailers so
-that each trailer appears on a line by itself with its full content with
-`trailers:unfold`. Both can be used together as `trailers:unfold,only`.
+are obtained as `trailers[:options]` (or by using the historical alias
+`contents:trailers[:options]`). For valid [:option] values see `trailers`
+section of linkgit:git-log[1].
For sorting purposes, fields with numeric values sort in numeric order
(`objectsize`, `authordate`, `committerdate`, `creatordate`, `taggerdate`).
diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
index 4deb4893f5..9fa17b60e4 100644
--- a/Documentation/git-http-fetch.txt
+++ b/Documentation/git-http-fetch.txt
@@ -41,11 +41,17 @@ commit-id::
<commit-id>['\t'<filename-as-in--w>]
--packfile=<hash>::
- Instead of a commit id on the command line (which is not expected in
+ For internal use only. Instead of a commit id on the command
+ line (which is not expected in
this case), 'git http-fetch' fetches the packfile directly at the given
URL and uses index-pack to generate corresponding .idx and .keep files.
The hash is used to determine the name of the temporary file and is
- arbitrary. The output of index-pack is printed to stdout.
+ arbitrary. The output of index-pack is printed to stdout. Requires
+ --index-pack-args.
+
+--index-pack-args=<args>::
+ For internal use only. The command to run on the contents of the
+ downloaded pack. Arguments are URL-encoded separated by spaces.
--recover::
Verify that everything reachable from target is fetched. Used after
diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 69ba904d44..7fa74b9e79 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -86,7 +86,12 @@ OPTIONS
Die if the pack contains broken links. For internal use only.
--fsck-objects::
- Die if the pack contains broken objects. For internal use only.
+ For internal use only.
++
+Die if the pack contains broken objects. If the pack contains a tree
+pointing to a .gitmodules blob that does not exist, prints the hash of
+that blob (for the caller to check) after the hash that goes into the
+name of the pack/idx file (see "Notes").
--threads=<n>::
Specifies the number of threads to spawn when resolving
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 2bd1220477..1c7269655f 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -169,6 +169,26 @@ a similarity score different from the default of 50% by giving a
number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
8/10 = 80%).
+Note that when rename detection is on but both copy and break
+detection are off, rename detection adds a preliminary step that first
+checks if files are moved across directories while keeping their
+filename the same. If there is a file added to a directory whose
+contents is sufficiently similar to a file with the same name that got
+deleted from a different directory, it will mark them as renames and
+exclude them from the later quadratic step (the one that pairwise
+compares all unmatched files to find the "best" matches, determined by
+the highest content similarity). So, for example, if a deleted
+docs/ext.txt and an added docs/config/ext.txt are similar enough, they
+will be marked as a rename and prevent an added docs/ext.md that may
+be even more similar to the deleted docs/ext.txt from being considered
+as the rename destination in the later step. For this reason, the
+preliminary "match same filename" step uses a bit higher threshold to
+mark a file pair as a rename and stop considering other candidates for
+better matches. At most, one comparison is done per file in this
+preliminary pass; so if there are several remaining ext.txt files
+throughout the directory hierarchy after exact rename detection, this
+preliminary step will be skipped for those files.
+
Note. When the "-C" option is used with `--find-copies-harder`
option, 'git diff-{asterisk}' commands feed unmodified filepairs to
diffcore mechanism as well as modified ones. This lets the copy
diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt
new file mode 100644
index 0000000000..593614fced
--- /dev/null
+++ b/Documentation/technical/chunk-format.txt
@@ -0,0 +1,116 @@
+Chunk-based file formats
+========================
+
+Some file formats in Git use a common concept of "chunks" to describe
+sections of the file. This allows structured access to a large file by
+scanning a small "table of contents" for the remaining data. This common
+format is used by the `commit-graph` and `multi-pack-index` files. See
+link:technical/pack-format.html[the `multi-pack-index` format] and
+link:technical/commit-graph-format.html[the `commit-graph` format] for
+how they use the chunks to describe structured data.
+
+A chunk-based file format begins with some header information custom to
+that format. That header should include enough information to identify
+the file type, format version, and number of chunks in the file. From this
+information, that file can determine the start of the chunk-based region.
+
+The chunk-based region starts with a table of contents describing where
+each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
+where C is the number of chunks. Consider the following table:
+
+ | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
+ |--------------------|------------------------|
+ | ID[0] | OFFSET[0] |
+ | ... | ... |
+ | ID[C] | OFFSET[C] |
+ | 0x0000 | OFFSET[C+1] |
+
+Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
+Each integer is stored in network-byte order.
+
+The chunk identifier `ID[i]` is a label for the data stored within this
+fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
+size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
+and `OFFSET[i]`. This requires that the chunk data appears contiguously
+in the same order as the table of contents.
+
+The final entry in the table of contents must be four zero bytes. This
+confirms that the table of contents is ending and provides the offset for
+the end of the chunk-based data.
+
+Note: The chunk-based format expects that the file contains _at least_ a
+trailing hash after `OFFSET[C+1]`.
+
+Functions for working with chunk-based file formats are declared in
+`chunk-format.h`. Using these methods provide extra checks that assist
+developers when creating new file formats.
+
+Writing chunk-based file formats
+--------------------------------
+
+To write a chunk-based file format, create a `struct chunkfile` by
+calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
+caller is responsible for opening the `hashfile` and writing header
+information so the file format is identifiable before the chunk-based
+format begins.
+
+Then, call `add_chunk()` for each chunk that is intended for write. This
+populates the `chunkfile` with information about the order and size of
+each chunk to write. Provide a `chunk_write_fn` function pointer to
+perform the write of the chunk data upon request.
+
+Call `write_chunkfile()` to write the table of contents to the `hashfile`
+followed by each of the chunks. This will verify that each chunk wrote
+the expected amount of data so the table of contents is correct.
+
+Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
+caller is responsible for finalizing the `hashfile` by writing the trailing
+hash and closing the file.
+
+Reading chunk-based file formats
+--------------------------------
+
+To read a chunk-based file format, the file must be opened as a
+memory-mapped region. The chunk-format API expects that the entire file
+is mapped as a contiguous memory region.
+
+Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
+
+After reading the header information from the beginning of the file,
+including the chunk count, call `read_table_of_contents()` to populate
+the `struct chunkfile` with the list of chunks, their offsets, and their
+sizes.
+
+Extract the data information for each chunk using `pair_chunk()` or
+`read_chunk()`:
+
+* `pair_chunk()` assigns a given pointer with the location inside the
+ memory-mapped file corresponding to that chunk's offset. If the chunk
+ does not exist, then the pointer is not modified.
+
+* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
+ with the appropriate initial pointer and size information. The function
+ is not called if the chunk does not exist. Use this method to read chunks
+ if you need to perform immediate parsing or if you need to execute logic
+ based on the size of the chunk.
+
+After calling these methods, call `free_chunkfile()` to clear the
+`struct chunkfile` data. This will not close the memory-mapped region.
+Callers are expected to own that data for the timeframe the pointers into
+the region are needed.
+
+Examples
+--------
+
+These file formats use the chunk-format API, and can be used as examples
+for future formats:
+
+* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
+ in `commit-graph.c` for how the chunk-format API is used to write and
+ parse the commit-graph file format documented in
+ link:technical/commit-graph-format.html[the commit-graph file format].
+
+* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
+ in `midx.c` for how the chunk-format API is used to write and
+ parse the multi-pack-index file format documented in
+ link:technical/pack-format.html[the multi-pack-index file format].
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index b6658eff18..87971c27dd 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -61,6 +61,9 @@ CHUNK LOOKUP:
the length using the next chunk position if necessary.) Each chunk
ID appears at most once.
+ The CHUNK LOOKUP matches the table of contents from
+ link:technical/chunk-format.html[the chunk-based file format].
+
The remaining data in the body is described one chunk at a time, and
these chunks may be given in any order. Chunks are required unless
otherwise specified.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 8833b71c8b..1faa949bf6 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -336,6 +336,9 @@ CHUNK LOOKUP:
(Chunks are provided in file-order, so you can infer the length
using the next chunk position if necessary.)
+ The CHUNK LOOKUP matches the table of contents from
+ link:technical/chunk-format.html[the chunk-based file format].
+
The remaining data in the body is described one chunk at a time, and
these chunks may be given in any order. Chunks are required unless
otherwise specified.
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 8095ab2590..3ef169af27 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -872,17 +872,11 @@ A repository must set its `$GIT_DIR/config` to configure reftable:
Layout
^^^^^^
-A collection of reftable files are stored in the `$GIT_DIR/reftable/`
-directory:
-
-....
-00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref
-....
-
-where reftable files are named by a unique name such as produced by the
-function `${min_update_index}-${max_update_index}.ref`.
+A collection of reftable files are stored in the `$GIT_DIR/reftable/` directory.
+Their names should have a random element, such that each filename is globally
+unique; this helps avoid spurious failures on Windows, where open files cannot
+be removed or overwritten. It suggested to use
+`${min_update_index}-${max_update_index}-${random}.ref` as a naming convention.
Log-only files use the `.log` extension, while ref-only and mixed ref
and log files use `.ref`. extension.
@@ -893,9 +887,9 @@ current files, one per line, in order, from oldest (base) to newest
....
$ cat .git/reftable/tables.list
-00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref
+00000001-00000001-RANDOM1.log
+00000002-00000002-RANDOM2.ref
+00000003-00000003-RANDOM3.ref
....
Readers must read `$GIT_DIR/reftable/tables.list` to determine which
@@ -940,7 +934,7 @@ new reftable and atomically appending it to the stack:
3. Select `update_index` to be most recent file's
`max_update_index + 1`.
4. Prepare temp reftable `tmp_XXXXXX`, including log entries.
-5. Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+5. Rename `tmp_XXXXXX` to `${update_index}-${update_index}-${random}.ref`.
6. Copy `tables.list` to `tables.list.lock`, appending file from (5).
7. Rename `tables.list.lock` to `tables.list`.
@@ -993,7 +987,7 @@ prevents other processes from trying to compact these files.
should always be the case, assuming that other processes are adhering to
the locking protocol.
7. Rename `${min_update_index}-${max_update_index}_XXXXXX` to
-`${min_update_index}-${max_update_index}.ref`.
+`${min_update_index}-${max_update_index}-${random}.ref`.
8. Write the new stack to `tables.list.lock`, replacing `B` and `C`
with the file from (4).
9. Rename `tables.list.lock` to `tables.list`.
@@ -1005,6 +999,22 @@ This strategy permits compactions to proceed independently of updates.
Each reftable (compacted or not) is uniquely identified by its name, so
open reftables can be cached by their name.
+Windows
+^^^^^^^
+
+On windows, and other systems that do not allow deleting or renaming to open
+files, compaction may succeed, but other readers may prevent obsolete tables
+from being deleted.
+
+On these platforms, the following strategy can be followed: on closing a
+reftable stack, reload `tables.list`, and delete any tables no longer mentioned
+in `tables.list`.
+
+Irregular program exit may still leave about unused files. In this case, a
+cleanup operation can read `tables.list`, note its modification timestamp, and
+delete any unreferenced `*.ref` files that are older.
+
+
Alternatives considered
~~~~~~~~~~~~~~~~~~~~~~~