summaryrefslogtreecommitdiff
path: root/t/t5318-commit-graph.sh
AgeCommit message (Collapse)AuthorFilesLines
2022-03-01commit-graph: start parsing generation v2 (again)Libravatar Derrick Stolee1-1/+1
The 'read_generation_data' member of 'struct commit_graph' was introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire chain does, 2021-01-16). The intention was to avoid using corrected commit dates if not all layers of a commit-graph had that data stored. The logic in validate_mixed_generation_chain() at that point incorrectly initialized read_generation_data to 1 if and only if the tip commit-graph contained the Corrected Commit Date chunk. This was "fixed" in 448a39e65 (commit-graph: validate layers for generation data, 2021-02-02) to validate that read_generation_data was either non-zero for all layers, or it would set read_generation_data to zero for all layers. The problem here is that read_generation_data is not initialized to be non-zero anywhere! This change initializes read_generation_data immediately after the chunk is parsed, so each layer will have its value present as soon as possible. The read_generation_data member is used in fill_commit_graph_info() to determine if we should use the corrected commit date or the topological levels stored in the Commit Data chunk. Due to this bug, all previous versions of Git were defaulting to topological levels in all cases! This can be measured with some performance tests. Using the Linux kernel as a testbed, I generated a complete commit-graph containing corrected commit dates and tested the 'new' version against the previous, 'old' version. First, rev-list with --topo-order demonstrates a 26% improvement using corrected commit dates: hyperfine \ -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \ -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 57.1 ms ± 3.1 ms Range (min … max): 52.9 ms … 62.0 ms 55 runs Benchmark 2: new Time (mean ± σ): 45.5 ms ± 3.3 ms Range (min … max): 39.9 ms … 51.7 ms 59 runs Summary 'new' ran 1.26 ± 0.11 times faster than 'old' These performance improvements are due to the algorithmic improvements given by walking fewer commits due to the higher cutoffs from corrected commit dates. However, this comes at a cost. The additional I/O cost of parsing the corrected commit dates is visible in case of merge-base commands that do not reduce the overall number of walked commits. hyperfine \ -n "old" "$OLD_GIT merge-base v4.8 v4.9" \ -n "new" "$NEW_GIT merge-base v4.8 v4.9" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 110.4 ms ± 6.4 ms Range (min … max): 96.0 ms … 118.3 ms 25 runs Benchmark 2: new Time (mean ± σ): 150.7 ms ± 1.1 ms Range (min … max): 149.3 ms … 153.4 ms 19 runs Summary 'old' ran 1.36 ± 0.08 times faster than 'new' Performance issues like this are what motivated 702110aac (commit-graph: use config to specify generation type, 2021-02-25). In the future, we could fix this performance problem by inserting the corrected commit date offsets into the Commit Date chunk instead of having that data in an extra chunk. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01commit-graph: fix ordering bug in generation numbersLibravatar Derrick Stolee1-2/+2
When computing the generation numbers for a commit-graph, we compute the corrected commit dates and then check if their offsets from the actual dates is too large to fit in the 32-bit Generation Data chunk. However, there is a problem with this approach: if we have parsed the generation data from the previous commit-graph, then we continue the loop because the corrected commit date is already computed. This causes an under-count in the number of overflow values. It is incorrect to add an increment to num_generation_data_overflows next to this 'continue' statement, because we might start double-counting commits that are computed because of the depth-first search walk from a commit with an earlier OID. Instead, iterate over the full commit list at the end, checking the offsets to see how many grow beyond the maximum value. Create a new t5328-commit-graph-64-bit-time.sh test script to handle special cases of testing 64-bit timestamps. This helps demonstrate this bug in more cases. It still won't hit all potential cases until the next change, which reenables reading generation numbers. Use the skip_all trick from 0a2bfccb9c8 (t0051: use "skip_all" under !MINGW in single-test file, 2022-02-04) to make the output clean when run on a 32-bit system. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01t5318: extract helpers to lib-commit-graph.shLibravatar Derrick Stolee1-48/+2
The graph_git_behavior helper is useful for testing that certain Git commands behave the same when using the commit-graph and when not using the commit-graph. Extract it to a new lib-commit-graph.sh file for use in new test scripts that will split out from t5318. While doing this extraction, also extract graph_read_expect and the logic for priming the test_oid_cache. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01test-read-graph: include extra post-parse infoLibravatar Derrick Stolee1-0/+1
It can be helpful to verify that the 'struct commit_graph' that results from parsing a commit-graph is correctly structured. The existence of different chunks is not enough to verify that all of the optional features are correctly enabled. Update 'test-tool read-graph' to output an "options:" line that includes information for different parts of the struct commit_graph. In particular, this change demonstrates that the read_generation_data option is never being enabled, which will be fixed in a later change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13t5000-t5999: detect and signal failure within loopLibravatar Eric Sunshine1-3/+3
Failures within `for` and `while` loops can go unnoticed if not detected and signaled manually since the loop itself does not abort when a contained command fails, nor will a failure necessarily be detected when the loop finishes since the loop returns the exit code of the last command it ran on the final iteration, which may not be the command which failed. Therefore, detect and signal failures manually within loops using the idiom `|| return 1` (or `|| exit 1` within subshells). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15fsck: verify commit graph when implicitly enabledLibravatar Glen Choo1-1/+22
Change fsck to check the "core_commit_graph" variable set in "repo-settings.c" instead of reading the "core.commitGraph" variable. This fixes a bug where we wouldn't verify the commit-graph if the config key was missing. This bug was introduced in 31b1de6a09 (commit-graph: turn on commit-graph by default, 2019-08-13), where core.commitGraph was turned on by default. Add tests to "t5318-commit-graph.sh" to verify that fsck checks the commit-graph as expected for the 3 values of core.commitGraph. Also, disable GIT_TEST_COMMIT_GRAPH in t/t0410-partial-clone.sh because some test cases use fsck in ways that assume that commit-graph checking is disabled. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15Merge branch 'ab/ignore-replace-while-working-on-commit-graph' into ↵Libravatar Junio C Hamano1-2/+3
gc/use-repo-settings * ab/ignore-replace-while-working-on-commit-graph: commit-graph: don't consider "replace" objects with "verify" commit-graph tests: fix another graph_git_two_modes() helper commit-graph tests: fix error-hiding graph_git_two_modes() helper
2021-10-15commit-graph: don't consider "replace" objects with "verify"Libravatar Ævar Arnfjörð Bjarmason1-0/+1
Extend the code added in d6538246d3d (commit-graph: not compatible with replace objects, 2018-08-20) which ignored replace objects in the "write" command to ignore it in the "verify" command too. We can just move this assignment to the cmd_commit_graph(), it dispatches to "write" and "verify", and we're unlikely to ever get a sub-command that would like to consider replace refs. This will make tests added in eddc1f556cd (mktag tests: test update-ref and reachable fsck, 2021-06-17) pass in combination with the "GIT_TEST_COMMIT_GRAPH" mode added in 859fdc0c3cf (commit-graph: define GIT_TEST_COMMIT_GRAPH, 2018-08-29), except that mode is currently broken (but is being fixed concurrently). See the discussion starting at [1]. 1. https://lore.kernel.org/git/87wnmihswp.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-15commit-graph tests: fix error-hiding graph_git_two_modes() helperLibravatar Ævar Arnfjörð Bjarmason1-2/+2
The graph_git_two_modes() helper added in 177722b3442 (commit: integrate commit graph with commit parsing, 2018-04-10) didn't &&-chain its "git commit-graph" invocations, which as can be seen with SANITIZE=leak will happily mark tests as passing if both of these commands die, since test_cmp() will be comparing two empty files. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30commit-graph: show "unexpected subcommand" errorLibravatar Ævar Arnfjörð Bjarmason1-1/+15
Bring the "commit-graph" command in line with the error output and general pattern in cmd_multi_pack_index(). Let's test for that output, and also cover the same potential bug as was fixed in the multi-pack-index command in 88617d11f9d (multi-pack-index: fix potential segfault without sub-command, 2021-07-19). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30commit-graph: show usage on "commit-graph [write|verify] garbage"Libravatar Ævar Arnfjörð Bjarmason1-0/+5
Change the parse_options() invocation in the commit-graph code to error on unknown leftover argv elements, in addition to the existing and implicit erroring via parse_options() on unknown options. We'd already error in cmd_commit_graph() on e.g.: git commit-graph unknown verify git commit-graph --unknown verify But here we're calling parse_options() twice more for the "write" and "verify" subcommands. We did not do the same checking for leftover argv elements there. As a result we'd silently accept garbage in these subcommands, let's not do that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-22Merge branch 'ds/commit-graph-generation-config'Libravatar Junio C Hamano1-1/+1
A new configuration variable has been introduced to allow choosing which version of the generation number gets used in the commit-graph file. * ds/commit-graph-generation-config: commit-graph: use config to specify generation type commit-graph: create local repository pointer
2021-03-01Merge branch 'ds/chunked-file-api'Libravatar Junio C Hamano1-1/+1
The common code to deal with "chunked file format" that is shared by the multi-pack-index and commit-graph files have been factored out, to help codepaths for both filetypes to become more robust. * ds/chunked-file-api: commit-graph.c: display correct number of chunks when writing chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods midx: add num_large_offsets to write_midx_context midx: add pack_perm to write_midx_context midx: add entries to write_midx_context midx: use context in write_midx_pack_names() midx: rename pack_info to write_midx_context commit-graph: use chunk-format write API chunk-format: create chunk format write API commit-graph: anonymize data in chunk_write_fn
2021-02-25commit-graph: use config to specify generation typeLibravatar Derrick Stolee1-1/+1
We have two established generation number versions: 1: topological levels 2: corrected commit dates The corrected commit dates are enabled by default, but they also write extra data in the GDAT and GDOV chunks. Services that host Git data might want to have more control over when this feature rolls out than just updating the Git binaries. Add a new "commitGraph.generationVersion" config option that specifies the intended generation number version. If this value is less than 2, then the GDAT chunk is never written _or read_ from an existing file. This can replace our use of the GIT_TEST_COMMIT_GRAPH_NO_GDAT environment variable in the test suite. Remove it. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22Merge branch 'ab/test-lib'Libravatar Junio C Hamano1-1/+1
Test framework clean-up. * ab/test-lib: test-lib-functions: assert correct parameter count test-lib-functions: remove bug-inducing "diagnostics" helper param test libs: rename "diff-lib" to "lib-diff" t/.gitattributes: sort lines test-lib-functions: move function to lib-bitmap.sh test libs: rename gitweb-lib.sh to lib-gitweb.sh test libs: rename bundle helper to "lib-bundle.sh" test-lib-functions: remove generate_zero_bytes() wrapper test-lib-functions: move test_set_index_version() to its user test lib: change "error" to "BUG" as appropriate test-lib: remove check_var_migration
2021-02-18commit-graph: use chunk-format read APILibravatar Derrick Stolee1-1/+1
Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). While the current implementation loses the duplicate-chunk detection, that will be added in a future change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10test-lib-functions: remove generate_zero_bytes() wrapperLibravatar Ævar Arnfjörð Bjarmason1-1/+1
Since d5cfd142ec1 (tests: teach the test-tool to generate NUL bytes and use it, 2019-02-14) the generate_zero_bytes() functions has been a thin wrapper for "test-tool genzeros". Let's have its only user call that directly instead. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-01commit-graph: prepare commit graphLibravatar Derrick Stolee1-1/+1
Before checking if the repository has a commit-graph loaded, be sure to run prepare_commit_graph(). This is necessary because otherwise the topo_levels slab is not initialized. As we compute topo_levels for the new commits, we iterate further into the lower layers since the first visit to each commit looks as though the topo_level is not populated. By properly initializing the topo_slab, we fix the previously broken case of a split commit graph where a base layer has the generation_data_overflow chunk. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-01commit-graph: always parse before commit_graph_data_at()Libravatar Derrick Stolee1-0/+21
There is a subtle failure happening when computing corrected commit dates with --split enabled. It requires a base layer needing the generation_data_overflow chunk. Then, the next layer on top erroneously thinks it needs an overflow chunk due to a bug leading to recalculating all reachable generation numbers. The output of the failure is BUG: commit-graph.c:1912: expected to write 8 bytes to chunk 47444f56, but wrote 0 instead These "expected" 8 bytes are due to re-computing the corrected commit date for the lower layer but the new layer does not need any overflow. Add a test to t5318-commit-graph.sh that demonstrates this bug. However, it does not trigger consistently with the existing code. The generation number data is stored in a slab and accessed by commit_graph_data_at(). This data is initialized when parsing a commit, but is otherwise used assuming it has been populated. The loop in compute_generation_numbers() did not enforce that all reachable commits were parsed and had correct values. This could lead to some problems when writing a commit-graph with corrected commit dates based on a commit-graph without them. It has been difficult to identify the issue here because it was so hard to reproduce. It relies on this uninitialized data having a non-zero value, but also on specifically in a way that overwrites the existing data. This patch adds the extra parse to ensure the data is filled before we compute the generation number of a commit. This triggers the new test to fail because the generation number overflow count does not match between this computation and the write for that chunk. The actual fix will follow as the next few changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-18commit-graph: implement generation data chunkLibravatar Abhishek Kumar1-13/+66
As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17commit-graph: use the "hash version" byteLibravatar Derrick Stolee1-2/+36
The commit-graph format reserved a byte among the header of the file to store a "hash version". During the SHA-256 work, this was not modified because file formats are not necessarily intended to work across hash versions. If a repository has SHA-256 as its hash algorithm, it automatically up-shifts the lengths of object names in all necessary formats. However, since we have this byte available for adjusting the version, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Update the oid_version() method in commit-graph.c to add a new value, 2, for sha-256. This automatically writes the new value in a SHA-256 repository _and_ verifies the value is correct. This is a breaking change relative to the current 'master' branch since 092b677 (Merge branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking relative to any released version of Git. The test impact is relatively minor: the output of 'test-tool read-graph' lists the header information, so those instances of '1' need to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A more careful test is added that specifically creates a repository of each type then swaps the commit-graph files. The important value here is that the "git log" command succeeds while writing a message to stderr. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-11Merge branch 'bc/sha-256-part-3'Libravatar Junio C Hamano1-2/+1
The final leg of SHA-256 transition. * bc/sha-256-part-3: (39 commits) t: remove test_oid_init in tests docs: add documentation for extensions.objectFormat ci: run tests with SHA-256 t: make SHA1 prerequisite depend on default hash t: allow testing different hash algorithms via environment t: add test_oid option to select hash algorithm repository: enable SHA-256 support by default setup: add support for reading extensions.objectformat bundle: add new version for use with SHA-256 builtin/verify-pack: implement an --object-format option http-fetch: set up git directory before parsing pack hashes t0410: mark test with SHA1 prerequisite t5308: make test work with SHA-256 t9700: make hash size independent t9500: ensure that algorithm info is preserved in config t9350: make hash size independent t9301: make hash size independent t9300: use $ZERO_OID instead of hard-coded object ID t9300: abstract away SHA-1-specific constants t8011: make hash size independent ...
2020-07-30Merge branch 'ds/commit-graph-bloom-updates' into masterLibravatar Junio C Hamano1-1/+1
Updates to the changed-paths bloom filter. * ds/commit-graph-bloom-updates: commit-graph: check all leading directories in changed path Bloom filters revision: empty pathspecs should not use Bloom filters revision.c: fix whitespace commit-graph: check chunk sizes after writing commit-graph: simplify chunk writes into loop commit-graph: unify the signatures of all write_graph_chunk_*() functions commit-graph: persist existence of changed-paths bloom: fix logic in get_bloom_filter() commit-graph: change test to die on parse, not load commit-graph: place bloom_settings in context
2020-07-30Merge branch 'sg/commit-graph-cleanups' into masterLibravatar Junio C Hamano1-2/+3
The changed-path Bloom filter is improved using ideas from an independent implementation. * sg/commit-graph-cleanups: commit-graph: simplify write_commit_graph_file() #2 commit-graph: simplify write_commit_graph_file() #1 commit-graph: simplify parse_commit_graph() #2 commit-graph: simplify parse_commit_graph() #1 commit-graph: clean up #includes diff.h: drop diff_tree_oid() & friends' return value commit-slab: add a function to deep free entries on the slab commit-graph-format.txt: all multi-byte numbers are in network byte order commit-graph: fix parsing the Chunk Lookup table tree-walk.c: don't match submodule entries for 'submod/anything'
2020-07-30t: remove test_oid_init in testsLibravatar brian m. carlson1-2/+1
Now that we call test_oid_init in the setup for all test scripts, there's no point in calling it individually. Remove all of the places where we've done so to help keep tests tidy. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23commit-graph: change test to die on parse, not loadLibravatar Derrick Stolee1-1/+1
43d3561 (commit-graph write: don't die if the existing graph is corrupt, 2019-03-25) introduced the GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD environment variable. This was created to verify that commit-graph was not loaded when writing a new non-incremental commit-graph. An upcoming change wants to load a commit-graph in some valuable cases, but we want to maintain that we don't trust the commit-graph data when writing our new file. Instead of dying on load, instead die if we ever try to parse a commit from the commit-graph. This functionally verifies the same intended behavior, but allows a more advanced feature in the next change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17Merge branch 'tb/t5318-cleanup'Libravatar Junio C Hamano1-4/+25
Code cleanup. * tb/t5318-cleanup: t5318: test that '--stdin-commits' respects '--[no-]progress' t5318: use 'test_must_be_empty'
2020-06-08Merge branch 'tb/commit-graph-no-check-oids'Libravatar Junio C Hamano1-9/+16
Clean-up the commit-graph codepath. * tb/commit-graph-no-check-oids: commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag t5318: reorder test below 'graph_read_expect' commit-graph.c: simplify 'fill_oids_from_commits' builtin/commit-graph.c: dereference tags in builtin builtin/commit-graph.c: extract 'read_one_commit()' commit-graph.c: peel refs in 'add_ref_to_set' commit-graph.c: show progress of finding reachable commits commit-graph.c: extract 'refs_cb_data'
2020-06-08commit-graph: simplify parse_commit_graph() #1Libravatar SZEDER Gábor1-1/+2
While we iterate over all entries of the Chunk Lookup table we make sure that we don't attempt to read past the end of the mmap-ed commit-graph file, and check in each iteration that the chunk ID and offset we are about to read is still within the mmap-ed memory region. However, these checks in each iteration are not really necessary, because the number of chunks in the commit-graph file is already known before this loop from the just parsed commit-graph header. So let's check that the commit-graph file is large enough for all entries in the Chunk Lookup table before we start iterating over those entries, and drop those per-iteration checks. While at it, take into account the size of everything that is necessary to have a valid commit-graph file, i.e. the size of the header, the size of the mandatory OID Fanout chunk, and the size of the signature in the trailer as well. Note that this necessitates the change of the error message as well, and, consequently, have to update the 'detect incorrect chunk count' test in 't5318-commit-graph.sh' as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08commit-graph: fix parsing the Chunk Lookup tableLibravatar SZEDER Gábor1-1/+1
The commit-graph file format specifies that the chunks may be in any order. However, if the OID Lookup chunk happens to be the last one in the file, then any command attempting to access the commit-graph data will fail with: fatal: invalid commit position. commit-graph is likely corrupt In this case the error is wrong, the commit-graph file does conform to the specification, but the parsing of the Chunk Lookup table is a bit buggy, and leaves the field holding the number of commits in the commit-graph zero-initialized. The number of commits in the commit-graph is determined while parsing the Chunk Lookup table, by dividing the size of the OID Lookup chunk with the hash size. However, the Chunk Lookup table doesn't actually store the size of the chunks, but it stores their starting offset. Consequently, the size of a chunk can only be calculated by subtracting the starting offsets of that chunk from the offset of the subsequent chunk, or in case of the last chunk from the offset recorded in the terminating label. This is currenly implemented in a bit complicated way: as we iterate over the entries of the Chunk Lookup table, we check the ID of each chunk and store its starting offset, then we check the ID of the last seen chunk and calculate its size using its previously saved offset if necessary (at the moment it's only necessary for the OID Lookup chunk). Alas, while parsing the Chunk Lookup table we only interate through the "real" chunks, but never look at the terminating label, thus don't even check whether it's necessary to calulate the size of the last chunk. Consequently, if the OID Lookup chunk is the last one, then we don't calculate its size and turn don't run the piece of code determining the number of commits in the commit graph, leaving the field holding that number unchanged (i.e. zero-initialized), eventually triggering the sanity check in load_oid_from_graph(). Fix this by iterating through all entries in the Chunk Lookup table, including the terminating label. Note that this is the minimal fix, suitable for the maintenance track. A better fix would be to simplify how the chunk sizes are calculated, but that is a more invasive change, less suitable for 'maint', so that will be done in later patches. This additional flexibility of scanning more chunks breaks a test for "git commit-graph verify" so alter that test to mutate the commit-graph to have an even lower chunk count. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04t5318: test that '--stdin-commits' respects '--[no-]progress'Libravatar Taylor Blau1-0/+21
The following lines were not covered in a recent line-coverage test against Git: builtin/commit-graph.c 5b6653e5 244) progress = start_delayed_progress( 5b6653e5 268) stop_progress(&progress); These statements are executed when both '--stdin-commits' and '--progress' are passed. Introduce a trio of tests that exercise various combinations of these options to ensure that these lines are covered. More importantly, this is exercising a (somewhat) previously-ignored feature of '--stdin-commits', which is that it respects '--progress'. Prior to 5b6653e523 (builtin/commit-graph.c: dereference tags in builtin, 2020-05-13), dereferencing input from '--stdin-commits' was done inside of commit-graph.c. Now that an additional progress meter may be generated from outside of commit-graph.c, add a corresponding test to make sure that it also respects '--[no]-progress'. The other location that generates progress meter output (from d335ce8f24 (commit-graph.c: show progress of finding reachable commits, 2020-05-13)) is already covered by any test that passes '--reachable'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04t5318: use 'test_must_be_empty'Libravatar Taylor Blau1-4/+4
A handful of tests in t5318 use 'test_line_count = 0 ...' to make sure that some command does not write any output. While correct, it is more idiomatic to use 'test_must_be_empty' instead. Switch the former invocations to use the latter instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flagLibravatar Taylor Blau1-4/+11
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' | awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' | git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18t5318: reorder test below 'graph_read_expect'Libravatar Taylor Blau1-9/+9
In the subsequent commit, we will introduce a dependency on 'graph_read_expect' from t5318.7. Preemptively move it below 'graph_read_expect()'s definition so that the test can call it. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-05Merge branch 'tb/commit-graph-perm-bits'Libravatar Junio C Hamano1-1/+14
Some of the files commit-graph subsystem keeps on disk did not correctly honor the core.sharedRepository settings and some were left read-write. * tb/commit-graph-perm-bits: commit-graph.c: make 'commit-graph-chain's read-only commit-graph.c: ensure graph layers respect core.sharedRepository commit-graph.c: write non-split graphs as read-only lockfile.c: introduce 'hold_lock_file_for_update_mode' tempfile.c: introduce 'create_tempfile_mode'
2020-05-01Merge branch 'gs/commit-graph-path-filter'Libravatar Junio C Hamano1-0/+2
Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS
2020-05-01Merge branch 'tb/commit-graph-split-strategy'Libravatar Junio C Hamano1-1/+1
"git commit-graph write" learned different ways to write out split files. * tb/commit-graph-split-strategy: Revert "commit-graph.c: introduce '--[no-]check-oids'" commit-graph.c: introduce '--[no-]check-oids' commit-graph.h: replace 'commit_hex' with 'commits' oidset: introduce 'oidset_size' builtin/commit-graph.c: introduce split strategy 'replace' builtin/commit-graph.c: introduce split strategy 'no-merge' builtin/commit-graph.c: support for '--split[=<strategy>]' t/helper/test-read-graph.c: support commit-graph chains
2020-04-29Revert "commit-graph.c: introduce '--[no-]check-oids'"Libravatar Junio C Hamano1-28/+0
This reverts commit 7a9ce0269bc0f4ef230f930b3910b70ac3142552, which has not yet gained consensus.
2020-04-29commit-graph.c: write non-split graphs as read-onlyLibravatar Taylor Blau1-1/+14
In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which *does* rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15commit-graph.c: introduce '--[no-]check-oids'Libravatar Taylor Blau1-0/+28
When operating on a stream of commit OIDs on stdin, 'git commit-graph write' checks that each OID refers to an object that is indeed a commit. This is convenient to make sure that the given input is well-formed, but can sometimes be undesirable. For example, server operators may wish to feed the refnames that were updated during a push to 'git commit-graph write --input=stdin-commits', and silently discard refs that don't point at commits. This can be done by combing the output of 'git for-each-ref' with '--format %(*objecttype)', but this requires opening up a potentially large number of objects. Instead, it is more convenient to feed the updated refs to the commit-graph machinery, and let it throw out refs that don't point to commits. Introduce '--[no-]check-oids' to make such a behavior possible. With '--check-oids' (the default behavior to retain backwards compatibility), 'git commit-graph write' will barf on a non-commit line in its input. With 'no-check-oids', such lines will be silently ignored, making the above possible by specifying this option. No matter which is supplied, 'git commit-graph write' retains the behavior from the previous commit of rejecting non-OID inputs like "HEAD" and "refs/heads/foo" as before. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15commit-graph.h: replace 'commit_hex' with 'commits'Libravatar Taylor Blau1-1/+1
The 'write_commit_graph()' function takes in either a string list of pack indices, or a string list of hexadecimal commit OIDs. These correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from 'git commit-graph write'. Using a string_list of hexadecimal commit IDs is not the most efficient use of memory, since we can instead use the 'struct oidset', which is more well-suited for this case. This has another benefit which will become apparent in the following commit. This is that we are about to disambiguate the kinds of errors we produce with '--stdin-commits' into "non-hex input" and "hex-input, but referring to a non-commit object". By having 'write_commit_graph' take in a 'struct oidset *' of commits, we place the burden on the caller (in this case, the builtin) to handle the first case, and the commit-graph machinery can handle the second case. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flagLibravatar Garima Singh1-0/+2
Add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag to the test setup suite in order to toggle writing Bloom filters when running any of the git tests. If set to true, we will compute and write Bloom filters every time a test calls `git commit-graph write`, as if the `--changed-paths` option was passed in. The test suite passes when GIT_TEST_COMMIT_GRAPH and GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS are enabled. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-17Merge branch 'bc/hash-independent-tests-part-8'Libravatar Junio C Hamano1-2/+2
Preparation for SHA-256 migration continues. * bc/hash-independent-tests-part-8: (21 commits) t6024: update for SHA-256 t6006: make hash size independent t6000: abstract away SHA-1-specific constants t5703: make test work with SHA-256 t5607: make hash size independent t5318: update for SHA-256 t5515: make test hash independent t5321: make test hash independent t5313: make test hash independent t5309: make test hash independent t5302: make hash size independent t4060: make test work with SHA-256 t4211: add test cases for SHA-256 t4211: move SHA-1-specific test cases into a directory t4013: make test hash independent t3311: make test work with SHA-256 t3310: make test work with SHA-256 t3309: make test work with SHA-256 t3308: make test work with SHA-256 t3206: make hash size independent ...
2020-02-14Merge branch 'tb/commit-graph-object-dir'Libravatar Junio C Hamano1-2/+2
The code to compute the commit-graph has been taught to use a more robust way to tell if two object directories refer to the same thing. * tb/commit-graph-object-dir: commit-graph.h: use odb in 'load_commit_graph_one_fd_st' commit-graph.c: remove path normalization, comparison commit-graph.h: store object directory in 'struct commit_graph' commit-graph.h: store an odb in 'struct write_commit_graph_context' t5318: don't pass non-object directory to '--object-dir'
2020-02-07t5318: update for SHA-256Libravatar brian m. carlson1-2/+2
Switch two tests to use $ZERO_OID to represent the all-zeros object ID. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05Merge branch 'bc/hash-independent-tests-part-7'Libravatar Junio C Hamano1-1/+1
Preparation of test scripts for the day when the object names will use SHA-256 continues. * bc/hash-independent-tests-part-7: t5604: make hash independent t5601: switch into repository to hash object t5562: use $ZERO_OID t5540: make hash size independent t5537: make hash size independent t5530: compute results based on object length t5512: abstract away SHA-1-specific constants t5510: make hash size independent t5504: make hash algorithm independent t5324: make hash size independent t5319: make test work with SHA-256 t5319: change invalid offset for SHA-256 compatibility t5318: update for SHA-256 t4300: abstract away SHA-1-specific constants t4204: make hash size independent t4202: abstract away SHA-1-specific constants t4200: make hash size independent t4134: compute appropriate length constant t4066: compute index line in diffs t4054: make hash-size independent
2020-01-31t5318: don't pass non-object directory to '--object-dir'Libravatar Taylor Blau1-2/+2
In f237c8b6fe (commit-graph: implement git-commit-graph write, 2018-04-02) the test t5318.3 was introduced to ensure that calling 'git commit-graph write' in a repository with no packfiles does not write any commit-graph file(s). To exercise more paths in 'builtin/commit-graph.c', this test passes '--object-dir' to 'git commit-graph write', but the given argument refers to the working copy, not the object directory. Since the commit-graph sub-commands currently swallow these errors, this does not result in a test failure. But, it is only lucky that the test ends with no commit-graphs, since there were none to begin with. In preparation for a future commit where an '--object-dir' argument that does not match a known object directory will print out a failure, let's fix the test to still use '--object-dir', but pass the correct location to the object store instead of '.'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-15t5318: update for SHA-256Libravatar brian m. carlson1-1/+1
When running with SHA-256 as the hash algorithm, the hash version octet is 2 instead of 1. Pick the right value depending on the hash algorithm and use it where we look for the existing value. To ensure the test checking for invalid data passes, use 3 as the test value for an invalid hash version. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-10Merge branch 'ds/commit-graph-delay-gen-progress'Libravatar Junio C Hamano1-2/+2
One kind of progress messages were always given during commit-graph generation, instead of following the "if it takes more than two seconds, show progress" pattern, which has been corrected. * ds/commit-graph-delay-gen-progress: commit-graph: use start_delayed_progress() progress: create GIT_PROGRESS_DELAY
2019-12-01Merge branch 'ds/test-read-graph'Libravatar Junio C Hamano1-1/+1
Dev support for commit-graph feature. * ds/test-read-graph: test-tool: use 'read-graph' helper