summaryrefslogtreecommitdiff
path: root/Documentation/git-commit-graph.txt
AgeCommit message (Collapse)AuthorFilesLines
2020-09-18commit-graph: introduce 'commitGraph.maxNewFilters'Libravatar Taylor Blau1-1/+2
Introduce a configuration variable to specify a default value for the recently-introduce '--max-new-filters' option of 'git commit-graph write'. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-18builtin/commit-graph.c: introduce '--max-new-filters=<n>'Libravatar Taylor Blau1-0/+6
Introduce a command-line flag to specify the maximum number of new Bloom filters that a 'git commit-graph write' is willing to compute from scratch. Prior to this patch, a commit-graph write with '--changed-paths' would compute Bloom filters for all selected commits which haven't already been computed (i.e., by a previous commit-graph write with '--split' such that a roll-up or replacement is performed). This behavior can cause prohibitively-long commit-graph writes for a variety of reasons: * There may be lots of filters whose diffs take a long time to generate (for example, they have close to the maximum number of changes, diffing itself takes a long time, etc). * Old-style commit-graphs (which encode filters with too many entries as not having been computed at all) cause us to waste time recomputing filters that appear to have not been computed only to discover that they are too-large. This can make the upper-bound of the time it takes for 'git commit-graph write --changed-paths' to be rather unpredictable. To make this command behave more predictably, introduce '--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters from scratch. This lets "computing" already-known filters proceed quickly, while bounding the number of slow tasks that Git is willing to do. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30Merge branch 'ds/commit-graph-bloom-updates' into masterLibravatar Junio C Hamano1-1/+4
Updates to the changed-paths bloom filter. * ds/commit-graph-bloom-updates: commit-graph: check all leading directories in changed path Bloom filters revision: empty pathspecs should not use Bloom filters revision.c: fix whitespace commit-graph: check chunk sizes after writing commit-graph: simplify chunk writes into loop commit-graph: unify the signatures of all write_graph_chunk_*() functions commit-graph: persist existence of changed-paths bloom: fix logic in get_bloom_filter() commit-graph: change test to die on parse, not load commit-graph: place bloom_settings in context
2020-07-01commit-graph: persist existence of changed-pathsLibravatar Derrick Stolee1-1/+4
The changed-path Bloom filters were released in v2.27.0, but have a significant drawback. A user can opt-in to writing the changed-path filters using the "--changed-paths" option to "git commit-graph write" but the next write will drop the filters unless that option is specified. This becomes even more important when considering the interaction with gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of features.experimental). These config options trigger commit-graph writes that the user did not signal, and hence there is no --changed-paths option available. Allow a user that opts-in to the changed-path filters to persist the property of "my commit-graph has changed-path filters" automatically. A user can drop filters using the --no-changed-paths option. In the process, we need to be extremely careful to match the Bloom filter settings as specified by the commit-graph. This will allow future versions of Git to customize these settings, and the version with this change will persist those settings as commit-graphs are rewritten on top. Use the trace2 API to signal the settings used during the write, and check that output in a test after manually adjusting the correct bytes in the commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08Merge branch 'tb/commit-graph-no-check-oids'Libravatar Junio C Hamano1-2/+4
Clean-up the commit-graph codepath. * tb/commit-graph-no-check-oids: commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag t5318: reorder test below 'graph_read_expect' commit-graph.c: simplify 'fill_oids_from_commits' builtin/commit-graph.c: dereference tags in builtin builtin/commit-graph.c: extract 'read_one_commit()' commit-graph.c: peel refs in 'add_ref_to_set' commit-graph.c: show progress of finding reachable commits commit-graph.c: extract 'refs_cb_data'
2020-05-18git-commit-graph.txt: fix list renderingLibravatar Martin Ågren1-0/+1
The first list item follows immediately on the paragraph where we introduce the list. This makes the "*" render literally as part of one huge paragraph. (With AsciiDoc, everything is fine after that, but with Asciidoctor, we get some minor follow-on errors.) Add an empty line -- with a list continuation ("+") -- to make the first list item render ok. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18git-commit-graph.txt: fix grammoLibravatar Martin Ågren1-1/+1
It's easy to mix up the possessive "its" and "it's" ("it is"). Correct an instance of this. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flagLibravatar Taylor Blau1-2/+4
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' | awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' | git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-01Merge branch 'gs/commit-graph-path-filter'Libravatar Junio C Hamano1-0/+5
Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS
2020-04-29Revert "commit-graph.c: introduce '--[no-]check-oids'"Libravatar Junio C Hamano1-5/+0
This reverts commit 7a9ce0269bc0f4ef230f930b3910b70ac3142552, which has not yet gained consensus.
2020-04-15commit-graph.c: introduce '--[no-]check-oids'Libravatar Taylor Blau1-0/+5
When operating on a stream of commit OIDs on stdin, 'git commit-graph write' checks that each OID refers to an object that is indeed a commit. This is convenient to make sure that the given input is well-formed, but can sometimes be undesirable. For example, server operators may wish to feed the refnames that were updated during a push to 'git commit-graph write --input=stdin-commits', and silently discard refs that don't point at commits. This can be done by combing the output of 'git for-each-ref' with '--format %(*objecttype)', but this requires opening up a potentially large number of objects. Instead, it is more convenient to feed the updated refs to the commit-graph machinery, and let it throw out refs that don't point to commits. Introduce '--[no-]check-oids' to make such a behavior possible. With '--check-oids' (the default behavior to retain backwards compatibility), 'git commit-graph write' will barf on a non-commit line in its input. With 'no-check-oids', such lines will be silently ignored, making the above possible by specifying this option. No matter which is supplied, 'git commit-graph write' retains the behavior from the previous commit of rejecting non-OID inputs like "HEAD" and "refs/heads/foo" as before. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15builtin/commit-graph.c: introduce split strategy 'replace'Libravatar Taylor Blau1-3/+4
When using split commit-graphs, it is sometimes useful to completely replace the commit-graph chain with a new base. For example, consider a scenario in which a repository builds a new commit-graph incremental for each push. Occasionally (say, after some fixed number of pushes), they may wish to rebuild the commit-graph chain with all reachable commits. They can do so with $ git commit-graph write --reachable but this removes the chain entirely and replaces it with a single commit-graph in 'objects/info/commit-graph'. Unfortunately, this means that the next push will have to move this commit-graph into the first layer of a new chain, and then write its new commits on top. Avoid such copying entirely by allowing the caller to specify that they wish to replace the entirety of their commit-graph chain, while also specifying that the new commit-graph should become the basis of a fresh, length-one chain. This addresses the above situation by making it possible for the caller to instead write: $ git commit-graph write --reachable --split=replace which writes a new length-one chain to 'objects/info/commit-graphs', making the commit-graph incremental generated by the subsequent push relatively cheap by avoiding the aforementioned copy. In order to do this, remove an assumption in 'write_commit_graph_file' that chains are always at least two incrementals long. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15builtin/commit-graph.c: introduce split strategy 'no-merge'Libravatar Taylor Blau1-0/+5
In the previous commit, we laid the groundwork for supporting different splitting strategies. In this commit, we introduce the first splitting strategy: 'no-merge'. Passing '--split=no-merge' is useful for callers which wish to write a new incremental commit-graph, but do not want to spend effort condensing the incremental chain [1]. Previously, this was possible by passing '--size-multiple=0', but this no longer the case following 63020f175f (commit-graph: prefer default size_mult when given zero, 2020-01-02). When '--split=no-merge' is given, the commit-graph machinery will never condense an existing chain, and it will always write a new incremental. [1]: This might occur when, for example, a server administrator running some program after each push may want to ensure that each job runs proportional in time to the size of the push, and does not "jump" when the commit-graph machinery decides to trigger a merge. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15builtin/commit-graph.c: support for '--split[=<strategy>]'Libravatar Taylor Blau1-5/+6
With '--split', the commit-graph machinery writes new commits in another incremental commit-graph which is part of the existing chain, and optionally decides to condense the chain into a single commit-graph. This is done to ensure that the asymptotic behavior of looking up a commit in an incremental chain is not dominated by the number of incrementals in that chain. It can be controlled by the '--max-commits' and '--size-multiple' options. In the next two commits, we will introduce additional splitting strategies that can exert additional control over: - when a split commit-graph is and isn't written, and - when the existing commit-graph chain is discarded completely and replaced with another graph To prepare for this, make '--split' take an optional strategy (as in '--split[=<strategy>]'), and add a new enum to describe which strategy is being used. For now, no strategies are given, and the only enumerated value is 'COMMIT_GRAPH_SPLIT_UNSPECIFIED', indicating the absence of a strategy. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06commit-graph: add --changed-paths option to write subcommandLibravatar Garima Singh1-0/+5
Add --changed-paths option to git commit-graph write. This option will allow users to compute information about the paths that have changed between a commit and its first parent, and write it into the commit graph file. If the option is passed to the write subcommand we set the COMMIT_GRAPH_WRITE_BLOOM_FILTERS flag and pass it down to the commit-graph logic. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04commit-graph.h: store an odb in 'struct write_commit_graph_context'Libravatar Taylor Blau1-1/+4
There are lots of places in 'commit-graph.h' where a function either has (or almost has) a full 'struct object_directory *', accesses '->path', and then throws away the rest of the struct. This can cause headaches when comparing the locations of object directories across alternates (e.g., in the case of deciding if two commit-graph layers can be merged). These paths are normalized with 'normalize_path_copy()' which mitigates some comparison issues, but not all [1]. Replace usage of 'char *object_dir' with 'odb->path' by storing a 'struct object_directory *' in the 'write_commit_graph_context' structure. This is an intermediate step towards getting rid of all path normalization in 'commit-graph.c'. Resolving a user-provided '--object-dir' argument now requires that we compare it to the known alternates for equality. Prior to this patch, an unknown '--object-dir' argument would silently exit with status zero. This can clearly lead to unintended behavior, such as verifying commit-graphs that aren't in a repository's own object store (or one of its alternates), or causing a typo to mask a legitimate commit-graph verification failure. Make this error non-silent by 'die()'-ing when the given '--object-dir' does not match any known alternate object store. [1]: In my testing, for example, I can get one side of the commit-graph code to fill object_dir with "./objects" and the other with just "objects". Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13test-tool: use 'read-graph' helperLibravatar Derrick Stolee1-12/+0
The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-18commit-graph: add --[no-]progress to write and verifyLibravatar Garima Singh1-2/+5
Add --[no-]progress to git commit-graph write and verify. The progress feature was introduced in 7b0f229 ("commit-graph write: add progress output", 2018-09-17) but the ability to opt-out was overlooked. Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-19commit-graph: verify chains with --shallow modeLibravatar Derrick Stolee1-1/+4
If we wrote a commit-graph chain, we only modified the tip file in the chain. It is valuable to verify what we wrote, but not waste time checking files we did not write. Add a '--shallow' option to the 'git commit-graph verify' subcommand and check that it does not read the base graph in a two-file chain. Making the verify subcommand read from a chain of commit-graphs takes some rearranging of the builtin code. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-19commit-graph: create options for split filesLibravatar Derrick Stolee1-1/+20
The split commit-graph feature is now fully implemented, but needs some more run-time configurability. Allow direct callers to 'git commit-graph write --split' to specify the values used in the merge strategy and the expire time. Update the documentation to specify these values. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-27Doc: refer to the "commit-graph file" with dashLibravatar Martin Ågren1-6/+6
The file processed by `git commit-graph` is referred to as the "commit-graph file", also with a dash. We have a few references to the "commit graph file", though, without the dash. These occur in git-commit-graph.txt as well as in Doc/technical/commit-graph.txt. Fix them. Do not change the references to the "commit graph" (without "... file") as a data structure. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-27git-commit-graph.txt: refer to "*commit*-graph file"Libravatar Martin Ågren1-6/+6
This document sometimes refers to the "commit-graph file" as just "the graph file". This saves a couple of words here and there at the risk of confusion. In particular, the documentation for `git commit-graph read` appears to suggest that there are indeed different types of graph files. Let's just write out the full name everywhere. The full name, by the way, is not the dash-less "commit graph file". Use the dashed form. (The next commit will fix the remaining few instances of the "commit graph file" in this document.) Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-27git-commit-graph.txt: typeset more in monospaceLibravatar Martin Ågren1-6/+7
While we're here, fix an instance of "folder" to be "directory". Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-27git-commit-graph.txt: fix bullet listsLibravatar Martin Ågren1-2/+2
We have a couple of bullet items which span multiple lines, and where we have prefixed each line with a `*`. (This might be the result of a text editor trying to help.) This results in each line being typeset as a separate bullet item. Drop the extra `*`. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27commit-graph: add '--reachable' optionLibravatar Derrick Stolee1-2/+6
When writing commit-graph files, it can be convenient to ask for all reachable commits (starting at the ref set) in the resulting file. This is particularly helpful when writing to stdin is complicated, such as a future integration with 'git gc'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27commit-graph: add 'verify' subcommandLibravatar Derrick Stolee1-0/+6
If the commit-graph file becomes corrupt, we need a way to verify that its contents match the object database. In the manner of 'git fsck' we will implement a 'git commit-graph verify' subcommand to report all issues with the file. Add the 'verify' subcommand to the 'commit-graph' builtin and its documentation. The subcommand is currently a no-op except for loading the commit-graph into memory, which may trigger run-time errors that would be caught by normal use. Add a simple test that ensures the command returns a zero error code. If no commit-graph file exists, this is an acceptable state. Do not report any errors. Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit-graph: implement "--append" optionLibravatar Derrick Stolee1-0/+10
Teach git-commit-graph to add all commits from the existing commit-graph file to the file about to be written. This should be used when adding new commits without performing garbage collection. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit-graph: build graph from starting commitsLibravatar Derrick Stolee1-1/+13
Teach git-commit-graph to read commits from stdin when the --stdin-commits flag is specified. Commits reachable from these commits are added to the graph. This is a much faster way to construct the graph than inspecting all packed objects, but is restricted to known tips. For the Linux repository, 700,000+ commits were added to the graph file starting from 'master' in 7-9 seconds, depending on the number of packfiles in the repo (1, 24, or 120). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit-graph: read only from specific pack-indexesLibravatar Derrick Stolee1-1/+10
Teach git-commit-graph to inspect the objects only in a certain list of pack-indexes within the given pack directory. This allows updating the commit graph iteratively. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11commit-graph: implement git commit-graph readLibravatar Derrick Stolee1-0/+12
Teach git-commit-graph to read commit graph files and summarize their contents. Use the read subcommand to verify the contents of a commit graph file in the tests. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02commit-graph: implement git-commit-graph writeLibravatar Derrick Stolee1-0/+41
Teach git-commit-graph to write graph files. Create new test script to verify this command succeeds without failure. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02commit-graph: create git-commit-graph builtinLibravatar Derrick Stolee1-0/+10
Teach git the 'commit-graph' builtin that will be used for writing and reading packed graph files. The current implementation is mostly empty, except for an '--object-dir' option. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>