summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-05-11completion: offer '--(no-)patch' among 'git log' optionsLibravatar SZEDER Gábor1-0/+1
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11bloom: use num_changes not nr for limit detectionLibravatar Derrick Stolee2-2/+2
As diff_tree_oid() computes a diff, it will terminate early if the total number of changed paths is strictly larger than max_changes. This includes the directories that changed, not just the file paths. However, only the file paths are reflected in the resulting diff queue's "nr" value. Use the "num_changes" from diffopt to check if the diff terminated early. This is incredibly important, as it can result in incorrect filters! For example, the first commit in the Linux kernel repo reports only 471 changes, but since these are nested inside several directories they expand to 513 "real" changes, and in fact the total list of changes is not reported. Thus, the computed filter for this commit is incorrect. Demonstrate the subtle difference by using one fewer file change in the 'get bloom filter for commit with 513 changes' test. Before, this edited 513 files inside "bigDir" which hit this inequality. However, dropping the file count by one demonstrates how the previous inequality was incorrect but the new one is correct. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11bloom: de-duplicate directory entriesLibravatar Derrick Stolee2-11/+28
When computing a changed-path Bloom filter, we need to take the files that changed from the diff computation and extract the parent directories. That way, a directory pathspec such as "Documentation" could match commits that change "Documentation/git.txt". However, the current code does a poor job of this process. The paths are added to a hashmap, but we do not check if an entry already exists with that path. This can create many duplicate entries and cause the filter to have a much larger length than it should. This means that the filter is more sparse than intended, which helps the false positive rate, but wastes a lot of space. Properly use hashmap_get() before hashmap_add(). Also be sure to include a comparison function so these can be matched correctly. This has an effect on a test in t0095-bloom.sh. This makes sense, there are ten changes inside "smallDir" so the total number of paths in the filter should be 11. This would result in 11 * 10 bits required, and with 8 bits per byte, this results in 14 bytes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11Documentation: changed-path Bloom filters use byte wordsLibravatar Derrick Stolee1-4/+4
In Documentation/technical/commit-graph-format.txt, the definition of the BIDX chunk specifies the length is a number of 8-byte words. During development we discovered that using 8-byte words in the Murmur3 hash algorithm causes issues with big-endian versus little- endian machines. Thus, the hash algorithm was adapted to work on a byte-by-byte basis. However, this caused a change in the definition of a "word" in bloom.h. Now, a "word" is a single byte, which allows filters to be as small as two bytes. These length-two filters are demonstrated in t0095-bloom.sh, and a larger filter of length 25 is demonstrated as well. The original point of using 8-byte words was for alignment reasons. It also presented opportunities for extremely sparse Bloom filters when there were a small number of changes at a commit, creating a very low false-positive rate. However, modifying the format at this point is unlikely to be a valuable exercise. Also, this use of single-byte granularity does present opportunities to save space. It is unclear if 8-byte alignment of the filters would present any meaningful performance benefits. Modify the format document to reflect reality. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11bloom: parse commit before computing filtersLibravatar Derrick Stolee1-0/+3
When computing changed-path Bloom filters for a commit, we need to know if the commit has a parent or not. If the commit is not parsed, then its parent pointer will be NULL. As far as I can tell, the only opportunity to reach this code without parsing the commit is inside "test-tool bloom get_filter_for_commit" but it is best to be safe. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-01test-bloom: fix usage typoLibravatar Derrick Stolee1-1/+1
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-01bloom: fix whitespace around tab lengthLibravatar Derrick Stolee2-10/+10
Fix alignment issues that were likely introduced due to an editor using tab lengths of 4 instead of 8. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23test-bloom: check that we have expected argumentsLibravatar Jeff King1-4/+16
If "test-tool bloom" is not fed a command, or if arguments are missing for some commands, it will just segfault. Let's check argc and write a friendlier usage message. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23test-bloom: fix some whitespace issuesLibravatar Jeff King1-5/+5
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23blame: drop unused parameter from maybe_changed_pathLibravatar Jeff King1-3/+1
We don't use the "parent" parameter at all (probably because the bloom filter for a commit is always defined against a single parent anyway). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-16blame: use changed-path Bloom filtersLibravatar Derrick Stolee3-9/+146
The changed-path Bloom filters help reduce the amount of tree parsing required during history queries. Before calculating a diff, we can ask the filter if a path changed between a commit and its first parent. If the filter says "no" then we can move on without parsing trees. If the filter says "maybe" then we parse trees to discover if the answer is actually "yes" or "no". When computing a blame, there is a section in find_origin() that computes a diff between a commit and one of its parents. When this is the first parent, we can check the Bloom filters before calling diff_tree_oid(). In order to make this work with the blame machinery, we need to initialize a struct bloom_key with the initial path. But also, we need to add more keys to a list if a rename is detected. We then check to see if _any_ of these keys answer "maybe" in the diff. During development, I purposefully left out this "add a new key when a rename is detected" to see if the test suite would catch my error. That is how I discovered the issues with GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS from the previous change. With that change, we can feel some confidence in the coverage of this change. If a user requests copy detection using "git blame -C", then there are more places where the set of "important" files can expand. I do not know enough about how this happens in the blame machinery. Thus, the Bloom filter integration is explicitly disabled in this mode. A later change could expand the bloom_key data with an appropriate call (or calls) to add_bloom_key(). If we did not disable this mode, then the following tests would fail: t8003-blame-corner-cases.sh t8011-blame-split-file.sh Generally, this is a performance enhancement and should not change the behavior of 'git blame' in any way. If a repo has a commit-graph file with computed changed-path Bloom filters, then they should notice improved performance for their 'git blame' commands. Here are some example timings that I found by blaming some paths in the Linux kernel repository: git blame arch/x86/kernel/topology.c >/dev/null Before: 0.83s After: 0.24s git blame kernel/time/time.c >/dev/null Before: 0.72s After: 0.24s git blame tools/perf/ui/stdio/hist.c >/dev/null Before: 0.27s After: 0.11s I specifically looked for "deep" paths that were also edited many times. As a counterpoint, the MAINTAINERS file was edited many times but is located in the root tree. This means that the cost of computing a diff relative to the pathspec is very small. Here are the timings for that command: git blame MAINTAINERS >/dev/null Before: 20.1s After: 18.0s These timings are the best of five. The worst-case runs were on the order of 2.5 minutes for both cases. Note that the MAINTAINERS file has 18,740 lines across 17,000+ commits. This happens to be one of the cases where this change provides the least improvement. The lack of improvement for the MAINTAINERS file and the relatively modest improvement for the other examples can be easily explained. The blame machinery needs to compute line-level diffs to determine which lines were changed by each commit. That makes up a large proportion of the computation time, and this change does not attempt to improve on that section of the algorithm. The MAINTAINERS file is large and changed often, so it takes time to determine which lines were updated by which commit. In contrast, the code files are much smaller, and it takes longer to comute the line-by-line diff for a single patch on the Linux mailing lists. Outside of the "-C" integration, I believe there is little more to gain from the changed-path Bloom filters for 'git blame' after this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-16tests: write commit-graph with Bloom filtersLibravatar Derrick Stolee4-5/+29
The GIT_TEST_COMMIT_GRAPH environment variable updates the commit- graph file whenever "git commit" is run, ensuring that we always have an updated commit-graph throughout the test suite. The GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was introduced to write the changed-path Bloom filters whenever "git commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH trick doesn't launch a separate process and instead writes it directly. To expand the number of tests that have commits in the commit-graph file, add a helper method that computes the commit-graph and place that helper inside "git commit" and "git merge". In the helper method, check GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS to ensure we are writing changed-path Bloom filters whenever possible. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-16revision: complicated pathspecs disable filtersLibravatar Derrick Stolee1-1/+18
The changed-path Bloom filters work only when we can compute an explicit Bloom filter key in advance. When a pathspec is given that allows case-insensitive checks or wildcard matching, we must disable the Bloom filter performance checks. By checking the pathspec in prepare_to_use_bloom_filters(), we avoid setting up the Bloom filter data and thus revert to the usual logic. Before this change, the following tests would fail*: t6004-rev-list-path-optim.sh (Tests 6-7) t6130-pathspec-noglob.sh (Tests 3-6) t6131-pathspec-icase.sh (Tests 3-5) *These tests would fail when using GIT_TEST_COMMIT_GRAPH and GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS except that the latter environment variable was not set up correctly to write the changed- path Bloom filters in the test suite. That will be fixed in the next change. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-09bloom: ignore renames when computing changed pathsLibravatar Derrick Stolee1-0/+1
The changed-path Bloom filters record an entry in the filter for every path that was changed. This includes every add and delete, regardless of whether a rename was detected. Detecting renames causes significant performance issues, but also will trigger downloading missing blobs in partial clone. The simple fix is to disable rename detection when computing a changed-path Bloom filter. This should already be disabled by default, but it is good to explicitly enforce the intended behavior. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flagLibravatar Garima Singh6-1/+12
Add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag to the test setup suite in order to toggle writing Bloom filters when running any of the git tests. If set to true, we will compute and write Bloom filters every time a test calls `git commit-graph write`, as if the `--changed-paths` option was passed in. The test suite passes when GIT_TEST_COMMIT_GRAPH and GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS are enabled. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06t4216: add end to end tests for git log with Bloom filtersLibravatar Garima Singh2-0/+159
These tests exercises writing commit graph with Bloom filters and exercises 'git log -- path' with all the applicable options. They check that the output is the same with and without Bloom filters, confirm Bloom filters were used by checking if trace2 statistics were logged correctly. Also confirms cases where Bloom filters are not used: 1. Multiple path specs, 2. --walk-reflogs (see patch titled 'revision.c: use Bloom filters...' for details, 3. If the latest commit graph does not have Bloom filters Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06revision.c: add trace2 stats around Bloom filter usageLibravatar Garima Singh1-0/+41
Add trace2 statistics around Bloom filter usage and behavior for 'git log -- path' commands that are hoping to benefit from the presence of computed changed paths Bloom filters. These statistics are great for performance analysis work and for formal testing, which we will see in the commit following this one. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06revision.c: use Bloom filters to speed up path based revision walksLibravatar Garima Singh4-2/+118
Revision walk will now use Bloom filters for commits to speed up revision walks for a particular path (for computing history for that path), if they are present in the commit-graph file. We load the Bloom filters during the prepare_revision_walk step, currently only when dealing with a single pathspec. Extending it to work with multiple pathspecs can be explored and built on top of this series in the future. While comparing trees in rev_compare_trees(), if the Bloom filter says that the file is not different between the two trees, we don't need to compute the expensive diff. This is where we get our performance gains. The other response of the Bloom filter is '`:maybe', in which case we fall back to the full diff calculation to determine if the path was changed in the commit. We do not try to use Bloom filters when the '--walk-reflogs' option is specified. The '--walk-reflogs' option does not walk the commit ancestry chain like the rest of the options. Incorporating the performance gains when walking reflog entries would add more complexity, and can be explored in a later series. Performance Gains: We tested the performance of `git log -- <path>` on the git repo, the linux and some internal large repos, with a variety of paths of varying depths. On the git and linux repos: - we observed a 2x to 5x speed up. On a large internal repo with files seated 6-10 levels deep in the tree: - we observed 10x to 20x speed ups, with some paths going up to 28 times faster. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06commit-graph: add --changed-paths option to write subcommandLibravatar Garima Singh2-2/+12
Add --changed-paths option to git commit-graph write. This option will allow users to compute information about the paths that have changed between a commit and its first parent, and write it into the commit graph file. If the option is passed to the write subcommand we set the COMMIT_GRAPH_WRITE_BLOOM_FILTERS flag and pass it down to the commit-graph logic. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06commit-graph: reuse existing Bloom filters during writeLibravatar Garima Singh4-6/+55
Add logic to a) parse Bloom filter information from the commit graph file and, b) re-use existing Bloom filters. See Documentation/technical/commit-graph-format for the format in which the Bloom filter information is written to the commit graph file. To read Bloom filter for a given commit with lexicographic position 'i' we need to: 1. Read BIDX[i] which essentially gives us the starting index in BDAT for filter of commit i+1. It is essentially the index past the end of the filter of commit i. It is called end_index in the code. 2. For i>0, read BIDX[i-1] which will give us the starting index in BDAT for filter of commit i. It is called the start_index in the code. For the first commit, where i = 0, Bloom filter data starts at the beginning, just past the header in the BDAT chunk. Hence, start_index will be 0. 3. The length of the filter will be end_index - start_index, because BIDX[i] gives the cumulative 8-byte words including the ith commit's filter. We toggle whether Bloom filters should be recomputed based on the compute_if_not_present flag. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06commit-graph: write Bloom filters to commit graph fileLibravatar Garima Singh3-1/+147
Update the technical documentation for commit-graph-format with the formats for the Bloom filter index (BIDX) and Bloom filter data (BDAT) chunks. Write the computed Bloom filters information to the commit graph file using this format. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30commit-graph: examine commits by generation numberLibravatar Garima Singh1-3/+30
When running 'git commit-graph write --changed-paths', we sort the commits by pack-order to save time when computing the changed-paths bloom filters. This does not help when finding the commits via the '--reachable' flag. If not using pack-order, then sort by generation number before examining the diff. Commits with similar generation are more likely to have many trees in common, making the diff faster. On the Linux kernel repository, this change reduced the computation time for 'git commit-graph write --reachable --changed-paths' from 3m00s to 1m37s. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30commit-graph: examine changed-path objects in pack orderLibravatar Jeff King1-3/+35
Looking at the diff of commit objects in pack order is much faster than in sha1 order, as it gives locality to the access of tree deltas (whereas sha1 order is effectively random). Unfortunately the commit-graph code sorts the commits (several times, sometimes as an oid and sometimes a pointer-to-commit), and we ultimately traverse in sha1 order. Instead, let's remember the position at which we see each commit, and traverse in that order when looking at bloom filters. This drops my time for "git commit-graph write --changed-paths" in linux.git from ~4 minutes to ~1.5 minutes. Probably the "--reachable" code path would want something similar. Or alternatively, we could use a different data structure (either a hash, or maybe even just a bit in "struct commit") to keep track of which oids we've seen, etc instead of sorting. And then we could keep the original order. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30commit-graph: compute Bloom filters for changed pathsLibravatar Garima Singh2-2/+33
Add new COMMIT_GRAPH_WRITE_CHANGED_PATHS flag that makes Git compute Bloom filters for the paths that changed between a commit and it's first parent, for each commit in the commit-graph. This computation is done on a commit-by-commit basis. We will write these Bloom filters to the commit-graph file, to store this data on disk, in the next change in this series. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30diff: halt tree-diff early after max_changesLibravatar Derrick Stolee3-1/+14
When computing the changed-paths bloom filters for the commit-graph, we limit the size of the filter by restricting the number of paths in the diff. Instead of computing a large diff and then ignoring the result, it is better to halt the diff computation early. Create a new "max_changes" option in struct diff_options. If non-zero, then halt the diff computation after discovering strictly more changed paths. This includes paths corresponding to trees that change. Use this max_changes option in the bloom filter calculations. This reduces the time taken to compute the filters for the Linux kernel repo from 2m50s to 2m35s. On a large internal repository with ~500 commits that perform tree-wide changes, the time reduced from 6m15s to 3m48s. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30bloom.c: core Bloom filter implementation for changed paths.Libravatar Garima Singh4-0/+172
Add the core implementation for computing Bloom filters for the paths changed between a commit and it's first parent. We fill the Bloom filters as (const char *data, int len) pairs as `struct bloom_filters" within a commit slab. Filters for commits with no changes and more than 512 changes, is represented with a filter of length zero. There is no gain in distinguishing between a computed filter of length zero for a commit with no changes, and an uncomputed filter for new commits or for commits with more than 512 changes. The effect on `git log -- path` is the same in both cases. We will fall back to the normal diffing algorithm when we can't benefit from the existence of Bloom filters. Helped-by: Jeff King <peff@peff.net> Helped-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30bloom.c: introduce core Bloom filter constructsLibravatar Garima Singh4-1/+188
Introduce the constructs for Bloom filters, Bloom filter keys and Bloom filter settings. For details on what Bloom filters are and how they work, refer to Dr. Derrick Stolee's blog post [1]. It provides a concise explanation of the adoption of Bloom filters as described in [2] and [3]. Implementation specifics: 1. We currently use 7 and 10 for the number of hashes and the size of each entry respectively. They served as great starting values, the mathematical details behind this choice are described in [1] and [4]. The implementation, while not completely open to it at the moment, is flexible enough to allow for tweaking these settings in the future. Note: The performance gains we have observed with these values are significant enough that we did not need to tweak these settings. The performance numbers are included in the cover letter of this series and in the commit message of the subsequent commit where we use Bloom filters to speed up `git log -- path`. 2. As described in [1] and [3], we do not need 7 independent hashing functions. We use the Murmur3 hashing scheme, seed it twice and then combine those to procure an arbitrary number of hash values. 3. The filters will be sized according to the number of changes in each commit, in multiples of 8 bit words. [1] Derrick Stolee "Supercharging the Git Commit Graph IV: Bloom Filters" https://devblogs.microsoft.com/devops/super-charging-the-git-commit-graph-iv-Bloom-filters/ [2] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, George Varghese "An Improved Construction for Counting Bloom Filters" http://theory.stanford.edu/~rinap/papers/esa2006b.pdf https://doi.org/10.1007/11841036_61 [3] Peter C. Dillinger and Panagiotis Manolios "Bloom Filters in Probabilistic Verification" http://www.ccs.neu.edu/home/pete/pub/Bloom-filters-verification.pdf https://doi.org/10.1007/978-3-540-30494-4_26 [4] Thomas Mueller Graf, Daniel Lemire "Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters" https://arxiv.org/abs/1912.08258 Helped-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30bloom.c: add the murmur3 hash implementationLibravatar Garima Singh7-0/+133
In preparation for computing changed paths Bloom filters, implement the Murmur3 hash algorithm as described in [1]. It hashes the given data using the given seed and produces a uniformly distributed hash value. [1] https://en.wikipedia.org/wiki/MurmurHash#Algorithm Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Szeder Gábor <szeder.dev@gmail.com> Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30commit-graph: define and use MAX_NUM_CHUNKSLibravatar Garima Singh1-2/+3
This is a minor cleanup to make it easier to change the number of chunks being written to the commit graph. Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29Merge branch 'ds/default-pack-use-sparse-to-true'Libravatar Junio C Hamano7-16/+18
The 'pack.useSparse' configuration variable now defaults to 'true', enabling an optimization that has been experimental since Git 2.21. * ds/default-pack-use-sparse-to-true: pack-objects: flip the use of GIT_TEST_PACK_SPARSE config: set pack.useSparse=true by default
2020-03-26The second batch post 2.26 cycleLibravatar Junio C Hamano1-0/+53
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-26Merge branch 'ah/force-pull-rebase-configuration'Libravatar Junio C Hamano3-11/+65
"git pull" learned to warn when no pull.rebase configuration exists, and neither --[no-]rebase nor --ff-only is given (which would result a merge). * ah/force-pull-rebase-configuration: pull: warn if the user didn't say whether to rebase or to merge
2020-03-26Merge branch 'tg/retire-scripted-stash'Libravatar Junio C Hamano8-862/+31
"git stash" has kept an escape hatch to use the scripted version for a few releases, which got stale. It has been removed. * tg/retire-scripted-stash: stash: remove the stash.useBuiltin setting stash: get git_stash_config at the top level
2020-03-26Merge branch 'jc/describe-misnamed-annotated-tag'Libravatar Junio C Hamano2-7/+28
When "git describe C" finds an annotated tag with tagname A to be the best name to explain commit C, and the tag is stored in a "wrong" place in the refs/tags hierarchy, e.g. refs/tags/B, the command gave a warning message but used A (not B) to describe C. If C is exactly at the tag, the describe output would be "A", but "git rev-parse A^0" would not be equal as "git rev-parse C^0". The behavior of the command has been changed to use the "long" form i.e. A-0-gOBJECTNAME, which is correctly interpreted by rev-parse. * jc/describe-misnamed-annotated-tag: describe: force long format for a name based on a mislocated tag
2020-03-26Merge branch 'at/rebase-fork-point-regression-fix'Libravatar Junio C Hamano3-13/+34
The "--fork-point" mode of "git rebase" regressed when the command was rewritten in C back in 2.20 era, which has been corrected. * at/rebase-fork-point-regression-fix: rebase: --fork-point regression fix
2020-03-26Merge branch 'bc/filter-process'Libravatar Junio C Hamano20-75/+350
Provide more information (e.g. the object of the tree-ish in which the blob being converted appears, in addition to its path, which has already been given) to smudge/clean conversion filters. * bc/filter-process: t0021: test filter metadata for additional cases builtin/reset: compute checkout metadata for reset builtin/rebase: compute checkout metadata for rebases builtin/clone: compute checkout metadata for clones builtin/checkout: compute checkout metadata for checkouts convert: provide additional metadata to filters convert: permit passing additional metadata to filter processes builtin/checkout: pass branch info down to checkout_worktree
2020-03-26Merge branch 'hi/gpg-prefer-check-signature'Libravatar Junio C Hamano6-78/+201
The code to interface with GnuPG has been refactored. * hi/gpg-prefer-check-signature: gpg-interface: prefer check_signature() for GPG verification t: increase test coverage of signature verification output
2020-03-26Merge branch 'bc/sha-256-part-1-of-4'Libravatar Junio C Hamano28-141/+623
SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...
2020-03-26Merge branch 'pb/recurse-submodules-fix'Libravatar Junio C Hamano3-25/+51
Fix "git checkout --recurse-submodules" of a nested submodule hierarchy. * pb/recurse-submodules-fix: t/lib-submodule-update: add test removing nested submodules unpack-trees: check for missing submodule directory in merged_entry unpack-trees: remove outdated description for verify_clean_submodule t/lib-submodule-update: move a test to the right section t/lib-submodule-update: remove outdated test description t7112: remove mention of KNOWN_FAILURE_SUBMODULE_RECURSIVE_NESTED
2020-03-25The first batch post 2.26 cycleLibravatar Junio C Hamano3-2/+43
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-25Merge branch 'ss/submodule-foreach-cb'Libravatar Junio C Hamano1-4/+4
Code clean-up. * ss/submodule-foreach-cb: submodule--helper.c: Rename 'cb_foreach' to 'foreach_cb'
2020-03-25Merge branch 'jc/config-tar'Libravatar Junio C Hamano3-7/+8
Improve the structure of the documentation source a bit. * jc/config-tar: separate tar.* config to its own source file
2020-03-25Merge branch 'en/oidset-uninclude-hashmap'Libravatar Junio C Hamano1-1/+0
Code clean-up. * en/oidset-uninclude-hashmap: oidset: remove unnecessary include
2020-03-25Merge branch 'ds/check-connected-reprepare-packed-git'Libravatar Junio C Hamano1-0/+4
Corner case "git fetch" fix. * ds/check-connected-reprepare-packed-git: connected.c: reprepare packs for corner cases
2020-03-25Merge branch 'rs/doc-passthru-fetch-options'Libravatar Junio C Hamano1-3/+7
Doc update. * rs/doc-passthru-fetch-options: pull: document more passthru options
2020-03-25Merge branch 'pw/advise-rebase-skip'Libravatar Junio C Hamano8-46/+233
The mechanism to prevent "git commit" from making an empty commit or amending during an interrupted cherry-pick was broken during the rewrite of "git rebase" in C, which has been corrected. * pw/advise-rebase-skip: commit: give correct advice for empty commit during a rebase commit: encapsulate determine_whence() for sequencer commit: use enum value for multiple cherry-picks sequencer: write CHERRY_PICK_HEAD for reword and edit cherry-pick: check commit error messages cherry-pick: add test for `--skip` advice in `git commit` t3404: use test_cmp_rev
2020-03-25Merge branch 'yz/p4-py3'Libravatar Junio C Hamano2-95/+146
Update "git p4" to work with Python 3. * yz/p4-py3: ci: use python3 in linux-gcc and osx-gcc and python2 elsewhere git-p4: use python3's input() everywhere git-p4: simplify regex pattern generation for parsing diff-tree git-p4: use dict.items() iteration for python3 compatibility git-p4: use functools.reduce instead of reduce git-p4: fix freezing while waiting for fast-import progress git-p4: use marshal format version 2 when sending to p4 git-p4: open .gitp4-usercache.txt in text mode git-p4: convert path to unicode before processing them git-p4: encode/decode communication with git for python3 git-p4: encode/decode communication with p4 for python3 git-p4: remove string type aliasing git-p4: change the expansion test from basestring to list git-p4: make python2.7 the oldest supported version
2020-03-25Merge branch 'am/real-path-fix'Libravatar Junio C Hamano16-75/+107
The real_path() convenience function can easily be misused; with a bit of code refactoring in the callers' side, its use has been eliminated. * am/real-path-fix: get_superproject_working_tree(): return strbuf real_path_if_valid(): remove unsafe API real_path: remove unsafe API set_git_dir: fix crash when used with real_path()
2020-03-25Merge branch 'sg/commit-slab-clarify-peek'Libravatar Junio C Hamano1-1/+6
In-code comment update. * sg/commit-slab-clarify-peek: commit-slab: clarify slabname##_peek()'s return value
2020-03-25Merge branch 'jc/maintain-doc'Libravatar Junio C Hamano1-13/+39
Doc update. * jc/maintain-doc: update how-to-maintain-git