summaryrefslogtreecommitdiff
path: root/t/helper
AgeCommit message (Collapse)AuthorFilesLines
2020-09-03Merge branch 'jk/slimmed-down'Libravatar Junio C Hamano3-135/+0
Trim an unused binary and turn a bunch of commands into built-in. * jk/slimmed-down: drop vcs-svn experiment make git-fast-import a builtin make git-bugreport a builtin make credential helpers builtins Makefile: drop builtins from MSVC pdb list
2020-08-27Merge branch 'jk/leakfix'Libravatar Junio C Hamano1-1/+1
Code clean-up. * jk/leakfix: submodule--helper: fix leak of core.worktree value config: fix leak in git_config_get_expiry_in_days() config: drop git_config_get_string_const() config: fix leaks from git_config_get_string_const() checkout: fix leak of non-existent branch names submodule--helper: use strbuf_release() to free strbufs clear_pattern_list(): clear embedded hashmaps
2020-08-17multi-pack-index: use hash version byteLibravatar Derrick Stolee1-2/+6
Similar to the commit-graph format, the multi-pack-index format has a byte in the header intended to track the hash version used to write the file. This allows one to interpret the hash length without having the context of the repository config specifying the hash length. This was not modified as part of the SHA-256 work because the hash length was automatically up-shifted due to that config. Since we have this byte available, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Add a new oid_version() method in midx.c similar to the one in commit-graph.c. This is specifically made separate from that implementation to avoid artificially linking the formats. The test impact requires a few more things than the corresponding change in the commit-graph format. Specifically, 'test-tool read-midx' was not writing anything about this header value to output. Since the value available in 'struct multi_pack_index' is hash_len instead of a version value, we output "20" or "32" instead of "1" or "2". Since we want a user to not have their Git commands fail if their multi-pack-index has the incorrect hash version compared to the repository's hash version, we relax the die() to an error() in load_multi_pack_index(). This has some effect on 'git multi-pack-index verify' as we need to check that a failed parse of a file that exists is actually a verify error. For that test that checks the hash version matches, we change the corrupted byte from "2" to "3" to ensure the test fails for both hash algorithms. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-14config: fix leaks from git_config_get_string_const()Libravatar Jeff King1-1/+1
There are two functions to get a single config string: - git_config_get_string() - git_config_get_string_const() One might naively think that the first one allocates a new string and the second one just points us to the internal configset storage. But in fact they both allocate a new copy; the second one exists only to avoid having to cast when using it with a const global which we never intend to free. The documentation for the function explains that clearly, but it seems I'm not alone in being surprised by this. Of 17 calls to the function, 13 of them leak the resulting value. We could obviously fix these by adding the appropriate free(). But it would be simpler still if we actually had a non-allocating way to get the string. There's git_config_get_value() but that doesn't quite do what we want. If the config key is present but is a boolean with no value (e.g., "[foo]bar" in the file), then we'll get NULL (whereas the string versions will print an error and die). So let's introduce a new variant, git_config_get_string_tmp(), that behaves as these callers expect. We need a new name because we have new semantics but the same function signature (so even if we converted the four remaining callers, topics in flight might be surprised). The "tmp" is because this value should only be held onto for a short time. In practice it's rare for us to clear and refresh the configset, invalidating the pointer, but hopefully the "tmp" makes callers think about the lifetime. In each of the converted cases here the value only needs to last within the local function or its immediate caller. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-13drop vcs-svn experimentLibravatar Jeff King3-135/+0
The code in vcs-svn was started in 2010 as an attempt to build a remote-helper for interacting with svn repositories (as opposed to git-svn). However, we never got as far as shipping a mature remote helper, and the last substantive commit was e99d012a6bc in 2012. We do have a git-remote-testsvn, and it is even installed as part of "make install". But given the name, it seems unlikely to be used by anybody (you'd have to explicitly "git clone testsvn::$url", and there have been zero mentions of that on the mailing list since 2013, and even that includes the phrase "you might need to hack a bit to get it working properly"[1]). We also ship contrib/svn-fe, which builds on the vcs-svn work. However, it does not seem to build out of the box for me, as the link step misses some required libraries for using libgit.a. Curiously, the original build breakage bisects for me to eff80a9fd9 (Allow custom "comment char", 2013-01-16), which seems unrelated. There was an attempt to fix it in da011cb0e7 (contrib/svn-fe: fix Makefile, 2014-08-28), but on my system that only switches the error message. So it seems like the result is not really usable by anybody in practice. It would be wonderful if somebody wanted to pick up the topic again, and potentially it's worth carrying around for that reason. But the flip side is that people doing tree-wide operations have to deal with this code. And you can see the list with (replace "HEAD" with this commit as appropriate): { echo "--" git diff-tree --diff-filter=D -r --name-only HEAD^ HEAD } | git log --no-merges --oneline e99d012a6bc.. --stdin which shows 58 times somebody had to deal with the code, generally due to a compile or test failure, or a tree-wide style fix or API change. Let's drop it and let anybody who wants to pick it up do so by resurrecting it from the git history. As a bonus, this also reduces the size of a stripped installation of Git from 21MB to 19MB. [1] https://lore.kernel.org/git/CALkWK0mPHzKfzFKKpZkfAus3YVC9NFYDbFnt+5JQYVKipk3bQQ@mail.gmail.com/ Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-11Merge branch 'bc/sha-256-part-3'Libravatar Junio C Hamano1-0/+2
The final leg of SHA-256 transition. * bc/sha-256-part-3: (39 commits) t: remove test_oid_init in tests docs: add documentation for extensions.objectFormat ci: run tests with SHA-256 t: make SHA1 prerequisite depend on default hash t: allow testing different hash algorithms via environment t: add test_oid option to select hash algorithm repository: enable SHA-256 support by default setup: add support for reading extensions.objectformat bundle: add new version for use with SHA-256 builtin/verify-pack: implement an --object-format option http-fetch: set up git directory before parsing pack hashes t0410: mark test with SHA1 prerequisite t5308: make test work with SHA-256 t9700: make hash size independent t9500: ensure that algorithm info is preserved in config t9350: make hash size independent t9301: make hash size independent t9300: use $ZERO_OID instead of hard-coded object ID t9300: abstract away SHA-1-specific constants t8011: make hash size independent ...
2020-07-30strvec: rename struct fieldsLibravatar Jeff King1-4/+4
The "argc" and "argv" names made sense when the struct was argv_array, but now they're just confusing. Let's rename them to "nr" (which we use for counts elsewhere) and "v" (which is rather terse, but reads well when combined with typical variable names like "args.v"). Note that we have to update all of the callers immediately. Playing tricks with the preprocessor is hard here, because we wouldn't want to rewrite unrelated tokens. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30t: make test-bloom initialize repositoryLibravatar brian m. carlson1-0/+2
The bloom filter code relies on reading object IDs using parse_oid_hex. In order to make that work with an appropriate size, we need to have initialized the repository's hash algorithm. Since the values we're processing depend on the repository in use, let's set up the repository when we run the test helper. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: fix indentation in renamed callsLibravatar Jeff King1-3/+3
Code which split an argv_array call across multiple lines, like: argv_array_pushl(&args, "one argument", "another argument", "and more", NULL); was recently mechanically renamed to use strvec, which results in mis-matched indentation like: strvec_pushl(&args, "one argument", "another argument", "and more", NULL); Let's fix these up to align the arguments with the opening paren. I did this manually by sifting through the results of: git jump grep 'strvec_.*,$' and liberally applying my editor's auto-format. Most of the changes are of the form shown above, though I also normalized a few that had originally used a single-tab indentation (rather than our usual style of aligning with the open paren). I also rewrapped a couple of obvious cases (e.g., where previously too-long lines became short enough to fit on one), but I wasn't aggressive about it. In cases broken to three or more lines, the grouping of arguments is sometimes meaningful, and it wasn't worth my time or reviewer time to ponder each case individually. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: convert remaining callers away from argv_array nameLibravatar Jeff King1-18/+18
We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '*.c' '*.h' | xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: rename files from argv-array to strvecLibravatar Jeff King2-2/+2
This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '*.c' '*.h' | xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-06Merge branch 'cb/is-descendant-of'Libravatar Junio C Hamano1-1/+1
Code clean-up. * cb/is-descendant-of: commit-reach: avoid is_descendant_of() shim
2020-07-06Merge branch 'es/get-worktrees-unsort'Libravatar Junio C Hamano1-1/+1
API cleanup for get_worktrees() * es/get-worktrees-unsort: worktree: drop get_worktrees() unused 'flags' argument worktree: drop get_worktrees() special-purpose sorting option
2020-07-06Merge branch 'ak/commit-graph-to-slab'Libravatar Junio C Hamano1-1/+1
A few fields in "struct commit" that do not have to always be present have been moved to commit slabs. * ak/commit-graph-to-slab: commit-graph: minimize commit_graph_data_slab access commit: move members graph_pos, generation to a slab commit-graph: introduce commit_graph_data_slab object: drop parsed_object_pool->commit_count
2020-07-06Merge branch 'bc/sha-256-part-2'Libravatar Junio C Hamano1-0/+3
SHA-256 migration work continues. * bc/sha-256-part-2: (44 commits) remote-testgit: adapt for object-format bundle: detect hash algorithm when reading refs t5300: pass --object-format to git index-pack t5704: send object-format capability with SHA-256 t5703: use object-format serve option t5702: offer an object-format capability in the test t/helper: initialize the repository for test-sha1-array remote-curl: avoid truncating refs with ls-remote t1050: pass algorithm to index-pack when outside repo builtin/index-pack: add option to specify hash algorithm remote-curl: detect algorithm for dumb HTTP by size builtin/ls-remote: initialize repository based on fetch t5500: make hash independent serve: advertise object-format capability for protocol v2 connect: parse v2 refs with correct hash algorithm connect: pass full packet reader when parsing v2 refs Documentation/technical: document object-format for protocol v2 t1302: expect repo format version 1 for SHA-256 builtin/show-index: provide options to determine hash algo t5302: modernize test formatting ...
2020-06-23commit-reach: avoid is_descendant_of() shimLibravatar Carlo Marcelo Arenas Belón1-1/+1
d91d6fbf26 (commit-reach: create repo_is_descendant_of(), 2020-06-17) adds a repository aware version of is_descendant_of() and a backward compatibility shim that is barely used. Update all callers to directly use the new repo_is_descendant_of() function instead; making the codebase simpler and pushing more the_repository references higher up the stack. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22worktree: drop get_worktrees() unused 'flags' argumentLibravatar Eric Sunshine1-1/+1
get_worktrees() accepts a 'flags' argument, however, there are no existing flags (the lone flag GWT_SORT_LINKED was recently retired) and no behavior which can be tweaked. Therefore, drop the 'flags' argument. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19t/helper: initialize the repository for test-sha1-arrayLibravatar brian m. carlson1-0/+3
test-sha1-array uses the_hash_algo under the hood. Since t0064 wants to use the value that is correct for the hash algorithm that we're testing, make sure the test helper initializes the repository to set the_hash_algo correctly. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17object: drop parsed_object_pool->commit_countLibravatar Abhishek Kumar1-1/+1
14ba97f8 (alloc: allow arbitrary repositories for alloc functions, 2018-05-15) introduced parsed_object_pool->commit_count to keep count of commits per repository and was used to assign commit->index. However, commit-slab code requires commit->index values to be unique and a global count would be correct, rather than a per-repo count. Let's introduce a static counter variable, `parsed_commits_count` to keep track of parsed commits so far. As commit_count has no use anymore, let's also drop it from the struct. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08Merge branch 'dl/remote-curl-deadlock-fix'Libravatar Junio C Hamano1-0/+4
On-the-wire protocol v2 easily falls into a deadlock between the remote-curl helper and the fetch-pack process when the server side prematurely throws an error and disconnects. The communication has been updated to make it more robust. * dl/remote-curl-deadlock-fix: stateless-connect: send response end packet pkt-line: define PACKET_READ_RESPONSE_END remote-curl: error on incomplete packet pkt-line: extern packet_length() transport: extract common fetch_pack() call remote-curl: remove label indentation remote-curl: fix typo
2020-06-08Merge branch 'cb/t4210-illseq-auto-detect'Libravatar Junio C Hamano1-28/+66
As FreeBSD is not the only platform whose regexp library reports a REG_ILLSEQ error when fed invalid UTF-8, add logic to detect that automatically and skip the affected tests. * cb/t4210-illseq-auto-detect: t4210: detect REG_ILLSEQ dynamically and skip affected tests t/helper: teach test-regex to report pattern errors (like REG_ILLSEQ)
2020-05-24pkt-line: define PACKET_READ_RESPONSE_ENDLibravatar Denton Liu1-0/+4
In a future commit, we will use PACKET_READ_RESPONSE_END to separate messages proxied by remote-curl. To prepare for this, add the PACKET_READ_RESPONSE_END enum value. In switch statements that need a case added, die() or BUG() when a PACKET_READ_RESPONSE_END is unexpected. Otherwise, mirror how PACKET_READ_DELIM is implemented (especially in cases where packets are being forwarded). Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18t/helper: teach test-regex to report pattern errors (like REG_ILLSEQ)Libravatar Carlo Marcelo Arenas Belón1-28/+66
7187c7bbb8 (t4210: skip i18n tests that don't work on FreeBSD, 2019-11-27) adds a REG_ILLSEQ prerequisite to avoid failures from the tests added in 4e2443b181 (log tests: test regex backends in "--encode=<enc>" tests, 2019-06-28), but hardcodes it to be only enabled in FreeBSD. Instead of hardcoding the affected platform, teach the test-regex helper, how to validate a pattern and report back, so it can be used to detect the same issue in other affected systems (like DragonFlyBSD or macOS). While at it, refactor the tool so it can report back the source of the errors it founds, and can be invoked also in a --silent mode, when needed, for backward compatibility. A missing flag has been added and the code reformatted, as well as updates to the way the parameters are handled, for consistency. To minimize changes, it is assumed the regcomp error is of the right type since we control the only caller, and is also assumed to affect both basic and extended syntax (only basic is tested, but both behave the same in all three affected platforms since they use the same function). Based-on-patch-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-14Merge branch 'ds/bloom-cleanup'Libravatar Junio C Hamano1-1/+1
Code cleanup and typofixes * ds/bloom-cleanup: completion: offer '--(no-)patch' among 'git log' options bloom: use num_changes not nr for limit detection bloom: de-duplicate directory entries Documentation: changed-path Bloom filters use byte words bloom: parse commit before computing filters test-bloom: fix usage typo bloom: fix whitespace around tab length
2020-05-07bloom: fix `make sparse` warningLibravatar Đoàn Trần Công Danh1-1/+1
* We need a `final_new_line` to make our source code as text file, per POSIX and C specification. * `bloom_filters` should be limited to interal linkage only Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-01Merge branch 'dd/sparse-fixes'Libravatar Junio C Hamano2-11/+4
Compilation fix. * dd/sparse-fixes: progress.c: silence cgcc suggestion about internal linkage graph.c: limit linkage of internal variable compat/regex: move stdlib.h up in inclusion chain test-parse-pathspec-file.c: s/0/NULL/ for pointer type
2020-05-01Merge branch 'ds/blame-on-bloom'Libravatar Junio C Hamano1-8/+20
"git blame" learns to take advantage of the "changed-paths" Bloom filter stored in the commit-graph file. * ds/blame-on-bloom: test-bloom: check that we have expected arguments test-bloom: fix some whitespace issues blame: drop unused parameter from maybe_changed_path blame: use changed-path Bloom filters tests: write commit-graph with Bloom filters revision: complicated pathspecs disable filters
2020-05-01Merge branch 'gs/commit-graph-path-filter'Libravatar Junio C Hamano4-0/+87
Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS
2020-05-01Merge branch 'tb/commit-graph-split-strategy'Libravatar Junio C Hamano1-12/+1
"git commit-graph write" learned different ways to write out split files. * tb/commit-graph-split-strategy: Revert "commit-graph.c: introduce '--[no-]check-oids'" commit-graph.c: introduce '--[no-]check-oids' commit-graph.h: replace 'commit_hex' with 'commits' oidset: introduce 'oidset_size' builtin/commit-graph.c: introduce split strategy 'replace' builtin/commit-graph.c: introduce split strategy 'no-merge' builtin/commit-graph.c: support for '--split[=<strategy>]' t/helper/test-read-graph.c: support commit-graph chains
2020-05-01test-bloom: fix usage typoLibravatar Derrick Stolee1-1/+1
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28Merge branch 'dd/test-with-busybox'Libravatar Junio C Hamano1-1/+1
Various tests have been updated to work around issues found with shell utilities that come with busybox etc. * dd/test-with-busybox: t5703: feed raw data into test-tool unpack-sideband t4124: tweak test so that non-compliant diff(1) can also be used t7063: drop non-POSIX argument "-ls" from find(1) t5616: use rev-parse instead to get HEAD's object_id t5003: skip conversion test if unzip -a is unavailable t5003: drop the subshell in test_lazy_prereq test-lib-functions: test_cmp: eval $GIT_TEST_CMP t4061: use POSIX compliant regex(7)
2020-04-27progress.c: silence cgcc suggestion about internal linkageLibravatar Đoàn Trần Công Danh1-8/+1
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Reviewed-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-27test-parse-pathspec-file.c: s/0/NULL/ for pointer typeLibravatar Đoàn Trần Công Danh1-3/+3
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Reviewed-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23test-bloom: check that we have expected argumentsLibravatar Jeff King1-4/+16
If "test-tool bloom" is not fed a command, or if arguments are missing for some commands, it will just segfault. Let's check argc and write a friendlier usage message. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23test-bloom: fix some whitespace issuesLibravatar Jeff King1-5/+5
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-22Merge branch 'jk/oid-array-cleanups'Libravatar Junio C Hamano3-6/+6
Code cleanup. * jk/oid-array-cleanups: oidset: stop referring to sha1-array ref-filter: stop referring to "sha1 array" bisect: stop referring to sha1_array test-tool: rename sha1-array to oid-array oid_array: rename source file from sha1-array oid_array: use size_t for iteration oid_array: use size_t for count and allocation
2020-04-22Merge branch 'js/trace2-env-vars'Libravatar Junio C Hamano1-0/+1
Trace2 enhancement to allow logging of the environment variables. * js/trace2-env-vars: trace2: teach Git to log environment variables
2020-04-15t/helper/test-read-graph.c: support commit-graph chainsLibravatar Taylor Blau1-12/+1
In 61df89c8e5 (commit-graph: don't early exit(1) on e.g. "git status", 2019-03-25), the former 'load_commit_graph_one' was refactored into 'open_commit_graph' and 'load_commit_graph_one_fd_st' as a means of avoiding an early-exit from non-library code. However, 'load_commit_graph_one' does not support commit-graph chains, and hence the 'read-graph' test tool does not work with them. Replace 'load_commit_graph_one' with 'read_commit_graph_one' in order to support commit-graph chains. In the spirit of 61df89c8e5, 'read_commit_graph_one' does not ever 'die()', making it a suitable replacement here. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06t4216: add end to end tests for git log with Bloom filtersLibravatar Garima Singh1-0/+4
These tests exercises writing commit graph with Bloom filters and exercises 'git log -- path' with all the applicable options. They check that the output is the same with and without Bloom filters, confirm Bloom filters were used by checking if trace2 statistics were logged correctly. Also confirms cases where Bloom filters are not used: 1. Multiple path specs, 2. --walk-reflogs (see patch titled 'revision.c: use Bloom filters...' for details, 3. If the latest commit graph does not have Bloom filters Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06commit-graph: reuse existing Bloom filters during writeLibravatar Garima Singh1-1/+1
Add logic to a) parse Bloom filter information from the commit graph file and, b) re-use existing Bloom filters. See Documentation/technical/commit-graph-format for the format in which the Bloom filter information is written to the commit graph file. To read Bloom filter for a given commit with lexicographic position 'i' we need to: 1. Read BIDX[i] which essentially gives us the starting index in BDAT for filter of commit i+1. It is essentially the index past the end of the filter of commit i. It is called end_index in the code. 2. For i>0, read BIDX[i-1] which will give us the starting index in BDAT for filter of commit i. It is called the start_index in the code. For the first commit, where i = 0, Bloom filter data starts at the beginning, just past the header in the BDAT chunk. Hence, start_index will be 0. 3. The length of the filter will be end_index - start_index, because BIDX[i] gives the cumulative 8-byte words including the ith commit's filter. We toggle whether Bloom filters should be recomputed based on the compute_if_not_present flag. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30test-tool: rename sha1-array to oid-arrayLibravatar Jeff King3-5/+5
This matches the actual data structure name, as well as the source file that contains the code we're testing. The test scripts need updating to use the new name, as well. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30oid_array: rename source file from sha1-arrayLibravatar Jeff King1-1/+1
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to oid_array, 2017-03-31), but the file is still called sha1-array. Besides being slightly confusing, it makes it more annoying to grep for leftover occurrences of "sha1" in various files, because the header is included in so many places. Let's complete the transition by renaming the source and header files (and fixing up a few comment references). I kept the "-" in the name, as that seems to be our style; cf. fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10). We also have oidmap.h and oidset.h without any punctuation, but those are "struct oidmap" and "struct oidset" in the code. We _could_ make this "oidarray" to match, but somehow it looks uglier to me because of the length of "array" (plus it would be a very invasive patch for little gain). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30bloom.c: core Bloom filter implementation for changed paths.Libravatar Garima Singh1-0/+20
Add the core implementation for computing Bloom filters for the paths changed between a commit and it's first parent. We fill the Bloom filters as (const char *data, int len) pairs as `struct bloom_filters" within a commit slab. Filters for commits with no changes and more than 512 changes, is represented with a filter of length zero. There is no gain in distinguishing between a computed filter of length zero for a commit with no changes, and an uncomputed filter for new commits or for commits with more than 512 changes. The effect on `git log -- path` is the same in both cases. We will fall back to the normal diffing algorithm when we can't benefit from the existence of Bloom filters. Helped-by: Jeff King <peff@peff.net> Helped-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30bloom.c: introduce core Bloom filter constructsLibravatar Garima Singh1-0/+48
Introduce the constructs for Bloom filters, Bloom filter keys and Bloom filter settings. For details on what Bloom filters are and how they work, refer to Dr. Derrick Stolee's blog post [1]. It provides a concise explanation of the adoption of Bloom filters as described in [2] and [3]. Implementation specifics: 1. We currently use 7 and 10 for the number of hashes and the size of each entry respectively. They served as great starting values, the mathematical details behind this choice are described in [1] and [4]. The implementation, while not completely open to it at the moment, is flexible enough to allow for tweaking these settings in the future. Note: The performance gains we have observed with these values are significant enough that we did not need to tweak these settings. The performance numbers are included in the cover letter of this series and in the commit message of the subsequent commit where we use Bloom filters to speed up `git log -- path`. 2. As described in [1] and [3], we do not need 7 independent hashing functions. We use the Murmur3 hashing scheme, seed it twice and then combine those to procure an arbitrary number of hash values. 3. The filters will be sized according to the number of changes in each commit, in multiples of 8 bit words. [1] Derrick Stolee "Supercharging the Git Commit Graph IV: Bloom Filters" https://devblogs.microsoft.com/devops/super-charging-the-git-commit-graph-iv-Bloom-filters/ [2] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, George Varghese "An Improved Construction for Counting Bloom Filters" http://theory.stanford.edu/~rinap/papers/esa2006b.pdf https://doi.org/10.1007/11841036_61 [3] Peter C. Dillinger and Panagiotis Manolios "Bloom Filters in Probabilistic Verification" http://www.ccs.neu.edu/home/pete/pub/Bloom-filters-verification.pdf https://doi.org/10.1007/978-3-540-30494-4_26 [4] Thomas Mueller Graf, Daniel Lemire "Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters" https://arxiv.org/abs/1912.08258 Helped-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30bloom.c: add the murmur3 hash implementationLibravatar Garima Singh3-0/+15
In preparation for computing changed paths Bloom filters, implement the Murmur3 hash algorithm as described in [1]. It hashes the given data using the given seed and produces a uniformly distributed hash value. [1] https://en.wikipedia.org/wiki/MurmurHash#Algorithm Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Szeder Gábor <szeder.dev@gmail.com> Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-26t5703: feed raw data into test-tool unpack-sidebandLibravatar Đoàn Trần Công Danh1-1/+1
busybox's sed isn't binary clean. Thus, triggers false-negative on this test. We could replace sed with perl on this usecase. But, we could slightly modify the helper to discard unwanted data in the beginning. Fix the false negative by updating this helper. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-26Merge branch 'bc/sha-256-part-1-of-4'Libravatar Junio C Hamano2-6/+10
SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...
2020-03-25Merge branch 'am/real-path-fix'Libravatar Junio C Hamano1-1/+4
The real_path() convenience function can easily be misused; with a bit of code refactoring in the callers' side, its use has been eliminated. * am/real-path-fix: get_superproject_working_tree(): return strbuf real_path_if_valid(): remove unsafe API real_path: remove unsafe API set_git_dir: fix crash when used with real_path()
2020-03-25Merge branch 'hw/advise-ng'Libravatar Junio C Hamano3-0/+24
Revamping of the advise API to allow more systematic enumeration of advice knobs in the future. * hw/advise-ng: tag: use new advice API to check visibility advice: revamp advise API advice: change "setupStreamFailure" to "setUpstreamFailure" advice: extract vadvise() from advise()
2020-03-23trace2: teach Git to log environment variablesLibravatar Josh Steadmon1-0/+1
Via trace2, Git can already log interesting config parameters (see the trace2_cmd_list_config() function). However, this can grant an incomplete picture because many config parameters also allow overrides via environment variables. To allow for more complete logs, we add a new trace2_cmd_list_env_vars() function and supporting implementation, modeled after the pre-existing config param logging implementation. Signed-off-by: Josh Steadmon <steadmon@google.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>