summaryrefslogtreecommitdiff
path: root/t/helper
AgeCommit message (Collapse)AuthorFilesLines
2021-07-07oidtree: a crit-bit tree for odb_loose_cacheLibravatar Eric Wong3-0/+51
This saves 8K per `struct object_directory', meaning it saves around 800MB in my case involving 100K alternates (half or more of those alternates are unlikely to hold loose objects). This is implemented in two parts: a generic, allocation-free `cbtree' and the `oidtree' wrapper on top of it. The latter provides allocation using alloc_state as a memory pool to improve locality and reduce free(3) overhead. Unlike oid-array, the crit-bit tree does not require sorting. Performance is bound by the key length, for oidtree that is fixed at sizeof(struct object_id). There's no need to have 256 oidtrees to mitigate the O(n log n) overhead like we did with oid-array. Being a prefix trie, it is natively suited for expanding short object IDs via prefix-limited iteration in `find_short_object_filename'. On my busy workstation, p4205 performance seems to be roughly unchanged (+/-8%). Startup with 100K total alternates with no loose objects seems around 10-20% faster on a hot cache. (800MB in memory savings means more memory for the kernel FS cache). The generic cbtree implementation does impose some extra overhead for oidtree in that it uses memcmp(3) on "struct object_id" so it wastes cycles comparing 12 extra bytes on SHA-1 repositories. I've not yet explored reducing this overhead, but I expect there are many places in our code base where we'd want to investigate this. More information on crit-bit trees: https://cr.yp.to/critbit.html Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-14Merge branch 'en/ort-perf-batch-11'Libravatar Junio C Hamano1-20/+34
Optimize out repeated rename detection in a sequence of mergy operations. * en/ort-perf-batch-11: merge-ort, diffcore-rename: employ cached renames when possible merge-ort: handle interactions of caching and rename/rename(1to1) cases merge-ort: add helper functions for using cached renames merge-ort: preserve cached renames for the appropriate side merge-ort: avoid accidental API mis-use merge-ort: add code to check for whether cached renames can be reused merge-ort: populate caches of rename detection results merge-ort: add data structures for in-memory caching of rename detection t6429: testcases for remembering renames fast-rebase: write conflict state to working tree, index, and HEAD fast-rebase: change assert() to BUG() Documentation/technical: describe remembering renames optimization t6423: rename file within directory that other side renamed
2021-05-20fast-rebase: write conflict state to working tree, index, and HEADLibravatar Elijah Newren1-19/+32
Previously, when fast-rebase hit a conflict, it simply aborted and left HEAD, the index, and the working tree where they were before the operation started. While fast-rebase does not support restarting from a conflicted state, write the conflicted state out anyway as it gives us a way to see what the conflicts are and write tests that check for them. This will be important in the upcoming commits, because sequencer.c is only superficially integrated with merge-ort.c; in particular, it calls merge_switch_to_result() after EACH merge instead of only calling it at the end of all the sequence of merges (or when a conflict is hit). This not only causes needless updates to the working copy and index, but also causes all intermediate data to be freed and tossed, preventing caching information from one merge to the next. However, integrating sequencer.c more deeply with merge-ort.c is a big task, and making this small extension to fast-rebase.c provides us with a simple way to test the edge and corner cases that we want to make sure continue working. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20fast-rebase: change assert() to BUG()Libravatar Elijah Newren1-1/+2
assert() can succinctly document expectations for the code, and do so in a way that may be useful to future folks trying to refactor the code and change basic assumptions; it allows them to more quickly find some places where their violations of previous assumptions trips things up. Unfortunately, assert() can surround a function call with important side-effects, which is a huge mistake since some users will compile with assertions disabled. I've had to debug such mistakes before in other codebases, so I should know better. Luckily, this was only in test code, but it's still very embarrassing. Change an assert() to an if (...) BUG (...). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11Merge branch 'jk/symlinked-dotgitx-cleanup'Libravatar Junio C Hamano1-13/+33
Various test and documentation updates about .gitsomething paths that are symlinks. * jk/symlinked-dotgitx-cleanup: docs: document symlink restrictions for dot-files fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW t0060: test ntfs/hfs-obscured dotfiles t7450: test .gitmodules symlink matching against obscured names t7450: test verify_path() handling of gitmodules t7415: rename to expand scope fsck_tree(): wrap some long lines fsck_tree(): fix shadowed variable t7415: remove out-dated comment about translation
2021-05-10Merge branch 'bc/hash-transition-interop-part-1'Libravatar Junio C Hamano1-1/+1
SHA-256 transition. * bc/hash-transition-interop-part-1: hex: print objects using the hash algorithm member hex: default to the_hash_algo on zero algorithm value builtin/pack-objects: avoid using struct object_id for pack hash commit-graph: don't store file hashes as struct object_id builtin/show-index: set the algorithm for object IDs hash: provide per-algorithm null OIDs hash: set, copy, and use algo field in struct object_id builtin/pack-redundant: avoid casting buffers to struct object_id Use the final_oid_fn to finalize hashing of object IDs hash: add a function to finalize object IDs http-push: set algorithm when reading object ID Always use oidread to read into struct object_id hash: add an algo member to struct object_id
2021-05-04t0060: test ntfs/hfs-obscured dotfilesLibravatar Jeff King1-13/+33
We have tests that cover various filesystem-specific spellings of ".gitmodules", because we need to reliably identify that path for some security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules: add tests, 2018-05-12), with the actual code coming from e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02). Those latter two commits also added similar matching functions for .gitattributes and .gitignore. These ended up not being used in the final series, and are currently dead code. But in preparation for them being used in some fsck checks, let's make sure they actually work by throwing a few basic tests at them. Likewise, let's cover .mailmap (which does need matching code added). I didn't bother with the whole battery of tests that we cover for .gitmodules. These functions are all based on the same generic matcher, so it's sufficient to test most of the corner cases just once. Note that the ntfs magic prefix names in the tests come from the algorithm described in e7cb0b4455 (and are different for each file). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'ds/sparse-index-protections'Libravatar Junio C Hamano1-10/+56
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-04-30Merge branch 'jk/promisor-optim'Libravatar Junio C Hamano1-3/+3
Handling of "promisor packs" that allows certain objects to be missing and lazily retrievable has been optimized (a bit). * jk/promisor-optim: revision: avoid parsing with --exclude-promisor-objects lookup_unknown_object(): take a repository argument is_promisor_object(): free tree buffer after parsing
2021-04-27hash: provide per-algorithm null OIDsLibravatar brian m. carlson1-1/+1
Up until recently, object IDs did not have an algorithm member, only a hash. Consequently, it was possible to share one null (all-zeros) object ID among all hash algorithms. Now that we're going to be handling objects from multiple hash algorithms, it's important to make sure that all object IDs have a correct algorithm field. Introduce a per-algorithm null OID, and add it to struct hash_algo. Introduce a wrapper function as well, and use it everywhere we used to use the null_oid constant. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-20Merge branch 'ab/userdiff-tests'Libravatar Junio C Hamano3-0/+48
A bit of code clean-up and a lot of test clean-up around userdiff area. * ab/userdiff-tests: blame tests: simplify userdiff driver test blame tests: don't rely on t/t4018/ directory userdiff: remove support for "broken" tests userdiff tests: list builtin drivers via test-tool userdiff tests: explicitly test "default" pattern userdiff: add and use for_each_userdiff_driver() userdiff style: normalize pascal regex declaration userdiff style: declare patterns with consistent style userdiff style: re-order drivers in alphabetical order
2021-04-13Merge branch 'cc/test-helper-bloom-usage-fix'Libravatar Junio C Hamano1-1/+1
Usage message fix for a test helper. * cc/test-helper-bloom-usage-fix: test-bloom: fix missing 'bloom' from usage string
2021-04-13Merge branch 'tb/pack-preferred-tips-to-give-bitmap'Libravatar Junio C Hamano3-0/+26
A configuration variable has been added to force tips of certain refs to be given a reachability bitmap. * tb/pack-preferred-tips-to-give-bitmap: builtin/pack-objects.c: respect 'pack.preferBitmapTips' t/helper/test-bitmap.c: initial commit pack-bitmap: add 'test_bitmap_commits()' helper
2021-04-13lookup_unknown_object(): take a repository argumentLibravatar Jeff King1-3/+3
All of the other lookup_foo() functions take a repository argument, but lookup_unknown_object() was never converted, and it uses the_repository internally. Let's fix that. We could leave a wrapper that uses the_repository, but there aren't that many calls, so we'll just convert them all. I looked briefly at each site to see if we had a repository struct (besides the_repository) we could pass, but none of them do (so this conversion to pass the_repository is a pure noop in each case, though it does take us one step closer to eventually getting rid of the_repository). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08Merge branch 'tb/reverse-midx'Libravatar Junio C Hamano1-4/+20
An on-disk reverse-index to map the in-pack location of an object back to its object name across multiple packfiles is introduced. * tb/reverse-midx: midx.c: improve cache locality in midx_pack_order_cmp() pack-revindex: write multi-pack reverse indexes pack-write.c: extract 'write_rev_file_order' pack-revindex: read multi-pack reverse indexes Documentation/technical: describe multi-pack reverse indexes midx: make some functions non-static midx: keep track of the checksum midx: don't free midx_name early midx: allow marking a pack as preferred t/helper/test-read-midx.c: add '--show-objects' builtin/multi-pack-index.c: display usage on unrecognized command builtin/multi-pack-index.c: don't enter bogus cmd_mode builtin/multi-pack-index.c: split sub-commands builtin/multi-pack-index.c: define common usage with a macro builtin/multi-pack-index.c: don't handle 'progress' separately builtin/multi-pack-index.c: inline 'flags' with options
2021-04-08userdiff tests: list builtin drivers via test-toolLibravatar Ævar Arnfjörð Bjarmason3-0/+48
Change the userdiff test to list the builtin drivers via the test-tool, using the new for_each_userdiff_driver() API function. This gets rid of the need to modify this part of the test every time a new pattern is added, see 2ff6c34612 (userdiff: support Bash, 2020-10-22) and 09dad9256a (userdiff: support Markdown, 2020-05-02) for two recent examples. I only need the "list-builtin-drivers "argument here, but let's add "list-custom-drivers" and "list-drivers" too, just because it's easy. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-05test-bloom: fix missing 'bloom' from usage stringLibravatar Christian Couder1-1/+1
Like 'get_murmur3' and 'generate_filter', 'get_filter_for_commit' is a subcommand of `test-tool bloom` not of `test-tool` itself. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-02Merge branch 'jh/simple-ipc'Libravatar Junio C Hamano3-0/+789
A simple IPC interface gets introduced to build services like fsmonitor on top. * jh/simple-ipc: t0052: add simple-ipc tests and t/helper/test-simple-ipc tool simple-ipc: add Unix domain socket implementation unix-stream-server: create unix domain socket under lock unix-socket: disallow chdir() when creating unix domain sockets unix-socket: add backlog size option to unix_stream_listen() unix-socket: eliminate static unix_stream_socket() helper function simple-ipc: add win32 implementation simple-ipc: design documentation for new IPC mechanism pkt-line: add options argument to read_packetized_to_strbuf() pkt-line: add PACKET_READ_GENTLE_ON_READ_ERROR option pkt-line: do not issue flush packets in write_packetized_*() pkt-line: eliminate the need for static buffer in packet_write_gently()
2021-03-31t/helper/test-bitmap.c: initial commitLibravatar Taylor Blau3-0/+26
Add a new 'bitmap' test-tool which can be used to list the commits that have received bitmaps. In theory, a determined tester could run 'git rev-list --test-bitmap <commit>' to check if '<commit>' received a bitmap or not, since '--test-bitmap' exits with a non-zero code when it can't find the requested commit. But this is a dubious behavior to rely on, since arguably 'git rev-list' could continue its object walk outside of which commits are covered by bitmaps. This will be used to test the behavior of 'pack.preferBitmapTips', which will be added in the following patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30test-tool: don't force full indexLibravatar Derrick Stolee1-1/+12
We will use 'test-tool read-cache --table' to check that a sparse index is written as part of init_repos. Since we will no longer always expand a sparse index into a full index, add an '--expand' parameter that adds a call to ensure_full_index() so we can compare a sparse index directly against a full index, or at least what the in-memory index looks like when expanded in this way. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30test-read-cache: print cache entries with --tableLibravatar Derrick Stolee1-10/+45
This table is helpful for discovering data in the index to ensure it is being written correctly, especially as we build and test the sparse-index. This table includes an output format similar to 'git ls-tree', but should not be compared to that directly. The biggest reasons are that 'git ls-tree' includes a tree entry for every subdirectory, even those that would not appear as a sparse directory in a sparse-index. Further, 'git ls-tree' does not use a trailing directory separator for its tree rows. This does not print the stat() information for the blobs. That will be added in a future change with another option. The tests that are added in the next few changes care only about the object types and IDs. However, this future need for full index information justifies the need for this test helper over extending a user-facing feature, such as 'git ls-files'. To make the option parsing slightly more robust, wrap the string comparisons in a loop adapted from test-dir-iterator.c. Care must be taken with the final check for the 'cnt' variable. We continue the expectation that the numerical value is the final argument. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30t/helper/test-read-midx.c: add '--show-objects'Libravatar Taylor Blau1-4/+20
The 'read-midx' helper is used in places like t5319 to display basic information about a multi-pack-index. In the next patch, the MIDX writing machinery will learn a new way to choose from which pack an object is selected when multiple copies of that object exist. To disambiguate which pack introduces an object so that this feature can be tested, add a '--show-objects' option which displays additional information about each object in the MIDX. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-24Merge branch 'nk/diff-index-fsmonitor'Libravatar Junio C Hamano1-2/+2
"git diff-index" codepath has been taught to trust fsmonitor status to reduce number of lstat() calls. * nk/diff-index-fsmonitor: fsmonitor: add perf test for git diff HEAD fsmonitor: add assertion that fsmonitor is valid to check_removed fsmonitor: skip lstat deletion check during git diff-index
2021-03-22t0052: add simple-ipc tests and t/helper/test-simple-ipc toolLibravatar Jeff Hostetler3-0/+789
Create t0052-simple-ipc.sh with unit tests for the "simple-ipc" mechanism. Create t/helper/test-simple-ipc test tool to exercise the "simple-ipc" functions. When the tool is invoked with "run-daemon", it runs a server to listen for "simple-ipc" connections on a test socket or named pipe and responds to a set of commands to exercise/stress the communication setup. When the tool is invoked with "start-daemon", it spawns a "run-daemon" command in the background and waits for the server to become ready before exiting. (This helps make unit tests in t0052 more predictable and avoids the need for arbitrary sleeps in the test script.) The tool also has a series of client "send" commands to send commands and data to a server instance. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-19Merge branch 'jc/calloc-fix'Libravatar Junio C Hamano1-1/+1
Code clean-up. * jc/calloc-fix: xcalloc: use CALLOC_ARRAY() when applicable
2021-03-18fsmonitor: add perf test for git diff HEADLibravatar Nipunn Koorapati1-2/+2
Update the xargs call so that if your large repo contains symlinks, test-tool chmtime failure does not end the script. On Linux Test this tree upstream/master --------------------------------------------------------------------------------------------------------- 7519.4: status (fsmonitor=fsmonitor-watchman) 0.52(0.43+0.10) 0.53(0.49+0.05) +1.9% 7519.5: status -uno (fsmonitor=fsmonitor-watchman) 0.21(0.15+0.07) 0.22(0.13+0.09) +4.8% 7519.6: status -uall (fsmonitor=fsmonitor-watchman) 1.65(0.93+0.71) 1.69(1.03+0.65) +2.4% 7519.7: status (dirty) (fsmonitor=fsmonitor-watchman) 11.99(11.34+1.58) 11.95(11.02+1.79) -0.3% 7519.8: diff (fsmonitor=fsmonitor-watchman) 0.25(0.17+0.26) 0.25(0.18+0.26) +0.0% 7519.9: diff HEAD (fsmonitor=fsmonitor-watchman) 0.39(0.25+0.34) 0.89(0.35+0.74) +128.2% 7519.10: diff -- 0_files (fsmonitor=fsmonitor-watchman) 0.16(0.13+0.04) 0.16(0.12+0.05) +0.0% 7519.11: diff -- 10_files (fsmonitor=fsmonitor-watchman) 0.16(0.12+0.05) 0.16(0.12+0.05) +0.0% 7519.12: diff -- 100_files (fsmonitor=fsmonitor-watchman) 0.16(0.12+0.05) 0.16(0.12+0.05) +0.0% 7519.13: diff -- 1000_files (fsmonitor=fsmonitor-watchman) 0.16(0.11+0.06) 0.16(0.12+0.05) +0.0% 7519.14: diff -- 10000_files (fsmonitor=fsmonitor-watchman) 0.18(0.13+0.06) 0.17(0.10+0.08) -5.6% 7519.15: add (fsmonitor=fsmonitor-watchman) 2.25(1.53+0.68) 2.25(1.47+0.74) +0.0% 7519.18: status (fsmonitor=disabled) 0.88(0.73+1.03) 0.89(0.67+1.08) +1.1% 7519.19: status -uno (fsmonitor=disabled) 0.45(0.43+0.89) 0.45(0.34+0.98) +0.0% 7519.20: status -uall (fsmonitor=disabled) 1.88(1.16+1.58) 1.88(1.22+1.51) +0.0% 7519.21: status (dirty) (fsmonitor=disabled) 7.53(7.05+2.11) 7.53(6.98+2.04) +0.0% 7519.22: diff (fsmonitor=disabled) 0.42(0.37+0.92) 0.42(0.38+0.91) +0.0% 7519.23: diff HEAD (fsmonitor=disabled) 0.44(0.41+0.90) 0.44(0.40+0.91) +0.0% 7519.24: diff -- 0_files (fsmonitor=disabled) 0.13(0.09+0.05) 0.13(0.09+0.05) +0.0% 7519.25: diff -- 10_files (fsmonitor=disabled) 0.13(0.10+0.04) 0.13(0.10+0.04) +0.0% 7519.26: diff -- 100_files (fsmonitor=disabled) 0.13(0.09+0.05) 0.13(0.10+0.04) +0.0% 7519.27: diff -- 1000_files (fsmonitor=disabled) 0.13(0.09+0.06) 0.13(0.09+0.05) +0.0% 7519.28: diff -- 10000_files (fsmonitor=disabled) 0.14(0.11+0.05) 0.14(0.10+0.05) +0.0% 7519.29: add (fsmonitor=disabled) 2.43(1.61+1.64) 2.43(1.69+1.57) +0.0% On linux (2.29.2 vs w/ this patch): nipunn@nipunn-dbx:~/src/server3$ strace -f -c git diff 2>&1 | grep lstat 0.04 0.000063 3 20 6 lstat nipunn@nipunn-dbx:~/src/server3$ strace -f -c git diff HEAD 2>&1 | grep lstat 94.98 5.242262 10 523783 13 lstat nipunn@nipunn-dbx:~/src/server3$ strace -f -c ../git/bin-wrappers/git diff 2>&1 | grep lstat 0.38 0.000032 5 7 3 lstat nipunn@nipunn-dbx:~/src/server3$ strace -f -c ../git/bin-wrappers/git diff HEAD 2>&1 | grep lstat 99.44 0.741892 9 81634 10 lstat On mac (2.29.2 vs w/ this patch): nipunn-mbp:server nipunn$ sudo dtruss -L -f -c git diff 2>&1 | grep "^lstat64 " lstat64 8 nipunn-mbp:server nipunn$ sudo dtruss -L -f -c git diff HEAD 2>&1 | grep "^lstat64 " lstat64 120242 nipunn-mbp:server nipunn$ sudo dtruss -L -f -c ../git/bin-wrappers/git diff 2>&1 | grep "^lstat64 " lstat64 4 nipunn-mbp:server nipunn$ sudo dtruss -L -f -c ../git/bin-wrappers/git diff HEAD 2>&1 | grep "^lstat64 " lstat64 4497 There are still a bunch of lstats - on directories, but not every file. Progress! Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-15xcalloc: use CALLOC_ARRAY() when applicableLibravatar Junio C Hamano1-1/+1
These are for codebase before Git 2.31 Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17Merge branch 'jt/trace2-BUG'Libravatar Junio C Hamano1-0/+9
Even though invocations of "die()" were logged to the trace2 system, "BUG()"s were not, which has been corrected. * jt/trace2-BUG: usage: trace2 BUG() invocations
2021-02-17Merge branch 'ak/corrected-commit-date'Libravatar Junio C Hamano1-0/+4
The commit-graph learned to use corrected commit dates instead of the generation number to help topological revision traversal. * ak/corrected-commit-date: doc: add corrected commit date info commit-reach: use corrected commit dates in paint_down_to_common() commit-graph: use generation v2 only if entire chain does commit-graph: implement generation data chunk commit-graph: implement corrected commit date commit-graph: return 64-bit generation number commit-graph: add a slab to store topological levels t6600-test-reach: generalize *_three_modes commit-graph: consolidate fill_commit_graph_info revision: parse parent in indegree_walk_step() commit-graph: fix regression when computing Bloom filters
2021-02-10Merge branch 'ab/grep-pcre-invalid-utf8'Libravatar Junio C Hamano3-0/+14
Update support for invalid UTF-8 in PCRE2. * ab/grep-pcre-invalid-utf8: grep/pcre2: better support invalid UTF-8 haystacks grep/pcre2 tests: don't rely on invalid UTF-8 data test
2021-02-09usage: trace2 BUG() invocationsLibravatar Jonathan Tan1-0/+9
die() messages are traced in trace2, but BUG() messages are not. Anyone tracking die() messages would have even more reason to track BUG(). Therefore, write to trace2 when BUG() is invoked. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-24grep/pcre2: better support invalid UTF-8 haystacksLibravatar Ævar Arnfjörð Bjarmason3-0/+14
Improve the support for invalid UTF-8 haystacks given a non-ASCII needle when using the PCREv2 backend. This is a more complete fix for a bug I started to fix in 870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching, 2019-07-26), now that PCREv2 has the PCRE2_MATCH_INVALID_UTF mode we can make use of it. This fixes the sort of case described in 8a5999838e (grep: stess test PCRE v2 on invalid UTF-8 data, 2019-07-26), i.e.: - The subject string is non-ASCII (e.g. "ævar") - We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C" - We are using --ignore-case, or we're a non-fixed pattern If those conditions were satisfied and we matched found non-valid UTF-8 data PCREv2 might bark on it, in practice this only happened under the JIT backend (turned on by default on most platforms). Ultimately this fixes a "regression" in b65abcafc7 ("grep: use PCRE v2 for optimized fixed-string search", 2019-07-01), I'm putting that in scare-quotes because before then we wouldn't properly support these complex case-folding, locale etc. cases either, it just broke in different ways. There was a bug related to this the PCRE2_NO_START_OPTIMIZE flag fixed in PCREv2 10.36. It can be worked around by setting the PCRE2_NO_START_OPTIMIZE flag. Let's do that in those cases, and add tests for the bug. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21refs: switch peel_ref() to peel_iterated_oid()Libravatar Jeff King1-13/+0
The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char *refname, const struct object_id *oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-18commit-graph: implement generation data chunkLibravatar Abhishek Kumar1-0/+4
As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-25Merge branch 'jx/t5411-flake-fix'Libravatar Junio C Hamano1-14/+40
The exchange between receive-pack and proc-receive hook did not carefully check for errors. * jx/t5411-flake-fix: receive-pack: use default version 0 for proc-receive receive-pack: gently write messages to proc-receive t5411: new helper filter_out_user_friendly_and_stable_output
2020-11-21Merge branch 'en/strmap'Libravatar Junio C Hamano1-5/+4
A specialization of hashmap that uses a string as key has been introduced. Hopefully it will see wider use over time. * en/strmap: shortlog: use strset from strmap.h Use new HASHMAP_INIT macro to simplify hashmap initialization strmap: take advantage of FLEXPTR_ALLOC_STR when relevant strmap: enable allocations to come from a mem_pool strmap: add a strset sub-type strmap: split create_entry() out of strmap_put() strmap: add functions facilitating use as a string->int map strmap: enable faster clearing and reusing of strmaps strmap: add more utility functions strmap: new utility functions hashmap: provide deallocation function names hashmap: introduce a new hashmap_partial_clear() hashmap: allow re-use after hashmap_free() hashmap: adjust spacing to fix argument alignment hashmap: add usage documentation explaining hashmap_free[_entries]()
2020-11-18Merge branch 'en/merge-ort-api-null-impl'Libravatar Junio C Hamano3-0/+213
Preparation for a new merge strategy. * en/merge-ort-api-null-impl: merge,rebase,revert: select ort or recursive by config or environment fast-rebase: demonstrate merge-ort's API via new test-tool command merge-ort-wrappers: new convience wrappers to mimic the old merge API merge-ort: barebones API of new merge strategy with empty implementation
2020-11-18Merge branch 'ds/maintenance-part-3'Libravatar Junio C Hamano3-0/+37
Parts of "git maintenance" to ease writing crontab entries (and other scheduling system configuration) for it. * ds/maintenance-part-3: maintenance: add troubleshooting guide to docs maintenance: use 'incremental' strategy by default maintenance: create maintenance.strategy config maintenance: add start/stop subcommands maintenance: add [un]register subcommands for-each-repo: run subcommands on configured repos maintenance: add --schedule option and config maintenance: optionally skip --auto process
2020-11-11Use new HASHMAP_INIT macro to simplify hashmap initializationLibravatar Elijah Newren1-2/+1
Now that hashamp has lazy initialization and a HASHMAP_INIT macro, hashmaps allocated on the stack can be initialized without a call to hashmap_init() and in some cases makes the code a bit shorter. Convert some callsites over to take advantage of this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11receive-pack: use default version 0 for proc-receiveLibravatar Jiang Xin1-6/+10
In the verison negotiation phase between "receive-pack" and "proc-receive", "proc-receive" can send an empty flush-pkt to end the negotiation and use default version 0. Capabilities (such as "push-options") are not supported in version 0. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11receive-pack: gently write messages to proc-receiveLibravatar Jiang Xin1-10/+32
Johannes found a flaky hang in `t5411/test-0013-bad-protocol.sh` in the osx-clang job of the CI/PR builds, and ran into an issue when using the `--stress` option with the following error messages: fatal: unable to write flush packet: Broken pipe send-pack: unexpected disconnect while reading sideband packet fatal: the remote end hung up unexpectedly In this test case, the "proc-receive" hook sends an error message and dies earlier. While "receive-pack" on the other side of the pipe should forward the error message of the "proc-receive" hook to the client side, but it fails to do so. This is because "receive-pack" uses `packet_write_fmt()` and `packet_flush()` to write pkt-line message to "proc-receive" hook, and these functions die immediately when pipe is broken. Using "gently" forms for these functions will get more predicable output. Add more "--die-*" options to test helper to test different stages of the protocol between "receive-pack" and "proc-receive" hook. Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02Merge branch 'js/avoid-split-sideband-message'Libravatar Junio C Hamano1-0/+23
The side-band status report can be sent at the same time as the primary payload multiplexed, but the demultiplexer on the receiving end incorrectly split a single status report into two, which has been corrected. * js/avoid-split-sideband-message: test-pkt-line: drop colon from sideband identity sideband: report unhandled incomplete sideband messages as bugs sideband: avoid reporting incomplete sideband messages
2020-11-02hashmap: provide deallocation function namesLibravatar Elijah Newren1-3/+3
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29fast-rebase: demonstrate merge-ort's API via new test-tool commandLibravatar Elijah Newren3-0/+213
Add a new test-tool command named 'fast-rebase', which is a super-slimmed down and nowhere near as capable version of 'git rebase'. 'test-tool fast-rebase' is not currently planned for usage in the testsuite, but is here for two purposes: 1) Demonstrate the desired API of merge-ort. In particular, fast-rebase takes advantage of the separation of the merging operation from the updating of the index and working tree, to allow it to pick N commits, but only update the index and working tree once at the end. Look for the calls to merge_incore_nonrecursive() and merge_switch_to_result(). 2) Provide a convenient benchmark that isn't polluted by the heavy disk writing and forking of unnecessary processes that comes from sequencer.c and merge-recursive.c. fast-rebase is not meant to replace sequencer.c, just give ideas on how sequencer.c can be changed. Updating sequencer.c with these goals is probably a large amount of work; writing a simple targeted command with no documentation, less-than-useful help messages, numerous limitations in terms of flags it can accept and situations it can handle, and which is flagged off from users is a much easier interim step. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-27test-pkt-line: drop colon from sideband identityLibravatar Jeff King1-1/+1
We pass "sideband: " as our identity for errors to recv_sideband(). But it already adds the trailing colon and space. This doesn't invalidate any tests, but it looks funny when you examine the test output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20sideband: avoid reporting incomplete sideband messagesLibravatar Johannes Schindelin1-0/+23
In 2b695ecd74d (t5500: count objects through stderr, not trace, 2020-05-06) we tried to ensure that the "Total 3" message could be grepped in Git's output, even if it sometimes got chopped up into multiple lines in the trace machinery. However, the first instance where this mattered now goes through the sideband machinery, where it is _still_ possible for messages to get chopped up: it *is* possible for the standard error stream to be sent byte-for-byte and hence it can be easily interrupted. Meaning: it is possible for the single line that we're looking for to be chopped up into multiple sideband packets, with a primary packet being delivered between them. This seems to happen occasionally in the `vs-test` part of our CI builds, i.e. with binaries built using Visual C, but not when building with GCC or clang; The symptom is that t5500.43 fails to find a line matching `remote: Total 3` in the `log` file, which ends in something along these lines: remote: Tota remote: l 3 (delta 0), reused 0 (delta 0), pack-reused 0 This should not happen, though: we have code in `demultiplex_sideband()` _specifically_ to stitch back together lines that were delivered in separate sideband packets. However, this stitching was broken in a subtle way in fbd76cd450 (sideband: reverse its dependency on pkt-line, 2019-01-16): before that change, incomplete sideband lines would not be flushed upon receiving a primary packet, but after that patch, they would be. The subtleness of this bug comes from the fact that it is easy to get confused by the ambiguous meaning of the `break` keyword: after writing the primary packet contents, the `break;` in the original version of `recv_sideband()` does _not_ break out of the `while` loop, but instead only ends the `switch` case: while (!retval) { [...] switch (band) { [...] case 1: /* Write the contents of the primary packet */ write_or_die(out, buf + 1, len); /* Here, we do *not* break out of the loop, `retval` is unchanged */ break; [...] } if (outbuf.len) { /* Write any remaining sideband messages lacking a trailing LF */ strbuf_addch(&outbuf, '\n'); xwrite(2, outbuf.buf, outbuf.len); } In contrast, after fbd76cd450 (sideband: reverse its dependency on pkt-line, 2019-01-16), the body of the `while` loop was extracted into `demultiplex_sideband()`, crucially _including_ the logic to write incomplete sideband messages: switch (band) { [...] case 1: *sideband_type = SIDEBAND_PRIMARY; /* This does not break out of the loop: the loop is in the caller */ break; [...] } cleanup: [...] /* This logic is now no longer _outside_ the loop but _inside_ */ if (scratch->len) { strbuf_addch(scratch, '\n'); xwrite(2, scratch->buf, scratch->len); } The correct way to fix this is to return from `demultiplex_sideband()` early. The caller will then write out the contents of the primary packet and continue looping. The `scratch` buffer for incomplete sideband messages is owned by that caller, and will continue to accumulate the remainder(s) of those messages. The loop will only end once `demultiplex_sideband()` returned non-zero _and_ did not indicate a primary packet, which is the case only when we hit the `cleanup:` path, in which we take care of flushing any unfinished sideband messages and release the `scratch` buffer. To ensure that this does not get broken again, we introduce a pair of subcommands of the `pkt-line` test helper that specifically chop up the sideband message and squeeze a primary packet into the middle. Final note: The other test case touched by 2b695ecd74d (t5500: count objects through stderr, not trace, 2020-05-06) is not affected by this issue because the sideband machinery is not involved there. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-05Merge branch 'jk/unused'Libravatar Junio C Hamano2-5/+5
Code cleanup. * jk/unused: dir.c: drop unused "untracked" from treat_path_fast() sequencer: handle ignore_footer when parsing trailers test-advise: check argument count with argc instead of argv sparse-checkout: fill in some options boilerplate sequencer: drop repository argument from run_git_commit() push: drop unused repo argument to do_push() assert PARSE_OPT_NONEG in parse-options callbacks env--helper: write to opt->value in parseopt helper drop unused argc parameters convert: drop unused crlf_action from check_global_conv_flags_eol()
2020-10-05Merge branch 'ds/in-merge-bases-many-optim-bug'Libravatar Junio C Hamano1-0/+2
in_merge_bases_many(), a way to see if a commit is reachable from any commit in a set of commits, was totally broken when the commit-graph feature was in use, which has been corrected. * ds/in-merge-bases-many-optim-bug: commit-reach: fix in_merge_bases_many bug
2020-10-02commit-reach: fix in_merge_bases_many bugLibravatar Derrick Stolee1-0/+2
Way back in f9b8908b (commit.c: use generation numbers for in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit the in_merge_bases() walk. This works just fine as long as the caller is checking only two commits, but when there are multiple, there is a possibility that this heuristic is _very wrong_. Some code moves since then has changed this method to repo_in_merge_bases_many() inside commit-reach.c. The heuristic computes the minimum generation number of the "reference" list, then compares this number to the generation number of the "commit". In a recent topic, a test was added that used in_merge_bases_many() to test if a commit was reachable from a number of commits pulled from a reflog. However, this highlighted the problem: if any of the reference commits have a smaller generation number than the given commit, then the walk is skipped _even if there exist some with higher generation number_. This heuristic is wrong! It must check the MAXIMUM generation number of the reference commits, not the MINIMUM. This highlights a testing gap. t6600-test-reach.sh covers many methods in commit-reach.c, including in_merge_bases() and get_merge_bases_many(), but since these methods either restrict to two input commits or actually look for the full list of merge bases, they don't check this heuristic! Add a possible input to "test-tool reach" that tests in_merge_bases_many() and add tests to t6600-test-reach.sh that cover this heuristic. This includes cases for the reference commits having generation above and below the generation of the input commit, but also having maximum generation below the generation of the input commit. The fix itself is to swap min_generation with a max_generation in repo_in_merge_bases_many(). Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30test-advise: check argument count with argc instead of argvLibravatar Jeff King1-2/+2
We complain if "test-tool advise" is not given an argument, but we quietly ignore any additional arguments it receives. Let's instead check that we got the expected number. As a bonus, this silences -Wunused-parameter, which notes that we don't ever look at argc. While we're here, we can also fix the indentation in the conditional. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>