summaryrefslogtreecommitdiff
path: root/t
AgeCommit message (Collapse)AuthorFilesLines
2020-12-08pack-bitmap-write: build fewer intermediate bitmapsLibravatar Derrick Stolee1-3/+82
The bitmap_writer_build() method calls bitmap_builder_init() to construct a list of commits reachable from the selected commits along with a "reverse graph". This reverse graph has edges pointing from a commit to other commits that can reach that commit. After computing a reachability bitmap for a commit, the values in that bitmap are then copied to the reachability bitmaps across the edges in the reverse graph. We can now relax the role of the reverse graph to greatly reduce the number of intermediate reachability bitmaps we compute during this reverse walk. The end result is that we walk objects the same number of times as before when constructing the reachability bitmaps, but we also spend much less time copying bits between bitmaps and have much lower memory pressure in the process. The core idea is to select a set of "important" commits based on interactions among the sets of commits reachable from each selected commit. The first technical concept is to create a new 'commit_mask' member in the bb_commit struct. Note that the selected commits are provided in an ordered array. The first thing to do is to mark the ith bit in the commit_mask for the ith selected commit. As we walk the commit-graph, we copy the bits in a commit's commit_mask to its parents. At the end of the walk, the ith bit in the commit_mask for a commit C stores a boolean representing "The ith selected commit can reach C." As we walk, we will discover non-selected commits that are important. We will get into this later, but those important commits must also receive bit positions, growing the width of the bitmasks as we walk. At the true end of the walk, the ith bit means "the ith _important_ commit can reach C." MAXIMAL COMMITS --------------- We use a new 'maximal' bit in the bb_commit struct to represent whether a commit is important or not. The term "maximal" comes from the partially-ordered set of commits in the commit-graph where C >= P if P is a parent of C, and then extending the relationship transitively. Instead of taking the maximal commits across the entire commit-graph, we instead focus on selecting each commit that is maximal among commits with the same bits on in their commit_mask. This definition is important, so let's consider an example. Suppose we have three selected commits A, B, and C. These are assigned bitmasks 100, 010, and 001 to start. Each of these can be marked as maximal immediately because they each will be the uniquely maximal commit that contains their own bit. Keep in mind that that these commits may have different bitmasks after the walk; for example, if B can reach C but A cannot, then the final bitmask for C is 011. Even in these cases, C would still be a maximal commit among all commits with the third bit on in their masks. Now define sets X, Y, and Z to be the sets of commits reachable from A, B, and C, respectively. The intersections of these sets correspond to different bitmasks: * 100: X - (Y union Z) * 010: Y - (X union Z) * 001: Z - (X union Y) * 110: (X intersect Y) - Z * 101: (X intersect Z) - Y * 011: (Y intersect Z) - X * 111: X intersect Y intersect Z This can be visualized with the following Hasse diagram: 100 010 001 | \ / \ / | | \/ \/ | | /\ /\ | | / \ / \ | 110 101 011 \___ | ___/ \ | / 111 Some of these bitmasks may not be represented, depending on the topology of the commit-graph. In fact, we are counting on it, since the number of possible bitmasks is exponential in the number of selected commits, but is also limited by the total number of commits. In practice, very few bitmasks are possible because most commits converge on a common "trunk" in the commit history. With this three-bit example, we wish to find commits that are maximal for each bitmask. How can we identify this as we are walking? As we walk, we visit a commit C. Since we are walking the commits in topo-order, we know that C is visited after all of its children are visited. Thus, when we get C from the revision walk we inspect the 'maximal' property of its bb_data and use that to determine if C is truly important. Its commit_mask is also nearly final. If C is not one of the originally-selected commits, then assign a bit position to C (by incrementing num_maximal) and set that bit on in commit_mask. See "MULTIPLE MAXIMAL COMMITS" below for more detail on this. Now that the commit C is known to be maximal or not, consider each parent P of C. Compute two new values: * c_not_p : true if and only if the commit_mask for C contains a bit that is not contained in the commit_mask for P. * p_not_c : true if and only if the commit_mask for P contains a bit that is not contained in the commit_mask for P. If c_not_p is false, then P already has all of the bits that C would provide to its commit_mask. In this case, move on to other parents as C has nothing to contribute to P's state that was not already provided by other children of P. We continue with the case that c_not_p is true. This means there are bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or() to add those bits. If p_not_c is also true, then set the maximal bit for P to one. This means that if no other commit has P as a parent, then P is definitely maximal. This is because no child had the same bitmask. It is important to think about the maximal bit for P at this point as a temporary state: "P is maximal based on current information." In contrast, if p_not_c is false, then set the maximal bit for P to zero. Further, clear all reverse_edges for P since any edges that were previously assigned to P are no longer important. P will gain all reverse edges based on C. The final thing we need to do is to update the reverse edges for P. These reverse edges respresent "which closest maximal commits contributed bits to my commit_mask?" Since C contributed bits to P's commit_mask in this case, C must add to the reverse edges of P. If C is maximal, then C is a 'closest' maximal commit that contributed bits to P. Add C to P's reverse_edges list. Otherwise, C has a list of maximal commits that contributed bits to its bitmask (and this list is exactly one element). Add all of these items to P's reverse_edges list. Be careful to ignore duplicates here. After inspecting all parents P for a commit C, we can clear the commit_mask for C. This reduces the memory load to be limited to the "width" of the commit graph. Consider our ABC/XYZ example from earlier and let's inspect the state of the commits for an interesting bitmask, say 011. Suppose that D is the only maximal commit with this bitmask (in the first three bits). All other commits with bitmask 011 have D as the only entry in their reverse_edges list. D's reverse_edges list contains B and C. COMPUTING REACHABILITY BITMAPS ------------------------------ Now that we have our definition, let's zoom out and consider what happens with our new reverse graph when computing reachability bitmaps. We walk the reverse graph in reverse-topo-order, so we visit commits with largest commit_masks first. After we compute the reachability bitmap for a commit C, we push the bits in that bitmap to each commit D in the reverse edge list for C. Then, when we finally visit D we already have the bits for everything reachable from maximal commits that D can reach and we only need to walk the objects in the set-difference. In our ABC/XYZ example, when we finally walk for the commit A we only need to walk commits with bitmask equal to A's bitmask. If that bitmask is 100, then we are only walking commits in X - (Y union Z) because the bitmap already contains the bits for objects reachable from (X intersect Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps for the maximal commits with bitmasks 110 and 101). The behavior is intended to walk each commit (and the trees that commit introduces) at most once while allocating and copying fewer reachability bitmaps. There is one caveat: what happens when there are multiple maximal commits with the same bitmask, with respect to the initial set of selected commits? MULTIPLE MAXIMAL COMMITS ------------------------ Earlier, we mentioned that when we discover a new maximal commit, we assign a new bit position to that commit and set that bit position to one for that commit. This is absolutely important for interesting commit-graphs such as git/git and torvalds/linux. The reason is due to the existence of "butterflies" in the commit-graph partial order. Here is an example of four commits forming a butterfly: I J |\ /| | \/ | | /\ | |/ \| M N \ / |/ Q Here, I and J both have parents M and N. In general, these do not need to be exact parent relationships, but reachability relationships. The most important part is that M and N cannot reach each other, so they are independent in the partial order. If I had commit_mask 10 and J had commit_mask 01, then M and N would both be assigned commit_mask 11 and be maximal commits with the bitmask 11. Then, what happens when M and N can both reach a commit Q? If Q is also assigned the bitmask 11, then it is not maximal but is reachable from both M and N. While this is not necessarily a deal-breaker for our abstract definition of finding maximal commits according to a given bitmask, we have a few issues that can come up in our larger picture of constructing reachability bitmaps. In particular, if we do not also consider Q to be a "maximal" commit, then we will walk commits reachable from Q twice: once when computing the reachability bitmap for M and another time when computing the reachability bitmap for N. This becomes much worse if the topology continues this pattern with multiple butterflies. The solution has already been mentioned: each of M and N are assigned their own bits to the bitmask and hence they become uniquely maximal for their bitmasks. Finally, Q also becomes maximal and thus we do not need to walk its commits multiple times. The final bitmasks for these commits are as follows: I:10 J:01 |\ /| | \ _____/ | | /\____ | |/ \ | M:111 N:1101 \ / Q:1111 Further, Q's reverse edge list is { M, N }, while M and N both have reverse edge list { I, J }. PERFORMANCE MEASUREMENTS ------------------------ Now that we've spent a LOT of time on the theory of this algorithm, let's show that this is actually worth all that effort. To test the performance, use GIT_TRACE2_PERF=1 when running 'git repack -abd' in a repository with no existing reachability bitmaps. This avoids any issues with keeping existing bitmaps to skew the numbers. Inspect the "building_bitmaps_total" region in the trace2 output to focus on the portion of work that is affected by this change. Here are the performance comparisons for a few repositories. The timings are for the following versions of Git: "multi" is the timing from before any reverse graph is constructed, where we might perform multiple traversals. "reverse" is for the previous change where the reverse graph has every reachable commit. Finally "maximal" is the version introduced here where the reverse graph only contains the maximal commits. Repository: git/git multi: 2.628 sec reverse: 2.344 sec maximal: 2.047 sec Repository: torvalds/linux multi: 64.7 sec reverse: 205.3 sec maximal: 44.7 sec So in all cases we've not only recovered any time lost to switching to the reverse-edge algorithm, but we come out ahead of "multi" in all cases. Likewise, peak heap has gone back to something reasonable: Repository: torvalds/linux multi: 2.087 GB reverse: 3.141 GB maximal: 2.288 GB While I do not have access to full fork networks on GitHub, Peff has run this algorithm on the chromium/chromium fork network and reported a change from 3 hours to ~233 seconds. That network is particularly beneficial for this approach because it has a long, linear history along with many tags. The "multi" approach was obviously quadratic and the new approach is linear. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08t5310: add branch-based checksLibravatar Derrick Stolee1-27/+34
The current rev-list tests that check the bitmap data only work on HEAD instead of multiple branches. Expand the test cases to handle both 'master' and 'other' branches. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08t5310: drop size of truncated ewah bitmapLibravatar Jeff King1-7/+8
We truncate the .bitmap file to 512 bytes and expect to run into problems reading an individual ewah file. But this length is somewhat arbitrary, and just happened to work when the test was added in 9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14). An upcoming commit will change the size of the history we create in the test repo, which will cause this test to fail. We can future-proof it a bit more by reducing the size of the truncated bitmap file. Signed-off-by: Jeff King <peff@peff.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08pack-bitmap: bounds-check size of cache extensionLibravatar Jeff King1-2/+15
A .bitmap file may have a "name hash cache" extension, which puts a sequence of uint32_t values (one per object) at the end of the file. When we see a flag indicating this extension, we blindly subtract the appropriate number of bytes from our available length. However, if the .bitmap file is too short, we'll underflow our length variable and wrap around, thinking we have a very large length. This can lead to reading out-of-bounds bytes while loading individual ewah bitmaps. We can fix this by checking the number of available bytes when we parse the header. The existing "truncated bitmap" test is now split into two tests: one where we don't have this extension at all (and hence actually do try to read a truncated ewah bitmap) and one where we realize up-front that we can't even fit in the cache structure. We'll check stderr in each case to make sure we hit the error we're expecting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11Merge branch 'js/test-file-size'Libravatar Junio C Hamano5-42/+24
Test clean-up. * js/test-file-size: tests: consolidate the `file_size` function into `test-lib-functions.sh`
2020-11-11Merge branch 'js/test-whitespace-fixes'Libravatar Junio C Hamano7-112/+113
Test code clean-up. * js/test-whitespace-fixes: t9603: use tabs for indentation t5570: remove trailing padding t5400,t5402: consistently indent with tabs, not with spaces t3427: adjust stale comment t3406: indent with tabs, not spaces t1004: insert missing "branch" in a message
2020-11-11Merge branch 'rs/worktree-list-show-locked'Libravatar Junio C Hamano1-1/+1
Typofix. * rs/worktree-list-show-locked: t2402: fix typo
2020-11-09Merge branch 'js/default-branch-name-adjust-t5411'Libravatar Junio C Hamano32-723/+726
Prepare a test script to transition of the default branch name to 'main'. * js/default-branch-name-adjust-t5411: t5411: finish preparing for `main` being the default branch name t5411: adjust the remaining support files for init.defaultBranch=main t5411: start adjusting the support files for init.defaultBranch=main t5411: start using the default branch name "main"
2020-11-09Merge branch 'fc/zsh-completion'Libravatar Junio C Hamano1-1/+1
Zsh autocompletion (in contrib/) update. * fc/zsh-completion: (29 commits) zsh: update copyright notices completion: bash: remove old compat wrappers completion: bash: cleanup cygwin check completion: bash: trivial cleanup completion: zsh: add simple version check completion: zsh: trivial simplification completion: zsh: add alias descriptions completion: zsh: improve command tags completion: zsh: refactor command completion completion: zsh: shuffle functions around completion: zsh: simplify file_direct completion: zsh: simplify nl_append completion: zsh: trivial cleanup completion: zsh: simplify direct compadd completion: zsh: simplify compadd functions completion: zsh: fix splitting of words completion: zsh: add missing direct_append completion: fix conflict with bashcomp completion: zsh: fix completion for --no-.. options completion: bash: remove zsh wrapper ...
2020-11-09Merge branch 'jk/sideband-more-error-checking'Libravatar Junio C Hamano1-0/+12
The code to detect premature EOF in the sideband demultiplexer has been cleaned up. * jk/sideband-more-error-checking: sideband: diagnose more sideband anomalies
2020-11-09Merge branch 'ab/git-remote-exit-code'Libravatar Junio C Hamano1-8/+8
Exit codes from "git remote add" etc. were not usable by scripted callers. * ab/git-remote-exit-code: remote: add meaningful exit code on missing/existing
2020-11-09Merge branch 'pb/ref-filter-with-crlf'Libravatar Junio C Hamano1-0/+126
A commit and tag object may have CR at the end of each and every line (you can create such an object with hash-object or using --cleanup=verbatim to decline the default clean-up action), but it would make it impossible to have a blank line to separate the title from the body of the message. Be lenient and accept a line with lone CR on it as a blank line, too. * pb/ref-filter-with-crlf: log, show: add tests for messages containing CRLF ref-filter: handle CRLF at end-of-line more gracefully
2020-11-09Merge branch 'jk/checkout-index-errors'Libravatar Junio C Hamano2-1/+20
"git checkout-index" did not consistently signal an error with its exit status. * jk/checkout-index-errors: checkout-index: propagate errors to exit code checkout-index: drop error message from empty --stage=all
2020-11-09Merge branch 'jk/perl-warning'Libravatar Junio C Hamano1-0/+6
Dev support. * jk/perl-warning: perl: check for perl warnings while running tests
2020-11-09Merge branch 'nk/diff-files-vs-fsmonitor'Libravatar Junio C Hamano5-48/+68
"git diff" and other commands that share the same machinery to compare with working tree files have been taught to take advantage of the fsmonitor data when available. * nk/diff-files-vs-fsmonitor: p7519-fsmonitor: add a git add benchmark p7519-fsmonitor: refactor to avoid code duplication perf lint: add make test-lint to perf tests t/perf: add fsmonitor perf test for git diff t/perf/p7519-fsmonitor.sh: warm cache on first git status t/perf/README: elaborate on output format fsmonitor: use fsmonitor data in `git diff`
2020-11-09Merge branch 'as/tests-cleanup'Libravatar Junio C Hamano2-2/+4
Micro clean-up of a couple of test scripts. * as/tests-cleanup: t2200,t9832: avoid using 'git' upstream in a pipe
2020-11-09Merge branch 'en/dir-rename-tests'Libravatar Junio C Hamano1-47/+545
More preliminary tests have been added to document desired outcome of various "directory rename" situations. * en/dir-rename-tests: t6423: more involved rules for renaming directories into each other t6423: update directory rename detection tests with new rule t6423: more involved directory rename test directory-rename-detection.txt: update references to regression tests
2020-11-09t9603: use tabs for indentationLibravatar Johannes Schindelin1-12/+12
This patch will let the new `check-whitespace` GitHub workflow be happy with the upcoming patch series that wants to search-and-replace `master` with `main` in t9603 and some other test scripts. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09t5570: remove trailing paddingLibravatar Johannes Schindelin1-6/+6
Two blocks in t5570 want to align the closing double quotes, padding with spaces if needed. Since the maximum length of those lines is defined by the branch name `master`, the upcoming rename to `main` would unalign the quotes. But then, it is unclear how those aligned closing quotes should help readability anyway, so let's just remove that padding altogether. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09t5400,t5402: consistently indent with tabs, not with spacesLibravatar Johannes Schindelin2-84/+85
This patch actually prepares for the upcoming patches to replace `master` with `main` in these tests: we do not want those changes to be flagged by the new `check-whitespace` GitHub workflow (even if those changes do not introduce the whitespace issues, they touch lines affected by those issues without fixing them). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09t3427: adjust stale commentLibravatar Johannes Schindelin1-1/+1
In b6211b89eb3 (tests: avoid variations of the `master` branch name, 2020-09-26), the `master[123]` branch names were renamed to `topic_[123]`. A non-literal mention of the corresponding files was missed in that commit. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09t3406: indent with tabs, not spacesLibravatar Johannes Schindelin1-8/+8
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09t1004: insert missing "branch" in a messageLibravatar Johannes Schindelin1-1/+1
The message in question reads awkward with the name "master", but will be even more confusing once that is renamed to "main". Let's adjust it in advance of said rename. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-06tests: consolidate the `file_size` function into `test-lib-functions.sh`Libravatar Johannes Schindelin5-42/+24
In 8de7eeb54b6 (compression: unify pack.compression configuration parsing, 2016-11-15), we introduced identical copies of the `file_size` helper into three test scripts, with the plan to eventually consolidate them into a single copy. Let's do that, and adjust the function name to adhere to the `test_*` naming convention. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-03t2402: fix typoLibravatar Johannes Schindelin1-1/+1
In c57b3367bed (worktree: teach `list` to annotate locked worktree, 2020-10-11), we introduced a test case that wanted to talk about "worktrees" but talked about "worktress" instead. Let's fix that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02Merge branch 'js/default-branch-name-part-4-minus-1'Libravatar Junio C Hamano17-106/+123
Adjust tests so that they won't scream when the default initial branch name is changed to 'main'. * js/default-branch-name-part-4-minus-1: t1400: prepare for `main` being default branch name tests: prepare aligned mentions of the default branch name t9902: prepare a test for the upcoming default branch name t3200: prepare for `main` being shorter than `master` t5703: adjust a test case for the upcoming default branch name t6200: adjust suppression pattern to also match "main" tests: start moving to a different default main branch name t9801: use `--` in preparation for default branch rename fmt-merge-msg: also suppress "into main" by default
2020-11-02Merge branch 've/userdiff-bash'Libravatar Junio C Hamano16-0/+67
The userdiff pattern learned to identify the function definition in POSIX shells and bash. * ve/userdiff-bash: userdiff: support Bash
2020-11-02Merge branch 'js/t7006-cleanup'Libravatar Junio C Hamano1-42/+42
Code clean-up. * js/t7006-cleanup: t7006: Use test_path_is_* functions in test script
2020-11-02Merge branch 'mk/diff-ignore-regex'Libravatar Junio C Hamano2-0/+140
"git diff" family of commands learned the "-I<regex>" option to ignore hunks whose changed lines all match the given pattern. * mk/diff-ignore-regex: diff: add -I<regex> that ignores matching changes merge-base, xdiff: zero out xpparam_t structures
2020-11-02Merge branch 'jt/apply-reverse-twice'Libravatar Junio C Hamano2-0/+16
"git apply -R" did not handle patches that touch the same path twice correctly, which has been corrected. This is most relevant in a patch that changes a path from a regular file to a symbolic link (and vice versa). * jt/apply-reverse-twice: apply: when -R, also reverse list of sections
2020-11-02Merge branch 'sc/sequencer-gpg-octopus'Libravatar Junio C Hamano1-0/+56
"git rebase --rebase-merges" did not correctly pass --gpg-sign command line option to underlying "git merge" when replaying a merge using non-default merge strategy or when replaying an octopus merge (because replaying a two-head merge with the default strategy was done in a separate codepath, the problem did not trigger for most users), which has been corrected. * sc/sequencer-gpg-octopus: t3435: add tests for rebase -r GPG signing sequencer: pass explicit --no-gpg-sign to merge sequencer: fix gpg option passed to merge subcommand
2020-11-02Merge branch 'en/test-selector'Libravatar Junio C Hamano5-50/+76
Our test scripts can be told to run only individual pieces while skipping others with the "--run=..." option; they were taught to take a substring of test title, in addition to numbers, to name the test pieces to run. * en/test-selector: test-lib: reduce verbosity of skipped tests t6006, t6012: adjust tests to use 'setup' instead of synonyms test-lib: allow selecting tests by substring/glob with --run
2020-11-02Merge branch 'tk/credential-config'Libravatar Junio C Hamano1-0/+26
"git credential' didn't honor the core.askPass configuration variable (among other things), which has been corrected. * tk/credential-config: credential: load default config
2020-11-02Merge branch 'dl/diff-merge-base'Libravatar Junio C Hamano2-91/+193
"git diff A...B" learned "git diff --merge-base A B", which is a longer short-hand to say the same thing. * dl/diff-merge-base: contrib/completion: complete `git diff --merge-base` builtin/diff-tree: learn --merge-base builtin/diff-index: learn --merge-base t4068: add --merge-base tests diff-lib: define diff_get_merge_base() diff-lib: accept option flags in run_diff_index() contrib/completion: extract common diff/difftool options git-diff.txt: backtick quote command text git-diff-index.txt: make --cached description a proper sentence t4068: remove unnecessary >tmp
2020-11-02Merge branch 'ds/maintenance-commit-graph-auto-fix'Libravatar Junio C Hamano1-0/+37
Test-coverage enhancement of running commit-graph task "git maintenance" as needed led to discovery and fix of a bug. * ds/maintenance-commit-graph-auto-fix: maintenance: core.commitGraph=false prevents writes maintenance: test commit-graph auto condition
2020-11-02Merge branch 'ds/commit-graph-merging-fix'Libravatar Junio C Hamano1-0/+13
When "git commit-graph" detects the same commit recorded more than once while it is merging the layers, it used to die. The code now ignores all but one of them and continues. * ds/commit-graph-merging-fix: commit-graph: don't write commit-graph when disabled commit-graph: ignore duplicates when merging layers
2020-11-02Merge branch 'es/test-cmp-typocatcher'Libravatar Junio C Hamano1-14/+2
A test helper "test_cmp A B" was taught to diagnose missing files A or B as a bug in test, but some tests legitimately wanted to notice a failure to even create file B as an error, in addition to leaving the expected result in it, and were misdiagnosed as a bug. This has been corrected. * es/test-cmp-typocatcher: Revert "test_cmp: diagnose incorrect arguments"
2020-11-02Merge branch 'jk/fast-import-marks-alloc-fix'Libravatar Junio C Hamano1-0/+51
"git fast-import" wasted a lot of memory when many marks were in use. * jk/fast-import-marks-alloc-fix: fast-import: fix over-allocation of marks storage
2020-11-02Merge branch 'js/avoid-split-sideband-message'Libravatar Junio C Hamano2-0/+29
The side-band status report can be sent at the same time as the primary payload multiplexed, but the demultiplexer on the receiving end incorrectly split a single status report into two, which has been corrected. * js/avoid-split-sideband-message: test-pkt-line: drop colon from sideband identity sideband: report unhandled incomplete sideband messages as bugs sideband: avoid reporting incomplete sideband messages
2020-10-31t5411: finish preparing for `main` being the default branch nameLibravatar Johannes Schindelin2-9/+4
In addition to the trivial search-and-replace performed over the course of the previous three commits, there is one test in t5411 that depends on the length of the default branch name. Adjust it and use `main` as the default branch name in this test. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31t5411: adjust the remaining support files for init.defaultBranch=mainLibravatar Johannes Schindelin18-355/+355
This trick was performed via $ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t/t5411/* In the previous commit, we adjusted roughly half of the support files, to stay under the 100kB limit (mails larger than that are rejected by the Git mailing list). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31t5411: start adjusting the support files for init.defaultBranch=mainLibravatar Johannes Schindelin13-361/+361
This trick was performed via $ sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t/t5411/test-00[3-5]* We do not convert the files in `t/t5411/` in one go because the patch would be too big (mails larger than 100kB are rejected by the Git mailing list). Instead, we start with roughly half of the support files. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31t5411: start using the default branch name "main"Libravatar Johannes Schindelin1-7/+15
This is a straight-forward search-and-replace in the test script; However, this is not yet complete because it requires many more replacements in `t/t5411/`, too many for a single patch (the Git mailing list rejects mails larger than 100kB). For that reason, we disable this test script temporarily via the `PREPARE_FOR_MAIN_BRANCH` prereq. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-30Merge branch 'cm/t7xxx-cleanup'Libravatar Junio C Hamano3-258/+217
Micro clean-up. * cm/t7xxx-cleanup: t7102: prepare expected output inside test_expect_* block t7201: put each command on a separate line t7201: use 'git -C' to avoid subshell t7102,t7201: remove whitespace after redirect operator t7102,t7201: remove unnecessary blank spaces in test body t7101,t7102,t7201: modernize test formatting
2020-10-30Merge branch 'ct/t0000-use-test-path-is-file'Libravatar Junio C Hamano1-1/+1
Micro clean-up of a test script. * ct/t0000-use-test-path-is-file: t0000: use test_path_is_file instead of "test -f"
2020-10-30Merge branch 'en/t7518-unflake'Libravatar Junio C Hamano1-1/+1
Work around flakiness in a test. * en/t7518-unflake: t7518: fix flaky grep invocation
2020-10-29Merge branch 'jk/committer-date-is-author-date-fix' into maintLibravatar Junio C Hamano1-2/+2
In 2.29, "--committer-date-is-author-date" option of "rebase" and "am" subcommands lost the e-mail address by mistake, which has been corrected. * jk/committer-date-is-author-date-fix: rebase: fix broken email with --committer-date-is-author-date am: fix broken email with --committer-date-is-author-date t3436: check --committer-date-is-author-date result more carefully
2020-10-29log, show: add tests for messages containing CRLFLibravatar Philippe Blain1-0/+18
A previous commit adjusted the code in ref-filter.c so that messages containing CRLF are now correctly parsed and displayed. Add tests to also check that `git log` and `git show` correctly handle such messages, to prevent futur regressions if these commands are refactored to use the ref-filter API. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29ref-filter: handle CRLF at end-of-line more gracefullyLibravatar Philippe Blain1-0/+108
The ref-filter code does not correctly handle commit or tag messages that use CRLF as the line terminator. Such messages can be created with the `--cleanup=verbatim` option of `git commit` and `git tag`, or by using `git commit-tree` directly. The function `find_subpos` in ref-filter.c looks for two consecutive LFs to find the end of the subject line, a sequence which is absent in messages using CRLF. This results in the whole message being parsed as the subject line (`%(contents:subject)`), and the body of the message (`%(contents:body)`) being empty. Moreover, in `copy_subject`, which wants to return the subject as a single line, '\n' is replaced by space, but '\r' is untouched. This impacts the output of `git branch`, `git tag` and `git for-each-ref`. This behaviour is a regression for `git branch --verbose`, which bisects down to 949af0684c (branch: use ref-filter printing APIs, 2017-01-10). Adjust the ref-filter code to be more lenient by hardening the logic in `copy_subject` and `find_subpos` to correctly parse messages containing CRLF. Add a new test script, 't3920-crlf-messages.sh', to test the behaviour of commands using either the ref-filter or the pretty APIs with messages using CRLF line endings. The function `test_crlf_subject_body_and_contents` can be used to test that the `--format` option of `branch`, `tag`, `for-each-ref`, `log` and `show` correctly displays the subject, body and raw content of commit and tag messages using CRLF. Test the output of `branch`, `tag` and `for-each-ref` with such commits. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29sideband: diagnose more sideband anomaliesLibravatar Jeff King1-0/+12
In demultiplex_sideband(), there are two oddities when we check an incoming packet: - if it has zero length, then we assume it's a flush packet. This means we fail to notice the difference between a real flush and a true zero-length packet that's missing its sideband designator. It's not a huge problem in practice because we'd never send a zero-length data packet (even our keepalives are otherwise-empty sideband-1 packets). But it would be nice to detect and report the error, since it's likely to cause other confusion (we think the other side flushed, but they do not). - we try to detect packets missing their designator by checking for "if (len < 1)". But this will never trigger for "len == 0"; we've already detected that and left the function before then. It _could_ detect a negative "len" parameter. But in that case, the error message is wrong. The issue is not "no sideband" but rather "eof while reading the packet". However, this can't actually be triggered in practice, because neither of the two callers uses pkt_read's GENTLE_ON_EOF flag. Which means they'd die with "the remote end hung up unexpectedly" before we even get here. So this truly is dead code. We can improve these cases by passing in a pkt-line status to the demultiplexer, and by having recv_sideband() use GENTLE_ON_EOF. This gives us two improvements: - we can now reliably detect flush packets, and will report a normal packet missing its sideband designator as an error - we'll report an eof with a more detailed "protocol error: eof while reading sideband packet", rather than the generic "the remote end hung up unexpectedly" - when we see an eof, we'll flush the sideband scratch buffer, which may provide some hints from the remote about why they hung up (though note we already flush on newlines, so it's likely that most such messages already made it through) In some sense this patch goes against fbd76cd450 (sideband: reverse its dependency on pkt-line, 2019-01-16), which caused the sideband code not to depend on the pkt-line code. But that commit was really just trying to deal with the circular header dependency. The two modules are conceptually interlinked, and it was just trying to keep things compiling. And indeed, there's a sticking point in this patch: because pkt-line.h includes sideband.h, we can't add the reverse include we need for the sideband code to have an "enum packet_read_status" parameter. Nor can we forward declare it, because you can't forward declare an enum in C. However, C does guarantee that enums fit in an int, so we can just use that type. One alternative would be for the callers to check themselves that they got something sane from the pkt-line code. But besides duplicating logic, this gets quite tricky. Any error condition requires flushing the sideband #2 scratch buffer, which only demultiplex_sideband() knows how to do. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>