summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-06-30diff.c: color moved lines differentlyLibravatar Stefan Beller3-15/+602
When a patch consists mostly of moving blocks of code around, it can be quite tedious to ensure that the blocks are moved verbatim, and not undesirably modified in the move. To that end, color blocks that are moved within the same patch differently. For example (OM, del, add, and NM are different colors): [OM] -void sensitive_stuff(void) [OM] -{ [OM] - if (!is_authorized_user()) [OM] - die("unauthorized"); [OM] - sensitive_stuff(spanning, [OM] - multiple, [OM] - lines); [OM] -} void another_function() { [del] - printf("foo"); [add] + printf("bar"); } [NM] +void sensitive_stuff(void) [NM] +{ [NM] + if (!is_authorized_user()) [NM] + die("unauthorized"); [NM] + sensitive_stuff(spanning, [NM] + multiple, [NM] + lines); [NM] +} However adjacent blocks may be problematic. For example, in this potentially malicious patch, the swapping of blocks can be spotted: [OM] -void sensitive_stuff(void) [OM] -{ [OMA] - if (!is_authorized_user()) [OMA] - die("unauthorized"); [OM] - sensitive_stuff(spanning, [OM] - multiple, [OM] - lines); [OMA] -} void another_function() { [del] - printf("foo"); [add] + printf("bar"); } [NM] +void sensitive_stuff(void) [NM] +{ [NMA] + sensitive_stuff(spanning, [NMA] + multiple, [NMA] + lines); [NM] + if (!is_authorized_user()) [NM] + die("unauthorized"); [NMA] +} If the moved code is larger, it is easier to hide some permutation in the code, which is why some alternative coloring is needed. This patch implements the first mode: * basic alternating 'Zebra' mode This conveys all information needed to the user. Defer customization to later patches. First I implemented an alternative design, which would try to fingerprint a line by its neighbors to detect if we are in a block or at the boundary. This idea iss error prone as it inspected each line and its neighboring lines to determine if the line was (a) moved and (b) if was deep inside a hunk by having matching neighboring lines. This is unreliable as the we can construct hunks which have equal neighbors that just exceed the number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter as a line, that is permutated to AXYZCXYZBXYZD..'). Instead this provides a dynamic programming greedy algorithm that finds the largest moved hunk and then has several modes on highlighting bounds. A note on the options '--submodule=diff' and '--color-words/--word-diff': In the conversion to use emit_line in the prior patches both submodules as well as word diff output carefully chose to call emit_line with sign=0. All output with sign=0 is ignored for move detection purposes in this patch, such that no weird looking output will be generated for these cases. This leads to another thought: We could pass on '--color-moved' to submodules such that they color up moved lines for themselves. If we'd do so only line moves within a repository boundary are marked up. It is useful to have moved lines colored, but there are annoying corner cases, such as a single line moved, that is very common. For example in a typical patch of C code, we have closing braces that end statement blocks or functions. While it is technically true that these lines are moved as they show up elsewhere, it is harmful for the review as the reviewers attention is drawn to such a minor side annoyance. For now let's have a simple solution of hardcoding the number of moved lines to be at least 3 before coloring them. Note, that the length is applied across all blocks to find the 'lonely' blocks that pollute new code, but do not interfere with a permutated block where each permutation has less lines than 3. Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: buffer all output if asked toLibravatar Stefan Beller2-2/+109
Introduce a new option 'emitted_symbols' in the struct diff_options which controls whether all output is buffered up until all output is available. It is set internally in diff.c when necessary. We'll have a new struct 'emitted_string' in diff.c which will be used to buffer each line. The emitted_string will duplicate the memory of the line to buffer as that is easiest to reason about for now. In a future patch we may want to decrease the memory usage by not duplicating all output for buffering but rather we may want to store offsets into the file or in case of hunk descriptions such as the similarity score, we could just store the relevant number and reproduce the text later on. This approach was chosen as a first step because it is quite simple compared to the alternative with less memory footprint. emit_diff_symbol factors out the emission part and depending on the diff_options->emitted_symbols the emission will be performed directly when calling emit_diff_symbol or after the whole process is done, i.e. by buffering we have add the possibility for a second pass over the whole output before doing the actual output. In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with word-diff) we introduced a duplicate diff options struct for word emissions as we may have different regex settings in there. When buffering the output, we need to operate on just one buffer, so we have to copy back the emissions of the word buffer into the main buffer. Unconditionally enable output via buffer in this patch as it yields a great opportunity for testing, i.e. all the diff tests from the test suite pass without having reordering issues (i.e. only parts of the output got buffered, and we forgot to buffer other parts). The test suite passes, which gives confidence that we converted all functions to use emit_string for output. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARYLibravatar Stefan Beller1-30/+41
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEPLibravatar Stefan Beller1-3/+7
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: convert word diffing to use emit_diff_symbolLibravatar Stefan Beller1-33/+46
The word diffing is not line oriented and would need some serious effort to be transformed into a line oriented approach, so just go with a symbol DIFF_SYMBOL_WORD_DIFF that is a partial line. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: convert show_stats to use emit_diff_symbolLibravatar Stefan Beller2-44/+74
We call print_stat_summary from builtin/apply, so we still need the version with a file pointer, so introduce print_stat_summary_0 that uses emit_string machinery and keep print_stat_summary with the same arguments around. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: convert emit_binary_diff_body to use emit_diff_symbolLibravatar Stefan Beller1-17/+46
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30submodule.c: migrate diff output to use emit_diff_symbolLibravatar Stefan Beller4-68/+121
As the submodule process is no longer attached to the same file pointer 'o->file' as the superprojects process, there is a different result in color.c::check_auto_color. That is why we need to pass coloring explicitly, such that the submodule coloring decision will be made by the child process processing the submodule. Only DIFF_SYMBOL_SUBMODULE_PIPETHROUGH contains color, the other symbols are for embedding the submodule output into the superprojects output. Remove the colors from the function signatures, as all the coloring decisions will be made either inside the child process or the final emit_diff_symbol, but not in the functions driving the submodule diff. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFFLibravatar Stefan Beller1-14/+21
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILESLibravatar Stefan Beller1-5/+15
we could save a little bit of memory when buffering in a later mode by just passing the inner part ("%s and %s", file1, file 2), but those a just a few bytes, so instead let's reuse the implementation from DIFF_SYMBOL_HEADER and keep the whole line around. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADERLibravatar Stefan Beller1-8/+20
The header is constructed lazily including line breaks, so just emit the raw string as is. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS}Libravatar Stefan Beller1-21/+30
We have to use fprintf instead of emit_line, because we want to emit the tab after the color. This is important for ancient versions of gnu patch AFAICT, although we probably do not want to feed colored output to the patch utility, such that it would not matter if the trailing tab is colored. Keep the corner case as-is though. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETELibravatar Stefan Beller1-2/+4
The context marker use the exact same output pattern, so reuse it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN]Libravatar Stefan Beller1-16/+26
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: migrate emit_line_checked to use emit_diff_symbolLibravatar Stefan Beller3-44/+80
Add a new flags field to emit_diff_symbol, that will be used by context lines for: * white space rules that are applicable (The first 12 bits) Take a note in cahe.c as well, when this ws rules are extended we have to fix the bits in the flags field. * how the rules are evaluated (actually this double encodes the sign of the line, but the code is easier to keep this way, bits 13,14,15) * if the line a blank line at EOF (bit 16) The check if new lines need to be marked up as extra lines at the end of file, is now done unconditionally. That should be ok, as 'new_blank_line_at_eof' has a quick early return. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOFLibravatar Stefan Beller1-8/+11
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFOLibravatar Stefan Beller1-2/+6
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_MARKERLibravatar Stefan Beller1-1/+9
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: introduce emit_diff_symbolLibravatar Stefan Beller1-3/+19
In a later patch we want to buffer all output before emitting it as a new feature ("markup moved lines") conceptually cannot be implemented in a single pass over the output. There are different approaches to buffer all output such as: * Buffering on the char level, i.e. we'd have a char[] which would grow at approximately 80 characters a line. This would keep the output completely unstructured, but might be very easy to implement, such as redirecting all output to a temporary file and working off that. The later passes over the buffer are quite complicated though, because we have to parse back any output and then decide if it should be modified. * Buffer on a line level. As the output is mostly line oriented already, this would make sense, but it still is a bit awkward as we'd have to make sense of it again by looking at the first characters of a line to decide what part of a diff a line is. * Buffer semantically. Imagine there is a formal grammar for the diff output and we'd keep the symbols of this grammar around. This keeps the highest level of structure in the buffered data, such that the actual memory requirements are less than say the first option. Instead of buffering the characters of the line, we'll buffer what we intend to do plus additional information for the specifics. An output of diff --git a/new.txt b/new.txt index fa69b07..412428c 100644 Binary files a/new.txt and b/new.txt differ could be buffered as DIFF_SYMBOL_DIFF_START + new.txt DIFF_SYMBOL_INDEX_MODE + fa69b07 412428c "non-executable" flag DIFF_SYMBOL_BINARY_FILES + new.txt This and the following patches introduce the third option of buffering by first moving any output to emit_diff_symbol, and then introducing the buffering in this function. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: factor out diff_flush_patch_all_file_pairsLibravatar Stefan Beller1-5/+12
In a later patch we want to do more things before and after all filepairs are flushed. So factor flushing out all file pairs into its own function that the new code can be plugged in easily. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: move line ending check into emit_hunk_headerLibravatar Stefan Beller1-2/+2
The emit_hunk_header() function is responsible for assembling a hunk header and calling emit_line() to send the hunk header to the output file. Its only caller fn_out_consume() needs to prepare for a case where the function emits an incomplete line and add the terminating LF. Instead make sure emit_hunk_header() to always send a completed line to emit_line(). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30diff.c: readability fixLibravatar Stefan Beller1-2/+2
We already have dereferenced 'p->two' into a local variable 'two'. Use that. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30Merge branch 'sb/hashmap-customize-comparison' into sb/diff-color-moveLibravatar Junio C Hamano441-8419/+13147
* sb/hashmap-customize-comparison: (566 commits) hashmap: migrate documentation from Documentation/technical into header patch-ids.c: use hashmap correctly hashmap.h: compare function has access to a data field Twelfth batch for 2.14 Git 2.13.2 Eleventh batch for 2.14 Revert "split-index: add and use unshare_split_index()" Tenth batch for 2.14 add--interactive: quote commentChar regex add--interactive: handle EOF in prompt_yesno auto-correct: tweak phrasing docs: update 64-bit core.packedGitLimit default t7508: fix a broken indentation grep: fix erroneously copy/pasted variable in check/assert pattern Ninth batch for 2.14 glossary: define 'stash entry' status: add optional stash count information stash: update documentation to use 'stash entry' for_each_bisect_ref(): don't trim refnames mergetools/meld: improve compatibiilty with Meld on macOS X ...
2017-06-30hashmap: migrate documentation from Documentation/technical into headerLibravatar Stefan Beller2-341/+316
While at it, clarify the use of `key`, `keydata`, `entry_or_key` as well as documenting the new data pointer for the compare function. Rework the example. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30patch-ids.c: use hashmap correctlyLibravatar Stefan Beller1-4/+4
As alluded to in the previous patch, the code in patch-ids.c is using the hashmaps API wrong. Luckily we do not have a bug, as all hashmap functionality that we use here (hashmap_get) passes through the keydata. If hashmap_get_next were to be used, a bug would occur as that passes NULL for the key_data. So instead use the hashmap API correctly and provide the caller required data in the compare function via the first argument that always gets passed and was setup via the hashmap_init function. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30hashmap.h: compare function has access to a data fieldLibravatar Stefan Beller19-66/+113
When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26Twelfth batch for 2.14Libravatar Junio C Hamano1-1/+11
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26Merge branch 'mb/reword-autocomplete-message'Libravatar Junio C Hamano1-6/+12
Message update. * mb/reword-autocomplete-message: auto-correct: tweak phrasing
2017-06-26Merge branch 'ks/t7508-indent-fix'Libravatar Junio C Hamano1-1/+1
Cosmetic update to a test. * ks/t7508-indent-fix: t7508: fix a broken indentation
2017-06-26Merge branch 'jk/add-p-commentchar-fix'Libravatar Junio C Hamano2-1/+10
"git add -p" were updated in 2.12 timeframe to cope with custom core.commentchar but the implementation was buggy and a metacharacter like $ and * did not work. * jk/add-p-commentchar-fix: add--interactive: quote commentChar regex add--interactive: handle EOF in prompt_yesno
2017-06-26Merge branch 'dt/raise-core-packed-git-limit'Libravatar Junio C Hamano1-1/+2
Doc update for a topic already in 'master'. * dt/raise-core-packed-git-limit: docs: update 64-bit core.packedGitLimit default
2017-06-26Merge branch 'mh/packed-ref-store-prep'Libravatar Junio C Hamano5-11/+54
Bugfix for a topic that is (only) in 'master'. * mh/packed-ref-store-prep: for_each_bisect_ref(): don't trim refnames lock_packed_refs(): fix cache validity check
2017-06-26Merge branch 'lb/status-stash-count'Libravatar Junio C Hamano12-38/+115
"git status" learned to optionally give how many stash entries the user has in its output. * lb/status-stash-count: glossary: define 'stash entry' status: add optional stash count information stash: update documentation to use 'stash entry'
2017-06-24Sync with 2.13.2Libravatar Junio C Hamano2-12/+17
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24Git 2.13.2Libravatar Junio C Hamano2-1/+18
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24Merge branch 'sn/reset-doc-typofix' into maintLibravatar Junio C Hamano1-1/+1
Doc update. * sn/reset-doc-typofix: doc: git-reset: fix a trivial typo
2017-06-24Merge branch 'sg/doc-pretty-formats' into maintLibravatar Junio C Hamano1-2/+2
Doc update. * sg/doc-pretty-formats: docs/pretty-formats: stress that %- removes all preceding line-feeds
2017-06-24Merge branch 'sd/t3200-branch-m-test' into maintLibravatar Junio C Hamano1-0/+17
New test. * sd/t3200-branch-m-test: t3200: add test for single parameter passed to -m option
2017-06-24Merge branch 'sg/revision-parser-skip-prefix' into maintLibravatar Junio C Hamano2-48/+44
Code clean-up. * sg/revision-parser-skip-prefix: revision.c: use skip_prefix() in handle_revision_pseudo_opt() revision.c: use skip_prefix() in handle_revision_opt() revision.c: stricter parsing of '--early-output' revision.c: stricter parsing of '--no-{min,max}-parents' revision.h: turn rev_info.early_output back into an unsigned int
2017-06-24Merge branch 'km/test-mailinfo-b-failure' into maintLibravatar Junio C Hamano1-0/+42
New tests. * km/test-mailinfo-b-failure: t5100: add some more mailinfo tests
2017-06-24Merge branch 'sb/submodule-rm-absorb' into maintLibravatar Junio C Hamano1-4/+5
Doc update to a recently graduated topic. * sb/submodule-rm-absorb: Documentation/git-rm: correct submodule description
2017-06-24Merge branch 'jc/diff-tree-stale-comment' into maintLibravatar Junio C Hamano1-3/+5
Comment fix. * jc/diff-tree-stale-comment: diff-tree: update stale in-code comments
2017-06-24Merge branch 'ps/stash-push-pathspec-fix' into maintLibravatar Junio C Hamano2-0/+19
"git stash push <pathspec>" did not work from a subdirectory at all. Bugfix for a topic in v2.13 * ps/stash-push-pathspec-fix: git-stash: fix pushing stash with pathspec from subdir
2017-06-24Merge branch 'ls/github' into maintLibravatar Junio C Hamano2-0/+26
Help contributors that visit us at GitHub. * ls/github: Configure Git contribution guidelines for github.com
2017-06-24Merge branch 'jk/pack-idx-corruption-safety' into maintLibravatar Junio C Hamano1-1/+7
A flaky test has been corrected. * jk/pack-idx-corruption-safety: t5313: make extended-table test more deterministic
2017-06-24Merge branch 'jk/diff-blob' into maintLibravatar Junio C Hamano10-147/+301
The result from "git diff" that compares two blobs, e.g. "git diff $commit1:$path $commit2:$path", used to be shown with the full object name as given on the command line, but it is more natural to use the $path in the output and use it to look up .gitattributes. * jk/diff-blob: diff: use blob path for blob/file diffs diff: use pending "path" if it is available diff: use the word "path" instead of "name" for blobs diff: pass whole pending entry in blobinfo handle_revision_arg: record paths for pending objects handle_revision_arg: record modes for "a..b" endpoints t4063: add tests of direct blob diffs get_sha1_with_context: dynamically allocate oc->path get_sha1_with_context: always initialize oc->symlink_path sha1_name: consistently refer to object_context as "oc" handle_revision_arg: add handle_dotdot() helper handle_revision_arg: hoist ".." check out of range parsing handle_revision_arg: stop using "dotdot" as a generic pointer handle_revision_arg: simplify commit reference lookups handle_revision_arg: reset "dotdot" consistently
2017-06-24Merge branch 'jc/name-rev-lw-tag' into maintLibravatar Junio C Hamano2-8/+53
"git describe --contains" penalized light-weight tags so much that they were almost never considered. Instead, give them about the same chance to be considered as an annotated tag that is the same age as the underlying commit would. * jc/name-rev-lw-tag: name-rev: favor describing with tags and use committer date to tiebreak name-rev: refactor logic to see if a new candidate is a better name
2017-06-24Eleventh batch for 2.14Libravatar Junio C Hamano1-15/+46
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24Merge branch 'ab/free-and-null'Libravatar Junio C Hamano49-195/+117
A common pattern to free a piece of memory and assign NULL to the pointer that used to point at it has been replaced with a new FREE_AND_NULL() macro. * ab/free-and-null: *.[ch] refactoring: make use of the FREE_AND_NULL() macro coccinelle: make use of the "expression" FREE_AND_NULL() rule coccinelle: add a rule to make "expression" code use FREE_AND_NULL() coccinelle: make use of the "type" FREE_AND_NULL() rule coccinelle: add a rule to make "type" code use FREE_AND_NULL() git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24Merge branch 'jk/warn-add-gitlink'Libravatar Junio C Hamano10-12/+113
Using "git add d/i/r" when d/i/r is the top of the working tree of a separate repository would create a gitlink in the index, which would appear as a not-quite-initialized submodule to others. We learned to give warnings when this happens. * jk/warn-add-gitlink: t: move "git add submodule" into test blocks add: warn when adding an embedded repository