summaryrefslogtreecommitdiff
path: root/Documentation/gitdiffcore.txt
AgeCommit message (Collapse)AuthorFilesLines
2021-03-22Merge branch 'en/ort-perf-batch-8'Libravatar Junio C Hamano1-1/+1
Rename detection rework continues. * en/ort-perf-batch-8: diffcore-rename: compute dir_rename_guess from dir_rename_counts diffcore-rename: limit dir_rename_counts computation to relevant dirs diffcore-rename: compute dir_rename_counts in stages diffcore-rename: extend cleanup_dir_rename_info() diffcore-rename: move dir_rename_counts into dir_rename_info struct diffcore-rename: add function for clearing dir_rename_count Move computation of dir_rename_count from merge-ort to diffcore-rename diffcore-rename: add a mapping of destination names to their indices diffcore-rename: provide basic implementation of idx_possible_rename() diffcore-rename: use directory rename guided basename comparisons
2021-03-01Merge branch 'en/diffcore-rename'Libravatar Junio C Hamano1-0/+20
Performance optimization work on the rename detection continues. * en/diffcore-rename: merge-ort: call diffcore_rename() directly gitdiffcore doc: mention new preliminary step for rename detection diffcore-rename: guide inexact rename detection based on basenames diffcore-rename: complete find_basename_matches() diffcore-rename: compute basenames of source and dest candidates t4001: add a test comparing basename similarity and content similarity diffcore-rename: filter rename_src list when possible diffcore-rename: no point trying to find a match better than exact
2021-02-26diffcore-rename: use directory rename guided basename comparisonsLibravatar Elijah Newren1-1/+1
A previous commit noted that it is very common for people to move files across directories while keeping their filename the same. The last few commits took advantage of this and showed that we can accelerate rename detection significantly using basenames; since files with the same basename serve as likely rename candidates, we can check those first and remove them from the rename candidate pool if they are sufficiently similar. Unfortunately, the previous optimization was limited by the fact that the remaining basenames after exact rename detection are not always unique. Many repositories have hundreds of build files with the same name (e.g. Makefile, .gitignore, build.gradle, etc.), and may even have hundreds of source files with the same name. (For example, the linux kernel has 100 setup.c, 87 irq.c, and 112 core.c files. A repository at $DAYJOB has a lot of ObjectFactory.java and Plugin.java files). For these files with non-unique basenames, we are faced with the task of attempting to determine or guess which directory they may have been relocated to. Such a task is precisely the job of directory rename detection. However, there are two catches: (1) the directory rename detection code has traditionally been part of the merge machinery rather than diffcore-rename.c, and (2) directory rename detection currently runs after regular rename detection is complete. The 1st catch is just an implementation issue that can be overcome by some code shuffling. The 2nd requires us to add a further approximation: we only have access to exact renames at this point, so we need to do directory rename detection based on just exact renames. In some cases we won't have exact renames, in which case this extra optimization won't apply. We also choose to not apply the optimization unless we know that the underlying directory was removed, which will require extra data to be passed in to diffcore_rename_extended(). Also, even if we get a prediction about which directory a file may have relocated to, we will still need to check to see if there is a file in the predicted directory, and then compare the two files to see if they meet the higher min_basename_score threshold required for marking the two files as renames. This commit introduces an idx_possible_rename() function which will do this directory rename detection for us and give us the index within rename_dst of the resulting filename. For now, this function is hardcoded to return -1 (not found) and just hooks up how its results would be used once we have a more complete implementation in place. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16diff: --{rotate,skip}-to=<path>Libravatar Junio C Hamano1-0/+21
In the implementation of "git difftool", there is a case where the user wants to start viewing the diffs at a specific path and continue on to the rest, optionally wrapping around to the beginning. Since it is somewhat cumbersome to implement such a feature as a post-processing step of "git diff" output, let's support it internally with two new options. - "git diff --rotate-to=C", when the resulting patch would show paths A B C D E without the option, would "rotate" the paths to shows patch to C D E A B instead. It is an error when there is no patch for C is shown. - "git diff --skip-to=C" would instead "skip" the paths before C, and shows patch to C D E. Again, it is an error when there is no patch for C is shown. - "git log [-p]" also accepts these two options, but it is not an error if there is no change to the specified path. Instead, the set of output paths are rotated or skipped to the specified path or the first path that sorts after the specified path. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15gitdiffcore doc: mention new preliminary step for rename detectionLibravatar Elijah Newren1-0/+20
The last few patches have introduced a new preliminary step when rename detection is on but both break detection and copy detection are off. Document this new step. While we're at it, add a testcase that checks the new behavior as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-26log -G: ignore binary filesLibravatar Thomas Braun1-1/+2
The -G<regex> option of log looks for the differences whose patch text contains added/removed lines that match regex. Currently -G looks also into patches of binary files (which according to [1]) is binary as well. This has a couple of issues: - It makes the pickaxe search slow. In a proprietary repository of the author with only ~5500 commits and a total .git size of ~300MB searching takes ~13 seconds $time git log -Gwave > /dev/null real 0m13,241s user 0m12,596s sys 0m0,644s whereas when we ignore binary files with this patch it takes ~4s $time ~/devel/git/git log -Gwave > /dev/null real 0m3,713s user 0m3,608s sys 0m0,105s which is a speedup of more than fourfold. - The internally used algorithm for generating patch text is based on xdiff and its states in [1] > The output format of the binary patch file is proprietary > (and binary) and it is basically a collection of copy and insert > commands [..] which means that the current format could change once the internal algorithm is changed as the format is not standardized. In addition the git binary patch format used for preparing patches for git apply is *different* from the xdiff format as can be seen by comparing git log -p -a commit 6e95bf4bafccf14650d02ab57f3affe669be10cf Author: A U Thor <author@example.com> Date: Thu Apr 7 15:14:13 2005 -0700 modify binary file diff --git a/data.bin b/data.bin index f414c84..edfeb6f 100644 --- a/data.bin +++ b/data.bin @@ -1,2 +1,4 @@ a a^@a +a +a^@a with git log --binary commit 6e95bf4bafccf14650d02ab57f3affe669be10cf Author: A U Thor <author@example.com> Date: Thu Apr 7 15:14:13 2005 -0700 modify binary file diff --git a/data.bin b/data.bin index f414c84bd3aa25fa07836bb1fb73db784635e24b..edfeb6f501[..] GIT binary patch literal 12 QcmYe~N@Pgn0zx1O01)N^ZvX%Q literal 6 NcmYe~N@Pgn0ssWg0XP5v which seems unexpected. To resolve these issues this patch makes -G<regex> ignore binary files by default. Textconv filters are supported and also -a/--text for getting the old and broken behaviour back. The -S<block of text> option of log looks for differences that changes the number of occurrences of the specified block of text (i.e. addition/deletion) in a file. As we want to keep the current behaviour, add a test to ensure it stays that way. [1]: http://www.xmailserver.org/xdiff.html Signed-off-by: Thomas Braun <thomas.braun@virtuell-zuhause.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-28docs/diffcore: unquote "Complete Rewrites" in headersLibravatar Patrick Steinhardt1-4/+4
The gitdiffcore documentation quotes the term "Complete Rewrites" in headers for no real gain. This would make sense if the term could be easily confused if not properly grouped together. But actually, the term is quite obvious and thus does not really need any quoting, especially regarding that it is not used anywhere else. But more importanly, this brings up a bug when rendering man pages: when trying to render quotes inside of a section header, we end up with quotes which have been misaligned to the end of line. E.g. diffcore-break: For Splitting Up Complete Rewrites -------------------------------------------------- renders as DIFFCORE-BREAK: FOR SPLITTING UP COMPLETE REWRITES"" , which is obviously wrong. While this is fixable for the man pages by using double-quotes (e.g. ""COMPLETE REWRITES""), this again breaks it for our generated HTML pages. So fix the issue by simply dropping quotes inside of section headers, which is currently only done for the term "Complete Rewrites". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-28docs/diffcore: fix grammar in diffcore-rename headerLibravatar Patrick Steinhardt1-1/+1
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-15Merge branch 'sb/doc-unify-bottom'Libravatar Junio C Hamano1-1/+1
Doc clean-up. * sb/doc-unify-bottom: Documentation: unify bottom "part of git suite" lines
2017-02-09Documentation: unify bottom "part of git suite" linesLibravatar Stefan Beller1-1/+1
We currently have 168 man pages that mention they are part of Git, you can check yourself easily via: $ git grep "Part of the linkgit:git\[1\] suite" |wc -l 168 However some have a trailing period, i.e. $ git grep "Part of the linkgit:git\[1\] suite." |wc -l 8 Unify the bottom line in all man pages to not end with a period. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28doc: typeset long command-line options as literalLibravatar Matthieu Moy1-2/+2
Similarly to the previous commit, use backquotes instead of forward-quotes, for long options. This was obtained with: perl -pi -e "s/'(--[a-z][a-z=<>-]*)'/\`\$1\`/g" *.txt and manual tweak to remove false positive in ascii-art (o'--o'--o' to describe rewritten history). Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-12doc: convert \--option to --optionLibravatar Jeff King1-3/+3
Older versions of AsciiDoc would convert the "--" in "--option" into an emdash. According to 565e135 (Documentation: quote double-dash for AsciiDoc, 2011-06-29), this is fixed in AsciiDoc 8.3.0. According to bf17126, we don't support anything older than 8.4.1 anyway, so we no longer need to worry about quoting. Even though this does not change the output at all, there are a few good reasons to drop the quoting: 1. It makes the source prettier to read. 2. We don't quote consistently, which may be confusing when reading the source. 3. Asciidoctor does not like the quoting, and renders a literal backslash. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03diffcore-pickaxe doc: document -S and -G properlyLibravatar Ramkumar Ramachandra1-18/+27
The documentation of -S and -G is very sketchy. Completely rewrite the sections in Documentation/diff-options.txt and Documentation/gitdiffcore.txt. References: 52e9578 ([PATCH] Introducing software archaeologist's tool "pickaxe".) f506b8e (git log/diff: add -G<regexp> that greps in the patch text) Inputs-from: Phil Hord <phil.hord@gmail.com> Co-authored-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15The name of the hash function is "SHA-1", not "SHA1"Libravatar Thomas Ackermann1-1/+1
Use "SHA-1" instead of "SHA1" whenever we talk about the hash function. When used as a programming symbol, we keep "SHA1". Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-01Documentation: the name of the system is 'Git', not 'git'Libravatar Thomas Ackermann1-1/+1
Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-26docs: stop using asciidoc no-inline-literalLibravatar Jeff King1-5/+5
In asciidoc 7, backticks like `foo` produced a typographic effect, but did not otherwise affect the syntax. In asciidoc 8, backticks introduce an "inline literal" inside which markup is not interpreted. To keep compatibility with existing documents, asciidoc 8 has a "no-inline-literal" attribute to keep the old behavior. We enabled this so that the documentation could be built on either version. It has been several years now, and asciidoc 7 is no longer in wide use. We can now decide whether or not we want inline literals on their own merits, which are: 1. The source is much easier to read when the literal contains punctuation. You can use `master~1` instead of `master{tilde}1`. 2. They are less error-prone. Because of point (1), we tend to make mistakes and forget the extra layer of quoting. This patch removes the no-inline-literal attribute from the Makefile and converts every use of backticks in the documentation to an inline literal (they must be cleaned up, or the example above would literally show "{tilde}" in the output). Problematic sites were found by grepping for '`.*[{\\]' and examined and fixed manually. The results were then verified by comparing the output of "html2text" on the set of generated html pages. Doing so revealed that in addition to making the source more readable, this patch fixes several formatting bugs: - HTML rendering used the ellipsis character instead of literal "..." in code examples (like "git log A...B") - some code examples used the right-arrow character instead of '->' because they failed to quote - api-config.txt did not quote tilde, and the resulting HTML contained a bogus snippet like: <tt><sub></tt> foo <tt></sub>bar</tt> which caused some parsers to choke and omit whole sections of the page. - git-commit.txt confused ``foo`` (backticks inside a literal) with ``foo'' (matched double-quotes) - mentions of `A U Thor <author@example.com>` used to erroneously auto-generate a mailto footnote for author@example.com - the description of --word-diff=plain incorrectly showed the output as "[-removed-] and {added}", not "{+added+}". - using "prime" notation like: commit `C` and its replacement `C'` confused asciidoc into thinking that everything between the first backtick and the final apostrophe were meant to be inside matched quotes - asciidoc got confused by the escaping of some of our asterisks. In particular, `credential.\*` and `credential.<url>.\*` properly escaped the asterisk in the first case, but literally passed through the backslash in the second case. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-06Documentation: use [verse] for SYNOPSIS sectionsLibravatar Martin von Zweigbergk1-0/+1
The SYNOPSIS sections of most commands that span several lines already use [verse] to retain line breaks. Most commands that don't span several lines seem not to use [verse]. In the HTML output, [verse] does not only preserve line breaks, but also makes the section indented, which causes a slight inconsistency between commands that use [verse] and those that don't. Use [verse] in all SYNOPSIS sections for consistency. Also remove the blank lines from git-fetch.txt and git-rebase.txt to align with the other man pages. In the case of git-rebase.txt, which already uses [verse], the blank line makes the [verse] not apply to the last line, so removing the blank line also makes the formatting within the document more consistent. While at it, add single quotes to 'git cvsimport' for consistency with other commands. Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-31gitdiffcore doc: update pickaxe descriptionLibravatar Junio C Hamano1-3/+3
The old text described the original design (one side does not have it at all while the other side has it); this was later amended to check if the number of occurrences changed, which is what we currently do with -S. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-18Documentation/gitdiffcore: fix order in pickaxe descriptionLibravatar Michael J Gruber1-2/+2
Reverse the order of "origin" and "result" so that the sentence really describes an addition rather than a removal. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-21Merge branch 'maint-1.6.6' into maintLibravatar Junio C Hamano1-1/+1
* maint-1.6.6: Documentation/git-clone: Transform description list into item list Documentation/urls: Remove spurious example markers Documentation/gitdiffcore: Remove misleading date in heading Documentation/git-reflog: Fix formatting of command lists
2010-03-21Documentation/gitdiffcore: Remove misleading date in headingLibravatar Michael J Gruber1-1/+1
Ever since the automatic conversion into man form, the heading contained a misidentified subheading reading "June 2005". Remove this since the documentation is more recent, and the correct date is in the footer. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-10Documentation: spell 'git cmd' without dash throughoutLibravatar Thomas Rast1-13/+13
The documentation was quite inconsistent when spelling 'git cmd' if it only refers to the program, not to some specific invocation syntax: both 'git-cmd' and 'git cmd' spellings exist. The current trend goes towards dashless forms, and there is precedent in 647ac70 (git-svn.txt: stop using dash-form of commands., 2009-07-07) to actively eliminate the dashed variants. Replace 'git-cmd' with 'git cmd' throughout, except where git-shell, git-cvsserver, git-upload-pack, git-receive-pack, and git-upload-archive are concerned, because those really live in the $PATH.
2008-09-19Bust the ghost of long-defunct diffcore-pathspec.Libravatar Yann Dirson1-33/+22
This concept was retired by 77882f6 (Retire diffcore-pathspec., 2006-04-10), more than 2 years ago. Signed-off-by: Yann Dirson <ydirson@altern.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05manpages: italicize nongit command names (if they are in teletype font)Libravatar Jonathan Nieder1-1/+1
Some manual pages use teletype font to set command names. We change them to use italics, instead. This creates a visual distinction between names of commands and command lines that can be typed at the command line. It is also more consistent with other man pages outside Git. In this patch, the commands named are non-git commands like bash. Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05manpages: italicize git command names (which were in teletype font)Libravatar Jonathan Nieder1-15/+15
The names of git commands are not meant to be entered at the commandline; they are just names. So we render them in italics, as is usual for command names in manpages. Using doit () { perl -e 'for (<>) { s/\`(git-[^\`.]*)\`/'\''\1'\''/g; print }' } for i in git*.txt config.txt diff*.txt blame*.txt fetch*.txt i18n.txt \ merge*.txt pretty*.txt pull*.txt rev*.txt urls*.txt do doit <"$i" >"$i+" && mv "$i+" "$i" done git diff . Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05manpages: italicize command namesLibravatar Jonathan Nieder1-2/+2
This includes nongit commands like RCS 'merge'. This patch only italicizes names of commands if they had no formatting before. Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05gitdiffcore(7): fix awkward wordingLibravatar Jonathan Nieder1-2/+2
The phrase "diff outputs" sounds awkward to my ear (I think "output" is meant to be used as a substantive noun.) Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-01Documentation formatting and cleanupLibravatar Jonathan Nieder1-17/+17
Following what appears to be the predominant style, format names of commands and commandlines both as `teletype text`. While we're at it, add articles ("a" and "the") in some places, italicize the name of the command in the manual page synopsis line, and add a comma or two where it seems appropriate. Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-01Documentation: be consistent about "git-" versus "git "Libravatar Jonathan Nieder1-1/+1
Since the git-* commands are not installed in $(bindir), using "git-command <parameters>" in examples in the documentation is not a good idea. On the other hand, it is nice to be able to refer to each command using one hyphenated word. (There is no escaping it, anyway: man page names cannot have spaces in them.) This patch retains the dash in naming an operation, command, program, process, or action. Complete command lines that can be entered at a shell (i.e., without options omitted) are made to use the dashless form. The changes consist only of replacing some spaces with hyphens and vice versa. After a "s/ /-/g", the unpatched and patched versions are identical. Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-06documentation: move git(7) to git(1)Libravatar Christian Couder1-1/+1
As the "git" man page describes the "git" command at the end-user level, it seems better to move it to man section 1. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-06documentation: convert "diffcore" and "repository-layout" to man pagesLibravatar Christian Couder1-0/+292
This patch renames the following documents and at the same time converts them to the man format: diffcore.txt -> gitdiffcore.txt (man section 7) repository-layout.txt -> gitrepository-layout.txt (man section 5) Other documents that reference the above ones are changed accordingly. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>