Age | Commit message (Collapse) | Author | Files | Lines |
|
Output from "git diff" can be made easier to read by selecting
which lines are common and which lines are added/deleted
intelligently when the lines before and after the changed section
are the same. A command line option is added to help with the
experiment to find a good heuristics.
* mh/diff-indent-heuristic:
blame: honor the diff heuristic options and config
parse-options: add parse_opt_unknown_cb()
diff: improve positioning of add/delete blocks in diffs
xdl_change_compact(): introduce the concept of a change group
recs_match(): take two xrecord_t pointers as arguments
is_blank_line(): take a single xrecord_t as argument
xdl_change_compact(): only use heuristic if group can't be matched
xdl_change_compact(): fix compaction heuristic to adjust ixo
|
|
The "unsigned char sha1[20]" to "struct object_id" conversion
continues. Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.
It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2"). Extra sets of eyes double-checking for
mismerges are highly appreciated.
* bc/object-id:
builtin/reset: convert to use struct object_id
builtin/commit-tree: convert to struct object_id
builtin/am: convert to struct object_id
refs: add an update_ref_oid function.
sha1_name: convert get_sha1_mb to struct object_id
builtin/update-index: convert file to struct object_id
notes: convert init_notes to use struct object_id
builtin/rm: convert to use struct object_id
builtin/blame: convert file to use struct object_id
Convert read_mmblob to take struct object_id.
notes-merge: convert struct notes_merge_pair to struct object_id
builtin/checkout: convert some static functions to struct object_id
streaming: make stream_blob_to_fd take struct object_id
builtin: convert textconv_object to use struct object_id
builtin/cat-file: convert some static functions to struct object_id
builtin/cat-file: convert struct expand_data to use struct object_id
builtin/log: convert some static functions to use struct object_id
builtin/blame: convert struct origin to use struct object_id
builtin/apply: convert static functions to struct object_id
cache: convert struct cache_entry to use struct object_id
|
|
Teach "git blame" and "git annotate" the --compaction-heuristic and
--indent-heuristic options that are now supported by "git diff".
Also teach them to honor the `diff.compactionHeuristic` and
`diff.indentHeuristic` configuration options.
It would be conceivable to introduce separate configuration options for
"blame" and "annotate"; for example `blame.compactionHeuristic` and
`blame.indentHeuristic`. But it would be confusing to users if blame
output is inconsistent with diff output, so it makes more sense for them
to respect the same configuration.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
}
if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
$smtp_server = $_;
The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
$initial_reply_to =~ s/(^\s+|\s+$)//g;
}
+if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.
The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.
Commit d634d61ed6 improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.
This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.
Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:
1. It measures the following characteristics of a proposed splitting
position in a `struct split_measurement`:
* the number of blank lines above the proposed split
* whether the line directly after the split is blank
* the number of blank lines following that line
* the indentation of the nearest non-blank line above the split
* the indentation of the line directly below the split
* the indentation of the nearest non-blank line after that line
2. It combines the measured attributes using a bunch of
empirically-optimized weighting factors to derive a `struct
split_score` that measures the "badness" of splitting the text at
that position.
3. It combines the `split_score` for the top and the bottom of the
slider at each of its possible positions, and selects the position
that has the best `split_score`.
I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:
| repository | count | Git 2.9.0 | compaction | compaction-fixed | indent-1 | indent-2 |
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| afnetworking | 109 | 89 (81.7%) | 37 (33.9%) | 37 (33.9%) | 2 (1.8%) | 2 (1.8%) |
| alamofire | 30 | 18 (60.0%) | 14 (46.7%) | 15 (50.0%) | 0 (0.0%) | 0 (0.0%) |
| angular | 184 | 127 (69.0%) | 39 (21.2%) | 23 (12.5%) | 5 (2.7%) | 5 (2.7%) |
| animate | 313 | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) |
| ant | 380 | 356 (93.7%) | 152 (40.0%) | 148 (38.9%) | 15 (3.9%) | 15 (3.9%) | *
| bugzilla | 306 | 263 (85.9%) | 109 (35.6%) | 99 (32.4%) | 14 (4.6%) | 15 (4.9%) | *
| corefx | 126 | 91 (72.2%) | 22 (17.5%) | 21 (16.7%) | 6 (4.8%) | 6 (4.8%) |
| couchdb | 78 | 44 (56.4%) | 26 (33.3%) | 28 (35.9%) | 6 (7.7%) | 6 (7.7%) | *
| cpython | 937 | 158 (16.9%) | 50 (5.3%) | 49 (5.2%) | 5 (0.5%) | 5 (0.5%) | *
| discourse | 160 | 95 (59.4%) | 42 (26.2%) | 36 (22.5%) | 18 (11.2%) | 13 (8.1%) |
| docker | 307 | 194 (63.2%) | 198 (64.5%) | 253 (82.4%) | 8 (2.6%) | 8 (2.6%) | *
| electron | 163 | 132 (81.0%) | 38 (23.3%) | 39 (23.9%) | 6 (3.7%) | 6 (3.7%) |
| git | 536 | 470 (87.7%) | 73 (13.6%) | 78 (14.6%) | 16 (3.0%) | 16 (3.0%) | *
| gitflow | 127 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| ionic | 133 | 89 (66.9%) | 29 (21.8%) | 38 (28.6%) | 1 (0.8%) | 1 (0.8%) |
| ipython | 482 | 362 (75.1%) | 167 (34.6%) | 169 (35.1%) | 11 (2.3%) | 11 (2.3%) | *
| junit | 161 | 147 (91.3%) | 67 (41.6%) | 66 (41.0%) | 1 (0.6%) | 1 (0.6%) | *
| lighttable | 15 | 5 (33.3%) | 0 (0.0%) | 2 (13.3%) | 0 (0.0%) | 0 (0.0%) |
| magit | 88 | 75 (85.2%) | 11 (12.5%) | 9 (10.2%) | 1 (1.1%) | 0 (0.0%) |
| neural-style | 28 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| nodejs | 781 | 649 (83.1%) | 118 (15.1%) | 111 (14.2%) | 4 (0.5%) | 5 (0.6%) | *
| phpmyadmin | 491 | 481 (98.0%) | 75 (15.3%) | 48 (9.8%) | 2 (0.4%) | 2 (0.4%) | *
| react-native | 168 | 130 (77.4%) | 79 (47.0%) | 81 (48.2%) | 0 (0.0%) | 0 (0.0%) |
| rust | 171 | 128 (74.9%) | 30 (17.5%) | 27 (15.8%) | 16 (9.4%) | 14 (8.2%) |
| spark | 186 | 149 (80.1%) | 52 (28.0%) | 52 (28.0%) | 2 (1.1%) | 2 (1.1%) |
| tensorflow | 115 | 66 (57.4%) | 48 (41.7%) | 48 (41.7%) | 5 (4.3%) | 5 (4.3%) |
| test-more | 19 | 15 (78.9%) | 2 (10.5%) | 2 (10.5%) | 1 (5.3%) | 1 (5.3%) | *
| test-unit | 51 | 34 (66.7%) | 14 (27.5%) | 8 (15.7%) | 2 (3.9%) | 2 (3.9%) | *
| xmonad | 23 | 22 (95.7%) | 2 (8.7%) | 2 (8.7%) | 1 (4.3%) | 1 (4.3%) | *
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| totals | 6668 | 4391 (65.9%) | 1496 (22.4%) | 1491 (22.4%) | 150 (2.2%) | 144 (2.2%) |
| totals (training set) | 4552 | 3195 (70.2%) | 1053 (23.1%) | 1061 (23.3%) | 86 (1.9%) | 88 (1.9%) |
| totals (test set) | 2116 | 1196 (56.5%) | 443 (20.9%) | 430 (20.3%) | 64 (3.0%) | 56 (2.6%) |
In this table, the numbers are the count and percentage of human-rated
sliders that the corresponding algorithm got *wrong*. The columns are
* "repository" - the name of the repository used. I used the diffs
between successive non-merge commits on the HEAD branch of the
corresponding repository.
* "count" - the number of sliders that were human-rated. I chose most,
but not all, sliders to rate from those among which the various
algorithms gave different answers.
* "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.
* "compaction" - the heuristic used by `git diff --compaction-heuristic`
in Git 2.9.0.
* "compaction-fixed" - the heuristic used by `git diff
--compaction-heuristic` after the fixes from earlier in this patch
series. Note that the results are not dramatically different than
those for "compaction". Both produce non-ideal diffs only about 1/3 as
often as the default `git diff`.
* "indent-1" - the new `--indent-heuristic` algorithm, using the first
set of weighting factors, determined as described above.
* "indent-2" - the new `--indent-heuristic` algorithm, using the final
set of weighting factors, determined as described below.
* `*` - indicates that repo was part of training set used to determine
the first set of weighting factors.
The fact that the heuristic performed nearly as well on the test set as
on the training set in column "indent-1" is a good indication that the
heuristic was not over-trained. Given that fact, I ran a second round of
optimization, using the entire corpus as the training set. The resulting
set of weights gave the results in column "indent-2". These are the
weights included in this patch.
The final result gives consistently and significantly better results
across the whole corpus than either `git diff` or `git diff
--compaction-heuristic`. It makes only about 1/30 as many errors as the
former and about 1/10 as many errors as the latter. (And a good fraction
of the remaining errors are for diffs that involve weirdly-formatted
code, sometimes apparently machine-generated.)
The tools that were used to do this optimization and analysis, along
with the human-generated data values, are recorded in a separate project
[1].
This patch adds a new command-line option `--indent-heuristic`, and a
new configuration setting `diff.indentHeuristic`, that activate this
heuristic. This interface is only meant for testing purposes, and should
be finalized before including this change in any release.
[1] https://github.com/mhagger/diff-slider-tools
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code cleanup.
* sb/diff-cleanup:
diff: remove dead code
diff: omit found pointer from emit_callback
diff.c: use diff_options directly
|
|
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.
* jk/diff-submodule-diff-inline:
diff: teach diff to display submodule difference with an inline diff
submodule: refactor show_submodule_summary with helper function
submodule: convert show_submodule_summary to use struct object_id *
allow do_submodule_path to work even if submodule isn't checked out
diff: prepare for additional submodule formats
graph: add support for --line-prefix on all graph-aware output
diff.c: remove output_prefix_length field
cache: add empty_tree_oid object and helper function
|
|
When `len < 1`, len has to be 0 or negative, emit_line will then remove the
first character and by then `len` would be negative. As this doesn't
happen, it is safe to assume it is dead code.
This continues to simplify the code, which was started in b8d9c1a66b
(2009-09-03, diff.c: the builtin_diff() deals with only two-file
comparison).
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We keep the actual data in the diff options, which are just as accessible.
Remove the pointer stored in struct emit_callback for readability.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The value of `ecbdata->opt` is accessible via the short variable `o`
already, so let's use that instead.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach git-diff and friends a new format for displaying the difference
of a submodule. The new format is an inline diff of the contents of the
submodule between the commit range of the update. This allows the user
to see the actual code change caused by a submodule update.
Add tests for the new format and option.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since we're going to be changing this function in a future patch, lets
go ahead and convert this to use object_id now.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A future patch will add a new format for displaying the difference of
a submodule. Make it easier by changing how we store the current
selected format. Replace the DIFF_OPT flag with an enumeration, as each
format will be mutually exclusive.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=<string>" will print the
additional line-prefix on every line of output.
To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.
This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.
This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.
Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"diff/log --stat" has a logic that determines the display columns
available for the diffstat part of the output and apportions it for
pathnames and diffstat graph automatically.
5e71a84a (Add output_prefix_length to diff_options, 2012-04-16)
added the output_prefix_length field to diff_options structure to
allow this logic to subtract the display columns used for the
history graph part from the total "terminal width"; this matters
when the "git log --graph -p" option is in use.
The field must be set to the number of display columns needed to
show the output from the output_prefix() callback, which is error
prone. As there is only one user of the field, and the user has the
actual value of the prefix string, let's get rid of the field and
have the user count the display width itself.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When "git rebase" tries to compare set of changes on the updated
upstream and our own branch, it computes patch-id for all of these
changes and attempts to find matches. This has been optimized by
lazily computing the full patch-id (which is expensive) to be
compared only for changes that touch the same set of paths.
* kw/patch-ids-optim:
rebase: avoid computing unnecessary patch IDs
patch-ids: add flag to create the diff patch id using header only data
patch-ids: replace the seen indicator with a commit pointer
patch-ids: stop using a hand-rolled hashmap implementation
|
|
There is an optimization used in "git diff $treeA $treeB" to borrow
an already checked-out copy in the working tree when it is known to
be the same as the blob being compared, expecting that open/mmap of
such a file is faster than reading it from the object store, which
involves inflating and applying delta. This however kicked in even
when the checked-out copy needs to go through the convert-to-git
conversion (including the clean filter), which defeats the whole
point of the optimization. The optimization has been disabled when
the conversion is necessary.
* jk/diff-do-not-reuse-wtf-needs-cleaning:
diff: do not reuse worktree files that need "clean" conversion
|
|
This will allow a diff patch id to be created using only the header data
so that the contents of the file will not have to be loaded.
Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When accessing a blob for a diff, we may try to reuse file
contents in the working tree, under the theory that it is
faster to mmap those file contents than it would be to
extract the content from the object database.
When we have to filter those contents, though, that
assumption does not hold. Even for our internal conversions
like CRLF, we have to allocate and fill a new buffer anyway.
But much worse, for external clean filters we have to exec
an arbitrary script, and we have no idea how expensive it
may be to run.
So let's skip this optimization when conversion into git's
"clean" form is required. This applies whenever the
"want_file" flag is false. When it's true, the caller
actually wants the smudged worktree contents, which the
reused file by definition already has (in fact, this is a
key optimization going the other direction, since reusing
the worktree file there lets us skip smudge filters).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Conversion from unsigned char sha1[20] to struct object_id
continues.
* bc/cocci:
diff: convert prep_temp_blob() to struct object_id
merge-recursive: convert merge_recursive_generic() to object_id
merge-recursive: convert leaf functions to use struct object_id
merge-recursive: convert struct merge_file_info to object_id
merge-recursive: convert struct stage_data to use object_id
diff: rename struct diff_filespec's sha1_valid member
diff: convert struct diff_filespec to struct object_id
coccinelle: apply object_id Coccinelle transformations
coccinelle: convert hashcpy() with null_sha1 to hashclr()
contrib/coccinelle: add basic Coccinelle transforms
hex: add oid_to_hex_r()
|
|
All of the callers of this function use struct object_id, so convert it
to use struct object_id in its arguments and internally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that this struct's sha1 member is called "oid", update the comment
and the sha1_valid member to be called "oid_valid" instead. The
following Coccinelle semantic patch was used to implement this, followed
by the transformations in object_id.cocci:
@@
struct diff_filespec o;
@@
- o.sha1_valid
+ o.oid_valid
@@
struct diff_filespec *p;
@@
- p->sha1_valid
+ p->oid_valid
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert struct diff_filespec's sha1 member to use a struct object_id
called "oid" instead. The following Coccinelle semantic patch was used
to implement this, followed by the transformations in object_id.cocci:
@@
struct diff_filespec o;
@@
- o.sha1
+ o.oid.hash
@@
struct diff_filespec *p;
@@
- p->sha1
+ p->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
hashcpy with null_sha1 as the source is equivalent to hashclr. In
addition to being simpler, using hashclr may give the compiler a chance
to optimize better. Convert instances of hashcpy with the source
argument of null_sha1 to hashclr.
This transformation was implemented using the following semantic patch:
@@
expression E1;
@@
-hashcpy(E1, null_sha1);
+hashclr(E1);
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git diff --output=<file> --color=auto" used to show the ANSI color
sequence in the resulting file when the standard output is connected
to a terminal, because --color=auto check always checks the standard
output, not the actual file that receives the output.
We could correct this by using freopen(3) to redirect the standard
output to the specified file, which is in like with how format-patch
used to match the world order, but following the same reasoning as
the earlier "format-patch: explicitly switch off color when writing
to files", let's be more strict by bypassing the "auto" check when
the --output=<file> option is in use.
Strictly speaking, this is a backwards-incompatible change, but
it is highly unlikely that any user would want to see ANSI color
sequences in a file.
The reason this was not caught earlier is most likely that either
--output=<file> is not used, or only when stdout is redirected
anyway.
Users can still give --color=always if they want a colored diff in
the resulting file.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It turns out that the earlier effort to update the heuristics may
want to use a bit more time to mature. Turn it off by default.
* jk/diff-compact-heuristic:
diff: disable compaction heuristic for now
|
|
http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net
reports that a change to add a new "function" with common ending
with the existing one at the end of the file is shown like this:
def foo
do_foo_stuff()
+ common_ending()
+end
+
+def bar
+ do_bar_stuff()
+
common_ending()
end
when the new heuristic is in use. In reality, the change is to add
the blank line before "def bar" and everything below, which is what
the code without the new heuristic shows.
Disable the heuristics by default, and resurrect the documentation
for the option and the configuration variables, while clearly
marking the feature as still experimental.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Patch output from "git diff" and friends has been tweaked to be
more readable by using a blank line as a strong hint that the
contents before and after it belong to a logically separate unit.
* jk/diff-compact-heuristic:
diff: undocument the compaction heuristic knobs for experimentation
xdiff: implement empty line chunk heuristic
xdiff: add recs_match helper function
|
|
In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:
...
/*
+ *
+ *
+ */
+
+/*
...
instead of the more readable equivalent of
...
+/*
+ *
+ *
+ */
+
/*
...
Implement the following heuristic to (optionally) produce the desired
output.
If there are diff chunks which can be shifted around, shift each hunk
such that the last common empty line is below the chunk with the rest
of the context above.
This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, it can be disabled via diff.compactionHeuristic.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The end-user facing Porcelain level commands like "diff" and "log"
now enables the rename detection by default.
* mm/diff-renames-default:
diff: activate diff.renames by default
log: introduce init_log_defaults()
t: add tests for diff.renames (true/false/unset)
t4001-diff-rename: wrap file creations in a test
Documentation/diff-config: fix description of diff.renames
|
|
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
|
|
The memory ownership rule of fill_textconv() API, which was a bit
tricky, has been documented a bit better.
* jk/more-comments-on-textconv:
diff: clarify textconv interface
|
|
Rename detection is a very convenient feature, and new users shouldn't
have to dig in the documentation to benefit from it.
Potential objections to activating rename detection are that it
sometimes fail, and it is sometimes slow. But rename detection is
already activated by default in several cases like "git status" and "git
merge", so activating diff.renames does not fundamentally change the
situation. When the rename detection fails, it now fails consistently
between "git diff" and "git status".
This setting does not affect plumbing commands, hence well-written
scripts will not be affected.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We allocate 100 bytes to hold the "Submodule commit ..."
text. This is enough, but it's not immediately obvious that
this is the case, and we have to repeat the magic 100 twice.
We could get away with xstrfmt here, but we want to know the
size, as well, so let's use a real strbuf. And while we're
here, we can clean up the logic around size_only. It
currently sets and clears the "data" field pointlessly, and
leaves the "should_free" flag on even after we have cleared
the data.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The memory allocation scheme for the textconv interface is a
bit tricky, and not well documented. It was originally
designed as an internal part of diff.c (matching
fill_mmfile), but gradually was made public.
Refactoring it is difficult, but we can at least improve the
situation by documenting the intended flow and enforcing it
with an in-code assertion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A few options of "git diff" did not work well when the command was
run from a subdirectory.
* nd/diff-with-path-params:
diff: make -O and --output work in subdirectory
diff-no-index: do not take a redundant prefix argument
|
|
A few options of "git diff" did not work well when the command was
run from a subdirectory.
* nd/diff-with-path-params:
diff: make -O and --output work in subdirectory
diff-no-index: do not take a redundant prefix argument
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove no-longer used #include.
* tk/sigchain-unnecessary-post-tempfile:
shallow: remove unused #include "sigchain.h"
read-cache: remove unused #include "sigchain.h"
diff: remove unused #include "sigchain.h"
credential-cache--daemon: remove unused #include "sigchain.h"
|
|
After switching to use the tempfile module in commit 284098f1
(diff: use tempfile module), no declarations from sigchain.h are used in
diff.c anymore. Thus, remove the #include.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.
Macintosh-specific breakage was noticed and corrected in this
reroll.
* jk/war-on-sprintf: (70 commits)
name-rev: use strip_suffix to avoid magic numbers
use strbuf_complete to conditionally append slash
fsck: use for_each_loose_file_in_objdir
Makefile: drop D_INO_IN_DIRENT build knob
fsck: drop inode-sorting code
convert strncpy to memcpy
notes: document length of fanout path with a constant
color: add color_set helper for copying raw colors
prefer memcpy to strcpy
help: clean up kfmclient munging
receive-pack: simplify keep_arg computation
avoid sprintf and strcpy with flex arrays
use alloc_ref rather than hand-allocating "struct ref"
color: add overflow checks for parsing colors
drop strcpy in favor of raw sha1_to_hex
use sha1_to_hex_r() instead of strcpy
daemon: use cld->env_array when re-spawning
stat_tracking_info: convert to argv_array
http-push: use an argv_array for setup_revisions
fetch-pack: use argv_array for index-pack / unpack-objects
...
|
|
Before sha1_to_hex_r() existed, a simple way to get hex
sha1 into a buffer was with:
strcpy(buf, sha1_to_hex(sha1));
This isn't wrong (assuming the buf is 41 characters), but it
makes auditing the code base for bad strcpy() calls harder,
as these become false positives.
Let's convert them to sha1_to_hex_r(), and likewise for
some calls to find_unique_abbrev(). While we're here, we'll
double-check that all of the buffers are correctly sized,
and use the more obvious GIT_SHA1_HEXSZ constant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
When we call into xdiff to perform a diff, we generally lose
the return code completely. Typically by ignoring the return
of our xdi_diff wrapper, but sometimes we even propagate
that return value up and then ignore it later. This can
lead to us silently producing incorrect diffs (e.g., "git
log" might produce no output at all, not even a diff header,
for a content-level diff).
In practice this does not happen very often, because the
typical reason for xdiff to report failure is that it
malloc() failed (it uses straight malloc, and not our
xmalloc wrapper). But it could also happen when xdiff
triggers one our callbacks, which returns an error (e.g.,
outf() in builtin/rerere.c tries to report a write failure
in this way). And the next patch also plans to add more
failure modes.
Let's notice an error return from xdiff and react
appropriately. In most of the diff.c code, we can simply
die(), which matches the surrounding code (e.g., that is
what we do if we fail to load a file for diffing in the
first place). This is not that elegant, but we are probably
better off dying to let the user know there was a problem,
rather than simply generating bogus output.
We could also just die() directly in xdi_diff, but the
callers typically have a bit more context, and can provide a
better message (and if we do later decide to pass errors up,
we're one step closer to doing so).
There is one interesting case, which is in diff_grep(). Here
if we cannot generate the diff, there is nothing to match,
and we silently return "no hits". This is actually what the
existing code does already, but we make it a little more
explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.
However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The gitmodules API accessed from the C code learned to cache stuff
lazily.
* hv/submodule-config:
submodule: allow erroneous values for the fetchRecurseSubmodules option
submodule: use new config API for worktree configurations
submodule: extract functions for config set and lookup
submodule: implement a config API for lookup of .gitmodules values
|
|
The "lockfile" API has been rebuilt on top of a new "tempfile" API.
* mh/tempfile:
credential-cache--daemon: use tempfile module
credential-cache--daemon: delete socket from main()
gc: use tempfile module to handle gc.pid file
lock_repo_for_gc(): compute the path to "gc.pid" only once
diff: use tempfile module
setup_temporary_shallow(): use tempfile module
write_shared_index(): use tempfile module
register_tempfile(): new function to handle an existing temporary file
tempfile: add several functions for creating temporary files
prepare_tempfile_object(): new function, extracted from create_tempfile()
tempfile: a new module for handling temporary files
commit_lock_file(): use get_locked_file_path()
lockfile: add accessor get_lock_file_path()
lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
create_bundle(): duplicate file descriptor to avoid closing it twice
lockfile: move documentation to lockfile.h and lockfile.c
|