xdiff: reduce indent heuristic overhead

Skip searching for better indentation heuristics if we'd slide a hunk more than its size. This is the easiest fix proposed in the analysis[1] in response to a patch that mercurial took for xdiff to limit searching by a constant. Using a performance test as: #!python open('a', 'w').write(" \n" * 1000000) open('b', 'w').write(" \n" * 1000001) This patch reduces the execution of "git diff --no-index a b" from 0.70s to 0.31s. However limiting the sliding to the size of the diff hunk, which was proposed as a solution (that I found easiest to implement for now) is not optimal for cases like open('a', 'w').write(" \n" * 1000000) open('b', 'w').write(" \n" * 2000000) as then we'd still slide 1000000 times. In addition to limiting the sliding to size of the hunk, also limit by a constant. Choose 100 lines as the constant as that fits more than a screen, which really means that the diff sliding is probably not providing a lot of benefit anyway. [1] https://public-inbox.org/git/72ac1ac2-f567-f241-41d6-d0f83072e0b3@alum.mit.edu/ Reported-by: Jun Wu <quark@fb.com> Analysis-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Stefan Beller <sbeller@google.com> 2018-07-27 15:23:56 -0700
committer: Junio C Hamano <gitster@pobox.com> 2018-08-01 13:36:22 -0700
commit: 301ef8540155316cb87c896866dd1cab3108807b (patch)
tree: 487204f9308324c012996d70c546a304ecf5545d /xdiff
parent: Git 2.18 (diff)
download: tgif-301ef8540155316cb87c896866dd1cab3108807b.tar.xz
1 files changed, 11 insertions, 1 deletions
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463b..91e98ee986 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -592,6 +592,11 @@ static void measure_split(const xdfile_t *xdf, long split,
 #define INDENT_WEIGHT 60
 
 /*
+ * How far do we slide a hunk at most?
+ */
+#define INDENT_HEURISTIC_MAX_SLIDING 100
+
+/*
  * Compute a badness score for the hypothetical split whose measurements are
  * stored in m. The weight factors were determined empirically using the tools and
  * corpus described in
@@ -903,7 +908,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			long shift, best_shift = -1;
 			struct split_score best_score;
 
-			for (shift = earliest_end; shift <= g.end; shift++) {
+			shift = earliest_end;
+			if (g.end - groupsize - 1 > shift)
+				shift = g.end - groupsize - 1;
+			if (g.end - INDENT_HEURISTIC_MAX_SLIDING > shift)
+				shift = g.end - INDENT_HEURISTIC_MAX_SLIDING;
+			for (; shift <= g.end; shift++) {
 				struct split_measurement m;
 				struct split_score score = {0, 0};
author	Stefan Beller <sbeller@google.com>	2018-07-27 15:23:56 -0700
committer	Junio C Hamano <gitster@pobox.com>	2018-08-01 13:36:22 -0700
commit	301ef8540155316cb87c896866dd1cab3108807b (patch)
tree	487204f9308324c012996d70c546a304ecf5545d /xdiff
parent	Git 2.18 (diff)
download	tgif-301ef8540155316cb87c896866dd1cab3108807b.tar.xz