summaryrefslogtreecommitdiff
path: root/xdiff/xprepare.c
AgeCommit message (Collapse)AuthorFilesLines
2021-11-18xdiff: avoid unnecessary memory allocationsLibravatar Phillip Wood1-4/+7
rindex and ha are only used by xdl_cleanup_records() which is not called by the histogram or patience algorithms. The perf test results show a small reduction in run time but that is probably within the noise. Test HEAD^ HEAD ----------------------------------------------------------------------------- 4000.1: log -3000 (baseline) 0.19(0.17+0.02) 0.19(0.12+0.07) +0.0% 4000.2: log --raw -3000 (tree-only) 0.98(0.78+0.20) 0.98(0.81+0.16) +0.0% 4000.3: log -p -3000 (Myers) 4.81(4.15+0.64) 4.81(4.23+0.56) +0.0% 4000.4: log -p -3000 --histogram 5.87(5.19+0.66) 5.83(5.11+0.70) -0.7% 4000.5: log -p -3000 --patience 5.35(4.60+0.73) 5.31(4.61+0.69) -0.7% Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-18diff histogram: intern stringsLibravatar Phillip Wood1-16/+8
Histogram is the only diff algorithm not to call xdl_classify_record(). xdl_classify_record() ensures that the hash values of two strings that are not equal differ which means that it is not necessary to use xdl_recmatch() when comparing lines, all that is necessary is to compare the hash values. This gives a 7% reduction in the runtime of "git log --patch" when using the histogram diff algorithm. Test HEAD^ HEAD ----------------------------------------------------------------------------- 4000.1: log -3000 (baseline) 0.18(0.14+0.04) 0.19(0.17+0.02) +5.6% 4000.2: log --raw -3000 (tree-only) 0.99(0.77+0.21) 0.98(0.78+0.20) -1.0% 4000.3: log -p -3000 (Myers) 4.84(4.31+0.51) 4.81(4.15+0.64) -0.6% 4000.4: log -p -3000 --histogram 6.34(5.86+0.46) 5.87(5.19+0.66) -7.4% 4000.5: log -p -3000 --patience 5.39(4.60+0.76) 5.35(4.60+0.73) -0.7% Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-09Replace Free Software Foundation address in license noticesLibravatar Todd Zullinger1-2/+2
The mailing address for the FSF has changed over the years. Rather than updating the address across all files, refer readers to gnu.org, as the GNU GPL documentation now suggests for license notices. The mailing address is retained in the full license files (COPYING and LGPL-2.1). The old address is still present in t/diff-lib/COPYING. This is intentional, as the file is used in tests and the contents are not expected to change. Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-04xdiff/xprepare: fix a memory leakLibravatar Ramsay Jones1-0/+1
The xdl_prepare_env() function may initialise an xdlclassifier_t data structure via xdl_init_classifier(), which allocates memory to several fields, for example 'rchash', 'rcrecs' and 'ncha'. If this function later exits due to the failure of xdl_optimize_ctxs(), then this xdlclassifier_t structure, and the memory allocated to it, is not cleaned up. In order to fix the memory leak, insert a call to xdl_free_classifier() before returning. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-04xdiff/xprepare: use the XDF_DIFF_ALG() macro to access flag bitsLibravatar Ramsay Jones1-1/+1
Commit 307ab20b3 ("xdiff: PATIENCE/HISTOGRAM are not independent option bits", 19-02-2012) introduced the XDF_DIFF_ALG() macro to access the flag bits used to represent the diff algorithm requested. In addition, code which had used explicit manipulation of the flag bits was changed to use the macros. However, one example of direct manipulation remains. Update this code to use the XDF_DIFF_ALG() macro. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-19xdiff: PATIENCE/HISTOGRAM are not independent option bitsLibravatar Junio C Hamano1-11/+10
Because the default Myers, patience and histogram algorithms cannot be in effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not independent bits. Instead of wasting one bit per algorithm, define a few macros to access the few bits they occupy and update the code that access them. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-13Merge branch 'rs/diff-cleanup-records-fix'Libravatar Junio C Hamano1-3/+7
* rs/diff-cleanup-records-fix: diff: resurrect XDF_NEED_MINIMAL with --minimal Revert removal of multi-match discard heuristic in 27af01
2011-09-26Revert removal of multi-match discard heuristic in 27af01Libravatar René Scharfe1-3/+7
27af01d (xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records(), 2011-08-17) was supposed to be a performance boost only. However, it unexpectedly changed the behaviour of diff. Revert a part of 27af01d that removes logic that mark lines as "multi-match" (ie. dis[i] == 2). This was preventing the multi-match discard heuristic (performed in xdl_cleanup_records() and xdl_clean_mmatch()) from executing. Reported-by: Alexander Pepper <pepper@inf.fu-berlin.de> Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-06Merge branch 'rc/histogram-diff'Libravatar Junio C Hamano1-0/+2
* rc/histogram-diff: xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()
2011-08-31xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()Libravatar Tay Ray Chuan1-0/+2
Ensure that the xdl_free_classifier() call on xdlclassifier_t cf is safe even if xdl_init_classifier() isn't called. This may occur in the case where diff is run with --histogram and a call to, say, xdl_prepare_ctx() fails. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-17Merge branch 'rc/histogram-diff' into HEADLibravatar Junio C Hamano1-80/+63
* rc/histogram-diff: xdiff/xhistogram: drop need for additional variable xdiff/xhistogram: rely on xdl_trim_ends() xdiff/xhistogram: rework handling of recursed results xdiff: do away with xdl_mmfile_next() Make test number unique xdiff/xprepare: use a smaller sample size for histogram diff xdiff/xprepare: skip classification teach --histogram to diff t4033-diff-patience: factor out tests xdiff/xpatience: factor out fall-back-diff function xdiff/xprepare: refactor abort cleanups xdiff/xprepare: use memset() Conflicts: xdiff/xprepare.c
2011-08-17xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records()Libravatar Tay Ray Chuan1-36/+50
In xdl_cleanup_records(), we see O(n*m) performance, where n is the number of records from xdf->dstart to xdf->dend, and m is the size of a bucket in xdf->rhash (<= by mlim). Here, we improve this to O(n) by pre-computing nm (in rcrec->len(1|2)) in xdl_classify_record(). Reported-by: Marat Radchenko <marat@slonopotamus.org> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-03xdiff: do away with xdl_mmfile_next()Libravatar Tay Ray Chuan1-6/+1
Given our simple mmfile structure, xdl_mmfile_next() calls are redundant. Do away with calls to them. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-12xdiff/xprepare: use a smaller sample size for histogram diffLibravatar Tay Ray Chuan1-3/+14
For histogram diff, we can afford a smaller sample size and thus a poorer estimate of the number of lines, as the hash table (rhash) won't be filled up/grown. This is safe as the final count of lines (xdf.nrecs) will be updated correctly anyway by xdl_prepare_ctx(). This gives us a small boost in performance. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-12xdiff/xprepare: skip classificationLibravatar Tay Ray Chuan1-8/+16
xdiff performs "classification" of records (xdl_classify_record()), replacing hashes (xrecord_t.ha) with a unique identifier of the record/line and building a hash table (xrecord_t.rhash) of records. This is then used to "cleanup" records (xdl_cleanup_records()). We don't need any of that in histogram diff, so we omit calls to these functions. We also skip allocating memory to the hash table, rhash, as it is no longer used. This gives us a small boost in performance. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-07xdiff/xprepare: refactor abort cleanupsLibravatar Tay Ray Chuan1-59/+32
Group free()'s that are called when a malloc() fails in xdl_prepare_ctx(), making for more readable code. Also add a free() on ha, in case future git hackers add allocs after the ha malloc. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-07xdiff/xprepare: use memset()Libravatar Tay Ray Chuan1-7/+3
Use memset() instead of a for loop to initialize. This could give a performance advantage. Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-07Implement the patience diff algorithmLibravatar Johannes Schindelin1-1/+2
The patience diff algorithm produces slightly more intuitive output than the classic Myers algorithm, as it does not try to minimize the number of +/- lines first, but tries to preserve the lines that are unique. To this end, it first determines lines that are unique in both files, then the maximal sequence which preserves the order (relative to both files) is extracted. Starting from this initial set of common lines, the rest of the lines is handled recursively, with Myers' algorithm as a fallback when the patience algorithm fails (due to no common unique lines). This patch includes memory leak fixes by Pierre Habouzit. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-08xdiff: give up scanning similar lines earlyLibravatar Davide Libenzi1-2/+13
In a corner case of large files whose lines do not match uniquely, the loop to eliminate a line that matches multiple locations adjacent to a run of lines that do not uniquely match wasted too much cycles. Fix this by giving up early after scanning 100 lines in both direction. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07War on whitespaceLibravatar Junio C Hamano1-1/+0
This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2006-06-23Teach diff about -b and -w flagsLibravatar Johannes Schindelin1-6/+10
This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to diff. The main part of the patch is teaching libxdiff about it. [jc: renamed xdl_line_match() to xdl_recmatch() since the former is used for different purposes in xpatchi.c which is in the parts of the upstream source we do not use.] Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04Clean-up trivially redundant diff.Libravatar Davide Libenzi1-17/+46
Also corrects the line numbers in unified output when using zero lines context.
2006-03-25Use a *real* built-in diff generatorLibravatar Linus Torvalds1-0/+436
This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>