From 3443546f6ef57fe28ea5cca232df8e400bfc3883 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 24 Mar 2006 20:13:22 -0800 Subject: Use a *real* built-in diff generator This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds Signed-off-by: Junio C Hamano --- xdiff/xutils.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 xdiff/xutils.h (limited to 'xdiff/xutils.h') diff --git a/xdiff/xutils.h b/xdiff/xutils.h new file mode 100644 index 0000000000..428a4bb1ef --- /dev/null +++ b/xdiff/xutils.h @@ -0,0 +1,44 @@ +/* + * LibXDiff by Davide Libenzi ( File Differential Library ) + * Copyright (C) 2003 Davide Libenzi + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Davide Libenzi + * + */ + +#if !defined(XUTILS_H) +#define XUTILS_H + + +int xdl_emit_diffrec(char const *rec, long size, char const *pre, long psize, + xdemitcb_t *ecb); +int xdl_cha_init(chastore_t *cha, long isize, long icount); +void xdl_cha_free(chastore_t *cha); +void *xdl_cha_alloc(chastore_t *cha); +void *xdl_cha_first(chastore_t *cha); +void *xdl_cha_next(chastore_t *cha); +long xdl_guess_lines(mmfile_t *mf); +unsigned long xdl_hash_record(char const **data, char const *top); +unsigned int xdl_hashbits(unsigned int size); +int xdl_num_out(char *out, long val); +long xdl_atol(char const *str, char const **next); +int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2, xdemitcb_t *ecb); + + + +#endif /* #if !defined(XUTILS_H) */ + -- cgit v1.2.3 From acb725772964ee11656543a28c303e9aa6d092c5 Mon Sep 17 00:00:00 2001 From: Mark Wooding Date: Tue, 28 Mar 2006 03:23:31 +0100 Subject: xdiff: Show function names in hunk headers. The speed of the built-in diff generator is nice; but the function names shown by `diff -p' are /really/ nice. And I hate having to choose. So, we hack xdiff to find the function names and print them. xdiff has grown a flag to say whether to dig up the function names. The builtin_diff function passes this flag unconditionally. I suppose it could parse GIT_DIFF_OPTS, but it doesn't at the moment. I've also reintroduced the `function name' into the test suite, from which it was removed in commit 3ce8f089. The function names are parsed by a particularly stupid algorithm at the moment: it just tries to find a line in the `old' file, from before the start of the hunk, whose first character looks plausible. Still, it's most definitely a start. Signed-off-by: Mark Wooding Signed-off-by: Junio C Hamano --- xdiff/xutils.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'xdiff/xutils.h') diff --git a/xdiff/xutils.h b/xdiff/xutils.h index 428a4bb1ef..55b0d39f49 100644 --- a/xdiff/xutils.h +++ b/xdiff/xutils.h @@ -36,7 +36,8 @@ unsigned long xdl_hash_record(char const **data, char const *top); unsigned int xdl_hashbits(unsigned int size); int xdl_num_out(char *out, long val); long xdl_atol(char const *str, char const **next); -int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2, xdemitcb_t *ecb); +int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2, + const char *func, long funclen, xdemitcb_t *ecb); -- cgit v1.2.3 From ca557afff9f7dad7a8739cd193ac0730d872e282 Mon Sep 17 00:00:00 2001 From: Davide Libenzi Date: Mon, 3 Apr 2006 18:47:55 -0700 Subject: Clean-up trivially redundant diff. Also corrects the line numbers in unified output when using zero lines context. --- xdiff/xutils.h | 1 + 1 file changed, 1 insertion(+) (limited to 'xdiff/xutils.h') diff --git a/xdiff/xutils.h b/xdiff/xutils.h index 55b0d39f49..ea38ee903f 100644 --- a/xdiff/xutils.h +++ b/xdiff/xutils.h @@ -24,6 +24,7 @@ #define XUTILS_H +long xdl_bogosqrt(long n); int xdl_emit_diffrec(char const *rec, long size, char const *pre, long psize, xdemitcb_t *ecb); int xdl_cha_init(chastore_t *cha, long isize, long icount); -- cgit v1.2.3 From d281786fcd6d0df47dd46e415f1a804b2e81ed9b Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Mon, 19 Jun 2006 17:01:35 -0700 Subject: xdiff: minor changes to match libxdiff-0.21 This reformats the change 621c53cc082299eaf69e9f2dc0274547c7d87fb0 introduced to match what upstream author implemented in libxdiff-0.21 without changing any logic (hopefully ;-). This is to help keep us in sync with the upstream. Signed-off-by: Junio C Hamano --- xdiff/xutils.h | 1 + 1 file changed, 1 insertion(+) (limited to 'xdiff/xutils.h') diff --git a/xdiff/xutils.h b/xdiff/xutils.h index ea38ee903f..08691a2447 100644 --- a/xdiff/xutils.h +++ b/xdiff/xutils.h @@ -24,6 +24,7 @@ #define XUTILS_H + long xdl_bogosqrt(long n); int xdl_emit_diffrec(char const *rec, long size, char const *pre, long psize, xdemitcb_t *ecb); -- cgit v1.2.3 From 0d21efa51cc7de5250d5da46bceacda78ba35373 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Wed, 14 Jun 2006 17:40:23 +0200 Subject: Teach diff about -b and -w flags This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to diff. The main part of the patch is teaching libxdiff about it. [jc: renamed xdl_line_match() to xdl_recmatch() since the former is used for different purposes in xpatchi.c which is in the parts of the upstream source we do not use.] Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano --- xdiff/xutils.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'xdiff/xutils.h') diff --git a/xdiff/xutils.h b/xdiff/xutils.h index 08691a2447..70d8b9838a 100644 --- a/xdiff/xutils.h +++ b/xdiff/xutils.h @@ -34,7 +34,8 @@ void *xdl_cha_alloc(chastore_t *cha); void *xdl_cha_first(chastore_t *cha); void *xdl_cha_next(chastore_t *cha); long xdl_guess_lines(mmfile_t *mf); -unsigned long xdl_hash_record(char const **data, char const *top); +int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags); +unsigned long xdl_hash_record(char const **data, char const *top, long flags); unsigned int xdl_hashbits(unsigned int size); int xdl_num_out(char *out, long val); long xdl_atol(char const *str, char const **next); -- cgit v1.2.3 From a6080a0a44d5ead84db3dabbbc80e82df838533d Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Thu, 7 Jun 2007 00:04:01 -0700 Subject: War on whitespace This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano --- xdiff/xutils.h | 1 - 1 file changed, 1 deletion(-) (limited to 'xdiff/xutils.h') diff --git a/xdiff/xutils.h b/xdiff/xutils.h index 70d8b9838a..d5de8292e0 100644 --- a/xdiff/xutils.h +++ b/xdiff/xutils.h @@ -45,4 +45,3 @@ int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2, #endif /* #if !defined(XUTILS_H) */ - -- cgit v1.2.3 From 1d26b252f1128f7b31885811d7f481b6b7612bd7 Mon Sep 17 00:00:00 2001 From: Tay Ray Chuan Date: Thu, 7 Jul 2011 12:23:57 +0800 Subject: xdiff/xpatience: factor out fall-back-diff function This is in preparation for the histogram diff algorithm, which will also re-use much of the code to call the default Meyers diff algorithm. Signed-off-by: Tay Ray Chuan Signed-off-by: Junio C Hamano --- xdiff/xutils.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'xdiff/xutils.h') diff --git a/xdiff/xutils.h b/xdiff/xutils.h index d5de8292e0..674a657b08 100644 --- a/xdiff/xutils.h +++ b/xdiff/xutils.h @@ -41,6 +41,8 @@ int xdl_num_out(char *out, long val); long xdl_atol(char const *str, char const **next); int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2, const char *func, long funclen, xdemitcb_t *ecb); +int xdl_fall_back_diff(xdfenv_t *diff_env, xpparam_t const *xpp, + int line1, int count1, int line2, int count2); -- cgit v1.2.3 From 86abba801575892a8a2b161aa29518c1ed05e1f1 Mon Sep 17 00:00:00 2001 From: Tay Ray Chuan Date: Tue, 12 Jul 2011 14:10:27 +0800 Subject: xdiff/xprepare: use a smaller sample size for histogram diff For histogram diff, we can afford a smaller sample size and thus a poorer estimate of the number of lines, as the hash table (rhash) won't be filled up/grown. This is safe as the final count of lines (xdf.nrecs) will be updated correctly anyway by xdl_prepare_ctx(). This gives us a small boost in performance. Signed-off-by: Tay Ray Chuan Signed-off-by: Junio C Hamano --- xdiff/xutils.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'xdiff/xutils.h') diff --git a/xdiff/xutils.h b/xdiff/xutils.h index 674a657b08..714719a89c 100644 --- a/xdiff/xutils.h +++ b/xdiff/xutils.h @@ -33,7 +33,7 @@ void xdl_cha_free(chastore_t *cha); void *xdl_cha_alloc(chastore_t *cha); void *xdl_cha_first(chastore_t *cha); void *xdl_cha_next(chastore_t *cha); -long xdl_guess_lines(mmfile_t *mf); +long xdl_guess_lines(mmfile_t *mf, long sample); int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags); unsigned long xdl_hash_record(char const **data, char const *top, long flags); unsigned int xdl_hashbits(unsigned int size); -- cgit v1.2.3