summaryrefslogtreecommitdiff
path: root/Documentation/technical/index-format.txt
diff options
context:
space:
mode:
authorLibravatar Kirill Smelkov <kirr@mns.spb.ru>2014-01-20 20:20:40 +0400
committerLibravatar Junio C Hamano <gitster@pobox.com>2014-02-24 14:44:57 -0800
commit8518ff8fabc43aa96f1d8c8cd3de7f399d51d11e (patch)
tree8ff31f6ad2db2f36f8c6648765c8504512f98758 /Documentation/technical/index-format.txt
parentdiff test: add tests for combine-diff with orderfile (diff)
downloadtgif-8518ff8fabc43aa96f1d8c8cd3de7f399d51d11e.tar.xz
combine-diff: optimize combine_diff_path sets intersection
When generating combined diff, for each commit, we intersect diff paths from diff(parent_0,commit) to diff(parent_i,commit) comparing all paths pairs, i.e. doing it the quadratic way. That is correct, but could be optimized. Paths come from trees in sorted (= tree) order, and so does diff_tree() emits resulting paths in that order too. Now if we look at diffcore transformations, all of them, except diffcore_order, preserve resulting path ordering: - skip_stat_unmatch, grep, pickaxe, filter -- just skip elements -> order stays preserved - break -- just breaks diff for a path, adding path dup after the path -> order stays preserved - detect rename/copy -- resulting paths are emitted sorted (verified empirically) So only diffcore_order changes diff paths ordering. But diffcore_order meaning affects only presentation - i.e. only how to show the diff, so we could do all the internal computations without paths reordering, and order only resultant paths set. This is faster, since, if we know two paths sets are all ordered, their intersection could be done in linear time. This patch does just that. Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log") and with `-c` ("git log -c") before and after the patch are as follows: linux.git v3.10..v3.11 log log -c before 1.9s 20.4s after 1.9s 16.6s navy.git (private repo) log log -c before 0.83s 15.6s after 0.83s 2.1s P.S. I think linux.git case is sped up not so much as the second one, since in navy.git, there are more exotic (subtree, etc) merges. P.P.S. My tracing showed that the rest of the time (16.6s vs 1.9s) is usually spent in computing huge diffs from commit to second parent. Will try to deal with it, if I'll have time. P.P.P.S. For combine_diff_path, ->len is not needed anymore - will remove it in the next noisy cleanup path, to maintain good signal/noise ratio here. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/technical/index-format.txt')
0 files changed, 0 insertions, 0 deletions