summaryrefslogtreecommitdiff
path: root/tree-walk.c
AgeCommit message (Collapse)AuthorFilesLines
2010-01-03traverse_trees(): handle D/F conflict case sanelyLibravatar Junio C Hamano1-43/+234
traverse_trees() is supposed to call its callback with all the matching entries from the given trees. The current algorithm keeps a pointer to each of the tree being traversed, and feeds the entry with the earliest name to the callback. This breaks down if the trees being traversed looks like this: A B t-1 t t-2 u t/a v When we are currently looking at an entry "t-1" in tree A, and tree B has returned "t", feeding "t" from the B and not feeding anything from A, only because "t-1" sorts later than "t", will miss an entry for a subtree "t" behind the current entry in tree A. This introduces extended_entry_extract() helper function that gives what name is expected from the tree, and implements a mechanism to look-ahead in the tree object using it, to make sure such a case is handled sanely. Traversal in tree A in the above example will first return "t" to match that of B, and then the next request for an entry to A then returns "t-1". This roughly corresponds to what Linus's "prepare for one-entry lookahead" wanted to do, but because this does implement look ahead, t6035 and one more test in t1012 reveal that the approach would not work without adjusting the side that walks the index in unpack_trees() as well. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Fix tree-walking compare_entry() in the presense of --prefixLibravatar Linus Torvalds1-0/+3
When we make the "root" tree-walk info entry have a pathname in it, we need to have a ->prev pointer so that compare_entry will actually notice and traverse into the root. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Make 'traverse_trees()' traverse conflicting DF entries in parallelLibravatar Linus Torvalds1-2/+6
This makes the traverse_trees() entry comparator routine use the more relaxed form of name comparison that considers files and directories with the same name identical. We pass in a separate mask for just the directory entries, so that the callback routine can decide (if it wants to) to only handle one or the other type, but generally most (all?) users are expected to really want to see the case of a name 'foo' showing up in one tree as a file and in another as a directory at the same time. In particular, moving 'unpack_trees()' over to use this tree traversal mechanism requires this. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Add return value to 'traverse_tree()' callbackLibravatar Linus Torvalds1-7/+15
This allows the callback to return an error value, but it can also specify which of the tree entries that it actually used up by returning a positive mask value. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Make 'traverse_tree()' use linked structure rather than 'const char *base'Libravatar Linus Torvalds1-2/+33
This makes the calling convention a bit less obvious, but a lot more flexible. Instead of allocating and extending a new 'base' string, we just link the top-most name into a linked list of the 'info' structure when traversing a subdirectory, and we can generate the basename by following the list. Perhaps even more importantly, the linked list of info structures also gives us a place to naturally save off other information than just the directory name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-06tree-walk: don't parse incorrect entriesLibravatar Martin Koegler1-2/+8
The current code can access memory outside of the tree buffer in the case of malformed tree entries. This patch prevents this by: * The rest of the buffer must be at least 24 bytes (at least 1 byte mode, 1 blank, at least one byte path name, 1 NUL, 20 bytes sha1). * Check that the last NUL (21 bytes before the end) is present. This ensures that strlen() and get_mode() calls stay within the buffer. * The mode may not be empty. We have only to reject a blank at the begin, as the rest is handled by if (c < '0' || c > '7'). * The blank is ensured by get_mode(). * The path must contain at least one character. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07War on whitespaceLibravatar Junio C Hamano1-1/+0
This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-03-21Switch over tree descriptors to contain a pre-parsed entryLibravatar Linus Torvalds1-57/+44
This makes the tree descriptor contain a "struct name_entry" as part of it, and it gets filled in so that it always contains a valid entry. On some benchmarks, it improves performance by up to 15%. That makes tree entry "extract" trivial, and means that we only actually need to decode each tree entry just once: we decode the first one when we initialize the tree descriptor, and each subsequent one when doing "update_tree_entry()". In particular, this means that we don't need to do strlen() both at extract time _and_ at update time. Finally, it also allows more sharing of code (entry_extract(), that wanted a "struct name_entry", just got totally trivial, along with the "tree_entry()" function). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21Initialize tree descriptors with a helper function rather than by hand.Libravatar Linus Torvalds1-9/+15
This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21Remove "pathlen" from "struct name_entry"Libravatar Linus Torvalds1-4/+2
Since we have the "tree_entry_len()" helper function these days, and don't need to do a full strlen(), there's no point in saving the path length - it's just redundant information. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-18Avoid unnecessary strlen() callsLibravatar Linus Torvalds1-2/+2
This is a micro-optimization that grew out of the mailing list discussion about "strlen()" showing up in profiles. We used to pass regular C strings around to the low-level tree walking routines, and while this worked fine, it meant that we needed to call strlen() on strings that the caller always actually knew the size of anyway. So pass the length of the string down wih the string, and avoid unnecessary calls to strlen(). Also, when extracting a pathname from a tree entry, use "tree_entry_len()" instead of strlen(), since the length of the pathname is directly calculable from the decoded tree entry itself without having to actually do another strlen(). This shaves off another ~5-10% from some loads that are very tree intensive (notably doing commit filtering by a pathspec). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>" Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-09get_tree_entry: map blank requested entry to tree rootLibravatar Jeff King1-1/+8
This means that git show HEAD: will now return HEAD^{tree}, which is logically consistent with git show HEAD:Documentation Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-04Remove shadowing variable from traverse_trees()Libravatar René Scharfe1-1/+0
The variable named entry is allocated using malloc() and then forgotten, it being shadowed by an automatic variable of the same name. Fixing the array size at 3 worked so far because the only caller of traverse_trees() needed only as much entries. Simply remove the shadowing varaible and we're able to traverse more than three trees and save stack space at the same time! Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23Convert memcpy(a,b,20) to hashcpy(a,b).Libravatar Shawn Pearce1-2/+2
This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void*. This is a reasonable tradeoff as most call sites already use unsigned char* and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20Remove all void-pointer arithmetic.Libravatar Florian Forster1-5/+6
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30tree_entry(): new tree-walking helper functionLibravatar Linus Torvalds1-2/+31
This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-28Don't use "sscanf()" for tree mode scanningLibravatar Linus Torvalds1-3/+18
Doing an oprofile run on the result of my git rev-list memory leak fixes and tree parsing cleanups, I was surprised by the third-highest entry being samples % image name app name symbol name 179751 2.7163 libc-2.4.so libc-2.4.so _IO_vfscanf@@GLIBC_2.4 where that 2.7% is actually more than 5% of one CPU, because this was run on a dual CPU setup with the other CPU just being idle. That seems to all be from the use of 'sscanf(tree, "%o", &mode)' for the tree buffer parsing. So do the trivial octal parsing by hand, which also gives us where the first space in the string is (and thus where the pathname starts) so we can get rid of the "strchr(tree, ' ')" call too. This brings the "git rev-list --all --objects" time down from 63 seconds to 55 seconds on the historical kernel archive for me, so it's quite noticeable - tree parsing is a lot of what we end up doing when following all the objects. [ I also see a 5% speedup on a full "git fsck-objects" on the current kernel archive, so that sscanf() really does seem to have hurt our performance by a surprising amount ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-19get_tree_entry(): make it available from tree-walkLibravatar Junio C Hamano1-0/+50
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04Use blob_, commit_, tag_, and tree_type throughout.Libravatar Peter Eriksen1-1/+2
This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-29tree/diff header cleanup.Libravatar Junio C Hamano1-0/+116
Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net>