diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2008-05-09 09:21:07 -0700 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2008-05-10 18:16:31 -0700 |
commit | c40641b77b0274186fd1b327d5dc3246f814aaaf (patch) | |
tree | 9d455fe976c1d08a3ea0cab763bb45c312e0aaec /diff-lib.c | |
parent | Avoid some unnecessary lstat() calls (diff) | |
download | tgif-c40641b77b0274186fd1b327d5dc3246f814aaaf.tar.xz |
Optimize symlink/directory detection
This is the base for making symlink detection in the middle fo a pathname
saner and (much) more efficient.
Under various loads, we want to verify that the full path leading up to a
filename is a real directory tree, and that when we successfully do an
'lstat()' on a filename, we don't get a false positive due to a symlink in
the middle of the path that git should have seen as a symlink, not as a
normal path component.
The 'has_symlink_leading_path()' function already did this, and cached
a single level of symlink information, but didn't cache the _lack_ of a
symlink, so the normal behaviour was actually the wrong way around, and we
ended up doing an 'lstat()' on each path component to check that it was a
real directory.
This caches the last detected full directory and symlink entries, and
speeds up especially deep directory structures a lot by avoiding to
lstat() all the directories leading up to each entry in the index.
[ This can - and should - probably be extended upon so that we eventually
never do a bare 'lstat()' on any path entries at *all* when checking the
index, but always check the full path carefully. Right now we do not
generally check the whole path for all our normal quick index
revalidation.
We should also make sure that we're careful about all the invalidation,
ie when we remove a link and replace it by a directory we should
invalidate the symlink cache if it matches (and vice versa for the
directory cache).
But regardless, the basic function needs to be sane to do that. The old
'has_symlink_leading_path()' was not capable enough - or indeed the code
readable enough - to really do that sanely. So I'm pushing this as not
just an optimization, but as a base for further work. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'diff-lib.c')
-rw-r--r-- | diff-lib.c | 10 |
1 files changed, 5 insertions, 5 deletions
diff --git a/diff-lib.c b/diff-lib.c index c5894c7c5a..fe2ccec7e6 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -347,14 +347,14 @@ int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv) * exists for ce that is a submodule -- it is a submodule that is not * checked out). Return negative for an error. */ -static int check_removed(const struct cache_entry *ce, struct stat *st, char *symcache) +static int check_removed(const struct cache_entry *ce, struct stat *st) { if (lstat(ce->name, st) < 0) { if (errno != ENOENT && errno != ENOTDIR) return -1; return 1; } - if (has_symlink_leading_path(ce->name, symcache)) + if (has_symlink_leading_path(ce_namelen(ce), ce->name)) return 1; if (S_ISDIR(st->st_mode)) { unsigned char sub[20]; @@ -421,7 +421,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) memset(&(dpath->parent[0]), 0, sizeof(struct combine_diff_parent)*5); - changed = check_removed(ce, &st, symcache); + changed = check_removed(ce, &st); if (!changed) dpath->mode = ce_mode_from_stat(ce, st.st_mode); else { @@ -485,7 +485,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) if (ce_uptodate(ce)) continue; - changed = check_removed(ce, &st, symcache); + changed = check_removed(ce, &st); if (changed) { if (changed < 0) { perror(ce->name); @@ -546,7 +546,7 @@ static int get_stat_data(struct cache_entry *ce, if (!cached) { int changed; struct stat st; - changed = check_removed(ce, &st, cbdata->symcache); + changed = check_removed(ce, &st); if (changed < 0) return -1; else if (changed) { |