untracked cache: record/validate dir mtime and reuse cached output

The main readdir loop in read_directory_recursive() is replaced with a new one that checks if cached results of a directory is still valid. If a file is added or removed from the index, the containing directory is invalidated (but not its subdirs). If directory's mtime is changed, the same happens. If a .gitignore is updated, the containing directory and all subdirs are invalidated recursively. If dir_struct#flags or other conditions change, the cache is ignored. If a directory is invalidated, we opendir/readdir/closedir and run the exclude machinery on that directory listing as usual. If untracked cache is also enabled, we'll update the cache along the way. If a directory is validated, we simply pull the untracked listing out from the cache. The cache also records the list of direct subdirs that we have to recurse in. Fully excluded directories are seen as "untracked files". In the best case when no dirs are invalidated, read_directory() becomes a series of stat(dir), open(.gitignore), fstat(), read(), close() and optionally hash_sha1_file() For comparison, standard read_directory() is a sequence of opendir(), readdir(), open(.gitignore), fstat(), read(), close(), the expensive last_exclude_matching() and closedir(). We already try not to open(.gitignore) if we know it does not exist, so open/fstat/read/close sequence does not apply to every directory. The sequence could be reduced further, as noted in prep_exclude() in another patch. So in theory, the entire best-case read_directory sequence could be reduced to a series of stat() and nothing else. This is not a silver bullet approach. When you compile a C file, for example, the old .o file is removed and a new one with the same name created, effectively invalidating the containing directory's cache (but not its subdirectories). If your build process touches every directory, this cache adds extra overhead for nothing, so it's a good idea to separate generated files from tracked files.. Editors may use the same strategy for saving files. And of course you're out of luck running your repo on an unsupported filesystem and/or operating system. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> 2015-03-08 17:12:29 +0700
committer: Junio C Hamano <gitster@pobox.com> 2015-03-12 13:45:15 -0700
commit: 91a2288b5f63fba82e912dca475154d5b9dd233a (patch)
tree: 6526f8ed8ff6554d01cd1b2e54c99f2b5456eea8 /dir.h
parent: untracked cache: make a wrapper around {open,read,close}dir() (diff)
download: tgif-91a2288b5f63fba82e912dca475154d5b9dd233a.tar.xz
1 files changed, 2 insertions, 0 deletions
diff --git a/dir.h b/dir.h
index 1d7a9585fe..ff3d99bcb0 100644
--- a/dir.h
+++ b/dir.h
@@ -135,6 +135,8 @@ struct untracked_cache {
 	/* Statistics */
 	int dir_created;
 	int gitignore_invalidated;
+	int dir_invalidated;
+	int dir_opened;
 };
 
 struct dir_struct {
author	Nguyễn Thái Ngọc Duy <pclouds@gmail.com>	2015-03-08 17:12:29 +0700
committer	Junio C Hamano <gitster@pobox.com>	2015-03-12 13:45:15 -0700
commit	91a2288b5f63fba82e912dca475154d5b9dd233a (patch)
tree	6526f8ed8ff6554d01cd1b2e54c99f2b5456eea8 /dir.h
parent	untracked cache: make a wrapper around {open,read,close}dir() (diff)
download	tgif-91a2288b5f63fba82e912dca475154d5b9dd233a.tar.xz