summaryrefslogtreecommitdiff
path: root/dir.h
AgeCommit message (Collapse)AuthorFilesLines
2016-03-18Revert "Merge branch 'nd/exclusion-regression-fix'"Libravatar Junio C Hamano1-3/+0
This reverts commit 5e57f9c3dfe7dd44a1b56bb5b3327d7a1356ec7c, reversing changes made to e79112d21024beb997951381db21a70b087d459d. We will be postponing nd/exclusion-regression-fix topic to later cycle.
2016-02-15dir.c: support marking some patterns already matchedLibravatar Nguyễn Thái Ngọc Duy1-0/+3
Given path "a" and the pattern "a", it's matched. But if we throw path "a/b" to pattern "a", the code fails to realize that if "a" matches "a" then "a/b" should also be matched. When the pattern is matched the first time, we can mark it "sticky", so that all files and dirs inside the matched path also matches. This is a simpler solution than modify all match scenarios to fix that. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-25dir: simplify untracked cache "ident" fieldLibravatar Christian Couder1-1/+0
It is not a good idea to compare kernel versions and disable the untracked cache if it changes, as people may upgrade and still want the untracked cache to work. So let's just compare work tree locations and kernel name to decide if we should disable it. Also storing many locations in the ident field and comparing to any of them can be dangerous if GIT_WORK_TREE is used with different values. So let's just store one location, the location of the current work tree. The downside is that untracked cache can only be used by one type of OS for now. Exporting a git repo to different clients via a network to e.g. Linux and Windows means that only one can use the untracked cache. If the location changed in the ident field and we still want an untracked cache, let's delete the cache and recreate it. Note that if an untracked cache has been created by a previous Git version, then the kernel version is stored in the ident field. As we now compare with just the kernel name the comparison will fail and the untracked cache will be disabled until it's recreated. Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-25dir: add remove_untracked_cache()Libravatar Christian Couder1-0/+1
Factor out code into remove_untracked_cache(), which will be used in a later commit. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-25dir: add {new,add}_untracked_cache()Libravatar Christian Couder1-0/+1
Factor out code into new_untracked_cache() and add_untracked_cache(), which will be used in later commits. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-26Merge branch 'nd/untracked-cache'Libravatar Junio C Hamano1-0/+82
Teach the index to optionally remember already seen untracked files to speed up "git status" in a working tree with tons of cruft. * nd/untracked-cache: (24 commits) git-status.txt: advertisement for untracked cache untracked cache: guard and disable on system changes mingw32: add uname() t7063: tests for untracked cache update-index: test the system before enabling untracked cache update-index: manually enable or disable untracked cache status: enable untracked cache untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE untracked cache: mark index dirty if untracked cache is updated untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS untracked cache: avoid racy timestamps read-cache.c: split racy stat test to a separate function untracked cache: invalidate at index addition or removal untracked cache: load from UNTR index extension untracked cache: save to an index extension ewah: add convenient wrapper ewah_serialize_strbuf() untracked cache: don't open non-existent .gitignore untracked cache: mark what dirs should be recursed/saved untracked cache: record/validate dir mtime and reuse cached output untracked cache: make a wrapper around {open,read,close}dir() ...
2015-03-26Merge branch 'jc/report-path-error-to-dir'Libravatar Junio C Hamano1-0/+1
Code clean-up. * jc/report-path-error-to-dir: report_path_error(): move to dir.c
2015-03-24report_path_error(): move to dir.cLibravatar Junio C Hamano1-0/+1
The expected call sequence is for the caller to use match_pathspec() repeatedly on a set of pathspecs, accumulating the "hits" in a separate array, and then call this function to diagnose a pathspec that never matched anything, as that can indicate a typo from the command line, e.g. "git commit Maekfile". Many builtin commands use this function from builtin/ls-files.c, which is not a very healthy arrangement. ls-files might have been the first command to feel the need for such a helper, but the need is shared by everybody who uses the "match and then report" pattern. Move it to dir.c where match_pathspec() is defined. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: guard and disable on system changesLibravatar Nguyễn Thái Ngọc Duy1-0/+2
If the user enables untracked cache, then - move worktree to an unsupported filesystem - or simply upgrade OS - or move the whole (portable) disk from one machine to another - or access a shared fs from another machine there's no guarantee that untracked cache can still function properly. Record the worktree location and OS footprint in the cache. If it changes, err on the safe side and disable the cache. The user can 'update-index --untracked-cache' again to make sure all conditions are met. This adds a new requirement that setup_git_directory* must be called before read_cache() because we need worktree location by then, or the cache is dropped. This change does not cover all bases, you can fool it if you try hard. The point is to stop accidents. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: invalidate at index addition or removalLibravatar Nguyễn Thái Ngọc Duy1-0/+4
Ideally we should implement untracked_cache_remove_from_index() and untracked_cache_add_to_index() so that they update untracked cache right away instead of invalidating it and wait for read_directory() next time to deal with it. But that may need some more work in unpack-trees.c. So stay simple as the first step. The new call in add_index_entry_with_check() may look strange because new calls usually stay close to cache_tree_invalidate_path(). We do it a bit later than c_t_i_p() in this function because if it's about replacing the entry with the same name, we don't care (but cache-tree does). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: load from UNTR index extensionLibravatar Nguyễn Thái Ngọc Duy1-0/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: save to an index extensionLibravatar Nguyễn Thái Ngọc Duy1-0/+1
Helped-by: Stefan Beller <sbeller@google.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: mark what dirs should be recursed/savedLibravatar Nguyễn Thái Ngọc Duy1-1/+2
If we redo this thing in a functional style, we would have one struct untracked_dir as input tree and another as output. The input is used for verification. The output is a brand new tree, reflecting current worktree. But that means recreate a lot of dir nodes even if a lot could be shared between input and output trees in good cases. So we go with the messy but efficient way, combining both input and output trees into one. We need a way to know which node in this combined tree belongs to the output. This is the purpose of this "recurse" flag. "valid" bit can't be used for this because it's about data of the node except the subdirs. When we invalidate a directory, we want to keep cached data of the subdirs intact even though we don't really know what subdir still exists (yet). Then we check worktree to see what actual subdir remains on disk. Those will have 'recurse' bit set again. If cached data for those are still valid, we may be able to avoid computing exclude files for them. Those subdirs that are deleted will have 'recurse' remained clear and their 'valid' bits do not matter. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: record/validate dir mtime and reuse cached outputLibravatar Nguyễn Thái Ngọc Duy1-0/+2
The main readdir loop in read_directory_recursive() is replaced with a new one that checks if cached results of a directory is still valid. If a file is added or removed from the index, the containing directory is invalidated (but not its subdirs). If directory's mtime is changed, the same happens. If a .gitignore is updated, the containing directory and all subdirs are invalidated recursively. If dir_struct#flags or other conditions change, the cache is ignored. If a directory is invalidated, we opendir/readdir/closedir and run the exclude machinery on that directory listing as usual. If untracked cache is also enabled, we'll update the cache along the way. If a directory is validated, we simply pull the untracked listing out from the cache. The cache also records the list of direct subdirs that we have to recurse in. Fully excluded directories are seen as "untracked files". In the best case when no dirs are invalidated, read_directory() becomes a series of stat(dir), open(.gitignore), fstat(), read(), close() and optionally hash_sha1_file() For comparison, standard read_directory() is a sequence of opendir(), readdir(), open(.gitignore), fstat(), read(), close(), the expensive last_exclude_matching() and closedir(). We already try not to open(.gitignore) if we know it does not exist, so open/fstat/read/close sequence does not apply to every directory. The sequence could be reduced further, as noted in prep_exclude() in another patch. So in theory, the entire best-case read_directory sequence could be reduced to a series of stat() and nothing else. This is not a silver bullet approach. When you compile a C file, for example, the old .o file is removed and a new one with the same name created, effectively invalidating the containing directory's cache (but not its subdirectories). If your build process touches every directory, this cache adds extra overhead for nothing, so it's a good idea to separate generated files from tracked files.. Editors may use the same strategy for saving files. And of course you're out of luck running your repo on an unsupported filesystem and/or operating system. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: initial untracked cache validationLibravatar Nguyễn Thái Ngọc Duy1-0/+4
Make sure the starting conditions and all global exclude files are good to go. If not, either disable untracked cache completely, or wipe out the cache and start fresh. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12untracked cache: record .gitignore information and dir hierarchyLibravatar Nguyễn Thái Ngọc Duy1-0/+60
The idea is if we can capture all input and (non-rescursive) output of read_directory_recursive(), and can verify later that all the input is the same, then the second r_d_r() should produce the same output as in the first run. The requirement for this to work is stat info of a directory MUST change if an entry is added to or removed from that directory (and should not change often otherwise). If your OS and filesystem do not meet this requirement, untracked cache is not for you. Most file systems on *nix should be fine. On Windows, NTFS is fine while FAT may not be [1] even though FAT on Linux seems to be fine. The list of input of r_d_r() is in the big comment block in dir.h. In short, the output of a directory (not counting subdirs) mainly depends on stat info of the directory in question, all .gitignore leading to it and the check_only flag when r_d_r() is called recursively. This patch records all this info (and the output) as r_d_r() runs. Two hash_sha1_file() are required for $GIT_DIR/info/exclude and core.excludesfile unless their stat data matches. hash_sha1_file() is only needed when .gitignore files in the worktree are modified, otherwise their SHA-1 in index is used (see the previous patch). We could store stat data for .gitignore files so we don't have to rehash them if their content is different from index, but I think .gitignore files are rarely modified, so not worth extra cache data (and hashing penalty read-cache.c:verify_hdr(), as we will be storing this as an index extension). The implication is, if you change .gitignore, you better add it to the index soon or you lose all the benefit of untracked cache because a modified .gitignore invalidates all subdirs recursively. This is especially bad for .gitignore at root. This cached output is about untracked files only, not ignored files because the number of tracked files is usually small, so small cache overhead, while the number of ignored files could go really high (e.g. *.o files mixing with source code). [1] "Description of NTFS date and time stamps for files and folders" http://support.microsoft.com/kb/299648 Helped-by: Torsten Bögershausen <tboegi@web.de> Helped-by: David Turner <dturner@twopensource.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12dir.c: optionally compute sha-1 of a .gitignore fileLibravatar Nguyễn Thái Ngọc Duy1-0/+6
This is not used anywhere yet. But the goal is to compare quickly if a .gitignore file has changed when we have the SHA-1 of both old (cached somewhere) and new (from index or a tree) versions. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-14prep_exclude: remove the artificial PATH_MAX limitLibravatar Nguyễn Thái Ngọc Duy1-1/+1
This fixes a segfault in git-status with long paths on Windows, where PATH_MAX is only 260. This also fixes the problem of silently ignoring .gitignore if the full path exceeds PATH_MAX. Now add_excludes_from_file() will report if it gets ENAMETOOLONG. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-14dir.h: move struct exclude declaration to top levelLibravatar Nguyễn Thái Ngọc Duy1-20/+22
There is no actual nested struct here. Move it out for clarity. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24pathspec: pass directory indicator to match_pathspec_item()Libravatar Nguyễn Thái Ngọc Duy1-3/+7
This patch activates the DO_MATCH_DIRECTORY code in m_p_i(), which makes "git diff HEAD submodule/" and "git diff HEAD submodule" produce the same output. Previously only the version without trailing slash returns the difference (if any). That's the effect of new ce_path_match(). dir_path_match() is not executed by the new tests. And it should not introduce regressions. Previously if path "dir/" is passed in with pathspec "dir/", they obviously match. With new dir_path_match(), the path becomes _directory_ "dir" vs pathspec "dir/", which is not executed by the old code path in m_p_i(). The new code path is executed and produces the same result. The other case is pathspec "dir" and path "dir/" is now turned to "dir" (with DO_MATCH_DIRECTORY). Still the same result before or after the patch. So why change? Because of the next patch about clean.c. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24pathspec: rename match_pathspec_depth() to match_pathspec()Libravatar Nguyễn Thái Ngọc Duy1-5/+5
A long time ago, for some reason I was not happy with match_pathspec(). I created a better version, match_pathspec_depth() that was suppose to replace match_pathspec() eventually. match_pathspec() has finally been gone since 6 months ago. Use the shorter name for match_pathspec_depth(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24pathspec: convert some match_pathspec_depth() to dir_path_match()Libravatar Nguyễn Thái Ngọc Duy1-0/+7
This helps reduce the number of match_pathspec_depth() call sites and show how m_p_d() is used. And it usage is: - match against an index entry (ce_path_match or match_pathspec_depth in ls-files) - match against a dir_entry from read_directory (dir_path_match and match_pathspec_depth in clean.c, which will be converted later) - resolve-undo (rerere.c and ls-files.c) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24pathspec: convert some match_pathspec_depth() to ce_path_match()Libravatar Nguyễn Thái Ngọc Duy1-0/+7
This helps reduce the number of match_pathspec_depth() call sites and show how match_pathspec_depth() is used. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-11Merge branch 'jc/ls-files-killed-optim'Libravatar Junio C Hamano1-1/+2
"git ls-files -k" needs to crawl only the part of the working tree that may overlap the paths in the index to find killed files, but shared code with the logic to find all the untracked files, which made it unnecessarily inefficient. * jc/ls-files-killed-optim: dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage t3010: update to demonstrate "ls-files -k" optimization pitfalls ls-files -k: a directory only can be killed if the index has a non-directory dir.c: use the cache_* macro to access the current index
2013-08-15ls-files -k: a directory only can be killed if the index has a non-directoryLibravatar Junio C Hamano1-1/+2
"ls-files -o" and "ls-files -k" both traverse the working tree down to find either all untracked paths or those that will be "killed" (removed from the working tree to make room) when the paths recorded in the index are checked out. It is necessary to traverse the working tree fully when enumerating all the "other" paths, but when we are only interested in "killed" paths, we can take advantage of the fact that paths that do not overlap with entries in the index can never be killed. The treat_one_path() helper function, which is called during the recursive traversal, is the ideal place to implement an optimization. When we are looking at a directory P in the working tree, there are three cases: (1) P exists in the index. Everything inside the directory P in the working tree needs to go when P is checked out from the index. (2) P does not exist in the index, but there is P/Q in the index. We know P will stay a directory when we check out the contents of the index, but we do not know yet if there is a directory P/Q in the working tree to be killed, so we need to recurse. (3) P does not exist in the index, and there is no P/Q in the index to require P to be a directory, either. Only in this case, we know that everything inside P will not be killed without recursing. Note that this helper is called by treat_leading_path() that decides if we need to traverse only subdirectories of a single common leading directory, which is essential for this optimization to be correct. This caller checks each level of the leading path component from shallower directory to deeper ones, and that is what allows us to only check if the path appears in the index. If the call to treat_one_path() weren't there, given a path P/Q/R, the real traversal may start from directory P/Q/R, even when the index records P as a regular file, and we would end up having to check if any leading subpath in P/Q/R, e.g. P, appears in the index. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15pathspec: support :(glob) syntaxLibravatar Nguyễn Thái Ngọc Duy1-5/+4
:(glob)path differs from plain pathspec that it uses wildmatch with WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The difference lies in how '*' (and '**') is processed. With the introduction of :(glob) and :(literal) and their global options --[no]glob-pathspecs, the user can: - make everything literal by default via --noglob-pathspecs --literal-pathspecs cannot be used for this purpose as it disables _all_ pathspec magic. - individually turn on globbing with :(glob) - make everything globbing by default via --glob-pathspecs - individually turn off globbing with :(literal) The implication behind this is, there is no way to gain the default matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get new globbing or literal. The old fnmatch behavior is considered deprecated and discouraged to use. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15remove match_pathspec() in favor of match_pathspec_depth()Libravatar Nguyễn Thái Ngọc Duy1-1/+0
match_pathspec_depth was created to replace match_pathspec (see 61cf282 (pathspec: add match_pathspec_depth() - 2010-12-15). It took more than two years, but the replacement finally happens :-) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15convert common_prefix() to use struct pathspecLibravatar Nguyễn Thái Ngọc Duy1-1/+1
The code now takes advantage of nowildcard_len field. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15convert {read,fill}_directory to take struct pathspecLibravatar Nguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15add parse_pathspec() that converts cmdline args to struct pathspecLibravatar Nguyễn Thái Ngọc Duy1-0/+2
Currently to fill a struct pathspec, we do: const char **paths; paths = get_pathspec(prefix, argv); ... init_pathspec(&pathspec, paths); "paths" can only carry bare strings, which loses information from command line arguments such as pathspec magic or the prefix part's length for each argument. parse_pathspec() is introduced to combine the two calls into one. The plan is gradually replace all get_pathspec() and init_pathspec() with parse_pathspec(). get_pathspec() now becomes a thin wrapper of parse_pathspec(). parse_pathspec() allows the caller to reject the pathspec magics that it does not support. When a new pathspec magic is introduced, we can enable it per command after making sure that all underlying code has no problem with the new magic. "flags" parameter is currently unused. But it would allow callers to pass certain instructions to parse_pathspec, for example forcing literal pathspec when no magic is used. With the introduction of parse_pathspec, there are now two functions that can initialize struct pathspec: init_pathspec and parse_pathspec. Any semantic changes in struct pathspec must be reflected in both functions. init_pathspec() will be phased out in favor of parse_pathspec(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: git-status --ignored: don't scan the work tree twiceLibravatar Karsten Blees1-1/+2
'git-status --ignored' still scans the work tree twice to collect untracked and ignored files, respectively. fill_directory / read_directory already supports collecting untracked and ignored files in a single directory scan. However, the DIR_COLLECT_IGNORED flag to enable this has some git-add specific side-effects (e.g. it doesn't recurse into ignored directories, so listing ignored files with --untracked=all doesn't work). The DIR_SHOW_IGNORED flag doesn't list untracked files and returns ignored files in dir_struct.entries[] (instead of dir_struct.ignored[] as DIR_COLLECT_IGNORED). DIR_SHOW_IGNORED is used all throughout git. We don't want to break the existing API, so lets introduce a new flag DIR_SHOW_IGNORED_TOO that lists untracked as well as ignored files similar to DIR_COLLECT_FILES, but will recurse into sub-directories based on the other flags as DIR_SHOW_IGNORED does. In dir.c::read_directory_recursive, add ignored files to either dir_struct.entries[] or dir_struct.ignored[] based on the flags. Also move the DIR_COLLECT_IGNORED case here so that filling result lists is in a common place. In wt-status.c::wt_status_collect_untracked, use the new flag and read results from dir_struct.ignored[]. Remove the extra fill_directory call. builtin/check-ignore.c doesn't call fill_directory, setting the git-add specific DIR_COLLECT_IGNORED flag has no effect here. Remove for clarity. Update API documentation to reflect the changes. Performance: with this patch, 'git-status --ignored' is typically as fast as 'git-status'. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: replace is_path_excluded with now equivalent is_excluded APILibravatar Karsten Blees1-13/+3
Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15dir.c: unify is_excluded and is_path_excluded APIsLibravatar Karsten Blees1-3/+3
The is_excluded and is_path_excluded APIs are very similar, except for a few noteworthy differences: is_excluded doesn't handle ignored directories, results for paths within ignored directories are incorrect. This is probably based on the premise that recursive directory scans should stop at ignored directories, which is no longer true (in certain cases, read_directory_recursive currently calls is_excluded *and* is_path_excluded to get correct ignored state). is_excluded caches parsed .gitignore files of the last directory in struct dir_struct. If the directory changes, it finds a common parent directory and is very careful to drop only as much state as necessary. On the other hand, is_excluded will also read and parse .gitignore files in already ignored directories, which are completely irrelevant. is_path_excluded correctly handles ignored directories by checking if any component in the path is excluded. As it uses is_excluded internally, this unfortunately forces is_excluded to drop and re-read all .gitignore files, as there is no common parent directory for the root dir. is_path_excluded tracks state in a separate struct path_exclude_check, which is essentially a wrapper of dir_struct with two more fields. However, as is_path_excluded also modifies dir_struct, it is not possible to e.g. use multiple path_exclude_check structures with the same dir_struct in parallel. The additional structure just unnecessarily complicates the API. Teach is_excluded / prep_exclude about ignored directories: whenever entering a new directory, first check if the entire directory is excluded. Remember the excluded state in dir_struct. Don't traverse into already ignored directories (i.e. don't read irrelevant .gitignore files). Directories could also be excluded by exclude patterns specified on the command line or .git/info/exclude, so we cannot simply skip prep_exclude entirely if there's no .gitignore file name (dir_struct.exclude_per_dir). Move this check to just before actually reading the file. is_path_excluded is now equivalent to is_excluded, so we can simply redirect to it (the public API is cleaned up in the next patch). The performance impact of the additional ignored check per directory is hardly noticeable when reading directories recursively (e.g. 'git status'). However, performance of git commands using the is_path_excluded API (e.g. 'git ls-files --cached --ignored --exclude-standard') is greatly improved as this no longer re-reads .gitignore files on each call. Here's some performance data from the linux and WebKit repos (best of 10 runs on a Debian Linux on SSD, core.preloadIndex=true): | ls-files -ci | status | status --ignored | linux | WebKit | linux | WebKit | linux | WebKit -------+-------+--------+-------+--------+-------+--------- before | 0.506 | 6.539 | 0.212 | 1.555 | 0.323 | 2.541 after | 0.080 | 1.191 | 0.218 | 1.583 | 0.321 | 2.579 gain | 6.325 | 5.490 | 0.972 | 0.982 | 1.006 | 0.985 Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-23Merge branch 'as/check-ignore'Libravatar Junio C Hamano1-11/+51
Add a new command "git check-ignore" for debugging .gitignore files. The variable names may want to get cleaned up but that can be done in-tree. * as/check-ignore: clean.c, ls-files.c: respect encapsulation of exclude_list_groups t0008: avoid brace expansion add git-check-ignore sub-command setup.c: document get_pathspec() add.c: extract new die_if_path_beyond_symlink() for reuse add.c: extract check_path_for_gitlink() from treat_gitlinks() for reuse pathspec.c: rename newly public functions for clarity add.c: move pathspec matchers into new pathspec.c for reuse add.c: remove unused argument from validate_pathspec() dir.c: improve docs for match_pathspec() and match_pathspec_depth() dir.c: provide clear_directory() for reclaiming dir_struct memory dir.c: keep track of where patterns came from dir.c: use a single struct exclude_list per source of excludes Conflicts: builtin/ls-files.c dir.c
2013-01-10Merge branch 'as/dir-c-cleanup'Libravatar Junio C Hamano1-10/+35
Refactor and generally clean up the directory traversal API implementation. * as/dir-c-cleanup: dir.c: rename free_excludes() to clear_exclude_list() dir.c: refactor is_path_excluded() dir.c: refactor is_excluded() dir.c: refactor is_excluded_from_list() dir.c: rename excluded() to is_excluded() dir.c: rename excluded_from_list() to is_excluded_from_list() dir.c: rename path_excluded() to is_path_excluded() dir.c: rename cryptic 'which' variable to more consistent name Improve documentation and comments regarding directory traversal API api-directory-listing.txt: update to match code
2013-01-06dir.c: improve docs for match_pathspec() and match_pathspec_depth()Libravatar Adam Spiers1-0/+6
Fix a grammatical issue in the description of these functions, and make it more obvious how and why seen[] can be reused across multiple invocations. Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-06dir.c: provide clear_directory() for reclaiming dir_struct memoryLibravatar Adam Spiers1-0/+1
By the end of a directory traversal, a dir_struct instance will typically contains pointers to various data structures on the heap. clear_directory() provides a convenient way to reclaim that memory. Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-06dir.c: keep track of where patterns came fromLibravatar Adam Spiers1-2/+19
For exclude patterns read in from files, the filename is stored in the exclude list, and the originating line number is stored in the individual exclude (counting starting at 1). For exclude patterns provided on the command line, a string describing the source of the patterns is stored in the exclude list, and the sequence number assigned to each exclude pattern is negative, with counting starting at -1. So for example the 2nd pattern provided via --exclude would be numbered -2. This allows any future consumers of that data to easily distinguish between exclude patterns from files vs. from the CLI. Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-06dir.c: use a single struct exclude_list per source of excludesLibravatar Adam Spiers1-10/+26
Previously each exclude_list could potentially contain patterns from multiple sources. For example dir->exclude_list[EXC_FILE] would typically contain patterns from .git/info/exclude and core.excludesfile, and dir->exclude_list[EXC_DIRS] could contain patterns from multiple per-directory .gitignore files during directory traversal (i.e. when dir->exclude_stack was more than one item deep). We split these composite exclude_lists up into three groups of exclude_lists (EXC_CMDL / EXC_DIRS / EXC_FILE as before), so that each exclude_list now contains patterns from a single source. This will allow us to cleanly track the origin of each pattern simply by adding a src field to struct exclude_list, rather than to struct exclude, which would make memory management of the source string tricky in the EXC_DIRS case where its contents are dynamically generated. Similarly, by moving the filebuf member from struct exclude_stack to struct exclude_list, it allows us to track and subsequently free memory buffers allocated during the parsing of all exclude files, rather than only tracking buffers allocated for files in the EXC_DIRS group. Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-28dir.c: rename free_excludes() to clear_exclude_list()Libravatar Adam Spiers1-1/+1
It is clearer to use a 'clear_' prefix for functions which empty and deallocate the contents of a data structure without freeing the structure itself, and a 'free_' prefix for functions which also free the structure itself. http://article.gmane.org/gmane.comp.version-control.git/206128 Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-28dir.c: refactor is_path_excluded()Libravatar Adam Spiers1-0/+3
In a similar way to the previous commit, this extracts a new helper function last_exclude_matching_path() which return the last exclude_list element which matched, or NULL if no match was found. is_path_excluded() becomes a wrapper around this, and just returns 0 or 1 depending on whether any matching exclude_list element was found. This allows callers to find out _why_ a given path was excluded, rather than just whether it was or not, paving the way for a new git sub-command which allows users to test their exclude lists from the command line. Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-28dir.c: rename excluded() to is_excluded()Libravatar Adam Spiers1-2/+2
Continue adopting clearer names for exclude functions. This is_* naming pattern for functions returning booleans was discussed here: http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924 Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-28dir.c: rename excluded_from_list() to is_excluded_from_list()Libravatar Adam Spiers1-2/+2
Continue adopting clearer names for exclude functions. This 'is_*' naming pattern for functions returning booleans was discussed here: http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924 Also adjust their callers as necessary. Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-28dir.c: rename path_excluded() to is_path_excluded()Libravatar Adam Spiers1-1/+1
Start adopting clearer names for exclude functions. This 'is_*' naming pattern for functions returning booleans was agreed here: http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924 Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-28dir.c: rename cryptic 'which' variable to more consistent nameLibravatar Adam Spiers1-2/+2
'el' is only *slightly* less cryptic, but is already used as the variable name for a struct exclude_list pointer in numerous other places, so this reduces the number of cryptic variable names in use by one :-) Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-28Improve documentation and comments regarding directory traversal APILibravatar Adam Spiers1-2/+24
traversal API has a few potentially confusing properties. These comments clarify a few key aspects and will hopefully make it easier to understand for other newcomers in the future. Signed-off-by: Adam Spiers <git@adamspiers.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-26pathspec: apply "*.c" optimization from excludeLibravatar Nguyễn Thái Ngọc Duy1-0/+1
When a pattern contains only a single asterisk as wildcard, e.g. "foo*bar", after literally comparing the leading part "foo" with the string, we can compare the tail of the string and make sure it matches "bar", instead of running fnmatch() on "*bar" against the remainder of the string. -O2 build on linux-2.6, without the patch: $ time git rev-list --quiet HEAD -- '*.c' real 0m40.770s user 0m40.290s sys 0m0.256s With the patch $ time ~/w/git/git rev-list --quiet HEAD -- '*.c' real 0m34.288s user 0m33.997s sys 0m0.205s The above command is not supposed to be widely popular. It's chosen because it exercises pathspec matching a lot. The point is it cuts down matching time for popular patterns like *.c, which could be used as pathspec in other places. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-26pathspec: do exact comparison on the leading non-wildcard partLibravatar Nguyễn Thái Ngọc Duy1-0/+8
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-15attr: more matching optimizations from .gitignoreLibravatar Nguyễn Thái Ngọc Duy1-0/+11
.gitattributes and .gitignore share the same pattern syntax but has separate matching implementation. Over the years, ignore's implementation accumulates more optimizations while attr's stays the same. This patch reuses the core matching functions that are also used by excluded_from_list. excluded_from_list and path_matches can't be merged due to differences in exclude and attr, for example: * "!pattern" syntax is forbidden in .gitattributes. As an attribute can be unset (i.e. set to a special value "false") or made back to unspecified (i.e. not even set to "false"), "!pattern attr" is unclear which one it means. * we support attaching attributes to directories, but git-core internally does not currently make use of attributes on directories. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-15gitignore: make pattern parsing code a separate functionLibravatar Nguyễn Thái Ngọc Duy1-1/+1
This function can later be reused by attr.c. Also turn to_exclude field into a flag. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>