From 22705334b91275563c48f38a52d1643a14e5507b Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Thu, 16 Jan 2020 20:21:54 +0000 Subject: dir: treat_leading_path() and read_directory_recursive(), round 2 I was going to title this "dir: more synchronizing of treat_leading_path() and read_directory_recursive()", a nod to commit 777b42034764 ("dir: synchronize treat_leading_path() and read_directory_recursive()", 2019-12-19), but the title was too long. Anyway, first the backstory... fill_directory() has always had a slightly error-prone interface: it returns a subset of paths which *might* match the specified pathspec; it was intended to prune away some paths which didn't match the specified pathspec and keep at least all the ones that did match it. Given this interface, callers were responsible to post-process the results and check whether each actually matched the pathspec. builtin/clean.c did this. It would first prune out duplicates (e.g. if "dir" was returned as well as all files under "dir/", then it would simplify this to just "dir"), and after pruning duplicates it would compare the remaining paths to the specified pathspec(s). This post-processing itself could run into problems, though, as noted in commit 404ebceda01c ("dir: also check directories for matching pathspecs", 2019-09-17): For the case of git-clean and a set of pathspecs of "dir/file" and "more", this caused a problem because we'd end up with dir entries for both of "dir" "dir/file" Then correct_untracked_entries() would try to helpfully prune duplicates for us by removing "dir/file" since it's under "dir", leaving us with "dir" Since the original pathspec only had "dir/file", the only entry left doesn't match and leaves nothing to be removed. (Note that if only one pathspec was specified, e.g. only "dir/file", then the common_prefix_len optimizations in fill_directory would cause us to bypass this problem, making it appear in simple tests that we could correctly remove manually specified pathspecs.) That commit fixed the issue -- when multiple pathspecs were specified -- by making sure fill_directory() wouldn't return both "dir" and "dir/file" outside the common_prefix_len optimization path. This is where it starts to get fun. In commit b9670c1f5e6b ("dir: fix checks on common prefix directory", 2019-12-19), we noticed that the common_prefix_len wasn't doing appropriate checks and letting all kinds of stuff through, resulting in recursing into .git/ directories and other craziness. So it started locking down and doing checks on pathnames within that code path. That continued with commit 777b42034764 ("dir: synchronize treat_leading_path() and read_directory_recursive()", 2019-12-19), which noted the following: Our optimization to avoid calling into read_directory_recursive() when all pathspecs have a common leading directory mean that we need to match the logic that read_directory_recursive() would use if we had just called it from the root. Since it does more than call treat_path() we need to copy that same logic. ...and then it more forcefully addressed the issue with this wonderfully ironic statement: Needing to duplicate logic like this means it is guaranteed someone will eventually need to make further changes and forget to update both locations. It is tempting to just nuke the leading_directory special casing to avoid such bugs and simplify the code, but unpack_trees' verify_clean_subdirectory() also calls read_directory() and does so with a non-empty leading path, so I'm hesitant to try to restructure further. Add obnoxious warnings to treat_leading_path() and read_directory_recursive() to try to warn people of such problems. You would think that with such a strongly worded description, that its author would have actually ensured that the logic in treat_leading_path() and read_directory_recursive() did actually match and that *everything* that was needed had at least been copied over at the time that this paragraph was written. But you'd be wrong, I messed it up by missing part of the logic. Copy the missing bits to fix the new final test in t7300. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- dir.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'dir.c') diff --git a/dir.c b/dir.c index 7d255227b1..5d4c92d3aa 100644 --- a/dir.c +++ b/dir.c @@ -2383,6 +2383,10 @@ static int treat_leading_path(struct dir_struct *dir, (dir->flags & DIR_SHOW_IGNORED_TOO || do_match_pathspec(istate, pathspec, sb.buf, sb.len, baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) { + if (!match_pathspec(istate, pathspec, sb.buf, sb.len, + 0 /* prefix */, NULL, + 0 /* do NOT special case dirs */)) + state = path_none; add_path_to_appropriate_result_list(dir, NULL, &cdir, istate, &sb, baselen, -- cgit v1.2.3 From ad6f2157f951dc2ffeb7a01f090686f0bbb06f7a Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 16 Jan 2020 20:21:55 +0000 Subject: dir: restructure in a way to avoid passing around a struct dirent Restructure the code slightly to avoid passing around a struct dirent anywhere, which also enables us to avoid trying to manufacture one. Signed-off-by: Jeff King Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano --- dir.c | 73 +++++++++++++++++++++++++++++-------------------------------------- 1 file changed, 31 insertions(+), 42 deletions(-) (limited to 'dir.c') diff --git a/dir.c b/dir.c index 5d4c92d3aa..8437030006 100644 --- a/dir.c +++ b/dir.c @@ -41,7 +41,8 @@ struct cached_dir { int nr_files; int nr_dirs; - struct dirent *de; + const char *d_name; + int d_type; const char *file; struct untracked_cache_dir *ucd; }; @@ -50,8 +51,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, struct index_state *istate, const char *path, int len, struct untracked_cache_dir *untracked, int check_only, int stop_at_first_file, const struct pathspec *pathspec); -static int get_dtype(struct dirent *de, struct index_state *istate, - const char *path, int len); +static int resolve_dtype(int dtype, struct index_state *istate, + const char *path, int len); int count_slashes(const char *s) { @@ -1215,8 +1216,7 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname int prefix = pattern->nowildcardlen; if (pattern->flags & PATTERN_FLAG_MUSTBEDIR) { - if (*dtype == DT_UNKNOWN) - *dtype = get_dtype(NULL, istate, pathname, pathlen); + *dtype = resolve_dtype(*dtype, istate, pathname, pathlen); if (*dtype != DT_DIR) continue; } @@ -1842,10 +1842,9 @@ static int get_index_dtype(struct index_state *istate, return DT_UNKNOWN; } -static int get_dtype(struct dirent *de, struct index_state *istate, - const char *path, int len) +static int resolve_dtype(int dtype, struct index_state *istate, + const char *path, int len) { - int dtype = de ? DTYPE(de) : DT_UNKNOWN; struct stat st; if (dtype != DT_UNKNOWN) @@ -1870,14 +1869,13 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, struct strbuf *path, int baselen, const struct pathspec *pathspec, - int dtype, struct dirent *de) + int dtype) { int exclude; int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case); enum path_treatment path_treatment; - if (dtype == DT_UNKNOWN) - dtype = get_dtype(de, istate, path->buf, path->len); + dtype = resolve_dtype(dtype, istate, path->buf, path->len); /* Always exclude indexed files */ if (dtype != DT_DIR && has_path_in_index) @@ -1985,21 +1983,18 @@ static enum path_treatment treat_path(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { - int dtype; - struct dirent *de = cdir->de; - - if (!de) + if (!cdir->d_name) return treat_path_fast(dir, untracked, cdir, istate, path, baselen, pathspec); - if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git")) + if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) return path_none; strbuf_setlen(path, baselen); - strbuf_addstr(path, de->d_name); + strbuf_addstr(path, cdir->d_name); if (simplify_away(path->buf, path->len, pathspec)) return path_none; - dtype = DTYPE(de); - return treat_one_path(dir, untracked, istate, path, baselen, pathspec, dtype, de); + return treat_one_path(dir, untracked, istate, path, baselen, pathspec, + cdir->d_type); } static void add_untracked(struct untracked_cache_dir *dir, const char *name) @@ -2087,10 +2082,17 @@ static int open_cached_dir(struct cached_dir *cdir, static int read_cached_dir(struct cached_dir *cdir) { + struct dirent *de; + if (cdir->fdir) { - cdir->de = readdir(cdir->fdir); - if (!cdir->de) + de = readdir(cdir->fdir); + if (!de) { + cdir->d_name = NULL; + cdir->d_type = DT_UNKNOWN; return -1; + } + cdir->d_name = de->d_name; + cdir->d_type = DTYPE(de); return 0; } while (cdir->nr_dirs < cdir->untracked->dirs_nr) { @@ -2216,7 +2218,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, /* recurse into subdir if instructed by treat_path */ if ((state == path_recurse) || ((state == path_untracked) && - (get_dtype(cdir.de, istate, path.buf, path.len) == DT_DIR) && + (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) && ((dir->flags & DIR_SHOW_IGNORED_TOO) || (pathspec && do_match_pathspec(istate, pathspec, path.buf, path.len, @@ -2314,10 +2316,10 @@ static int treat_leading_path(struct dir_struct *dir, */ struct strbuf sb = STRBUF_INIT; + struct strbuf subdir = STRBUF_INIT; int prevlen, baselen; const char *cp; struct cached_dir cdir; - struct dirent *de; enum path_treatment state = path_none; /* @@ -2342,22 +2344,8 @@ static int treat_leading_path(struct dir_struct *dir, if (!len) return 1; - /* - * We need a manufactured dirent with sufficient space to store a - * leading directory component of path in its d_name. Here, we - * assume that the dirent's d_name is either declared as - * char d_name[BIG_ENOUGH] - * or that it is declared at the end of the struct as - * char d_name[] - * For either case, padding with len+1 bytes at the end will ensure - * sufficient storage space. - */ - de = xcalloc(1, st_add3(sizeof(struct dirent), len, 1)); memset(&cdir, 0, sizeof(cdir)); - cdir.de = de; -#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT) - de->d_type = DT_DIR; -#endif + cdir.d_type = DT_DIR; baselen = 0; prevlen = 0; while (1) { @@ -2374,12 +2362,13 @@ static int treat_leading_path(struct dir_struct *dir, break; strbuf_reset(&sb); strbuf_add(&sb, path, prevlen); - memcpy(de->d_name, path+prevlen, baselen-prevlen); - de->d_name[baselen-prevlen] = '\0'; + strbuf_reset(&subdir); + strbuf_add(&subdir, path+prevlen, baselen-prevlen); + cdir.d_name = subdir.buf; state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec); if (state == path_untracked && - get_dtype(cdir.de, istate, sb.buf, sb.len) == DT_DIR && + resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR && (dir->flags & DIR_SHOW_IGNORED_TOO || do_match_pathspec(istate, pathspec, sb.buf, sb.len, baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) { @@ -2403,7 +2392,7 @@ static int treat_leading_path(struct dir_struct *dir, &sb, baselen, pathspec, state); - free(de); + strbuf_release(&subdir); strbuf_release(&sb); return state == path_recurse; } -- cgit v1.2.3 From 0cbb60574e741e8255ba457606c4c90898cfc755 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Thu, 16 Jan 2020 20:21:56 +0000 Subject: dir: point treat_leading_path() warning to the right place Commit 777b420347 (dir: synchronize treat_leading_path() and read_directory_recursive(), 2019-12-19) tried to add two warning comments in those functions, pointing at each other. But the one in treat_leading_path() just points at itself. Let's fix that. Since the comment also redirects the reader for more details to "the commit that added this warning", and since we're now modifying the warning (creating a new commit without those details), let's mention the actual commit id. Signed-off-by: Jeff King Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- dir.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'dir.c') diff --git a/dir.c b/dir.c index 8437030006..b460211e61 100644 --- a/dir.c +++ b/dir.c @@ -2310,9 +2310,9 @@ static int treat_leading_path(struct dir_struct *dir, * WARNING WARNING WARNING: * * Any updates to the traversal logic here may need corresponding - * updates in treat_leading_path(). See the commit message for the - * commit adding this warning as well as the commit preceding it - * for details. + * updates in read_directory_recursive(). See 777b420347 (dir: + * synchronize treat_leading_path() and read_directory_recursive(), + * 2019-12-19) and its parent commit for details. */ struct strbuf sb = STRBUF_INIT; -- cgit v1.2.3