summaryrefslogtreecommitdiff
path: root/argv-array.h
diff options
context:
space:
mode:
authorLibravatar Erik Elfström <erik.elfstrom@gmail.com>2015-06-15 21:39:55 +0200
committerLibravatar Junio C Hamano <gitster@pobox.com>2015-06-15 13:14:24 -0700
commit0179ca7a626e0a6c7bf5eaccf88dead307306dee (patch)
treea55aac80b058f21be596eb6a5e24517294fbbc3a /argv-array.h
parentp7300: add performance tests for clean (diff)
downloadtgif-0179ca7a626e0a6c7bf5eaccf88dead307306dee.tar.xz
clean: improve performance when removing lots of directories
"git clean" uses resolve_gitlink_ref() to check for the presence of nested git repositories, but it has the drawback of creating a ref_cache entry for every directory that should potentially be cleaned. The linear search through the ref_cache list causes a massive performance hit for large number of directories. Modify clean.c:remove_dirs to use setup.c:is_git_directory and setup.c:read_gitfile_gently instead. Both these functions will open files and parse contents when they find something that looks like a git repository. This is ok from a performance standpoint since finding repository candidates should be comparatively rare. Using is_git_directory and read_gitfile_gently should give a more standardized check for what is and what isn't a git repository but also gives three behavioral changes. The first change is that we will now detect and avoid cleaning empty nested git repositories (only init run). This is desirable. Second, we will no longer die when cleaning a file named ".git" with garbage content (it will be cleaned instead). This is also desirable. The last change is that we will detect and avoid cleaning empty bare repositories that have been placed in a directory named ".git". This is not desirable but should have no real user impact since we already fail to clean non-empty bare repositories in the same scenario. This is thus deemed acceptable. On top of this we add some extra precautions. If read_gitfile_gently fails to open the git file, read the git file or verify the path in the git file we assume that the path with the git file is a valid repository and avoid cleaning. Update t7300 to reflect these changes in behavior. The time to clean an untracked directory containing 100000 sub directories went from 61s to 1.7s after this change. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'argv-array.h')
0 files changed, 0 insertions, 0 deletions