summaryrefslogtreecommitdiff
path: root/preload-index.c
AgeCommit message (Collapse)AuthorFilesLines
2018-10-30speed up refresh_index() by utilizing preload_index()Libravatar Ben Peart1-4/+4
Speed up refresh_index() by utilizing preload_index() to do most of the work spread across multiple threads. This works because most cache entries will get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when called from within refresh_index(). On a Windows repo with ~200K files, this drops refresh times from 6.64 seconds to 2.87 seconds for a savings of 57%. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19Merge branch 'nd/status-refresh-progress'Libravatar Junio C Hamano1-5/+45
"git status" learns to show progress bar when refreshing the index takes a long time. * nd/status-refresh-progress: status: show progress bar if refreshing the index takes too long
2018-10-19Merge branch 'bp/rename-test-env-var'Libravatar Junio C Hamano1-1/+2
Some environment variables that control the runtime options of Git used during tests are getting renamed for consistency. * bp/rename-test-env-var: t0000: do not get self-test disrupted by environment warnings preload-index: update GIT_FORCE_PRELOAD_TEST support read-cache: update TEST_GIT_INDEX_VERSION support fsmonitor: update GIT_TEST_FSMONITOR support preload-index: use git_env_bool() not getenv() for customization t/README: correct spelling of "uncommon"
2018-09-28preload-index: update GIT_FORCE_PRELOAD_TEST supportLibravatar Ben Peart1-1/+1
Rename GIT_FORCE_PRELOAD_TEST to GIT_TEST_PRELOAD_INDEX for consistency with the other GIT_TEST_ special setups and properly document its use. Add logic in t/test-lib.sh to give a warning when the old variable is set to let people know they need to update their environment to use the new variable. Signed-off-by: Ben Peart <Ben.Peart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-20preload-index: use git_env_bool() not getenv() for customizationLibravatar Ben Peart1-1/+2
GIT_FORCE_PRELOAD_TEST is only checked for presence by using getenv(). Use git_env_bool() instead so that GIT_FORCE_PRELOAD_TEST=false can work as expected. Signed-off-by: Ben Peart <Ben.Peart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'nd/unpack-trees-with-cache-tree'Libravatar Junio C Hamano1-2/+2
The unpack_trees() API used in checking out a branch and merging walks one or more trees along with the index. When the cache-tree in the index tells us that we are walking a tree whose flattened contents is known (i.e. matches a span in the index), as linearly scanning a span in the index is much more efficient than having to open tree objects recursively and listing their entries, the walk can be optimized, which is done in this topic. * nd/unpack-trees-with-cache-tree: Document update for nd/unpack-trees-with-cache-tree cache-tree: verify valid cache-tree in the test suite unpack-trees: add missing cache invalidation unpack-trees: reuse (still valid) cache-tree from src_index unpack-trees: reduce malloc in cache-tree walk unpack-trees: optimize walking same trees with cache-tree unpack-trees: add performance tracing trace.h: support nested performance tracing
2018-09-17status: show progress bar if refreshing the index takes too longLibravatar Nguyễn Thái Ngọc Duy1-5/+45
Refreshing the index is usually very fast, but it can still take a long time sometimes. Cold cache is one. Or copying a repo to a new place (*). It's good to show something to let the user know "git status" is not hanging, it's just busy doing something. (*) In this case, all stat info in the index becomes invalid and git falls back to rehashing all file content to see if there's any difference between updating stat info in the index. This is quite expensive. Even with a repo as small as git.git, it takes 3 seconds. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18trace.h: support nested performance tracingLibravatar Nguyễn Thái Ngọc Duy1-2/+2
Performance measurements are listed right now as a flat list, which is fine when we measure big blocks. But when we start adding more and more measurements, some of them could be just part of a bigger measurement and a flat list gives a wrong impression that they are executed at the same level instead of nested. Add trace_performance_enter() and trace_performance_leave() to allow indent these nested measurements. For now it does not help much because the only nested thing is (lazy) name hash initialization (e.g. called in diff-index from "git status"). This will help more because I'm going to add some more tracing that's actually nested. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13preload-index.c: use the right index instead of the_indexLibravatar Nguyễn Thái Ngọc Duy1-1/+1
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13dir.c: remove an implicit dependency on the_index in pathspec codeLibravatar Nguyễn Thái Ngọc Duy1-1/+1
Make the match_patchspec API and friends take an index_state instead of assuming the_index in dir.c. All external call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02trace: measure where the time is spent in the index-heavy operationsLibravatar Nguyễn Thái Ngọc Duy1-0/+2
All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not. An unoptimized git-status would give something like below: 0.001791141 s: read cache ... 0.004011363 s: preload index 0.000516161 s: refresh index 0.003139257 s: git command: ... 'status' '--porcelain=2' 0.006788129 s: diff-files 0.002090267 s: diff-index 0.001885735 s: initialize name hash 0.032013138 s: read directory 0.051781209 s: git command: './git' 'status' Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01fsmonitor: teach git to optionally utilize a file system monitor to speed up ↵Libravatar Ben Peart1-1/+5
detecting new or changed files. When the index is read from disk, the fsmonitor index extension is used to flag the last known potentially dirty index entries. The registered core.fsmonitor command is called with the time the index was last updated and returns the list of files changed since that time. This list is used to flag any additional dirty cache entries and untracked cache directories. We can then use this valid state to speed up preload_index(), ie_match_stat(), and refresh_cache_ent() as they do not need to lstat() files to detect potential changes for those entries marked CE_FSMONITOR_VALID. In addition, if the untracked cache is turned on valid_cached_dir() can skip checking directories for new or changed files as fsmonitor will invalidate the cache only for those directories that have been identified as having potential changes. To keep the CE_FSMONITOR_VALID state accurate during git operations; when git updates a cache entry to match the current state on disk, it will now set the CE_FSMONITOR_VALID bit. Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit is cleared and the corresponding untracked cache directory is marked invalid. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-24preload-index: add override to enable testing preload-indexLibravatar Ben Peart1-0/+2
By default, the preload index code path doesn't run unless there is a minimum of 1000 files. To enable running the test suite and having it execute the preload-index path, add an environment variable (GIT_FORCE_PRELOAD_TEST) which will override that minimum and set it to 2. This enables you run existing tests and have the core.preloadindex code path execute as long as the test has at least 2 files by setting GIT_FORCE_PRELOAD_TEXT=1 before running the test. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-10preload-index: avoid lstat for skip-worktree itemsLibravatar Jeff Hostetler1-0/+2
Teach preload-index to avoid lstat() calls for index-entries with skip-worktree bit set. This is a performance optimization. During a sparse-checkout, the skip-worktree bit is set on items that were not populated and therefore are not present in the worktree. The per-thread preload-index loop performs a series of tests on each index-entry as it attempts to compare the worktree version with the index and mark them up-to-date. This patch short-cuts that work. On a Windows 10 system with a very large repo (450MB index) and various levels of sparseness, performance was improved in the {preloadindex=true, fscache=false} case by 80% and in the {preloadindex=true, fscache=true} case by 20% for various commands. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13cache.h: rename cache_def_free to cache_def_clearLibravatar Karsten Blees1-1/+1
Rename cache_def_free to cache_def_clear as it doesn't free the struct cache_def, but just clears its content. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07symlinks: remove PATH_MAX limitationLibravatar Karsten Blees1-2/+2
'git checkout' fails if a directory is longer than PATH_MAX, because the lstat_cache in symlinks.c checks if the leading directory exists using PATH_MAX-bounded string operations. Remove the limitation by using strbuf instead. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24pathspec: convert some match_pathspec_depth() to ce_path_match()Libravatar Nguyễn Thái Ngọc Duy1-1/+2
This helps reduce the number of match_pathspec_depth() call sites and show how match_pathspec_depth() is used. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15convert read_cache_preload() to take struct pathspecLibravatar Nguyễn Thái Ngọc Duy1-9/+11
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15move struct pathspec and related functions to pathspec.[ch]Libravatar Nguyễn Thái Ngọc Duy1-0/+1
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03Convert ce_path_match() to use struct pathspecLibravatar Nguyễn Thái Ngọc Duy1-1/+4
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-24Make ce_uptodate() trustworthy againLibravatar Junio C Hamano1-0/+2
The rule has always been that a cache entry that is ce_uptodate(ce) means that we already have checked the work tree entity and we know there is no change in the work tree compared to the index, and nobody should have to double check. Note that false ce_uptodate(ce) does not mean it is known to be dirty---it only means we don't know if it is clean. There are a few codepaths (refresh-index and preload-index are among them) that mark a cache entry as up-to-date based solely on the return value from ie_match_stat(); this function uses lstat() to see if the work tree entity has been touched, and for a submodule entry, if its HEAD points at the same commit as the commit recorded in the index of the superproject (a submodule that is not even cloned is considered clean). A submodule is no longer considered unmodified merely because its HEAD matches the index of the superproject these days, in order to prevent people from forgetting to commit in the submodule and updating the superproject index with the new submodule commit, before commiting the state in the superproject. However, the patch to do so didn't update the codepath that marks cache entries up-to-date based on the updated definition and instead worked it around by saying "we don't trust the return value of ce_uptodate() for submodules." This makes ce_uptodate() trustworthy again by not marking submodule entries up-to-date. The next step _could_ be to introduce a few "in-core" flag bits to cache_entry structure to record "this entry is _known_ to be dirty", call is_submodule_modified() from ie_match_stat(), and use these new bits to avoid running this rather expensive check more than once, but that can be a separate patch. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-22janitor: add DIV_ROUND_UP and use it.Libravatar Pierre Habouzit1-1/+1
Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-09Make index preloading check the whole path to the fileLibravatar Linus Torvalds1-0/+4
This uses the new thread-safe 'threaded_has_symlink_leading_path()' function to efficiently verify that the whole path leading up to the filename is a proper path, and does not contain symlinks. This makes 'ce_uptodate()' a much stronger guarantee: it no longer just guarantees that the 'lstat()' of the path would match, it also means that we know that people haven't played games with moving directories around and covered it up with symlinks. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-02Makefile: introduce NO_PTHREADSLibravatar Junio C Hamano1-0/+9
This introduces make variable NO_PTHREADS for platforms that lack the support for pthreads library or people who do not want to use it for whatever reason. When defined, it makes the multi-threaded index preloading into a no-op, and also disables threaded delta searching by pack-objects. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Mike Ralphson <mike@abacus.co.uk> Tested-by: Johannes Sixt <j6t@kdbg.org> (AIX 4.3.x) Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-17Fix index preloading for racy dirty caseLibravatar Linus Torvalds1-1/+1
In the threaded index preloading case, we must be sure to always use the CE_MATCH_RACY_IS_DIRTY flag when calling ie_match_stat(), in order to make sure that we only ever look at the stat() data, and don't try to do anything fancy. Because most of git internals are not thread-safe, and must not be called in parallel. Otherwise, what happens is that if the timestamps indicate that an entry _might_ be dirty, we might start actually comparing filesystem data with the object database. And we mustn't do that, because that would involve looking up and creating the object structure, and that whole code sequence with read_sha1_file() where we look up and add objects to the hashes is definitely not thread-safe. Nor do we want to add locking, because the whole point of the preload was to be simple and not affect anything else. With CE_MATCH_RACY_IS_DIRTY, we get what we wanted, and we'll just leave the hard cases well alone, to be done later in the much simpler serial case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-14Add cache preload facilityLibravatar Linus Torvalds1-0/+91
This can do the lstat() storm in parallel, giving potentially much improved performance for cold-cache cases or things like NFS that have weak metadata caching. Just use "read_cache_preload()" instead of "read_cache()" to force an optimistic preload of the index stat data. The function takes a pathspec as its argument, allowing us to preload only the relevant portion of the index. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>