summaryrefslogtreecommitdiff
path: root/read-cache.c
AgeCommit message (Collapse)AuthorFilesLines
2017-09-28Merge branch 'jk/fallthrough'Libravatar Junio C Hamano1-0/+1
Many codepaths have been updated to squelch -Wimplicit-fallthrough warnings from Gcc 7 (which is a good code hygiene). * jk/fallthrough: consistently use "fallthrough" comments in switches curl_trace(): eliminate switch fallthrough test-line-buffer: simplify command parsing
2017-09-25Merge branch 'jk/write-in-full-fix'Libravatar Junio C Hamano1-3/+3
Many codepaths did not diagnose write failures correctly when disks go full, due to their misuse of write_in_full() helper function, which have been corrected. * jk/write-in-full-fix: read_pack_header: handle signed/unsigned comparison in read result config: flip return value of store_write_*() notes-merge: use ssize_t for write_in_full() return value pkt-line: check write_in_full() errors against "< 0" convert less-trivial versions of "write_in_full() != len" avoid "write_in_full(fd, buf, len) != len" pattern get-tar-commit-id: check write_in_full() return against 0 config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-09-25Merge branch 'kw/write-index-reduce-alloc'Libravatar Junio C Hamano1-1/+3
A hotfix to a topic already in 'master'. * kw/write-index-reduce-alloc: read-cache: fix index corruption with index v4 Add t/helper/test-write-cache to .gitignore
2017-09-22consistently use "fallthrough" comments in switchesLibravatar Jeff King1-0/+1
Gcc 7 adds -Wimplicit-fallthrough, which can warn when a switch case falls through to the next case. The general idea is that the compiler can't tell if this was intentional or not, so you should annotate any intentional fall-throughs as such, leaving it to complain about any unannotated ones. There's a GNU __attribute__ which can be used for annotation, but of course we'd have to #ifdef it away on non-gcc compilers. Gcc will also recognize specially-formatted comments, which matches our current practice. Let's extend that practice to all of the unannotated sites (which I did look over and verify that they were behaving as intended). Ideally in each case we'd actually give some reasons in the comment about why we're falling through, or what we're falling through to. And gcc does support that with -Wimplicit-fallthrough=2, which relaxes the comment pattern matching to anything that contains "fallthrough" (or a variety of spelling variants). However, this isn't the default for -Wimplicit-fallthrough, nor for -Wextra. In the name of simplicity, it's probably better for us to support the default level, which requires "fallthrough" to be the only thing in the comment (modulo some window dressing like "else" and some punctuation; see the gcc manual for the complete set of patterns). This patch suppresses all warnings due to -Wimplicit-fallthrough. We might eventually want to add that to the DEVELOPER Makefile knob, but we should probably wait until gcc 7 is more widely adopted (since earlier versions will complain about the unknown warning type). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14avoid "write_in_full(fd, buf, len) != len" patternLibravatar Jeff King1-3/+3
The return value of write_in_full() is either "-1", or the requested number of bytes[1]. If we make a partial write before seeing an error, we still return -1, not a partial value. This goes back to f6aa66cb95 (write_in_full: really write in full or return error on disk full., 2007-01-11). So checking anything except "was the return value negative" is pointless. And there are a couple of reasons not to do so: 1. It can do a funny signed/unsigned comparison. If your "len" is signed (e.g., a size_t) then the compiler will promote the "-1" to its unsigned variant. This works out for "!= len" (unless you really were trying to write the maximum size_t bytes), but is a bug if you check "< len" (an example of which was fixed recently in config.c). We should avoid promoting the mental model that you need to check the length at all, so that new sites are not tempted to copy us. 2. Checking for a negative value is shorter to type, especially when the length is an expression. 3. Linus says so. In d34cf19b89 (Clean up write_in_full() users, 2007-01-11), right after the write_in_full() semantics were changed, he wrote: I really wish every "write_in_full()" user would just check against "<0" now, but this fixes the nasty and stupid ones. Appeals to authority aside, this makes it clear that writing it this way does not have an intentional benefit. It's a historical curiosity that we never bothered to clean up (and which was undoubtedly cargo-culted into new sites). So let's convert these obviously-correct cases (this includes write_str_in_full(), which is just a wrapper for write_in_full()). [1] A careful reader may notice there is one way that write_in_full() can return a different value. If we ask write() to write N bytes and get a return value that is _larger_ than N, we could return a larger total. But besides the fact that this would imply a totally broken version of write(), it would already invoke undefined behavior. Our internal remaining counter is an unsigned size_t, which means that subtracting too many byte will wrap it around to a very large number. So we'll instantly begin reading off the end of the buffer, trying to write gigabytes (or petabytes) of data. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-08read-cache: fix index corruption with index v4Libravatar Thomas Gummerer1-1/+3
ce012deb98 ("read-cache: avoid allocating every ondisk entry when writing", 2017-08-21) changed the way cache entries are written to the index file. While previously it wrote the name to an struct that was allocated using xcalloc(), it now uses ce_write() directly. Previously ce_namelen - common bytes were written to the cache entry, which would automatically make it nul terminated, as it was allocated using calloc. Now we are writing ce_namelen - common + 1 bytes directly from the ce->name to the index. If CE_STRIP_NAME however gets set in the split index case ce->ce_namelen is set to 0 without changing the actual ce->name buffer. When index-v4, this results in the first character of ce->name being written out instead of just a terminating nul charcter. As index-v4 requires the terminating nul character as terminator of the name when reading it back, this results in a corrupted index. Fix that by only writing ce_namelen - common bytes directly from ce->name to the index, and adding the nul terminator in an extra call to ce_write. This bug was turned up by setting TEST_GIT_INDEX_VERSION = 4 in config.mak and running the test suite (t1700 specifically broke). Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06tempfile: auto-allocate tempfiles on heapLibravatar Jeff King1-13/+12
The previous commit taught the tempfile code to give up ownership over tempfiles that have been renamed or deleted. That makes it possible to use a stack variable like this: struct tempfile t; create_tempfile(&t, ...); ... if (!err) rename_tempfile(&t, ...); else delete_tempfile(&t); But doing it this way has a high potential for creating memory errors. The tempfile we pass to create_tempfile() ends up on a global linked list, and it's not safe for it to go out of scope until we've called one of those two deactivation functions. Imagine that we add an early return from the function that forgets to call delete_tempfile(). With a static or heap tempfile variable, the worst case is that the tempfile hangs around until the program exits (and some functions like setup_shallow_temporary rely on this intentionally, creating a tempfile and then leaving it for later cleanup). But with a stack variable as above, this is a serious memory error: the variable goes out of scope and may be filled with garbage by the time the tempfile code looks at it. Let's see if we can make it harder to get this wrong. Since many callers need to allocate arbitrary numbers of tempfiles, we can't rely on static storage as a general solution. So we need to turn to the heap. We could just ask all callers to pass us a heap variable, but that puts the burden on them to call free() at the right time. Instead, let's have the tempfile code handle the heap allocation _and_ the deallocation (when the tempfile is deactivated and removed from the list). This changes the return value of all of the creation functions. For the cleanup functions (delete and rename), we'll add one extra bit of safety: instead of taking a tempfile pointer, we'll take a pointer-to-pointer and set it to NULL after freeing the object. This makes it safe to double-call functions like delete_tempfile(), as the second call treats the NULL input as a noop. Several callsites follow this pattern. The resulting patch does have a fair bit of noise, as each caller needs to be converted to handle: 1. Storing a pointer instead of the struct itself. 2. Passing the pointer instead of taking the struct address. 3. Handling a "struct tempfile *" return instead of a file descriptor. We could play games to make this less noisy. For example, by defining the tempfile like this: struct tempfile { struct heap_allocated_part_of_tempfile { int fd; ...etc } *actual_data; } Callers would continue to have a "struct tempfile", and it would be "active" only when the inner pointer was non-NULL. But that just makes things more awkward in the long run. There aren't that many callers, so we can simply bite the bullet and adjust all of them. And the compiler makes it easy for us to find them all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06lockfile: do not rollback lock on failed closeLibravatar Jeff King1-1/+1
Since the lockfile code is based on the tempfile code, it has some of the same problems, including that close_lock_file() erases the tempfile's filename buf, making it hard for the caller to write a good error message. In practice this comes up less for lockfiles than for straight tempfiles, since we usually just report the refname. But there is at least one buggy case in write_ref_to_lockfile(). Besides, given the coupling between the lockfile and tempfile modules, it's less confusing if their close() functions have the same semantics. Just as the previous commit did for close_tempfile(), let's teach close_lock_file() and its wrapper close_ref() not to rollback on error. And just as before, we'll give them new "gently" names to catch any new callers that are added. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06tempfile: do not delete tempfile on failed closeLibravatar Jeff King1-2/+5
When close_tempfile() fails, we delete the tempfile and reset the fields of the tempfile struct. This makes it easier for callers to return without cleaning up, but it also makes this common pattern: if (close_tempfile(tempfile)) return error_errno("error closing %s", tempfile->filename.buf); wrong, because the "filename" field has been reset after the failed close. And it's not easy to fix, as in many cases we don't have another copy of the filename (e.g., if it was created via one of the mks_tempfile functions, and we just have the original template string). Let's drop the feature that a failed close automatically deletes the file. This puts the burden on the caller to do the deletion themselves, but this isn't that big a deal. Callers which do: if (write(...) || close_tempfile(...)) { delete_tempfile(...); return -1; } already had to call delete when the write() failed, and so aren't affected. Likewise, any caller which just calls die() in the error path is OK; we'll delete the tempfile during the atexit handler. Because this patch changes the semantics of close_tempfile() without changing its signature, all callers need to be manually checked and converted to the new scheme. This patch covers all in-tree callers, but there may be others for not-yet-merged topics. To catch these, we rename the function to close_tempfile_gently(), which will attract compile-time attention to new callers. (Technically the original could be considered "gentle" already in that it didn't die() on errors, but this one is even more so). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-26Merge branch 'kw/write-index-reduce-alloc'Libravatar Junio C Hamano1-28/+34
We used to spend more than necessary cycles allocating and freeing piece of memory while writing each index entry out. This has been optimized. * kw/write-index-reduce-alloc: read-cache: avoid allocating every ondisk entry when writing read-cache: fix memory leak in do_write_index perf: add test for writing the index
2017-08-21read-cache: avoid allocating every ondisk entry when writingLibravatar Kevin Willford1-25/+25
When writing the index for each entry an ondisk struct will be allocated and freed in ce_write_entry. We can do better by using a ondisk struct on the stack for each entry. This is accomplished by using a stack ondisk_cache_entry_extended outside looping through the entries in do_write_index. Only the fixed fields of this struct are used when writing and depending on whether it is extended or not the flags2 field will be written. The name field is not used and instead the cache_entry name field is used directly when writing out the name. Because ce_write is using a buffer and memcpy to fill the buffer before flushing to disk, we don't have to worry about doing multiple ce_write calls. Running the p0007-write-cache.sh tests would save anywhere between 3-7% when the index had over a million entries with no performance degradation on small repos. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-21read-cache: fix memory leak in do_write_indexLibravatar Kevin Willford1-3/+9
The previous_name_buf was never getting released when there was an error in ce_write_entry or allow was false and execution was returned to the caller. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20sha1_file: convert index_fd to struct object_idLibravatar Patryk Obara1-1/+1
Convert all remaining callers as well. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20sha1_file: convert index_path to struct object_idLibravatar Patryk Obara1-1/+1
Convert all remaining callers as well. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20read-cache: convert to struct object_idLibravatar Patryk Obara1-3/+3
Replace hashcmp with oidcmp. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-17use MOVE_ARRAYLibravatar René Scharfe1-3/+2
Simplify the code for moving members inside of an array and make it more robust by using the helper macro MOVE_ARRAY. It calculates the size based on the specified number of elements for us and supports NULL pointers when that number is zero. Raw memmove(3) calls with NULL can cause the compiler to (over-eagerly) optimize out later NULL checks. This patch was generated with contrib/coccinelle/array.cocci and spatch (Coccinelle). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-05Merge branch 'cc/shared-index-permfix'Libravatar Junio C Hamano1-0/+8
The split index code did not honor core.sharedrepository setting correctly. * cc/shared-index-permfix: t1700: make sure split-index respects core.sharedrepository t1301: move modebits() to test-lib-functions.sh read-cache: use shared perms when writing shared index
2017-06-25read-cache: use shared perms when writing shared indexLibravatar Christian Couder1-0/+8
Since f6ecc62dbf (write_shared_index(): use tempfile module, 2015-08-10) write_shared_index() has been using mks_tempfile() to create the temporary file that will become the shared index. But even before that, it looks like the functions used to create this file didn't call adjust_shared_perm(), which means that the shared index file has always been created with 600 permissions regardless of the shared permission settings. Because of that, on repositories created with `git init --shared=all` and using the split index feature, one gets an error like: fatal: .git/sharedindex.a52f910b489bc462f187ab572ba0086f7b5157de: index file open failed: Permission denied when another user performs any operation that reads the shared index. Call adjust_shared_perm() on the temporary file created by mks_tempfile() ourselves to adjust the permission bits. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24Merge branch 'ab/free-and-null'Libravatar Junio C Hamano1-4/+2
A common pattern to free a piece of memory and assign NULL to the pointer that used to point at it has been replaced with a new FREE_AND_NULL() macro. * ab/free-and-null: *.[ch] refactoring: make use of the FREE_AND_NULL() macro coccinelle: make use of the "expression" FREE_AND_NULL() rule coccinelle: add a rule to make "expression" code use FREE_AND_NULL() coccinelle: make use of the "type" FREE_AND_NULL() rule coccinelle: add a rule to make "type" code use FREE_AND_NULL() git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24Merge branch 'bw/config-h'Libravatar Junio C Hamano1-0/+1
Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h
2017-06-24Merge branch 'nd/split-index-unshare'Libravatar Junio C Hamano1-2/+8
* nd/split-index-unshare: Revert "split-index: add and use unshare_split_index()"
2017-06-24Revert "split-index: add and use unshare_split_index()"Libravatar Junio C Hamano1-2/+8
This reverts commit f9d7abec2ad2f9eb3d8873169cc28c34273df082; see public-inbox.org/git/CAP8UFD0bOfzY-_hBDKddOcJdPUpP2KEVaX_SrCgvAMYAHtseiQ@mail.gmail.com
2017-06-16coccinelle: make use of the "type" FREE_AND_NULL() ruleLibravatar Ævar Arnfjörð Bjarmason1-4/+2
Apply the result of the just-added coccinelle rule. This manually excludes a few occurrences, mostly things that resulted in many FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15config: don't include config.h by defaultLibravatar Brandon Williams1-0/+1
Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13Merge branch 'jh/close-index-before-stat' into maintLibravatar Junio C Hamano1-4/+9
The timestamp of the index file is now taken after the file is closed, to help Windows, on which a stale timestamp is reported by fstat() on a file that is opened for writing and data was written but not yet closed. * jh/close-index-before-stat: read-cache: close index.lock in do_write_index
2017-06-05Merge branch 'jh/close-index-before-stat'Libravatar Junio C Hamano1-4/+9
The timestamp of the index file is now taken after the file is closed, to help Windows, on which a stale timestamp is reported by fstat() on a file that is opened for writing and data was written but not yet closed. * jh/close-index-before-stat: read-cache: close index.lock in do_write_index
2017-05-30Merge branch 'dt/unpack-save-untracked-cache-extension'Libravatar Junio C Hamano1-0/+6
When "git checkout", "git merge", etc. manipulates the in-core index, various pieces of information in the index extensions are discarded from the original state, as it is usually not the case that they are kept up-to-date and in-sync with the operation on the main index. The untracked cache extension is copied across these operations now, which would speed up "git status" (as long as the cache is properly invalidated). * dt/unpack-save-untracked-cache-extension: unpack-trees: preserve index extensions
2017-05-29Merge branch 'nd/split-index-unshare'Libravatar Junio C Hamano1-8/+2
Plug some leaks and updates internal API used to implement the split index feature to make it easier to avoid such a leak in the future. * nd/split-index-unshare: p3400: add perf tests for rebasing many changes split-index: add and use unshare_split_index()
2017-05-20unpack-trees: preserve index extensionsLibravatar David Turner1-0/+6
Make git checkout (and other unpack_tree operations) preserve the untracked cache. This is valuable for two reasons: 1. Often, an unpack_tree operation will not touch large parts of the working tree, and thus most of the untracked cache will continue to be valid. 2. Even if the untracked cache were entirely invalidated by such an operation, the user has signaled their intention to have such a cache, and we don't want to throw it away. [jes: backed out the watchman-specific parts] Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-16Merge branch 'jk/no-null-sha1-in-cache-tree'Libravatar Junio C Hamano1-1/+4
Code to update the cache-tree has been tightened so that we won't accidentally write out any 0{40} entry in the tree object. * jk/no-null-sha1-in-cache-tree: cache-tree: reject entries with null sha1
2017-05-08split-index: add and use unshare_split_index()Libravatar Nguyễn Thái Ngọc Duy1-8/+2
When split-index is being used, we have two cache_entry arrays in index_state->cache[] and index_state->split_index->base->cache[]. index_state->cache[] may share the same entries with base->cache[] so we can quickly determine what entries are shared. This makes memory management tricky, we can't free base->cache[] until we know index_state->cache[] does not point to any of those entries. unshare_split_index() is added for this purpose, to find shared entries and either duplicate them in index_state->cache[], or discard them. Either way it should be safe to free base->cache[] after unshare_split_index(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-01i18n: read-cache: typofixLibravatar Peter Krefting1-1/+1
Signed-off-by: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Jean-Noel Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-28read-cache: close index.lock in do_write_indexLibravatar Jeff Hostetler1-4/+9
Teach do_write_index() to close the index.lock file before getting the mtime and updating the istate.timestamp fields. On Windows, a file's mtime is not updated until the file is closed. On Linux, the mtime is set after the last flush. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-26Merge branch 'cc/split-index-config'Libravatar Junio C Hamano1-1/+2
The split-index code configuration code used an unsafe git_path() function without copying its result out. * cc/split-index-config: read-cache: avoid using git_path() in freshen_shared_index()
2017-04-26Merge branch 'jh/add-index-entry-optim'Libravatar Junio C Hamano1-2/+137
"git checkout" that handles a lot of paths has been optimized by reducing the number of unnecessary checks of paths in the has_dir_name() function. * jh/add-index-entry-optim: read-cache: speed up has_dir_name (part 2) read-cache: speed up has_dir_name (part 1) read-cache: speed up add_index_entry during checkout p0006-read-tree-checkout: perf test to time read-tree read-cache: add strcmp_offset function
2017-04-23cache-tree: reject entries with null sha1Libravatar Jeff King1-1/+4
We generally disallow null sha1s from entering the index, due to 4337b5856 (do not write null sha1s to on-disk index, 2012-07-28). However, we loosened that in 83bd7437c (write_index: optionally allow broken null sha1s, 2013-08-27) so that tools like filter-branch could be used to repair broken history. However, we should make sure that these broken entries do not get propagated into new trees. For most entries, we'd catch them with the missing-object check (since presumably the null sha1 does not exist in our object database). But gitlink entries do not need reachability, so we may blindly copy the entry into a bogus tree. This patch rejects all null sha1s (with the same "invalid entry" message that missing objects get) when building trees from the index. It does so even for non-gitlinks, and even when "write-tree" is given the --missing-ok flag. The null sha1 is a special sentinel value that is already rejected in trees by fsck; whether the object exists or not, it is an error to put it in a tree. Note that for this to work, we must also avoid reusing an existing cache-tree that contains the null sha1. This patch does so by just refusing to write out any cache tree when the index contains a null sha1. This is blunter than we need to be; we could just reject the subtree that contains the offending entry. But it's not worth the complexity. The behavior is unchanged unless you have a broken index entry, and even then we'd refuse the whole index write unless the emergency GIT_ALLOW_NULL_SHA1 is in use. And even then the end result is only a performance drop (any write-tree will have to generate the whole cache-tree from scratch). The tests bear some explanation. The existing test in t7009 doesn't catch this problem, because our index-filter runs "git rm --cached", which will try to rewrite the updated index and barf on the bogus entry. So we never even make it to write-tree. The new test there adds a noop index-filter, which does show the problem. The new tests in t1601 are slightly redundant with what filter-branch is doing under the hood in t7009. But as they're much more direct, they're easier to reason about. And should filter-branch ever change or go away, we'd want to make sure that these plumbing commands behave sanely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-20read-cache: avoid using git_path() in freshen_shared_index()Libravatar Christian Couder1-1/+2
When performing an interactive rebase in split-index mode, the commit message that one should rework when squashing commits can contain some garbage instead of the usual concatenation of both of the commit messages. The code uses git_path() to compute the shared index filename, and passes it to check_and_freshen_file() as its argument; there is no guarantee that the rotating pathname buffer passed as argument will stay valid during the life of this call. Make our own copy before calling the function and pass the copy as its argument to avoid this risky pattern. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19read-cache: speed up has_dir_name (part 2)Libravatar Jeff Hostetler1-1/+62
Teach has_dir_name() to see if the path of the new item is greater than the last path in the index array before attempting to search for it. has_dir_name() is looking for file/directory collisions in the index and has to consider each sub-directory prefix in turn. This can cause multiple binary searches for each path. During operations like checkout, merge_working_tree() populates the new index in sorted order, so we expect to be able to append in many cases. This commit is part 2 of 2. This commit handles the additional possible short-cuts as we look at each sub-directory prefix. The net-net gains for add_index_entry_with_check() and both had_dir_name() commits are best seen for very large repos. Here are results for an INFLATED version of linux.git with 1M files. $ GIT_PERF_REPO=/mnt/test/linux_inflated.git/ ./run upstream/base HEAD ./p0006-read-tree-checkout.sh Test upstream/base HEAD 0006.2: read-tree br_base br_ballast (1043893) 3.79(3.63+0.15) 2.68(2.52+0.15) -29.3% 0006.3: switch between br_base br_ballast (1043893) 7.55(6.58+0.44) 6.03(4.60+0.43) -20.1% 0006.4: switch between br_ballast br_ballast_plus_1 (1043893) 10.84(9.26+0.59) 8.44(7.06+0.65) -22.1% 0006.5: switch between aliases (1043893) 10.93(9.39+0.58) 10.24(7.04+0.63) -6.3% Here are results for a synthetic repo with 4.2M files. $ GIT_PERF_REPO=~/work/gfw/t/perf/repos/gen-many-files-10.4.3.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh Test HEAD~3 HEAD 0006.2: read-tree br_base br_ballast (4194305) 29.96(19.26+10.50) 23.76(13.42+10.12) -20.7% 0006.3: switch between br_base br_ballast (4194305) 56.95(36.08+16.83) 45.54(25.94+15.68) -20.0% 0006.4: switch between br_ballast br_ballast_plus_1 (4194305) 90.94(51.50+31.52) 78.22(39.39+30.70) -14.0% 0006.5: switch between aliases (4194305) 93.72(51.63+34.09) 77.94(39.00+30.88) -16.8% Results for medium repos (like linux.git) are mixed and have more variance (probably do to disk IO unrelated to this test. $ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh Test HEAD~3 HEAD 0006.2: read-tree br_base br_ballast (57994) 0.25(0.21+0.03) 0.20(0.17+0.02) -20.0% 0006.3: switch between br_base br_ballast (57994) 10.67(6.06+2.92) 10.51(5.94+2.91) -1.5% 0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.59(0.47+0.16) 0.52(0.40+0.13) -11.9% 0006.5: switch between aliases (57994) 0.59(0.44+0.17) 0.51(0.38+0.14) -13.6% $ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh Test HEAD~3 HEAD 0006.2: read-tree br_base br_ballast (57994) 0.24(0.21+0.02) 0.21(0.18+0.02) -12.5% 0006.3: switch between br_base br_ballast (57994) 10.42(5.98+2.91) 10.66(5.86+3.09) +2.3% 0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.59(0.49+0.13) 0.53(0.37+0.16) -10.2% 0006.5: switch between aliases (57994) 0.59(0.43+0.17) 0.50(0.37+0.14) -15.3% Results for smaller repos (like git.git) are not significant. $ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh Test HEAD~3 HEAD 0006.2: read-tree br_base br_ballast (3043) 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% 0006.3: switch between br_base br_ballast (3043) 0.31(0.17+0.11) 0.29(0.19+0.08) -6.5% 0006.4: switch between br_ballast br_ballast_plus_1 (3043) 0.03(0.02+0.00) 0.03(0.02+0.00) +0.0% 0006.5: switch between aliases (3043) 0.03(0.02+0.00) 0.03(0.02+0.00) +0.0% Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19read-cache: speed up has_dir_name (part 1)Libravatar Jeff Hostetler1-0/+45
Teach has_dir_name() to see if the path of the new item is greater than the last path in the index array before attempting to search for it. has_dir_name() is looking for file/directory collisions in the index and has to consider each sub-directory prefix in turn. This can cause multiple binary searches for each path. During operations like checkout, merge_working_tree() populates the new index in sorted order, so we expect to be able to append in many cases. This commit is part 1 of 2. This commit handles the top of has_dir_name() and the trivial optimization. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-19read-cache: speed up add_index_entry during checkoutLibravatar Jeff Hostetler1-1/+10
Teach add_index_entry_with_check() to see if the path of the new item is greater than the last path in the index array before attempting to search for it. During checkout, merge_working_tree() populates the new index in sorted order, so this change will save a binary lookups per file. This preserves the original behavior but simply checks the last element before starting the search. This helps performance on very large repositories. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15read-cache: add strcmp_offset functionLibravatar Jeff Hostetler1-0/+20
Add strcmp_offset() function to also return the offset of the first change. Add unit test and helper to verify. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-15read-cache: force_verify_index_checksumLibravatar Jeff Hostetler1-0/+7
Teach git to skip verification of the SHA1-1 checksum at the end of the index file in verify_hdr() which is called from read_index() unless the "force_verify_index_checksum" global variable is set. Teach fsck to force this verification. The checksum verification is for detecting disk corruption, and for small projects, the time it takes to compute SHA-1 is not that significant, but for gigantic repositories this calculation adds significant time to every command. These effect can be seen using t/perf/p0002-read-cache.sh: Test HEAD~1 HEAD -------------------------------------------------------------------------------------- 0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5% Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-17Merge branch 'cc/split-index-config'Libravatar Junio C Hamano1-10/+147
The experimental "split index" feature has gained a few configuration variables to make it easier to use. * cc/split-index-config: (22 commits) Documentation/git-update-index: explain splitIndex.* Documentation/config: add splitIndex.sharedIndexExpire read-cache: use freshen_shared_index() in read_index_from() read-cache: refactor read_index_from() t1700: test shared index file expiration read-cache: unlink old sharedindex files config: add git_config_get_expiry() from gc.c read-cache: touch shared index files when used sha1_file: make check_and_freshen_file() non static Documentation/config: add splitIndex.maxPercentChange t1700: add tests for splitIndex.maxPercentChange read-cache: regenerate shared index if necessary config: add git_config_get_max_percent_split_change() Documentation/git-update-index: talk about core.splitIndex config var Documentation/config: add information for core.splitIndex t1700: add tests for core.splitIndex update-index: warn in case of split-index incoherency read-cache: add and then use tweak_split_index() split-index: add {add,remove}_split_index() functions config: add git_config_get_split_index() ...
2017-03-06read-cache: use freshen_shared_index() in read_index_from()Libravatar Christian Couder1-0/+1
This way a share index file will not be garbage collected if we still read from an index it is based from. As we need to read the current index before creating a new one, the tests have to be adjusted, so that we don't expect an old shared index file to be deleted right away when we create a new one. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-06read-cache: refactor read_index_from()Libravatar Christian Couder1-6/+8
It looks better and is simpler to review when we don't compute the same things many times in the function. It will also help make the following commit simpler. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-06read-cache: unlink old sharedindex filesLibravatar Christian Couder1-1/+63
Everytime split index is turned on, it creates a "sharedindex.XXXX" file in the git directory. This change makes sure that shared index files that haven't been used for a long time are removed when a new shared index file is created. The new "splitIndex.sharedIndexExpire" config variable is created to tell the delay after which an unused shared index file can be deleted. It defaults to "2.weeks.ago". A previous commit made sure that each time a split index file is created the mtime of the shared index file it references is updated. This makes sure that recently used shared index file will not be deleted. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01read-cache: touch shared index files when usedLibravatar Christian Couder1-3/+26
When a split-index file is created, let's update the mtime of the shared index file that the split-index file is referencing. In a following commit we will make shared index file expire depending on their mtime, so updating the mtime makes sure that the shared index file will not be deleted soon. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01read-cache: regenerate shared index if necessaryLibravatar Christian Couder1-0/+32
When writing a new split-index and there is a big number of cache entries in the split-index compared to the shared index, it is a good idea to regenerate the shared index. By default when the ratio reaches 20%, we will push back all the entries from the split-index into a new shared index file instead of just creating a new split-index file. The threshold can be configured using the "splitIndex.maxPercentChange" config variable. We need to adjust the existing tests in t1700 by setting "splitIndex.maxPercentChange" to 100 at the beginning of t1700, as the existing tests are assuming that the shared index is regenerated only when `git update-index --split-index` is used. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01read-cache: add and then use tweak_split_index()Libravatar Christian Couder1-0/+17
This will make us use the split-index feature or not depending on the value of the "core.splitIndex" config variable. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-31Merge branch 'sb/in-core-index-doc'Libravatar Junio C Hamano1-1/+0
Documentation and in-code comments updates. * sb/in-core-index-doc: documentation: retire unfinished documentation cache.h: document add_[file_]to_index cache.h: document remove_index_entry_at cache.h: document index_name_pos