tgif.git - Terin's Improved Git Fork

Age	Commit message (Collapse)	Author	Files	Lines
2016-02-03	Merge branch 'jk/ref-cache-non-repository-optim'	Junio C Hamano	1	-14/+32
	The underlying machinery used by "ls-files -o" and other commands have been taught not to create empty submodule ref cache for a directory that is not a submodule. This removes a ton of wasted CPU cycles. * jk/ref-cache-non-repository-optim: resolve_gitlink_ref: ignore non-repository paths clean: make is_git_repository a public function
2016-01-25	resolve_gitlink_ref: ignore non-repository paths	Jeff King	1	-14/+32
	When we want to look up a submodule ref, we use get_ref_cache(path) to find or auto-create its ref cache. But if we feed a path that isn't actually a git repository, we blindly create the ref cache, and then may die deeper in the code when we try to access it. This is a problem because many callers speculatively feed us a path that looks vaguely like a repository, and expect us to tell them when it is not. This patch teaches resolve_gitlink_ref to reject non-repository paths without creating a ref_cache. This avoids the die(), and also performs better if you have a large number of these faux-submodule directories (because the ref_cache lookup is linear, under the assumption that there won't be a large number of submodules). To accomplish this, we also break get_ref_cache into two pieces: the lookup and auto-creation (the latter is lumped into create_ref_cache). This lets us first cheaply ask our cache "is it a submodule we know about?" If so, we can avoid repeating our filesystem lookup. So lookups of real submodules are not penalized; they examine the submodule's .git directory only once. The test in t3000 demonstrates a case where this improves correctness (we used to just die). The new perf case in p7300 shows off the speed improvement in an admittedly pathological repository: Test HEAD^ HEAD ---------------------------------------------------------------- 7300.4: ls-files -o 66.97(66.15+0.87) 0.33(0.08+0.24) -99.5% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-13	lock_ref_sha1_basic: handle REF_NODEREF with invalid refs	Jeff King	1	-9/+10
	We sometimes call lock_ref_sha1_basic with REF_NODEREF to operate directly on a symbolic ref. This is used, for example, to move to a detached HEAD, or when updating the contents of HEAD via checkout or symbolic-ref. However, the first step of the function is to resolve the refname to get the "old" sha1, and we do so without telling resolve_ref_unsafe() that we are only interested in the symref. As a result, we may detect a problem there not with the symref itself, but with something it points to. The real-world example I found (and what is used in the test suite) is a HEAD pointing to a ref that cannot exist, because it would cause a directory/file conflict with other existing refs. This situation is somewhat broken, of course, as trying to _commit_ on that HEAD would fail. But it's not explicitly forbidden, and we should be able to move away from it. However, neither "git checkout" nor "git symbolic-ref" can do so. We try to take the lock on HEAD, which is pointing to a non-existent ref. We bail from resolve_ref_unsafe() with errno set to EISDIR, and the lock code thinks we are attempting to create a d/f conflict. Of course we're not. The problem is that the lock code has no idea what level we were at when we got EISDIR, so trying to diagnose or remove empty directories for HEAD is not useful. To make things even more complicated, we only get EISDIR in the loose-ref case. If the refs are packed, the resolution may "succeed", giving us the pointed-to ref in "refname", but a null oid. Later, we say "ah, the null oid means we are creating; let's make sure there is room for it", but mistakenly check against the _resolved_ refname, not the original. We can fix this by making two tweaks: 1. Call resolve_ref_unsafe() with RESOLVE_REF_NO_RECURSE when REF_NODEREF is set. This means any errors we get will be from the orig_refname, and we can act accordingly. We already do this in the REF_DELETING case, but we should do it for update, too. 2. If we do get a "refname" return from resolve_ref_unsafe(), even with RESOLVE_REF_NO_RECURSE it may be the name of the ref pointed-to by a symref. We already normalize this back to orig_refname before taking the lockfile, but we need to do so before the null_oid check. While we're rearranging the REF_NODEREF handling, we can also bump the initialization of lflags to the top of the function, where we are setting up other flags. This saves us from having yet another conditional block on REF_NODEREF just to set it later. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-13	lock_ref_sha1_basic: always fill old_oid while holding lock	Jeff King	1	-6/+11
	Our basic strategy for taking a ref lock is: 1. Create $ref.lock to take the lock 2. Read the ref again while holding the lock (during which time we know that nobody else can be updating it). 3. Compare the value we read to the expected "old_sha1" The value we read in step (2) is returned to the caller via the lock->old_oid field, who may use it for other purposes (such as writing a reflog). If we have no "old_sha1" (i.e., we are unconditionally taking the lock), then we obviously must omit step 3. But we _also_ omit step 2. This seems like a nice optimization, but it means that the caller sees only whatever was left in lock->old_oid from previous calls to resolve_ref_unsafe(), which happened outside of the lock. We can demonstrate this race pretty easily. Imagine you have three commits, $one, $two, and $three. One script just flips between $one and $two, without providing an old-sha1: while true; do git update-ref -m one refs/heads/foo $one git update-ref -m two refs/heads/foo $two done Meanwhile, another script tries to set the value to $three, also not using an old-sha1: while true; do git update-ref -m three refs/heads/foo $three done If these run simultaneously, we'll see a lot of lock contention, but each of the writes will succeed some of the time. The reflog may record movements between any of the three refs, but we would expect it to provide a consistent log: the "from" field of each log entry should be the same as the "to" field of the previous one. But if we check this: perl -alne ' print "mismatch on line $." if defined $last && $F[0] ne $last; $last = $F[1]; ' .git/logs/refs/heads/foo we'll see many mismatches. Why? Because sometimes, in the time between lock_ref_sha1_basic filling lock->old_oid via resolve_ref_unsafe() and it taking the lock, there may be a complete write by another process. And the "from" field in our reflog entry will be wrong, and will refer to an older value. This is probably quite rare in practice. It requires writers which do not provide an old-sha1 value, and it is a very quick race. However, it is easy to fix: we simply perform step (2), the read-under-lock, whether we have an old-sha1 or not. Then the value we hand back to the caller is always atomic. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-29	create_symref: write reflog while holding lock	Jeff King	1	-1/+2
	We generally hold a lock on the matching ref while writing to its reflog; this prevents two simultaneous writers from clobbering each other's reflog lines (it does not even have to be two symref updates; because we don't hold the lock, we could race with somebody writing to the pointed-to ref via HEAD, for example). We can fix this by writing the reflog before we commit the lockfile. This runs the risk of writing the reflog but failing the final rename(), but at least we now err on the same side as the rest of the ref code. Noticed-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-29	create_symref: use existing ref-lock code	Jeff King	1	-55/+54
	The create_symref() function predates the existence of "struct lock_file", let alone the more recent "struct ref_lock". Instead, it just does its own manual dot-locking. Besides being more code, this has a few downsides: - if git is interrupted while holding the lock, we don't clean up the lockfile - we don't do the usual directory/filename conflict check. So you can sometimes create a symref "refs/heads/foo/bar", even if "refs/heads/foo" exists (namely, if the refs are packed and we do not hit the d/f conflict in the filesystem). This patch refactors create_symref() to use the "struct ref_lock" interface, which handles both of these things. There are a few bonus cleanups that come along with it: - we leaked ref_path in some error cases - the symref contents were stored in a fixed-size buffer, putting an artificial (albeit large) limitation on the length of the refname. We now write through fprintf, and handle refnames of any size. - we called adjust_shared_perm only after the file was renamed into place, creating a potential race with readers in a shared repository. The lockfile code now handles this when creating the lockfile, making it atomic. - the legacy prefer_symlink_refs path did not do any locking at all. Admittedly, it is not atomic from a reader's perspective (as it unlinks and re-creates the symlink to overwrite), but at least it cannot conflict with other writers now. - the result of this patch is hopefully more readable. It eliminates three goto labels. Two were for error checking that is now simplified, and the third was to reach shared code that has been pulled into its own function. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-29	create_symref: modernize variable names	Jeff King	1	-21/+20
	Once upon a time, create_symref() was used only to point HEAD at a branch name, and the variable names reflect that (e.g., calling the path git_HEAD). However, it is much more generic these days (and has been for some time). Let's update the variable names to make it easier to follow: - `ref_target` is now just `refname`. This is closer to the `ref` that is already in `cache.h`, but with the extra twist that "name" makes it clear this is the name and not a ref struct. Dropping "target" hopefully makes it clear that we are talking about the symref itself, not what it points to. - `git_HEAD` is now `ref_path`; the on-disk path corresponding to `ref`. - `refs_heads_master` is now just `target`; i.e., what the symref points at. This term also matches what is in the symlink(2) manpage (at least on Linux). - the buffer to hold the symref file's contents was simply called `ref`. It's now `buf` (admittedly also generic, but at least not actively introducing confusion with the other variable holding the refname). Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-11	Merge branch 'sg/lock-file-commit-error'	Junio C Hamano	1	-1/+1
	Cosmetic improvement to lock-file error messages. * sg/lock-file-commit-error: Make error message after failing commit_lock_file() less confusing
2015-11-20	refs: break out ref conflict checks	David Turner	2	-42/+23
	Create new function find_descendant_ref, to hold one of the ref conflict checks used in verify_refname_available. Multiple backends will need this function, so move it to the common code. Also move rename_ref_available to the common code, because alternate backends might need it and it has no files-backend-specific code. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2015-11-20	files_log_ref_write: new function	David Turner	2	-0/+12
	Because HEAD and stash are per-worktree, every refs backend needs to go through the files backend to write these refs. So create a new function, files_log_ref_write, and add it to refs/refs-internal.h. Later, we will use this to handle reflog updates for per-worktree symbolic refs (HEAD). Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2015-11-20	refs: split filesystem-based refs code into a new file	Michael Haggerty	1	-0/+3539
	As another step in the move to pluggable reference backends, move the code that is specific to the filesystem-based reference backend (i.e., the current system of storing references as loose and packed files) into a separate file, refs/files-backend.c. Aside from a tiny bit of file header boilerplate, this commit only moves a subset of the code verbatim from refs.c to the new file, as can easily be verified using patience diff: git diff --patience $commit^:refs.c $commit:refs.c git diff --patience $commit^:refs.c $commit:refs/files-backend.c Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2015-11-20	refs/refs-internal.h: new header file	Michael Haggerty	1	-0/+180
	There are a number of constants, structs, and static functions defined in refs.c and treated as private to the references module. But we want to support multiple reference backends within the reference module, and those backends will need access to some heretofore private declarations. We don't want those declarations to be visible to non-refs code, so we don't want to move them to refs.h. Instead, add a new header file, refs/refs-internal.h, that is intended to be included only from within the refs module. Make some functions non-static and move some declarations (and their corresponding docstrings) from refs.c to this file. In a moment we will add more content to the "refs" subdirectory. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>