summaryrefslogtreecommitdiff
path: root/builtin
AgeCommit message (Collapse)AuthorFilesLines
2015-04-29merge: extract prepare_merge_message() logic outLibravatar Junio C Hamano1-11/+15
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-29merge: narrow scope of merge_namesLibravatar Junio C Hamano1-4/+7
In order to pass the list of parents to fmt_merge_msg(), cmd_merge() uses this strbuf to create something that look like FETCH_HEAD that describes commits that are being merged. This is necessary only when we are creating the merge commit message ourselves, but was done unconditionally. Move the variable and the logic to populate it to confine them in a block that needs them. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-29merge: split reduce_parents() out of collect_parents()Libravatar Junio C Hamano1-16/+25
The latter does two separate things: - Parse the list of commits on the command line, and formulate the list of commits to be merged (including the current HEAD); - Compute the list of parents to be recorded in the resulting merge commit. Split the latter into a separate helper function, so that we can later supply the list commits to be merged from a different source (namely, FETCH_HEAD). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-29merge: clarify collect_parents() logicLibravatar Junio C Hamano1-2/+11
Clarify this small function in three ways. - The function initially collects all commits to be merged into a commit_list "remoteheads"; the "remotes" pointer always points at the tail of this list (either the remoteheads variable itself, or the ->next slot of the element at the end of the list) to help elongate the list by repeated calls to commit_list_insert(). Because the new element appended by commit_list_insert() will always have its ->next slot NULLed out, there is no need for us to assign NULL to *remotes to terminate the list at the end. - The variable "head_subsumed" always confused me every time I read this code. What is happening here is that we inspect what the caller told us to merge (including the current HEAD) and come up with the list of parents to be recorded for the resulting merge commit, omitting commits that are ancestor of other commits. This filtering may remove the current HEAD from the resulting parent list---and we signal that fact with this variable, so that we can later record it as the first parent when "--no-ff" is in effect. - The "parents" list is created for this function by reduce_heads() and was not deallocated after its use, even though the loop control was written in such a way to allow us to do so by taking the "next" element in a separate variable so that it can be used in the next-step part of the loop control. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-29merge: small leakfix and code simplificationLibravatar Junio C Hamano1-2/+2
When parsing a merged object name like "foo~20" to formulate a merge summary "Merge branch foo (early part)", a temporary strbuf is used, but we forgot to deallocate it when we failed to find the named branch. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-29merge: do not check argc to determine number of remote headsLibravatar Junio C Hamano1-3/+2
To reject merging multiple commits into an unborn branch, we check argc, thinking that collect_parents() that reads the remaining command line arguments from <argc, argv> will give us the same number of commits as its input, i.e. argc. Because what we really care about is the number of commits, let the function run and then make sure it returns only one commit instead. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-29merge: clarify "pulling into void" special caseLibravatar Junio C Hamano1-17/+18
Instead of having it as one of the three if/elseif/.. case arms, test the condition and handle this special case upfront. This makes it easier to follow the flow of logic. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-29merge: simplify code flowLibravatar Junio C Hamano1-8/+8
One of the first things cmd_merge() does is to see if the "--abort" option is given and run "reset --merge" and exit. When the control reaches this point, we know "--abort" was not given. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-12Merge branch 'mg/add-ignore-errors' into maintLibravatar Junio C Hamano1-1/+1
* mg/add-ignore-errors: add: ignore only ignored files
2014-12-22Merge branch 'jk/push-simple' into maintLibravatar Junio C Hamano1-4/+4
Git 2.0 was supposed to make the "simple" mode for the default of "git push", but it didn't. * jk/push-simple: push: truly use "simple" as default, not "upstream"
2014-12-22Merge branch 'mh/config-flip-xbit-back-after-checking' into maintLibravatar Junio C Hamano1-1/+2
"git init" (hence "git clone") initialized the per-repository configuration file .git/config with x-bit by mistake. * mh/config-flip-xbit-back-after-checking: create_default_files(): don't set u+x bit on $GIT_DIR/config
2014-12-22Merge branch 'rs/receive-pack-use-labs' into maintLibravatar Junio C Hamano1-1/+1
* rs/receive-pack-use-labs: use labs() for variables of type long instead of abs()
2014-12-22Merge branch 'jk/colors-fix' into maintLibravatar Junio C Hamano1-14/+13
"git config --get-color" did not parse its command line arguments carefully. * jk/colors-fix: t4026: test "normal" color config: fix parsing of "git config --get-color some.key -1" docs: describe ANSI 256-color mode
2014-12-22Merge branch 'jk/checkout-from-tree' into maintLibravatar Junio C Hamano1-0/+18
"git checkout $treeish $path", when $path in the index and the working tree already matched what is in $treeish at the $path, still overwrote the $path unnecessarily. * jk/checkout-from-tree: checkout $tree: do not throw away unchanged index entries
2014-12-22clean: typofixLibravatar Alexander Kuleshov1-1/+1
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-30push: truly use "simple" as default, not "upstream"Libravatar Jeff King1-4/+4
The plan for the push.default transition had all along been to use the "simple" method rather than "upstream" as a default if the user did not specify their own push.default value. Commit 11037ee (push: switch default from "matching" to "simple", 2013-01-04) tried to implement that by moving PUSH_DEFAULT_UNSPECIFIED in our switch statement to fall-through to the PUSH_DEFAULT_SIMPLE case. When the commit that became 11037ee was originally written, that would have been enough. We would fall through to calling setup_push_upstream() with the "simple" parameter set to 1. However, it was delayed for a while until we were ready to make the transition in Git 2.0. And in the meantime, commit ed2b182 (push: change `simple` to accommodate triangular workflows, 2013-06-19) threw a monkey wrench into the works. That commit drops the "simple" parameter to setup_push_upstream, and instead checks whether the global "push_default" is PUSH_DEFAULT_SIMPLE. This is right when the user has explicitly configured push.default to simple, but wrong when we are a fall-through for the "unspecified" case. We never noticed because our push.default tests do not cover the case of the variable being totally unset; they only check the "simple" behavior itself. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-21add: ignore only ignored filesLibravatar Michael J Gruber1-1/+1
"git add foo bar" adds neither foo nor bar when bar is ignored, but dies to let the user recheck their command invocation. This becomes less helpful when "git add foo.*" is subject to shell expansion and some of the expanded files are ignored. "git add --ignore-errors" is supposed to ignore errors when indexing some files and adds the others. It does ignore errors from actual indexing attempts, but does not ignore the error "file is ignored" as outlined above. This is unexpected. Change "git add foo bar" to add foo when bar is ignored, but issue a warning and return a failure code as before the change. That is, in the case of trying to add ignored files we now act the same way (with or without "--ignore-errors") in which we act for more severe indexing errors when "--ignore-errors" is specified. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-20config: fix parsing of "git config --get-color some.key -1"Libravatar Jeff King1-14/+13
Most of git-config's command line options use OPT_BIT to choose an action, and then parse the non-option arguments in a context-dependent way. However, --get-color and --get-colorbool are unlike the rest of the options, in that they are OPT_STRING, taking the option name as a parameter. This generally works, because we then use the presence of those strings to set an action bit anyway. But it does mean that the option-parser will continue looking for options even after the key (because it is not a non-option; it is an argument to an option). And running: git config --get-color some.key -1 (to use "-1" as the default color spec) will barf, claiming that "-1" is not an option. Instead, we should treat --get-color and --get-colorbool as action bits, just like --add, --get, and all the other actions, and then check that the non-option arguments we got are sane. This fixes the weirdness above, and makes those two options like all the others. This "fixes" a test in t4026, which checked that feeding "-2" as a color should fail (it does fail, but prior to this patch, because parseopt barfed, not because we actually ever tried to parse the color). This also catches other errors, like: git config --get-color some.key black blue which previously silently ignored "blue" (and now will complain that you gave too many arguments). There are some possible regressions, though. We now disallow these, which currently do what you would expect: # specifying other options after the action git config --get-color some.key --file whatever # using long-arg syntax git config --get-color=some.key However, we have never advertised these in the documentation, and in fact they did not work in some older versions of git. The behavior was apparently switched as an accidental side effect of d64ec16 (git config: reorganize to use parseopt, 2009-02-21). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-18create_default_files(): don't set u+x bit on $GIT_DIR/configLibravatar Michael Haggerty1-1/+2
Since time immemorial, the test of whether to set "core.filemode" has been done by trying to toggle the u+x bit on $GIT_DIR/config, which we know always exists, and then testing whether the change "took". I find it somewhat odd to use the config file for this test, but whatever. The test code didn't set the u+x bit back to its original state itself, instead relying on the subsequent call to git_config_set() to re-write the config file with correct permissions. But ever since daa22c6f8d config: preserve config file permissions on edits (2014-05-06) git_config_set() copies the permissions from the old config file to the new one. This is a good change in and of itself, but it invalidates the create_default_files()'s assumption, causing "git init" to leave the executable bit set on $GIT_DIR/config. Reset the permissions on $GIT_DIR/config when we are done with the test in create_default_files(). Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-17use labs() for variables of type long instead of abs()Libravatar René Scharfe1-1/+1
Using abs() on long values can cause truncation, so use labs() instead. Reported by Clang 3.5 (-Wabsolute-value, enabled by -Wall). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-13checkout $tree: do not throw away unchanged index entriesLibravatar Jeff King1-0/+18
When we "git checkout $tree", we pull paths from $tree into the index, and then check the resulting entries out to the worktree. Our method for the first step is rather heavy-handed, though; it clobbers the entire existing index entry, even if the content is the same. This means we lose our stat information, leading checkout_entry to later rewrite the entire file with identical content. Instead, let's see if we have the identical entry already in the index, in which case we leave it in place. That lets checkout_entry do the right thing. Our tests cover two interesting cases: 1. We make sure that a file which has no changes is not rewritten. 2. We make sure that we do update a file that is unchanged in the index (versus $tree), but has working tree changes. We keep the old index entry, and checkout_entry is able to realize that our stat information is out of date. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-11Merge branch 'rs/clean-menu-item-defn' into maintLibravatar Junio C Hamano1-1/+1
* rs/clean-menu-item-defn: clean: use f(void) instead of f() to declare a pointer to a function without arguments
2014-11-06Merge branch 'jk/fetch-reflog-df-conflict'Libravatar Junio C Hamano1-1/+1
Corner-case bugfixes for "git fetch" around reflog handling. * jk/fetch-reflog-df-conflict: ignore stale directories when checking reflog existence fetch: load all default config at startup
2014-11-04fetch: load all default config at startupLibravatar Jeff King1-1/+1
When we start the git-fetch program, we call git_config to load all config, but our callback only processes the fetch.prune option; we do not chain to git_default_config at all. This means that we may not load some core configuration which will have an effect. For instance, we do not load core.logAllRefUpdates, which impacts whether or not we create reflogs in a bare repository. Note that I said "may" above. It gets even more exciting. If we have to transfer actual objects as part of the fetch, then we call fetch_pack as part of the same process. That function loads its own config, which does chain to git_default_config, impacting global variables which are used by the rest of fetch. But if the fetch is a pure ref update (e.g., a new ref which is a copy of an old one), we skip fetch_pack entirely. So we get inconsistent results depending on whether or not we have actual objects to transfer or not! Let's just load the core config at the start of fetch, so we know we have it (we may also load it again as part of fetch_pack, but that's OK; it's designed to be idempotent). Our tests check both cases (with and without a pack). We also check similar behavior for push for good measure, but it already works as expected. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-31Merge branch 'jc/push-cert'Libravatar Junio C Hamano1-2/+2
* jc/push-cert: receive-pack: avoid minor leak in case start_async() fails
2014-10-29Merge branch 'jk/pack-objects-no-bitmap-when-splitting' into maintLibravatar Junio C Hamano1-0/+1
* jk/pack-objects-no-bitmap-when-splitting: pack-objects: turn off bitmaps when we split packs
2014-10-29Merge branch 'jk/prune-mtime'Libravatar Junio C Hamano7-199/+157
Tighten the logic to decide that an unreachable cruft is sufficiently old by covering corner cases such as an ancient object becoming reachable and then going unreachable again, in which case its retention period should be prolonged. * jk/prune-mtime: (28 commits) drop add_object_array_with_mode revision: remove definition of unused 'add_object' function pack-objects: double-check options before discarding objects repack: pack objects mentioned by the index pack-objects: use argv_array reachable: use revision machinery's --indexed-objects code rev-list: add --indexed-objects option rev-list: document --reflog option t5516: test pushing a tag of an otherwise unreferenced blob traverse_commit_list: support pending blobs/trees with paths make add_object_array_with_context interface more sane write_sha1_file: freshen existing objects pack-objects: match prune logic for discarding objects pack-objects: refactor unpack-unreachable expiration check prune: keep objects reachable from recent objects sha1_file: add for_each iterators for loose and packed objects count-objects: use for_each_loose_file_in_objdir count-objects: do not use xsize_t when counting object size prune-packed: use for_each_loose_file_in_objdir reachable: mark index blobs as SEEN ...
2014-10-28receive-pack: avoid minor leak in case start_async() failsLibravatar René Scharfe1-2/+2
If the asynchronous start of copy_to_sideband() fails, then any env_array entries added to struct child_process proc by prepare_push_cert_sha1() are leaked. Call the latter function only after start_async() succeeded so that the allocated entries are cleaned up automatically by start_command() or finish_command(). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-24Merge branch 'jc/push-cert'Libravatar Junio C Hamano1-1/+12
* jc/push-cert: push: heed user.signingkey for signed pushes
2014-10-24Merge branch 'eb/no-pthreads'Libravatar Junio C Hamano3-7/+7
Allow us build with NO_PTHREADS=NoThanks compilation option. * eb/no-pthreads: Handle atexit list internaly for unthreaded builds pack-objects: set number of threads before checking and warning index-pack: fix compilation with NO_PTHREADS
2014-10-24Merge branch 'rs/run-command-env-array'Libravatar Junio C Hamano1-9/+14
Add managed "env" array to child_process to clarify the lifetime rules. * rs/run-command-env-array: use env_array member of struct child_process run-command: add env_array, an optional argv_array for env
2014-10-24Merge branch 'jk/pack-objects-no-bitmap-when-splitting'Libravatar Junio C Hamano1-0/+1
Splitting pack-objects output into multiple packs is incompatible with the use of reachability bitmap. * jk/pack-objects-no-bitmap-when-splitting: pack-objects: turn off bitmaps when we split packs
2014-10-24push: heed user.signingkey for signed pushesLibravatar Michael J Gruber1-1/+12
push --signed promises to take user.signingkey as the signing key but fails to read the config. Make it do so. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-21Merge branch 'rs/ref-transaction'Libravatar Junio C Hamano19-57/+97
The API to update refs have been restructured to allow introducing a true transactional updates later. We would even allow storing refs in backends other than the traditional filesystem-based one. * rs/ref-transaction: (25 commits) ref_transaction_commit: bail out on failure to remove a ref lockfile: remove unable_to_lock_error refs.c: do not permit err == NULL remote rm/prune: print a message when writing packed-refs fails for-each-ref: skip and warn about broken ref names refs.c: allow listing and deleting badly named refs test: put tests for handling of bad ref names in one place packed-ref cache: forbid dot-components in refnames branch -d: simplify by using RESOLVE_REF_READING branch -d: avoid repeated symref resolution reflog test: test interaction with detached HEAD refs.c: change resolve_ref_unsafe reading argument to be a flags field refs.c: make write_ref_sha1 static fetch.c: change s_update_ref to use a ref transaction refs.c: ref_transaction_commit: distinguish name conflicts from other errors refs.c: pass a list of names to skip to is_refname_available refs.c: call lock_ref_sha1_basic directly from commit refs.c: refuse to lock badly named refs in lock_ref_sha1_basic rename_ref: don't ask read_ref_full where the ref came from refs.c: pass the ref log message to _create/delete/update instead of _commit ...
2014-10-20Merge branch 'cc/interpret-trailers'Libravatar Junio C Hamano1-0/+44
A new filter to programatically edit the tail end of the commit log messages. * cc/interpret-trailers: Documentation: add documentation for 'git interpret-trailers' trailer: add tests for commands in config file trailer: execute command from 'trailer.<name>.command' trailer: add tests for "git interpret-trailers" trailer: add interpret-trailers command trailer: put all the processing together and print trailer: parse trailers from file or stdin trailer: process command line trailer arguments trailer: read and process config information trailer: process trailers from input message and arguments trailer: add data structures and basic functions
2014-10-20Merge branch 'jn/parse-config-slot'Libravatar Junio C Hamano6-30/+31
Code cleanup. * jn/parse-config-slot: color_parse: do not mention variable name in error message pass config slots as pointers instead of offsets
2014-10-20Merge branch 'rs/receive-pack-argv-leak-fix'Libravatar Junio C Hamano1-10/+8
* rs/receive-pack-argv-leak-fix: receive-pack: plug minor memory leak in unpack()
2014-10-19Handle atexit list internaly for unthreaded buildsLibravatar Etienne Buira1-5/+0
Wrap atexit()s calls on unthreaded builds to handle callback list internally. This is needed because on unthreaded builds, asyncs inherits parent's atexit() list, that gets run as soon as the async exit()s (and again at the end of async's parent process). That led to remove temporary files too early. Also remove a by-atexit-callback guard against this kind of issue in clone.c, as this patch makes it redundant. Fixes test 5537 (temporary shallow file vanished before unpack-objects could open it) BTW remove an unused variable in shallow.c. Helped-by: Duy Nguyen <pclouds@gmail.com> Helped-by: Andreas Schwab <schwab@linux-m68k.org> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Etienne Buira <etienne.buira@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19use env_array member of struct child_processLibravatar René Scharfe1-9/+14
Convert users of struct child_process to using the managed env_array for specifying environment variables instead of supplying an array on the stack or bringing their own argv_array. This shortens and simplifies the code and ensures automatically that the allocated memory is freed after use. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19pack-objects: turn off bitmaps when we split packsLibravatar Jeff King1-0/+1
If a pack.packSizeLimit is set, we may split the pack data across multiple packfiles. This means we cannot generate .bitmap files, as they require that all of the reachable objects are in the same pack. We check that condition when we are generating the list of objects to pack (and disable bitmaps if we are not packing everything), but we forgot to update it when we notice that we needed to split (which doesn't happen until the actual write phase). The resulting bitmaps are quite bogus (they mention entries that do not exist in the pack!) and can cause a fetch or push to send insufficient objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19pack-objects: double-check options before discarding objectsLibravatar Jeff King1-0/+2
When we are given an expiration time like --unpack-unreachable=2.weeks.ago, we avoid writing out old, unreachable loose objects entirely, under the assumption that running "prune" would simply delete them immediately anyway. However, this is only valid if we computed the same set of reachable objects as prune would. In practice, this is the case, because only git-repack uses the --unpack-unreachable option with an expiration, and it always feeds as many objects into the pack as possible. But we can double-check at runtime just to be sure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19repack: pack objects mentioned by the indexLibravatar Jeff King2-0/+9
When we pack all objects, we use only the objects reachable from references and reflogs. This misses any objects which are reachable from the index, but not yet referenced. By itself this isn't a big deal; the objects can remain loose until they are actually used in a commit. However, it does create a problem when we drop packed but unreachable objects. We try to optimize out the writing of objects that we will immediately prune, which means we must follow the same rules as prune in determining what is reachable. And prune uses the index for this purpose. This is rather uncommon in practice, as objects in the index would not usually have been packed in the first place. But it could happen in a sequence like: 1. You make a commit on a branch that references blob X. 2. You repack, moving X into the pack. 3. You delete the branch (and its reflog), so that X is unreferenced. 4. You "git add" blob X so that it is now referenced only by the index. 5. You repack again with git-gc. The pack-objects we invoke will see that X is neither referenced nor recent and not bother loosening it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19pack-objects: use argv_arrayLibravatar Jeff King1-10/+10
This saves us from having to bump the rp_av count when we add new traversal options. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16Merge branch 'po/everyday-doc'Libravatar Junio C Hamano1-0/+1
"git help everyday" to show the Everyday Git document. * po/everyday-doc: doc: add 'everyday' to 'git help' doc: Makefile regularise OBSOLETE_HTML list building doc: modernise everyday.txt wording and format in man page style
2014-10-16make add_object_array_with_context interface more saneLibravatar Jeff King1-4/+4
When you resolve a sha1, you can optionally keep any context found during the resolution, including the path and mode of a tree entry (e.g., when looking up "HEAD:subdir/file.c"). The add_object_array_with_context function lets you then attach that context to an entry in a list. Unfortunately, the interface for doing so is horrible. The object_context structure is large and most object_array users do not use it. Therefore we keep a pointer to the structure to avoid burdening other users too much. But that means when we do use it that we must allocate the struct ourselves. And the struct contains a fixed PATH_MAX-sized buffer, which makes this wholly unsuitable for any large arrays. We can observe that there is only a single user of the "with_context" variant: builtin/grep.c. And in that use case, the only element we care about is the path. We can therefore store only the path as a pointer (the context's mode field was redundant with the object_array_entry itself, and nobody actually cared about the surrounding tree). This still requires a strdup of the pathname, but at least we are only consuming the minimum amount of memory for each string. We can also handle the copying ourselves in add_object_array_*, and free it as appropriate in object_array_release_entry. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16pack-objects: match prune logic for discarding objectsLibravatar Jeff King1-0/+39
A recent commit taught git-prune to keep non-recent objects that are reachable from recent ones. However, pack-objects, when loosening unreachable objects, tries to optimize out the write in the case that the object will be immediately pruned. It now gets this wrong, since its rule does not reflect the new prune code (and this can be seen by running t6501 with a strategically placed repack). Let's teach pack-objects similar logic. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16pack-objects: refactor unpack-unreachable expiration checkLibravatar Jeff King1-5/+12
When we are loosening unreachable packed objects, we do not bother to process objects that would simply be pruned immediately anyway. The "would be pruned" check is a simple comparison, but is about to get more complicated. Let's pull it out into a separate function. Note that this is slightly less efficient than the original, which avoided even opening old packs, since no object in them could pass the current check, which cares only about the pack mtime. But the new rules will depend on the exact object, so we need to perform the check even for old packs. Note also that we fix a minor buglet when the pack mtime is exactly the same as the expiration time. The prune code considers that worth pruning, whereas our check here considered it worth keeping. This wasn't a big deal. Besides being unlikely to happen, the result was simply that the object was loosened and then pruned, missing the optimization. Still, we can easily fix it while we are here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16prune: keep objects reachable from recent objectsLibravatar Jeff King2-2/+2
Our current strategy with prune is that an object falls into one of three categories: 1. Reachable (from ref tips, reflogs, index, etc). 2. Not reachable, but recent (based on the --expire time). 3. Not reachable and not recent. We keep objects from (1) and (2), but prune objects in (3). The point of (2) is that these objects may be part of an in-progress operation that has not yet updated any refs. However, it is not always the case that objects for an in-progress operation will have a recent mtime. For example, the object database may have an old copy of a blob (from an abandoned operation, a branch that was deleted, etc). If we create a new tree that points to it, a simultaneous prune will leave our tree, but delete the blob. Referencing that tree with a commit will then work (we check that the tree is in the object database, but not that all of its referred objects are), as will mentioning the commit in a ref. But the resulting repo is corrupt; we are missing the blob reachable from a ref. One way to solve this is to be more thorough when referencing a sha1: make sure that not only do we have that sha1, but that we have objects it refers to, and so forth recursively. The problem is that this is very expensive. Creating a parent link would require traversing the entire object graph! Instead, this patch pushes the extra work onto prune, which runs less frequently (and has to look at the whole object graph anyway). It creates a new category of objects: objects which are not recent, but which are reachable from a recent object. We do not prune these objects, just like the reachable and recent ones. This lets us avoid the recursive check above, because if we have an object, even if it is unreachable, we should have its referent. We can make a simple inductive argument that with this patch, this property holds (that there are no objects with missing referents in the repository): 0. When we have no objects, we have nothing to refer or be referred to, so the property holds. 1. If we add objects to the repository, their direct referents must generally exist (e.g., if you create a tree, the blobs it references must exist; if you create a commit to point at the tree, the tree must exist). This is already the case before this patch. And it is not 100% foolproof (you can make bogus objects using `git hash-object`, for example), but it should be the case for normal usage. Therefore for any sequence of object additions, the property will continue to hold. 2. If we remove objects from the repository, then we will not remove a child object (like a blob) if an object that refers to it is being kept. That is the part implemented by this patch. Note, however, that our reachability check and the actual pruning are not atomic. So it _is_ still possible to violate the property (e.g., an object becomes referenced just as we are deleting it). This patch is shooting for eliminating problems where the mtimes of dependent objects differ by hours or days, and one is dropped without the other. It does nothing to help with short races. Naively, the simplest way to implement this would be to add all recent objects as tips to the reachability traversal. However, this does not perform well. In a recently-packed repository, all reachable objects will also be recent, and therefore we have to look at each object twice. This patch instead performs the reachability traversal, then follows up with a second traversal for recent objects, skipping any that have already been marked. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16count-objects: use for_each_loose_file_in_objdirLibravatar Jeff King1-71/+30
This drops our line count considerably, and should make things more readable by keeping the counting logic separate from the traversal. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16count-objects: do not use xsize_t when counting object sizeLibravatar Jeff King1-1/+1
The point of xsize_t is to safely cast an off_t into a size_t (because we are about to mmap). But in count-objects, we are summing the sizes in an off_t. Using xsize_t means that count-objects could fail on a 32-bit system with a 4G object (not likely, as other parts of git would fail, but we should at least be correct here). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>