summaryrefslogtreecommitdiff
path: root/sha1_file.c
AgeCommit message (Collapse)AuthorFilesLines
2016-09-26unpack_sha1_header(): detect malformed object headerLibravatar Junio C Hamano1-2/+24
When opening a loose object file, we often do this sequence: - prepare a short buffer for the object header (on stack) - call unpack_sha1_header() and have early part of the object data inflated, enough to fill the buffer - parse that data in the short buffer, assuming that the first part of the object is <typename> SP <length> NUL Because the parsing function parse_sha1_header_extended() is not given the number of bytes inflated into the header buffer, it you craft a file whose early part inflates a garbage sequence without SP or NUL, and replace a loose object with it, it will end up reading past the end of the inflated data. To correct this, do the following four things: - rename unpack_sha1_header() to unpack_sha1_short_header() and have unpack_sha1_header_to_strbuf() keep calling that as its helper function. This will detect and report zlib errors, but is not aware of the format of a loose object (as before). - introduce unpack_sha1_header() that calls the same helper function, and when zlib reports it inflated OK into the buffer, check if the inflated data has NUL. This would ensure that parsing function will terminate within the buffer that holds the inflated header. - update unpack_sha1_header_to_strbuf() to check if the resulting buffer has NUL for the same effect. - update parse_sha1_header_extended() to make sure that its loop to find the SP that terminates the <typename> stops at NUL. Essentially, this makes unpack_*() functions that are asked to unpack a loose object header to be a bit more strict and detect an input that cannot possibly be a valid object header, even before the parsing function kicks in. Reported-by: Gustavo Grieco <gustavo.grieco@imag.fr> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04Sync with 2.4.9Libravatar Junio C Hamano1-6/+3
2015-09-04Sync with 2.3.9Libravatar Junio C Hamano1-6/+3
2015-09-04Sync with 2.2.3Libravatar Junio C Hamano1-6/+3
2015-09-04read_info_alternates: handle paths larger than PATH_MAXLibravatar Jeff King1-6/+3
This function assumes that the relative_base path passed into it is no larger than PATH_MAX, and writes into a fixed-size buffer. However, this path may not have actually come from the filesystem; for example, add_submodule_odb generates a path using a strbuf and passes it in. This is hard to trigger in practice, though, because the long submodule directory would have to exist on disk before we would try to open its info/alternates file. We can easily avoid the bug, though, by simply creating the filename on the heap. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-03Merge branch 'cb/open-noatime-clear-errno' into maintLibravatar Junio C Hamano1-1/+4
When trying to see that an object does not exist, a state errno leaked from our "first try to open a packfile with O_NOATIME and then if it fails retry without it" logic on a system that refuses O_NOATIME. This confused us and caused us to die, saying that the packfile is unreadable, when we should have just reported that the object does not exist in that packfile to the caller. * cb/open-noatime-clear-errno: git_open_noatime: return with errno=0 on success
2015-08-12git_open_noatime: return with errno=0 on successLibravatar Clemens Buchacher1-1/+4
In read_sha1_file_extended we die if read_object fails with a fatal error. We detect a fatal error if errno is non-zero and is not ENOENT. If the object could not be read because it does not exist, this is not considered a fatal error and we want to return NULL. Somewhere down the line, read_object calls git_open_noatime to open a pack index file, for example. We first try open with O_NOATIME. If O_NOATIME fails with EPERM, we retry without O_NOATIME. When the second open succeeds, errno is however still set to EPERM from the first attempt. When we finally determine that the object does not exist, read_object returns NULL and read_sha1_file_extended dies with a fatal error: fatal: failed to read object <sha1>: Operation not permitted Fix this by resetting errno to zero before we call open again. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Clemens Buchacher <clemens.buchacher@intel.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-27Merge branch 'jk/fix-refresh-utime' into maintLibravatar Junio C Hamano1-1/+9
Fix a small bug in our use of umask() return value. * jk/fix-refresh-utime: check_and_freshen_file: fix reversed success-check
2015-07-27Merge branch 'jk/index-pack-reduce-recheck' into maintLibravatar Junio C Hamano1-1/+3
Disable "have we lost a race with competing repack?" check while receiving a huge object transfer that runs index-pack. * jk/index-pack-reduce-recheck: index-pack: avoid excessive re-reading of pack directory
2015-07-10Merge branch 'jk/fix-refresh-utime'Libravatar Junio C Hamano1-1/+9
Fix a small bug in our use of umask() return value. * jk/fix-refresh-utime: check_and_freshen_file: fix reversed success-check
2015-07-09Merge branch 'jk/maint-for-each-packed-object'Libravatar Junio C Hamano1-1/+6
The for_each_packed_object() API function did not iterate over objects in a packfile that hasn't been used yet. * jk/maint-for-each-packed-object: for_each_packed_object: automatically open pack index
2015-07-08check_and_freshen_file: fix reversed success-checkLibravatar Jeff King1-1/+9
When we want to write out a loose object file, we have always first made sure we don't already have the object somewhere. Since 33d4221 (write_sha1_file: freshen existing objects, 2014-10-15), we also update the timestamp on the file, so that a simultaneous prune knows somebody is likely to reference it soon. If our utime() call fails, we treat this the same as not having the object in the first place; the safe thing to do is write out another copy. However, the loose-object check accidentally inverts the utime() check; it returns failure _only_ when the utime() call actually succeeded. Thus it was failing to protect us there, and in the normal case where utime() succeeds, it caused us to pointlessly write out and link the object. This passed our freshening tests, because writing out the new object is certainly _one_ way of updating its utime. So the normal case was inefficient, but not wrong. While we're here, let's also drop a comment in front of the check_and_freshen functions, making a note of their return type (since it is not our usual "0 for success, -1 for error"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-25Merge branch 'jk/diagnose-config-mmap-failure' into maintLibravatar Junio C Hamano1-4/+11
The configuration reader/writer uses mmap(2) interface to access the files; when we find a directory, it barfed with "Out of memory?". * jk/diagnose-config-mmap-failure: xmmap(): drop "Out of memory?" config.c: rewrite ENODEV into EISDIR when mmap fails config.c: avoid xmmap error messages config.c: fix mmap leak when writing config read-cache.c: drop PROT_WRITE from mmap of index
2015-06-24Merge branch 'jk/index-pack-reduce-recheck'Libravatar Junio C Hamano1-1/+3
Disable "have we lost a race with competing repack?" check while receiving a huge object transfer that runs index-pack. * jk/index-pack-reduce-recheck: index-pack: avoid excessive re-reading of pack directory
2015-06-22for_each_packed_object: automatically open pack indexLibravatar Jeff King1-1/+6
When for_each_packed_object is called, we call prepare_packed_git() to make sure we have the actual list of packs. But the latter does not actually open the pack indices, meaning that pack->nr_objects may simply be 0 if the pack has not otherwise been used since the program started. In practice, this didn't come up for the current callers, because they iterate the packed objects only after iterating all reachable objects (so for it to matter you would have to have a pack consisting only of unreachable objects). But it is a dangerous and confusing interface that should be fixed for future callers. Note that we do not end the iteration when a pack cannot be opened, but we do return an error. That lets you complete the iteration even in actively-repacked repository where an .idx file may racily go away, but it also lets callers know that they may not have gotten the complete list (which the current reachability-check caller does care about). We have to tweak one of the prune tests due to the changed return value; an earlier test creates bogus .idx files and does not clean them up. Having to make this tweak is a good thing; it means we will not prune in a broken repository, and the test confirms that we do not negatively impact a more lenient caller, count-objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-16Merge branch 'jh/filter-empty-contents' into maintLibravatar Junio C Hamano1-1/+1
The clean/smudge interface did not work well when filtering an empty contents (failed and then passed the empty input through). It can be argued that a filter that produces anything but empty for an empty input is nonsense, but if the user wants to do strange things, then why not? * jh/filter-empty-contents: sha1_file: pass empty buffer to index empty file
2015-06-11Merge branch 'jk/diagnose-config-mmap-failure'Libravatar Junio C Hamano1-4/+11
The configuration reader/writer uses mmap(2) interface to access the files; when we find a directory, it barfed with "Out of memory?". * jk/diagnose-config-mmap-failure: xmmap(): drop "Out of memory?" config.c: rewrite ENODEV into EISDIR when mmap fails config.c: avoid xmmap error messages config.c: fix mmap leak when writing config read-cache.c: drop PROT_WRITE from mmap of index
2015-06-09index-pack: avoid excessive re-reading of pack directoryLibravatar Jeff King1-1/+3
Since 45e8a74 (has_sha1_file: re-check pack directory before giving up, 2013-08-30), we spend extra effort for has_sha1_file to give the right answer when somebody else is repacking. Usually this effort does not matter, because after finding that the object does not exist, the next step is usually to die(). However, some code paths make a large number of has_sha1_file checks which are _not_ expected to return 1. The collision test in index-pack.c is such a case. On a local system, this can cause a performance slowdown of around 5%. But on a system with high-latency system calls (like NFS), it can be much worse. This patch introduces a "quick" flag to has_sha1_file which callers can use when they would prefer high performance at the cost of false negatives during repacks. There may be other code paths that can use this, but the index-pack one is the most obviously critical, so we'll start with switching that one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-05Merge branch 'jk/sha1-file-reduce-useless-warnings' into maintLibravatar Junio C Hamano1-3/+1
* jk/sha1-file-reduce-useless-warnings: sha1_file: squelch "packfile cannot be accessed" warnings
2015-06-01Merge branch 'jh/filter-empty-contents'Libravatar Junio C Hamano1-1/+1
The clean/smudge interface did not work well when filtering an empty contents (failed and then passed the empty input through). It can be argued that a filter that produces anything but empty for an empty input is nonsense, but if the user wants to do strange things, then why not? * jh/filter-empty-contents: sha1_file: pass empty buffer to index empty file
2015-05-28xmmap(): drop "Out of memory?"Libravatar Junio C Hamano1-1/+1
We show that message with die_errno(), but the OS is ought to know why mmap(2) failed much better than we do. There is no reason for us to say "Out of memory?" here. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-28config.c: avoid xmmap error messagesLibravatar Jeff King1-4/+11
The config-writing code uses xmmap to map the existing config file, which will die if the map fails. This has two downsides: 1. The error message is not very helpful, as it lacks any context about the file we are mapping: $ mkdir foo $ git config --file=foo some.key value fatal: Out of memory? mmap failed: No such device 2. We normally do not die in this code path; instead, we'd rather report the error and return an appropriate exit status (which is part of the public interface documented in git-config.1). This patch introduces a "gentle" form of xmmap which lets us produce our own error message. We do not want to use mmap directly, because we would like to use the other compatibility elements of xmmap (e.g., handling 0-length maps portably). The end result is: $ git.compile config --file=foo some.key value error: unable to mmap 'foo': No such device $ echo $? 3 Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-26Merge branch 'jc/hash-object' into maintLibravatar Junio C Hamano1-4/+22
"hash-object --literally" introduced in v2.2 was not prepared to take a really long object type name. * jc/hash-object: write_sha1_file(): do not use a separate sha1[] array t1007: add hash-object --literally tests hash-object --literally: fix buffer overrun with extra-long object type git-hash-object.txt: document --literally option
2015-05-19Merge branch 'kn/cat-file-literally'Libravatar Junio C Hamano1-22/+106
Add the "--allow-unknown-type" option to "cat-file" to allow inspecting loose objects of an experimental or a broken type. * kn/cat-file-literally: t1006: add tests for git cat-file --allow-unknown-type cat-file: teach cat-file a '--allow-unknown-type' option cat-file: make the options mutually exclusive sha1_file: support reading from a loose object of unknown type
2015-05-18sha1_file: pass empty buffer to index empty fileLibravatar Jim Hill1-1/+1
`git add` of an empty file with a filter pops complaints from `copy_fd` about a bad file descriptor. This traces back to these lines in sha1_file.c:index_core: if (!size) { ret = index_mem(sha1, NULL, size, type, path, flags); The problem here is that content to be added to the index can be supplied from an fd, or from a memory buffer, or from a pathname. This call is supplying a NULL buffer pointer and a zero size. Downstream logic takes the complete absence of a buffer to mean the data is to be found elsewhere -- for instance, these, from convert.c: if (params->src) { write_err = (write_in_full(child_process.in, params->src, params->size) < 0); } else { write_err = copy_fd(params->fd, child_process.in); } ~If there's a buffer, write from that, otherwise the data must be coming from an open fd.~ Perfectly reasonable logic in a routine that's going to write from either a buffer or an fd. So change `index_core` to supply an empty buffer when indexing an empty file. There's a patch out there that instead changes the logic quoted above to take a `-1` fd to mean "use the buffer", but it seems to me that the distinction between a missing buffer and an empty one carries intrinsic semantics, where the logic change is adapting the code to handle incorrect arguments. Signed-off-by: Jim Hill <gjthill@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-13Merge branch 'jk/prune-mtime' into maintLibravatar Junio C Hamano1-4/+16
Access to objects in repositories that borrow from another one on a slow NFS server unnecessarily got more expensive due to recent code becoming more cautious in a naive way not to lose objects to pruning. * jk/prune-mtime: sha1_file: only freshen packs once per run sha1_file: freshen pack objects before loose reachable: only mark local objects as recent
2015-05-11Merge branch 'jc/hash-object'Libravatar Junio C Hamano1-4/+22
"hash-object --literally" introduced in v2.2 was not prepared to take a really long object type name. * jc/hash-object: write_sha1_file(): do not use a separate sha1[] array t1007: add hash-object --literally tests hash-object --literally: fix buffer overrun with extra-long object type git-hash-object.txt: document --literally option
2015-05-11Merge branch 'jk/sha1-file-reduce-useless-warnings'Libravatar Junio C Hamano1-3/+1
* jk/sha1-file-reduce-useless-warnings: sha1_file: squelch "packfile cannot be accessed" warnings
2015-05-11Merge branch 'nd/multiple-work-trees'Libravatar Junio C Hamano1-1/+1
A replacement for contrib/workdir/git-new-workdir that does not rely on symbolic links and make sharing of objects and refs safer by making the borrowee and borrowers aware of each other. * nd/multiple-work-trees: (41 commits) prune --worktrees: fix expire vs worktree existence condition t1501: fix test with split index t2026: fix broken &&-chain t2026 needs procondition SANITY git-checkout.txt: a note about multiple checkout support for submodules checkout: add --ignore-other-wortrees checkout: pass whole struct to parse_branchname_arg instead of individual flags git-common-dir: make "modules/" per-working-directory directory checkout: do not fail if target is an empty directory t2025: add a test to make sure grafts is working from a linked checkout checkout: don't require a work tree when checking out into a new one git_path(): keep "info/sparse-checkout" per work-tree count-objects: report unused files in $GIT_DIR/worktrees/... gc: support prune --worktrees gc: factor out gc.pruneexpire parsing code gc: style change -- no SP before closing parenthesis checkout: clean up half-prepared directories in --to mode checkout: reject if the branch is already checked out elsewhere prune: strategies for linked checkouts checkout: support checking out into a new working directory ...
2015-05-06sha1_file: support reading from a loose object of unknown typeLibravatar Karthik Nayak1-22/+106
Update sha1_loose_object_info() to optionally allow it to read from a loose object file of unknown/bogus type; as the function usually returns the type of the object it read in the form of enum for known types, add an optional "typename" field to receive the name of the type in textual form and a flag to indicate the reading of a loose object file of unknown/bogus type. Add parse_sha1_header_extended() which acts as a wrapper around parse_sha1_header() allowing more information to be obtained. Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of unknown/corrupt objects which have a unknown sha1 header size to a strbuf structure. This was written by Junio C Hamano but tested by me. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Hepled-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-05Merge branch 'jk/prune-mtime'Libravatar Junio C Hamano1-4/+16
Access to objects in repositories that borrow from another one on a slow NFS server unnecessarily got more expensive due to recent code becoming more cautious in a naive way not to lose objects to pruning. * jk/prune-mtime: sha1_file: only freshen packs once per run sha1_file: freshen pack objects before loose reachable: only mark local objects as recent
2015-05-05write_sha1_file(): do not use a separate sha1[] arrayLibravatar Junio C Hamano1-4/+1
In the beginning, write_sha1_file() did not have a way to tell the caller the name of the object it wrote to the caller. This was changed in d6d3f9d0 (This implements the new "recursive tree" write-tree., 2005-04-09) by adding the "returnsha1" parameter to the function so that the callers who are interested in the value can optionally pass a pointer to receive it. It turns out that all callers do want to know the name of the object it just has written. Nobody passes a NULL to this parameter, hence it is not necessary to use a separate sha1[] array to receive the result from write_sha1_file_prepare(), and copy the result to the returnsha1 supplied by the caller. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-05hash-object --literally: fix buffer overrun with extra-long object typeLibravatar Eric Sunshine1-0/+21
"hash-object" learned in 5ba9a93 (hash-object: add --literally option, 2014-09-11) to allow crafting a corrupt/broken object of unknown type. When the user-provided type is particularly long, however, it can overflow the relatively small stack-based character array handed to write_sha1_file_prepare() by hash_sha1_file() and write_sha1_file(), leading to stack corruption (and crash). Introduce a custom helper to allow arbitrarily long typenames just for "hash-object --literally". [jc: Eric's original used a strbuf in the more common codepaths, and I rewrote it to avoid penalizing the non-literally code. Bugs are mine] Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-20sha1_file: only freshen packs once per runLibravatar Jeff King1-1/+8
Since 33d4221 (write_sha1_file: freshen existing objects, 2014-10-15), we update the mtime of existing objects that we would have written out (had they not existed). For the common case in which many objects are packed, we may update the mtime on a single packfile repeatedly. This can result in a noticeable performance problem if calling utime() is expensive (e.g., because your storage is on NFS). We can fix this by keeping a per-pack flag that lets us freshen only once per program invocation. An alternative would be to keep the packed_git.mtime flag up to date as we freshen, and freshen only once every N seconds. In practice, it's not worth the complexity. We are racing against prune expiration times here, which inherently must be set to accomodate reasonable program running times (because they really care about the time between an object being written and it becoming referenced, and the latter is typically the last step a program takes). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-20sha1_file: freshen pack objects before looseLibravatar Jeff King1-1/+1
When writing out an object file, we first check whether it already exists and if so optimize out the write. Prior to 33d4221, we did this by calling has_sha1_file(), which will check for packed objects followed by loose. Since that commit, we check loose objects first. For the common case of a repository whose objects are mostly packed, this means we will make a lot of extra access() system calls checking for loose objects. We should follow the same packed-then-loose order that all of our other lookups use. Reported-by: Stefan Saasen <ssaasen@atlassian.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-20reachable: only mark local objects as recentLibravatar Jeff King1-2/+7
When pruning and repacking a repository that has an alternate object store configured, we may traverse a large number of objects in the alternate. This serves no purpose, and may be expensive to do. A longer explanation is below. Commits d3038d2 and abcb865 taught prune and pack-objects (respectively) to treat "recent" objects as tips for reachability, so that we keep whole chunks of history. They built on the object traversal in 660c889 (sha1_file: add for_each iterators for loose and packed objects, 2014-10-15), which covers both local and alternate objects. In both cases, covering alternate objects is unnecessary, as both commands can only drop objects from the local repository. In the case of prune, we traverse only the local object directory. And in the case of repacking, while we may or may not include local objects in our pack, we will never reach into the alternate with "repack -d". The "-l" option is only a question of whether we are migrating objects from the alternate into our repository, or leaving them untouched. It is possible that we may drop an object that is depended upon by another object in the alternate. For example, imagine two repositories, A and B, with A pointing to B as an alternate. Now imagine a commit that is in B which references a tree that is only in A. Traversing from recent objects in B might prevent A from dropping that tree. But this case isn't worth covering. Repo B should take responsibility for its own objects. It would never have had the commit in the first place if it did not also have the tree, and assuming it is using the same "keep recent chunks of history" scheme, then it would itself keep the tree, as well. So checking the alternate objects is not worth doing, and come with a significant performance impact. In both cases, we skip any recent objects that have already been marked SEEN (i.e., that we know are already reachable for prune, or included in the pack for a repack). So there is a slight waste of time in opening the alternate packs at all, only to notice that we have already considered each object. But much worse, the alternate repository may have a large number of objects that are not reachable from the local repository at all, and we end up adding them to the traversal. We can fix this by considering only local unseen objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-30sha1_file: squelch "packfile cannot be accessed" warningsLibravatar Jeff King1-3/+1
When we find an object in a packfile index, we make sure we can still open the packfile itself (or that it is already open), as it might have been deleted by a simultaneous repack. If we can't access the packfile, we print a warning for the user and tell the caller that we don't have the object (we can then look in other packfiles, or find a loose version, before giving up). The warning we print to the user isn't really accomplishing anything, and it is potentially confusing to users. In the normal case, it is complete noise; we find the object elsewhere, and the user does not have to care that we racily saw a packfile index that became stale. It didn't affect the operation at all. A possibly more interesting case is when we later can't find the object, and report failure to the user. In this case the warning could be considered a clue toward that ultimate failure. But it's not really a useful clue in practice. We wouldn't even print it consistently (since we are racing with another process, we might not even see the .idx file, or we might win the race and open the packfile, completing the operation). This patch drops the warning entirely (not only from the fill_pack_entry site, but also from an identical use in pack-objects). If we did find the warning interesting in the error case, we could stuff it away and reveal it to the user when we later die() due to the broken object. But that complexity just isn't worth it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-23Merge branch 'rs/deflate-init-cleanup' into maintLibravatar Junio C Hamano1-1/+0
Code simplification. * rs/deflate-init-cleanup: zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-17Merge branch 'rs/deflate-init-cleanup'Libravatar Junio C Hamano1-1/+0
Code simplification. * rs/deflate-init-cleanup: zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-05zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}Libravatar René Scharfe1-1/+0
Clear the git_zstream variable at the start of git_deflate_init() etc. so that callers don't have to do that. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-05Merge branch 'jk/prune-mtime' into maintLibravatar Junio C Hamano1-13/+31
In v2.2.0, we broke "git prune" that runs in a repository that borrows from an alternate object store. * jk/prune-mtime: sha1_file: fix iterating loose alternate objects for_each_loose_file_in_objdir: take an optional strbuf path
2015-02-22Merge branch 'jk/prune-mtime'Libravatar Junio C Hamano1-13/+31
In v2.2.0, we broke "git prune" that runs in a repository that borrows from an alternate object store. * jk/prune-mtime: sha1_file: fix iterating loose alternate objects for_each_loose_file_in_objdir: take an optional strbuf path
2015-02-09sha1_file: fix iterating loose alternate objectsLibravatar Jonathon Mah1-3/+10
The string in 'base' contains a path suffix to a specific object; when its value is used, the suffix must either be filled (as in stat_sha1_file, open_sha1_file, check_and_freshen_nonlocal) or cleared (as in prepare_packed_git) to avoid junk at the end. 660c889e (sha1_file: add for_each iterators for loose and packed objects, 2014-10-15) introduced loose_from_alt_odb(), but this did neither and treated 'base' as a complete path to the "base" object directory, instead of a pointer to the "base" of the full path string. The trailing path after 'base' is still initialized to NUL, hiding the bug in some common cases. Additionally the descendent for_each_file_in_obj_subdir() function swallows ENOENT, so an error only shows if the alternate's path was last filled with a valid object (where statting /path/to/existing/00/0bjectfile/00 fails). Signed-off-by: Jonathon Mah <me@JonathonMah.com> Helped-by: Kyle J. McKay <mackyle@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-09for_each_loose_file_in_objdir: take an optional strbuf pathLibravatar Jeff King1-10/+21
We feed a root "objdir" path to this iterator function, which then copies the result into a strbuf, so that it can repeatedly append the object sub-directories to it. Let's make it easy for callers to just pass us a strbuf in the first place. We leave the original interface as a convenience for callers who want to just pass a const string like the result of get_object_directory(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01path.c: make get_pathname() call sites return const char *Libravatar Nguyễn Thái Ngọc Duy1-1/+1
Before the previous commit, get_pathname returns an array of PATH_MAX length. Even if git_path() and similar functions does not use the whole array, git_path() caller can, in theory. After the commit, get_pathname() may return a buffer that has just enough room for the returned string and git_path() caller should never write beyond that. Make git_path(), mkpath() and git_path_submodule() return a const buffer to make sure callers do not write in it at all. This could have been part of the previous commit, but the "const" conversion is too much distraction from the core changes in path.c. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-25sort_string_list(): rename to string_list_sort()Libravatar Michael Haggerty1-1/+1
The new name is more consistent with the names of other string_list-related functions. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-29Merge branch 'jk/prune-mtime'Libravatar Junio C Hamano1-11/+198
Tighten the logic to decide that an unreachable cruft is sufficiently old by covering corner cases such as an ancient object becoming reachable and then going unreachable again, in which case its retention period should be prolonged. * jk/prune-mtime: (28 commits) drop add_object_array_with_mode revision: remove definition of unused 'add_object' function pack-objects: double-check options before discarding objects repack: pack objects mentioned by the index pack-objects: use argv_array reachable: use revision machinery's --indexed-objects code rev-list: add --indexed-objects option rev-list: document --reflog option t5516: test pushing a tag of an otherwise unreferenced blob traverse_commit_list: support pending blobs/trees with paths make add_object_array_with_context interface more sane write_sha1_file: freshen existing objects pack-objects: match prune logic for discarding objects pack-objects: refactor unpack-unreachable expiration check prune: keep objects reachable from recent objects sha1_file: add for_each iterators for loose and packed objects count-objects: use for_each_loose_file_in_objdir count-objects: do not use xsize_t when counting object size prune-packed: use for_each_loose_file_in_objdir reachable: mark index blobs as SEEN ...
2014-10-16write_sha1_file: freshen existing objectsLibravatar Jeff King1-7/+44
When we try to write a loose object file, we first check whether that object already exists. If so, we skip the write as an optimization. However, this can interfere with prune's strategy of using mtimes to mark files in progress. For example, if a branch contains a particular tree object and is deleted, that tree object may become unreachable, and have an old mtime. If a new operation then tries to write the same tree, this ends up as a noop; we notice we already have the object and do nothing. A prune running simultaneously with this operation will see the object as old, and may delete it. We can solve this by "freshening" objects that we avoid writing by updating their mtime. The algorithm for doing so is essentially the same as that of has_sha1_file. Therefore we provide a new (static) interface "check_and_freshen", which finds and optionally freshens the object. It's trivial to implement freshening and simple checking by tweaking a single parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16sha1_file: add for_each iterators for loose and packed objectsLibravatar Jeff King1-0/+62
We typically iterate over the reachable objects in a repository by starting at the tips and walking the graph. There's no easy way to iterate over all of the objects, including unreachable ones. Let's provide a way of doing so. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16prune: factor out loose-object directory traversalLibravatar Jeff King1-0/+84
Prune has to walk $GIT_DIR/objects/?? in order to find the set of loose objects to prune. Other parts of the code (e.g., count-objects) want to do the same. Let's factor it out into a reusable for_each-style function. Note that this is not quite a straight code movement. The original code had strange behavior when it found a file of the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex digits. It executed a "break" from the loop, meaning that we stopped pruning in that directory (but still pruned other directories!). This was probably a bug; we do not want to process the file as an object, but we should keep going otherwise (and that is how the new code handles it). We are also a little more careful with loose object directories which fail to open. The original code silently ignored any failures, but the new code will complain about any problems besides ENOENT. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>