summaryrefslogtreecommitdiff
path: root/sha1_file.c
AgeCommit message (Collapse)AuthorFilesLines
2017-07-05Merge branch 'jt/unify-object-info'Libravatar Junio C Hamano1-189/+196
Code clean-ups. * jt/unify-object-info: sha1_file: refactor has_sha1_file_with_flags sha1_file: do not access pack if unneeded sha1_file: teach sha1_object_info_extended more flags sha1_file: refactor read_object sha1_file: move delta base cache code up sha1_file: rename LOOKUP_REPLACE_OBJECT sha1_file: rename LOOKUP_UNKNOWN_OBJECT sha1_file: teach packed_object_info about typename
2017-07-05Merge branch 'rs/sha1-name-readdir-optim'Libravatar Junio C Hamano1-15/+24
Optimize "what are the object names already taken in an alternate object database?" query that is used to derive the length of prefix an object name is uniquely abbreviated to. * rs/sha1-name-readdir-optim: sha1_file: guard against invalid loose subdirectory numbers sha1_file: let for_each_file_in_obj_subdir() handle subdir names p4205: add perf test script for pretty log formats sha1_name: cache readdir(3) results in find_short_object_filename()
2017-06-26sha1_file: refactor has_sha1_file_with_flagsLibravatar Jonathan Tan1-10/+2
has_sha1_file_with_flags() implements many mechanisms in common with sha1_object_info_extended(). Make has_sha1_file_with_flags() a convenience function for sha1_object_info_extended() instead. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26sha1_file: do not access pack if unneededLibravatar Jonathan Tan1-0/+11
Currently, regardless of the contents of the "struct object_info" passed to sha1_object_info_extended(), that function always accesses the packfile whenever it returns information about a packed object, since it needs to populate "u.packed". Add the ability to pass NULL, and use NULL-ness of the argument to activate an optimization in which sha1_object_info_extended() does not needlessly access the packfile. A subsequent patch will make use of this optimization. A similar optimization is not made for the cached and loose cases as it would not cause a significant performance improvement. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26sha1_file: teach sha1_object_info_extended more flagsLibravatar Jonathan Tan1-19/+24
Improve sha1_object_info_extended() by supporting additional flags. This allows has_sha1_file_with_flags() to be modified to use sha1_object_info_extended() in a subsequent patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24Merge branch 'ab/free-and-null'Libravatar Junio C Hamano1-2/+1
A common pattern to free a piece of memory and assign NULL to the pointer that used to point at it has been replaced with a new FREE_AND_NULL() macro. * ab/free-and-null: *.[ch] refactoring: make use of the FREE_AND_NULL() macro coccinelle: make use of the "expression" FREE_AND_NULL() rule coccinelle: add a rule to make "expression" code use FREE_AND_NULL() coccinelle: make use of the "type" FREE_AND_NULL() rule coccinelle: add a rule to make "type" code use FREE_AND_NULL() git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24Merge branch 'bw/config-h'Libravatar Junio C Hamano1-0/+1
Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h
2017-06-24sha1_file: guard against invalid loose subdirectory numbersLibravatar René Scharfe1-1/+4
Loose object subdirectories have hexadecimal names based on the first byte of the hash of contained objects, thus their numerical representation can range from 0 (0x00) to 255 (0xff). Change the type of the corresponding variable in for_each_file_in_obj_subdir() and associated callback functions to unsigned int and add a range check. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24sha1_file: let for_each_file_in_obj_subdir() handle subdir namesLibravatar René Scharfe1-8/+14
The function for_each_file_in_obj_subdir() takes a object subdirectory number and expects the name of the same subdirectory to be included in the path strbuf. Avoid this redundancy by letting the function append the hexadecimal subdirectory name itself. This makes it a bit easier and safer to use the function -- it becomes impossible to specify different subdirectories in subdir_nr and path. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-22sha1_name: cache readdir(3) results in find_short_object_filename()Libravatar René Scharfe1-6/+6
Read each loose object subdirectory at most once when looking for unique abbreviated hashes. This speeds up commands like "git log --pretty=%h" considerably, which previously caused one readdir(3) call for each candidate, even for subdirectories that were visited before. The new cache is kept until the program ends and never invalidated. The same is already true for pack indexes. The inherent racy nature of finding unique short hashes makes it still fit for this purpose -- a conflicting new object may be added at any time. Tasks with higher consistency requirements should not use it, though. The cached object names are stored in an oid_array, which is quite compact. The bitmap for remembering which subdir was already read is stored as a char array, with one char per directory -- that's not quite as compact, but really simple and incurs only an overhead equivalent to 11 hashes after all. Suggested-by: Jeff King <peff@peff.net> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21sha1_file: refactor read_objectLibravatar Jonathan Tan1-42/+42
read_object() and sha1_object_info_extended() both implement mechanisms such as object replacement, retrying the packed store after failing to find the object in the packed store then the loose store, and being able to mark a packed object as bad and then retrying the whole process. Consolidating these mechanisms would be a great help to maintainability. Therefore, consolidate them by extending sha1_object_info_extended() to support the functionality needed, and then modifying read_object() to use sha1_object_info_extended(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21sha1_file: move delta base cache code upLibravatar Jonathan Tan1-110/+110
In a subsequent patch, packed_object_info() will be modified to use the delta base cache, so move the relevant code to before packed_object_info(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21sha1_file: rename LOOKUP_REPLACE_OBJECTLibravatar Jonathan Tan1-5/+9
The LOOKUP_REPLACE_OBJECT flag controls whether the lookup_replace_object() function is invoked by sha1_object_info_extended(), read_sha1_file_extended(), and lookup_replace_object_extended(), but it is not immediately clear which functions accept that flag. Therefore restrict this flag to only sha1_object_info_extended(), renaming it appropriately to OBJECT_INFO_LOOKUP_REPLACE and adding some documentation. Update read_sha1_file_extended() to have a boolean parameter instead, and delete lookup_replace_object_extended(). parse_sha1_header() also passes this flag to parse_sha1_header_extended() since commit 46f0344 ("sha1_file: support reading from a loose object of unknown type", 2015-05-03), but that has had no effect since that commit. Therefore this patch also removes this flag from that invocation. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-21sha1_file: rename LOOKUP_UNKNOWN_OBJECTLibravatar Jonathan Tan1-2/+2
The LOOKUP_UNKNOWN_OBJECT flag was introduced in commit 46f0344 ("sha1_file: support reading from a loose object of unknown type", 2015-05-03) in order to support a feature in cat-file subsequently introduced in commit 39e4ae3 ("cat-file: teach cat-file a '--allow-unknown-type' option", 2015-05-03). Despite its name and location in cache.h, this flag is used neither in read_sha1_file_extended() nor in any of the lookup functions, but used only in sha1_object_info_extended(). Therefore rename this flag to OBJECT_INFO_ALLOW_UNKNOWN_TYPE, taking the name of the cat-file flag that invokes this feature, and move it closer to the declaration of sha1_object_info_extended(). Also add documentation for this flag. OBJECT_INFO_ALLOW_UNKNOWN_TYPE is defined to 2, not 1, to avoid conflicting with LOOKUP_REPLACE_OBJECT. Avoidance of this conflict is necessary because sha1_object_info_extended() supports both flags. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-16coccinelle: make use of the "type" FREE_AND_NULL() ruleLibravatar Ævar Arnfjörð Bjarmason1-2/+1
Apply the result of the just-added coccinelle rule. This manually excludes a few occurrences, mostly things that resulted in many FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15config: don't include config.h by defaultLibravatar Brandon Williams1-0/+1
Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15sha1_file: teach packed_object_info about typenameLibravatar Jonathan Tan1-17/+12
In commit 46f0344 ("sha1_file: support reading from a loose object of unknown type", 2015-05-06), "struct object_info" gained a "typename" field that could represent a type name from a loose object file, whether valid or invalid, as opposed to the existing "typep" which could only represent valid types. Some relatively complex manipulations were added to avoid breaking packed_object_info() without modifying it, but it is much easier to just teach packed_object_info() about the new field. Therefore, teach packed_object_info() as described above. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13convert: convert convert_to_git to take an indexLibravatar Brandon Williams1-2/+2
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13convert: convert convert_to_git_filter_fd to take an indexLibravatar Brandon Williams1-1/+1
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-23Merge branch 'jk/loose-object-fsck'Libravatar Junio C Hamano1-3/+0
Code cleanup. * jk/loose-object-fsck: sha1_file: remove an used fd variable
2017-04-19Merge branch 'jk/no-looking-at-dotgit-outside-repo'Libravatar Junio C Hamano1-0/+2
Clean up fallouts from recent tightening of the set-up sequence, where Git barfs when repository information is accessed without first ensuring that it was started in a repository. * jk/no-looking-at-dotgit-outside-repo: test-read-cache: setup git dir has_sha1_file: don't bother if we are not in a repository
2017-04-19Merge branch 'bc/object-id'Libravatar Junio C Hamano1-3/+3
Conversion from unsigned char [40] to struct object_id continues. * bc/object-id: Documentation: update and rename api-sha1-array.txt Rename sha1_array to oid_array Convert sha1_array_for_each_unique and for_each_abbrev to object_id Convert sha1_array_lookup to take struct object_id Convert remaining callers of sha1_array_lookup to object_id Make sha1_array_append take a struct object_id * sha1-array: convert internal storage for struct sha1_array to object_id builtin/pull: convert to struct object_id submodule: convert check_for_new_submodule_commits to object_id sha1_name: convert disambiguate_hint_fn to take object_id sha1_name: convert struct disambiguate_state to object_id test-sha1-array: convert most code to struct object_id parse-options-cb: convert sha1_array_append caller to struct object_id fsck: convert init_skiplist to struct object_id builtin/receive-pack: convert portions to struct object_id builtin/pull: convert portions to struct object_id builtin/diff: convert to struct object_id Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ Define new hash-size constants for allocating memory
2017-04-16Merge branch 'jk/loose-object-info-report-error'Libravatar Junio C Hamano1-1/+1
Update error handling for codepath that deals with corrupt loose objects. * jk/loose-object-info-report-error: index-pack: detect local corruption in collision check sha1_loose_object_info: return error for corrupted objects
2017-04-16sha1_file: remove an used fd variableLibravatar Sebastian Schuberth1-3/+0
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-13has_sha1_file: don't bother if we are not in a repositoryLibravatar Jonathan Nieder1-0/+2
Most callers to this function already require that they are in a git repository, but there is an exception: "git apply" uses has_sha1_file to avoid work if the result of applying a binary patch is already present in the repository. When run outside any repository, this produces an error: fatal: BUG: setup_git_env called without repository Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-01sha1_loose_object_info: return error for corrupted objectsLibravatar Jeff King1-1/+1
When sha1_loose_object_info() finds that a loose object file cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an error to the caller. However, if it found that the loose object file is corrupt and the object data cannot be used from it, it stuffs OBJ_BAD into "type" field of the object_info, but returns zero (i.e., success), which can confuse callers. This is due to 052fe5eac (sha1_loose_object_info: make type lookup optional, 2013-07-12), which switched the return to a strict success/error, rather than returning the type (but botched the return). Callers of regular sha1_object_info() don't notice the difference, as that function returns the type (which is OBJ_BAD in this case). However, direct callers of sha1_object_info_extended() see the function return success, but without setting any meaningful values in the object_info struct, leading them to access potentially uninitialized memory. The easiest way to see the bug is via "cat-file -s", which will happily ignore the corruption and report whatever value happened to be in the "size" variable. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-30convert unchecked snprintf into xsnprintfLibravatar Jeff King1-2/+2
These calls to snprintf should always succeed, because their input is small and fixed. Let's use xsnprintf to make sure this is the case (and to make auditing for actual truncation easier). These could be candidates for turning into heap buffers, but they fall into a few broad categories that make it not worth doing: - formatting single numbers is simple enough that we can see the result should fit - the size of a sha1 is likewise well-known, and I didn't want to cause unnecessary conflicts with the ongoing process to convert these constants to GIT_MAX_HEXSZ - the interface for curl_errorstr is dictated by curl Signed-off-by: Jeff King <peff@peff.net>
2017-03-28Merge branch 'jk/pack-name-cleanups' into maintLibravatar Junio C Hamano1-11/+6
Code clean-up. * jk/pack-name-cleanups: index-pack: make pointer-alias fallbacks safer replace snprintf with odb_pack_name() odb_pack_keep(): stop generating keepfile name sha1_file.c: make pack-name helper globally accessible move odb_* declarations out of git-compat-util.h
2017-03-26Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZLibravatar brian m. carlson1-2/+2
Since we will likely be introducing a new hash function at some point, and that hash function might be longer than 20 bytes, use the constant GIT_MAX_RAWSZ, which is designed to be suitable for allocations, instead of GIT_SHA1_RAWSZ. This will ease the transition down the line by distinguishing between places where we need to allocate memory suitable for the largest hash from those where we need to handle the current hash. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-26Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZLibravatar brian m. carlson1-1/+1
Since we will likely be introducing a new hash function at some point, and that hash function might be longer than 40 hex characters, use the constant GIT_MAX_HEXSZ, which is designed to be suitable for allocations, instead of GIT_SHA1_HEXSZ. This will ease the transition down the line by distinguishing between places where we need to allocate memory suitable for the largest hash from those where we need to handle the current hash. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-21Merge branch 'jk/pack-name-cleanups'Libravatar Junio C Hamano1-11/+6
Code clean-up. * jk/pack-name-cleanups: index-pack: make pointer-alias fallbacks safer replace snprintf with odb_pack_name() odb_pack_keep(): stop generating keepfile name sha1_file.c: make pack-name helper globally accessible move odb_* declarations out of git-compat-util.h
2017-03-21Merge branch 'rs/sha1-file-plug-fallback-base-leak' into maintLibravatar Junio C Hamano1-0/+4
A leak in a codepath to read from a packed object in (rare) cases has been plugged. * rs/sha1-file-plug-fallback-base-leak: sha1_file: release fallback base's memory in unpack_entry()
2017-03-17Merge branch 'bc/object-id'Libravatar Junio C Hamano1-8/+19
"uchar [40]" to "struct object_id" conversion continues. * bc/object-id: wt-status: convert to struct object_id builtin/merge-base: convert to struct object_id Convert object iteration callbacks to struct object_id sha1_file: introduce an nth_packed_object_oid function refs: simplify parsing of reflog entries refs: convert each_reflog_ent_fn to struct object_id reflog-walk: convert struct reflog_info to struct object_id builtin/replace: convert to struct object_id Convert remaining callers of resolve_refdup to object_id builtin/merge: convert to struct object_id builtin/clone: convert to struct object_id builtin/branch: convert to struct object_id builtin/grep: convert to struct object_id builtin/fmt-merge-message: convert to struct object_id builtin/fast-export: convert to struct object_id builtin/describe: convert to struct object_id builtin/diff-tree: convert to struct object_id builtin/commit: convert to struct object_id hex: introduce parse_oid_hex
2017-03-17Merge branch 'cc/split-index-config'Libravatar Junio C Hamano1-1/+1
The experimental "split index" feature has gained a few configuration variables to make it easier to use. * cc/split-index-config: (22 commits) Documentation/git-update-index: explain splitIndex.* Documentation/config: add splitIndex.sharedIndexExpire read-cache: use freshen_shared_index() in read_index_from() read-cache: refactor read_index_from() t1700: test shared index file expiration read-cache: unlink old sharedindex files config: add git_config_get_expiry() from gc.c read-cache: touch shared index files when used sha1_file: make check_and_freshen_file() non static Documentation/config: add splitIndex.maxPercentChange t1700: add tests for splitIndex.maxPercentChange read-cache: regenerate shared index if necessary config: add git_config_get_max_percent_split_change() Documentation/git-update-index: talk about core.splitIndex config var Documentation/config: add information for core.splitIndex t1700: add tests for core.splitIndex update-index: warn in case of split-index incoherency read-cache: add and then use tweak_split_index() split-index: add {add,remove}_split_index() functions config: add git_config_get_split_index() ...
2017-03-16sha1_file.c: make pack-name helper globally accessibleLibravatar Jeff King1-11/+6
We provide sha1_pack_name() and sha1_pack_index_name(), but the more generic form (which takes its own strbuf and an arbitrary extension) is only used to implement the other two. Let's make it available, but clean up a few things: 1. Name it odb_pack_name(), as the original sha1_get_pack_name() is long but not all that descriptive. 2. Switch the strbuf argument to the beginning, so that it matches similar path-building functions like git_path_buf(). 3. Clean up the out-dated docstring and move it to the public declaration. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-10Merge branch 'rs/sha1-file-plug-fallback-base-leak'Libravatar Junio C Hamano1-0/+4
A leak in a codepath to read from a packed object in (rare) cases has been plugged. * rs/sha1-file-plug-fallback-base-leak: sha1_file: release fallback base's memory in unpack_entry()
2017-03-01sha1_file: make check_and_freshen_file() non staticLibravatar Christian Couder1-1/+1
This function will be used in a commit soon, so let's make it available globally. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-27Merge branch 'mh/ref-remove-empty-directory'Libravatar Junio C Hamano1-1/+75
Deletion of a branch "foo/bar" could remove .git/refs/heads/foo once there no longer is any other branch whose name begins with "foo/", but we didn't do so so far. Now we do. * mh/ref-remove-empty-directory: (23 commits) files_transaction_commit(): clean up empty directories try_remove_empty_parents(): teach to remove parents of reflogs, too try_remove_empty_parents(): don't trash argument contents try_remove_empty_parents(): rename parameter "name" -> "refname" delete_ref_loose(): inline function delete_ref_loose(): derive loose reference path from lock log_ref_write_1(): inline function log_ref_setup(): manage the name of the reflog file internally log_ref_write_1(): don't depend on logfile argument log_ref_setup(): pass the open file descriptor back to the caller log_ref_setup(): improve robustness against races log_ref_setup(): separate code for create vs non-create log_ref_write(): inline function rename_tmp_log(): improve error reporting rename_tmp_log(): use raceproof_create_file() lock_ref_sha1_basic(): use raceproof_create_file() lock_ref_sha1_basic(): inline constant raceproof_create_file(): new function safe_create_leading_directories(): set errno on SCLD_EXISTS safe_create_leading_directories_const(): preserve errno ...
2017-02-27sha1_file: release fallback base's memory in unpack_entry()Libravatar René Scharfe1-0/+4
If a pack entry that's used as a delta base is corrupt, unpack_entry() marks it as unusable and then searches the object again in the hope that it can be found in another pack or in a loose file. The memory for this external base object is never released. Free it after use. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-22Convert object iteration callbacks to struct object_idLibravatar brian m. carlson1-6/+6
Convert each_loose_object_fn and each_packed_object_fn to take a pointer to struct object_id. Update the various callbacks. Convert several 40-based constants to use GIT_SHA1_HEXSZ. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-22sha1_file: introduce an nth_packed_object_oid functionLibravatar brian m. carlson1-3/+14
There are places in the code where we would like to provide a struct object_id *, yet read the hash directly from the pack. Provide an nth_packed_object_oid function that is similar to the nth_packed_object_sha1 function. In order to avoid a potentially invalid cast, nth_packed_object_oid provides a variable into which to store the value, which it returns on success; on error, it returns NULL, as nth_packed_object_sha1 does. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-31Merge branch 'jk/clear-delta-base-cache-fix'Libravatar Junio C Hamano1-5/+4
A crashing bug introduced in v2.11 timeframe has been found (it is triggerable only in fast-import) and fixed. * jk/clear-delta-base-cache-fix: clear_delta_base_cache(): don't modify hashmap while iterating
2017-01-31Merge branch 'jk/loose-object-fsck'Libravatar Junio C Hamano1-16/+164
"git fsck" inspects loose objects more carefully now. * jk/loose-object-fsck: fsck: detect trailing garbage in all object types fsck: parse loose object paths directly sha1_file: add read_loose_object() function t1450: test fsck of packed objects sha1_file: fix error message for alternate objects t1450: refactor loose-object removal
2017-01-19clear_delta_base_cache(): don't modify hashmap while iteratingLibravatar Jeff King1-5/+4
On Thu, Jan 19, 2017 at 03:03:46PM +0100, Ulrich Spörlein wrote: > > I suspect the patch below may fix things for you. It works around it by > > walking over the lru list (either is fine, as they both contain all > > entries, and since we're clearing everything, we don't care about the > > order). > > Confirmed. With the patch applied, I can import the whole 55G in one go > without any crashes or aborts. Thanks much! Thanks. Here it is rolled up with a commit message. -- >8 -- Subject: clear_delta_base_cache(): don't modify hashmap while iterating Removing entries while iterating causes fast-import to access an already-freed `struct packed_git`, leading to various confusing errors. What happens is that clear_delta_base_cache() drops the whole contents of the cache by iterating over the hashmap, calling release_delta_base_cache() on each entry. That function removes the item from the hashmap. The hashmap code may then shrink the table, but the hashmap_iter struct retains an offset from the old table. As a result, the next call to hashmap_iter_next() may claim that the iteration is done, even though some items haven't been visited. The only caller of clear_delta_base_cache() is fast-import, which wants to clear the cache because it is discarding the packed_git struct for its temporary pack. So by failing to remove all of the entries, we still have references to the freed packed_git. To make things even more confusing, this doesn't seem to trigger with the test suite, because it depends on complexities like the size of the hash table, which entries got cleared, whether we try to access them before they're evicted from the cache, etc. So I've been able to identify the problem with large imports like freebsd's svn import, or a fast-export of linux.git. But nothing that would be reasonable to run as part of the normal test suite. We can fix this easily by iterating over the lru linked list instead of the hashmap. They both contain the same entries, and we can use the "safe" variant of the list iterator, which exists for exactly this case. Let's also add a warning to the hashmap API documentation to reduce the chances of getting bit by this again. Reported-by: Ulrich Spörlein <uqs@freebsd.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-18Merge branch 'bw/grep-recurse-submodules'Libravatar Junio C Hamano1-1/+1
"git grep" has been taught to optionally recurse into submodules. * bw/grep-recurse-submodules: grep: search history of moved submodules grep: enable recurse-submodules to work on <tree> objects grep: optionally recurse into submodules grep: add submodules as a grep source type submodules: load gitmodules file from commit sha1 submodules: add helper to determine if a submodule is initialized submodules: add helper to determine if a submodule is populated real_path: canonicalize directory separators in root parts real_path: have callers use real_pathdup and strbuf_realpath real_path: create real_pathdup real_path: convert real_path_internal to strbuf_realpath real_path: resolve symlinks by hand
2017-01-17Merge branch 'jk/quote-env-path-list-component' into maintLibravatar Junio C Hamano1-11/+36
A recent update to receive-pack to make it easier to drop garbage objects made it clear that GIT_ALTERNATE_OBJECT_DIRECTORIES cannot have a pathname with a colon in it (no surprise!), and this in turn made it impossible to push into a repository at such a path. This has been fixed by introducing a quoting mechanism used when appending such a path to the colon-separated list. * jk/quote-env-path-list-component: t5615-alternate-env: double-quotes in file names do not work on Windows t5547-push-quarantine: run the path separator test on Windows, too tmp-objdir: quote paths we add to alternates alternates: accept double-quoted paths
2017-01-15fsck: detect trailing garbage in all object typesLibravatar Jeff King1-0/+5
When a loose tree or commit is read by fsck (or any git program), unpack_sha1_rest() checks whether there is extra cruft at the end of the object file, after the zlib data. Blobs that are streamed, however, do not have this check. For normal git operations, it's not a big deal. We know the sha1 and size checked out, so we have the object bytes we wanted. The trailing garbage doesn't affect what we're trying to do. But since the point of fsck is to find corruption or other problems, it should be more thorough. This patch teaches its loose-sha1 reader to detect extra bytes after the zlib stream and complain. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-15sha1_file: add read_loose_object() functionLibravatar Jeff King1-3/+130
It's surprisingly hard to ask the sha1_file code to open a _specific_ incarnation of a loose object. Most of the functions take a sha1, and loop over the various object types (packed versus loose) and locations (local versus alternates) at a low level. However, some tools like fsck need to look at a specific file. This patch gives them a function they can use to open the loose object at a given path. The implementation unfortunately ends up repeating bits of related functions, but there's not a good way around it without some major refactoring of the whole sha1_file stack. We need to mmap the specific file, then partially read the zlib stream to know whether we're streaming or not, and then finally either stream it or copy the data to a buffer. We can do that by assembling some of the more arcane internal sha1_file functions, but we end up having to essentially reimplement unpack_sha1_file(), along with the streaming bits of check_sha1_signature(). Still, most of the ugliness is contained in the new function, and the interface is clean enough that it may be reusable (though it seems unlikely anything but git-fsck would care about opening a specific file). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-15sha1_file: fix error message for alternate objectsLibravatar Jeff King1-15/+31
When we fail to open a corrupt loose object, we report an error and mention the filename via sha1_file_name(). However, that function will always give us a path in the local repository, whereas the corrupt object may have come from an alternate. The result is a very misleading error message. Teach the open_sha1_file() and stat_sha1_file() helpers to pass back the path they found, so that we can report it correctly. Note that the pointers we return go to static storage (e.g., from sha1_file_name()), which is slightly dangerous. However, these helpers are static local helpers, and the names are used for immediately generating error messages. The simplicity is an acceptable tradeoff for the danger. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-10Merge branch 'jc/git-open-cloexec'Libravatar Junio C Hamano1-27/+19
The codeflow of setting NOATIME and CLOEXEC on file descriptors Git opens has been simplified. We may want to drop the tip one, but we'll see. * jc/git-open-cloexec: sha1_file: stop opening files with O_NOATIME git_open_cloexec(): use fcntl(2) w/ FD_CLOEXEC fallback git_open(): untangle possible NOATIME and CLOEXEC interactions