Age | Commit message (Collapse) | Author | Files | Lines |
|
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (53 commits)
object: convert parse_object* to take struct object_id
tree: convert parse_tree_indirect to struct object_id
sequencer: convert do_recursive_merge to struct object_id
diff-lib: convert do_diff_cache to struct object_id
builtin/ls-tree: convert to struct object_id
merge: convert checkout_fast_forward to struct object_id
sequencer: convert fast_forward_to to struct object_id
builtin/ls-files: convert overlay_tree_on_cache to object_id
builtin/read-tree: convert to struct object_id
sha1_name: convert internals of peel_onion to object_id
upload-pack: convert remaining parse_object callers to object_id
revision: convert remaining parse_object callers to object_id
revision: rename add_pending_sha1 to add_pending_oid
http-push: convert process_ls_object and descendants to object_id
refs/files-backend: convert many internals to struct object_id
refs: convert struct ref_update to use struct object_id
ref-filter: convert some static functions to struct object_id
Convert struct ref_array_item to struct object_id
Convert the verify_pack callback to struct object_id
Convert lookup_tag to struct object_id
...
|
|
Some platforms have ulong that is smaller than time_t, and our
historical use of ulong for timestamp would mean they cannot
represent some timestamp that the platform allows. Invent a
separate and dedicated timestamp_t (so that we can distingiuish
timestamps and a vanilla ulongs, which along is already a good
move), and then declare uintmax_t is the type to be used as the
timestamp_t.
* js/larger-timestamps:
archive-tar: fix a sparse 'constant too large' warning
use uintmax_t for timestamps
date.c: abort if the system time cannot handle one of our timestamps
timestamp_t: a new data type for timestamps
PRItime: introduce a new "printf format" for timestamps
parse_timestamp(): specify explicitly where we parse timestamps
t0006 & t5000: skip "far in the future" test when time_t is too limited
t0006 & t5000: prepare for 64-bit timestamps
ref-filter: avoid using `unsigned long` for catch-all data type
|
|
Make parse_object, parse_object_or_die, and parse_object_buffer take a
pointer to struct object_id. Remove the temporary variables inserted
earlier, since they are no longer necessary. Transform all of the
callers using the following semantic patch:
@@
expression E1;
@@
- parse_object(E1.hash)
+ parse_object(&E1)
@@
expression E1;
@@
- parse_object(E1->hash)
+ parse_object(E1)
@@
expression E1, E2;
@@
- parse_object_or_die(E1.hash, E2)
+ parse_object_or_die(&E1, E2)
@@
expression E1, E2;
@@
- parse_object_or_die(E1->hash, E2)
+ parse_object_or_die(E1, E2)
@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1.hash, E2, E3, E4, E5)
+ parse_object_buffer(&E1, E2, E3, E4, E5)
@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1->hash, E2, E3, E4, E5)
+ parse_object_buffer(E1, E2, E3, E4, E5)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the verify_pack_callback take a pointer to struct object_id.
Change the pack checksum to use GIT_MAX_RAWSZ, even though it is not
strictly an object ID. Doing so ensures resilience against future hash
size changes, and allows us to remove hard-coded assumptions about how
big the buffer needs to be.
Also, use a union to convert the pointer from nth_packed_object_sha1 to
to a pointer to struct object_id. This behavior is compatible with GCC
and clang and explicitly sanctioned by C11. The alternatives are to
just perform a cast, which would run afoul of strict aliasing rules, but
should just work, and changing the pointer into an instance of struct
object_id and copying the value. The latter operation could seriously
bloat memory usage on fsck, which already uses a lot of memory on some
repositories.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert lookup_blob to take a pointer to struct object_id.
The commit was created with manual changes to blob.c and blob.h, plus
the following semantic patch:
@@
expression E1;
@@
- lookup_blob(E1.hash)
+ lookup_blob(&E1)
@@
expression E1;
@@
- lookup_blob(E1->hash)
+ lookup_blob(E1)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert the sha1 member of struct cache_tree to struct object_id by
changing the definition and applying the following semantic patch, plus
the standard object_id transforms:
@@
struct cache_tree E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_tree *E1;
@@
- E1->sha1
+ E1->oid.hash
Fix up one reference to active_cache_tree which was not automatically
caught by Coccinelle. These changes are prerequisites for converting
parse_object.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).
So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.
By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.
As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Currently, Git's source code treats all timestamps as if they were
unsigned longs. Therefore, it is okay to write "%lu" when printing them.
There is a substantial problem with that, though: at least on Windows,
time_t is *larger* than unsigned long, and hence we will want to switch
away from the ill-specified `unsigned long` data type.
So let's introduce the pseudo format "PRItime" (currently simply being
defined to "lu") to make it easier to change the data type used for
timestamps.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.
Teach fsck to force this verification.
The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.
These effect can be seen using t/perf/p0002-read-cache.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5%
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert each_loose_object_fn and each_packed_object_fn to take a pointer
to struct object_id. Update the various callbacks. Convert several
40-based constants to use GIT_SHA1_HEXSZ.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make each_reflog_ent_fn take two struct object_id pointers instead of
two pointers to unsigned char. Convert the various callbacks to use
struct object_id as well. Also, rename fsck_handle_reflog_sha1 to
fsck_handle_reflog_oid.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git fsck --connectivity-check" was not working at all.
* jk/fsck-connectivity-check-fix:
fsck: lazily load types under --connectivity-only
fsck: move typename() printing to its own function
t1450: use "mv -f" within loose object directory
fsck: check HAS_OBJ more consistently
fsck: do not fallback "git fsck <bogus>" to "git fsck"
fsck: tighten error-checks of "git fsck <head>"
fsck: prepare dummy objects for --connectivity-check
fsck: report trees as dangling
t1450: clean up sub-objects in duplicate-entry test
|
|
The recent fixes to "fsck --connectivity-only" load all of
the objects with their correct types. This keeps the
connectivity-only code path close to the regular one, but it
also introduces some unnecessary inefficiency. While getting
the type of an object is cheap compared to actually opening
and parsing the object (as the non-connectivity-only case
would do), it's still not free.
For reachable non-blob objects, we end up having to parse
them later anyway (to see what they point to), making our
type lookup here redundant.
For unreachable objects, we might never hit them at all in
the reachability traversal, making the lookup completely
wasted. And in some cases, we might have quite a few
unreachable objects (e.g., when alternates are used for
shared object storage between repositories, it's normal for
there to be objects reachable from other repositories but
not the one running fsck).
The comment in mark_object_for_connectivity() claims two
benefits to getting the type up front:
1. We need to know the types during fsck_walk(). (And not
explicitly mentioned, but we also need them when
printing the types of broken or dangling commits).
We can address this by lazy-loading the types as
necessary. Most objects never need this lazy-load at
all, because they fall into one of these categories:
a. Reachable from our tips, and are coerced into the
correct type as we traverse (e.g., a parent link
will call lookup_commit(), which converts OBJ_NONE
to OBJ_COMMIT).
b. Unreachable, but not at the tip of a chunk of
unreachable history. We only mention the tips as
"dangling", so an unreachable commit which links
to hundreds of other objects needs only report the
type of the tip commit.
2. It serves as a cross-check that the coercion in (1a) is
correct (i.e., we'll complain about a parent link that
points to a blob). But we get most of this for free
already, because right after coercing, we'll parse any
non-blob objects. So we'd notice then if we expected a
commit and got a blob.
The one exception is when we expect a blob, in which
case we never actually read the object contents.
So this is a slight weakening, but given that the whole
point of --connectivity-only is to sacrifice some data
integrity checks for speed, this seems like an
acceptable tradeoff.
Here are before and after timings for an extreme case with
~5M reachable objects and another ~12M unreachable (it's the
torvalds/linux repository on GitHub, connected to shared
storage for all of the other kernel forks):
[before]
$ time git fsck --no-dangling --connectivity-only
real 3m4.323s
user 1m25.121s
sys 1m38.710s
[after]
$ time git fsck --no-dangling --connectivity-only
real 0m51.497s
user 0m49.575s
sys 0m1.776s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When an object has a problem, we mention its type. But we do
so by feeding the result of typename() directly to
fprintf(). This is potentially dangerous because typename()
can return NULL for some type values (like OBJ_NONE).
It's doubtful that this can be triggered in practice with
the current code, so this is probably not fixing a bug. But
it future-proofs us against modifications that make things
like OBJ_NONE more likely (and gives future patches a
central point to handle them).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are two spots that call lookup_object() and assume
that a non-NULL result means we have the object:
1. When we're checking the objects given to us by the user
on the command line.
2. When we're checking if a reflog entry is valid.
This generally follows fsck's mental model that we will have
looked at and loaded a "struct object" for each object in
the repository. But it misses one case: if another object
_mentioned_ an object, but we didn't actually parse it or
verify that it exists, it will still have a struct.
It's not clear if this is a triggerable bug or not.
Certainly the later parts of the reachability check need to
be careful of this, and do so by checking the HAS_OBJ flag.
But both of these steps happen before we start traversing,
so probably we won't have followed any links yet. Still,
it's easy enough to be defensive here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since fsck tries to continue as much as it can after seeing
an error, we still do the reachability check even if some
heads we were given on the command-line are bogus. But if
_none_ of the heads is is valid, we fallback to checking all
refs and the index, which is not what the user asked for at
all.
Instead of checking "heads", the number of successful heads
we got, check "argc" (which we know only has non-options in
it, because parse_options removed the others).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Instead of checking reachability from the refs, you can ask
fsck to check from a particular set of heads. However, the
error checking here is quite lax. In particular:
1. It claims lookup_object() will report an error, which
is not true. It only does a hash lookup, and the user
has no clue that their argument was skipped.
2. When either the name or sha1 cannot be resolved, we
continue to exit with a successful error code, even
though we didn't check what the user asked us to.
This patch fixes both of these cases.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.
Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.
However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:
1. mark_object() will not queue objects for examination,
so recursively following links from commits to trees,
etc, did nothing. I.e., we were checking the
reachability of hardly anything at all.
2. When a set of heads is given on the command-line, we
use lookup_object() to see if they exist. But without
the initial pass, we assume nothing exists.
3. When loading reflog entries, we do a similar
lookup_object() check, and complain that the reflog is
broken if the object doesn't exist in our hash.
So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.
One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).
Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.
And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.
One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.
Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.
That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
After checking connectivity, fsck looks through the list of
any objects we've seen mentioned, and reports unreachable
and un-"used" ones as dangling. However, it skips any object
which is not marked as "parsed", as that is an object that
we _don't_ have (but that somebody mentioned).
Since 6e454b9a3 (clear parsed flag when we free tree
buffers, 2013-06-05), that flag can't be relied on, and the
correct method is to check the HAS_OBJ flag. The cleanup in
that commit missed this callsite, though. As a result, we
would generally fail to report dangling trees.
We never noticed because there were no tests in this area
(for trees or otherwise). Let's add some.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.
In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.
The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The alternate_object_database struct uses a single buffer
both for storing the path to the alternate, and as a scratch
buffer for forming object names. This is efficient (since
otherwise we'd end up storing the path twice), but it makes
life hard for callers who just want to know the path to the
alternate. They have to remember to stop reading after
"alt->name - alt->base" bytes, and to subtract one for the
trailing '/'.
It would be much simpler if they could simply access a
NUL-terminated path string. We could encapsulate this in a
function which puts a NUL in the scratch buffer and returns
the string, but that opens up questions about the lifetime
of the result. The first time another caller uses the
alternate, the scratch buffer may get other data tacked onto
it.
Let's instead just store the root path separately from the
scratch buffer. There aren't enough alternates being stored
for the duplicated data to matter for performance, and this
keeps things simple and safe for the callers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since all of its callers have been updated, modify stream_blob_to_fd to
take a struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.
* nd/pack-ofs-4gb-limit:
fsck: use streaming interface for large blobs in pack
pack-objects: do not truncate result in-pack object size on 32-bit systems
index-pack: correct "offset" type in unpack_entry_data()
index-pack: report correct bad object offsets even if they are large
index-pack: correct "len" type in unpack_data()
sha1_file.c: use type off_t* for object_info->disk_sizep
pack-objects: pass length to check_pack_crc() without truncation
|
|
When reporting broken links between commits/trees/blobs, it would be
quite helpful at times if the user would be told how the object is
supposed to be reachable.
With the new --name-objects option, git-fsck will try to do exactly
that: name the objects in a way that shows how they are reachable.
For example, when some reflog got corrupted and a blob is missing that
should not be, the user might want to remove the corresponding reflog
entry. This option helps them find that entry: `git fsck` will now
report something like this:
broken link from tree b5eb6ff... (refs/stash@{<date>}~37:)
to blob ec5cf80...
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We will need this in the next commit, where fsck will be taught to
optionally name the objects when reporting issues about them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In many places, we refer to objects via their SHA-1s. Let's abstract
that into a function.
For the moment, it does nothing else than what we did previously: print
out the 40-digit hex string. But that will change over the course of the
next patches.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For blobs, we want to make sure the on-disk data is not corrupted
(i.e. can be inflated and produce the expected SHA-1). Blob content is
opaque, there's nothing else inside to check for.
For really large blobs, we may want to avoid unpacking the entire blob
in memory, just to check whether it produces the same SHA-1. On 32-bit
systems, we may not have enough virtual address space for such memory
allocation. And even on 64-bit where it's not a problem, allocating a
lot more memory could result in kicking other parts of systems to swap
file, generating lots of I/O and slowing everything down.
For this particular operation, not unpacking the blob and letting
check_sha1_signature, which supports streaming interface, do the job
is sufficient. check_sha1_signature() is not shown in the diff,
unfortunately. But if will be called when "data_valid && !data" is
false.
We will call the callback function "fn" with NULL as "data". The only
callback of this function is fsck_obj_buffer(), which does not touch
"data" at all if it's a blob.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is never read, so we can pass NULL to resolve_ref_unsafe().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object. This provides no
functional change, as it is essentially a macro substitution.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
|
|
struct object is one of the major data structures dealing with object
IDs. Convert it to use struct object_id instead of an unsigned char
array. Convert get_object_hash to refer to the new member as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
|
|
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash. Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
|
|
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.
Macintosh-specific breakage was noticed and corrected in this
reroll.
* jk/war-on-sprintf: (70 commits)
name-rev: use strip_suffix to avoid magic numbers
use strbuf_complete to conditionally append slash
fsck: use for_each_loose_file_in_objdir
Makefile: drop D_INO_IN_DIRENT build knob
fsck: drop inode-sorting code
convert strncpy to memcpy
notes: document length of fanout path with a constant
color: add color_set helper for copying raw colors
prefer memcpy to strcpy
help: clean up kfmclient munging
receive-pack: simplify keep_arg computation
avoid sprintf and strcpy with flex arrays
use alloc_ref rather than hand-allocating "struct ref"
color: add overflow checks for parsing colors
drop strcpy in favor of raw sha1_to_hex
use sha1_to_hex_r() instead of strcpy
daemon: use cld->env_array when re-spawning
stat_tracking_info: convert to argv_array
http-push: use an argv_array for setup_revisions
fetch-pack: use argv_array for index-pack / unpack-objects
...
|
|
There were some classes of errors that "git fsck" diagnosed to its
standard error that did not cause it to exit with non-zero status.
* jc/fsck-dropped-errors:
fsck: exit with non-zero when problems are found
|
|
Since 27e1e22 (prune: factor out loose-object directory
traversal, 2014-10-15), we now have a generic callback
system for iterating over the loose object directories. This
is used by prune, count-objects, etc.
We did not convert git-fsck at the time because it
implemented an inode-sorting scheme that was not part of the
generic code. Now that the inode-sorting code is gone, we
can reuse the generic code. The result is shorter,
hopefully more readable, and drops some unchecked sprintf
calls.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fsck tries to access loose objects in order of inode number,
with the hope that this would make cold cache access faster
on a spinning disk. This dates back to 7e8c174 (fsck-cache:
sort entries by inode number, 2005-05-02), which predates
the invention of packfiles.
These days, there's not much point in trying to optimize
cold cache for a large number of loose objects. You are much
better off to simply pack the objects, which will reduce the
disk footprint _and_ provide better locality of data access.
So while you can certainly construct pathological cases
where this code might help, it is not worth the trouble
anymore.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When fsck-ing alternates, we make a copy of the alternate
directory in a fixed PATH_MAX buffer. We memcpy directly,
without any check whether we are overflowing the buffer.
This is OK if PATH_MAX is a true representation of the
maximum path on the system, because any path here will have
already been vetted by the alternates subsystem. But that is
not true on every system, so we should be more careful.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 02976bf (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) recently gave fsck an option to perform only a
subset of the checks, by skipping the fsck_object_dir()
call. However, it does so only for the local object
directory, and we still do expensive checks on any alternate
repos. We should skip them in this case, too.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
After finding some problems (e.g. a ref refs/heads/X points at an
object that is not a commit) and issuing an error message, the
program failed to signal the fact that it found an error by a
non-zero exit status.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Because git_path uses a static buffer that is shared with
calls to git_path, mkpath, etc, it can be dangerous to
assign the result to a variable or pass it to a non-trivial
function. The value may change unexpectedly due to other
calls.
None of the cases changed here has a known bug, but they're
worth converting away from git_path because:
1. It's easy to use git_pathdup in these cases.
2. They use constructs (like assignment) that make it
hard to tell whether they're safe or not.
The extra malloc overhead should be trivial, as an
allocation should be an order of magnitude cheaper than a
system call (which we are clearly about to make, since we
are constructing a filename). The real cost is that we must
remember to free the result.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Allow ignoring fsck errors on specific set of known-to-be-bad
objects, and also tweaking warning level of various kinds of non
critical breakages reported.
* js/fsck-opt:
fsck: support ignoring objects in `git fsck` via fsck.skiplist
fsck: git receive-pack: support excluding objects from fsck'ing
fsck: introduce `git fsck --connectivity-only`
fsck: support demoting errors to warnings
fsck: document the new receive.fsck.<msg-id> options
fsck: allow upgrading fsck warnings to errors
fsck: optionally ignore specific fsck issues completely
fsck: disallow demoting grave fsck errors to warnings
fsck: add a simple test for receive.fsck.<msg-id>
fsck: make fsck_tag() warn-friendly
fsck: handle multiple authors in commits specially
fsck: make fsck_commit() warn-friendly
fsck: make fsck_ident() warn-friendly
fsck: report the ID of the error/warning
fsck (receive-pack): allow demoting errors to warnings
fsck: offer a function to demote fsck errors to warnings
fsck: provide a function to parse fsck message IDs
fsck: introduce identifiers for fsck messages
fsck: introduce fsck options
|
|
"git fsck" used to ignore missing or invalid objects recorded in reflog.
* mh/fsck-reflog-entries:
fsck: report errors if reflog entries point at invalid objects
fsck_handle_reflog_sha1(): new function
|
|
Identical to support in `git receive-pack for the config option
`receive.fsck.skiplist`, we now support ignoring given objects in
`git fsck` via `fsck.skiplist` altogether.
This is extremely handy in case of legacy repositories where it would
cause more pain to change incorrect objects than to live with them
(e.g. a duplicate 'author' line in an early commit object).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This option avoids unpacking each and all blob objects, and just
verifies the connectivity. In particular with large repositories, this
speeds up the operation, at the expense of missing corrupt blobs,
ignoring unreachable objects and other fsck issues, if any.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We already have support in `git receive-pack` to deal with some legacy
repositories which have non-fatal issues.
Let's make `git fsck` itself useful with such repositories, too, by
allowing users to ignore known issues, or at least demote those issues
to mere warnings.
Example: `git -c fsck.missingEmail=ignore fsck` would hide
problems with missing emails in author, committer and tagger lines.
In the same spirit that `git receive-pack`'s usage of the fsck machinery
differs from `git fsck`'s – some of the non-fatal warnings in `git fsck`
are fatal with `git receive-pack` when receive.fsckObjects = true, for
example – we strictly separate the fsck.<msg-id> from the
receive.fsck.<msg-id> settings.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Instead of specifying whether a message by the fsck machinery constitutes
an error or a warning, let's specify an identifier relating to the
concrete problem that was encountered. This is necessary for upcoming
support to be able to demote certain errors to warnings.
In the process, simplify the requirements on the calling code: instead of
having to handle full-blown varargs in every callback, we now send a
string buffer ready to be used by the callback.
We could use a simple enum for the message IDs here, but we want to
guarantee that the enum values are associated with the appropriate
message types (i.e. error or warning?). Besides, we want to introduce a
parser in the next commit that maps the string representation to the
enum value, hence we use the slightly ugly preprocessor construct that
is extensible for use with said parser.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Just like the diff machinery, we are about to introduce more settings,
therefore it makes sense to carry them around as a (pointer to a) struct
containing all of them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Previously, if a reflog entry's old or new SHA-1 was not resolvable to
an object, that SHA-1 was silently ignored. Instead, report such cases
as errors.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
New function, extracted from fsck_handle_reflog_ent(). The extra
is_null_sha1() test for the new reference is currently unnecessary, as
reflogs are deleted when the reference itself is deleted. But it
doesn't hurt, either.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|