Age | Commit message (Collapse) | Author | Files | Lines |
|
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.
* jk/fetch-quick-tag-following:
fetch: use "quick" has_sha1_file for tag following
|
|
When we auto-follow tags in a fetch, we look at all of the
tags advertised by the remote and fetch ones where we don't
already have the tag, but we do have the object it peels to.
This involves a lot of calls to has_sha1_file(), some of
which we can reasonably expect to fail. Since 45e8a74
(has_sha1_file: re-check pack directory before giving up,
2013-08-30), this may cause many calls to
reprepare_packed_git(), which is potentially expensive.
This has gone unnoticed for several years because it
requires a fairly unique setup to matter:
1. You need to have a lot of packs on the client side to
make reprepare_packed_git() expensive (the most
expensive part is finding duplicates in an unsorted
list, which is currently quadratic).
2. You need a large number of tag refs on the server side
that are candidates for auto-following (i.e., that the
client doesn't have). Each one triggers a re-read of
the pack directory.
3. Under normal circumstances, the client would
auto-follow those tags and after one large fetch, (2)
would no longer be true. But if those tags point to
history which is disconnected from what the client
otherwise fetches, then it will never auto-follow, and
those candidates will impact it on every fetch.
So when all three are true, each fetch pays an extra
O(nr_tags * nr_packs^2) cost, mostly in string comparisons
on the pack names. This was exacerbated by 47bf4b0
(prepare_packed_git_one: refactor duplicate-pack check,
2014-06-30) which uses a slightly more expensive string
check, under the assumption that the duplicate check doesn't
happen very often (and it shouldn't; the real problem here
is how often we are calling reprepare_packed_git()).
This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
accuracy for speed, in cases where we might be racy with a
simultaneous repack. This is similar to the fix in 0eeb077
(index-pack: avoid excessive re-reading of pack directory,
2015-06-09). As with that case, it's OK for has_sha1_file()
occasionally say "no I don't have it" when we do, because
the worst case is not a corruption, but simply that we may
fail to auto-follow a tag that points to it.
Here are results from the included perf script, which sets
up a situation similar to the one described above:
Test HEAD^ HEAD
----------------------------------------------------------
5550.4: fetch 11.21(10.42+0.78) 0.08(0.04+0.02) -99.3%
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Codepaths that read from an on-disk loose object were too loose in
validating what they are reading is a proper object file and
sometimes read past the data they read from the disk, which has
been corrected. H/t to Gustavo Grieco for reporting.
* jc/verify-loose-object-header:
unpack_sha1_header(): detect malformed object header
streaming: make sure to notice corrupt object
|
|
When opening a loose object file, we often do this sequence:
- prepare a short buffer for the object header (on stack)
- call unpack_sha1_header() and have early part of the object data
inflated, enough to fill the buffer
- parse that data in the short buffer, assuming that the first part
of the object is <typename> SP <length> NUL
Because the parsing function parse_sha1_header_extended() is not
given the number of bytes inflated into the header buffer, it you
craft a file whose early part inflates a garbage sequence without SP
or NUL, and replace a loose object with it, it will end up reading
past the end of the inflated data.
To correct this, do the following four things:
- rename unpack_sha1_header() to unpack_sha1_short_header() and
have unpack_sha1_header_to_strbuf() keep calling that as its
helper function. This will detect and report zlib errors, but is
not aware of the format of a loose object (as before).
- introduce unpack_sha1_header() that calls the same helper
function, and when zlib reports it inflated OK into the buffer,
check if the inflated data has NUL. This would ensure that
parsing function will terminate within the buffer that holds the
inflated header.
- update unpack_sha1_header_to_strbuf() to check if the resulting
buffer has NUL for the same effect.
- update parse_sha1_header_extended() to make sure that its loop to
find the SP that terminates the <typename> stops at NUL.
Essentially, this makes unpack_*() functions that are asked to
unpack a loose object header to be a bit more strict and detect an
input that cannot possibly be a valid object header, even before the
parsing function kicks in.
Reported-by: Gustavo Grieco <gustavo.grieco@imag.fr>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* vs/typofix:
Spelling fixes
|
|
<BAD> <CORRECTED>
accidently accidentally
commited committed
dependancy dependency
emtpy empty
existance existence
explicitely explicitly
git-upload-achive git-upload-archive
hierachy hierarchy
indegee indegree
intial initial
mulitple multiple
non-existant non-existent
precendence. precedence.
priviledged privileged
programatically programmatically
psuedo-binary pseudo-binary
soemwhere somewhere
successfull successful
transfering transferring
uncommited uncommitted
unkown unknown
usefull useful
writting writing
Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git am -3" calls "git merge-recursive" when it needs to fall back
to a three-way merge; this call has been turned into an internal
subroutine call instead of spawning a separate subprocess.
* js/am-3-merge-recursive-direct:
merge-recursive: flush output buffer even when erroring out
merge_trees(): ensure that the callers release output buffer
merge-recursive: offer an option to retain the output in 'obuf'
merge-recursive: write the commit title in one go
merge-recursive: flush output buffer before printing error messages
am -3: use merge_recursive() directly again
merge-recursive: switch to returning errors instead of dying
merge-recursive: handle return values indicating errors
merge-recursive: allow write_tree_from_memory() to error out
merge-recursive: avoid returning a wholesale struct
merge_recursive: abort properly upon errors
prepare the builtins for a libified merge_recursive()
merge-recursive: clarify code in was_tracked()
die(_("BUG")): avoid translating bug messages
die("bug"): report bugs consistently
t5520: verify that `pull --rebase` shows the helpful advice when failing
|
|
"git pack-objects" has a few options that tell it not to pack
objects found in certain packfiles, which require it to scan .idx
files of all available packs. The codepaths involved in these
operations have been optimized for a common case of not having any
non-local pack and/or any .kept pack.
* jk/pack-objects-optim:
pack-objects: compute local/ignore_pack_keep early
pack-objects: break out of want_object loop early
find_pack_entry: replace last_found_pack with MRU cache
add generic most-recently-used list
sha1_file: drop free_pack_by_name
t/perf: add tests for many-pack scenarios
|
|
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.
* nd/pack-ofs-4gb-limit:
fsck: use streaming interface for large blobs in pack
pack-objects: do not truncate result in-pack object size on 32-bit systems
index-pack: correct "offset" type in unpack_entry_data()
index-pack: report correct bad object offsets even if they are large
index-pack: correct "len" type in unpack_data()
sha1_file.c: use type off_t* for object_info->disk_sizep
pack-objects: pass length to check_pack_crc() without truncation
|
|
Each pack has an index for looking up entries in O(log n)
time, but if we have multiple packs, we have to scan through
them linearly. This can produce a measurable overhead for
some operations.
We dealt with this long ago in f7c22cc (always start looking
up objects in the last used pack first, 2007-05-30), which
keeps what is essentially a 1-element most-recently-used
cache. In theory, we should be able to do better by keeping
a similar but longer cache, that is the same length as the
pack-list itself.
Since we now have a convenient generic MRU structure, we can
plug it in and measure. Here are the numbers for running
p5303 against linux.git:
Test HEAD^ HEAD
------------------------------------------------------------------------
5303.3: rev-list (1) 31.56(31.28+0.27) 31.30(31.08+0.20) -0.8%
5303.4: repack (1) 40.62(39.35+2.36) 40.60(39.27+2.44) -0.0%
5303.6: rev-list (50) 31.31(31.06+0.23) 31.23(31.00+0.22) -0.3%
5303.7: repack (50) 58.65(69.12+1.94) 58.27(68.64+2.05) -0.6%
5303.9: rev-list (1000) 38.74(38.40+0.33) 31.87(31.62+0.24) -17.7%
5303.10: repack (1000) 367.20(441.80+4.62) 342.00(414.04+3.72) -6.9%
The main numbers of interest here are the rev-list ones
(since that is exercising the normal object lookup code
path). The single-pack case shouldn't improve at all; the
260ms speedup there is just part of the run-to-run noise
(but it's important to note that we didn't make anything
worse with the overhead of maintaining our cache). In the
50-pack case, we see similar results. There may be a slight
improvement, but it's mostly within the noise.
The 1000-pack case does show a big improvement, though. That
carries over to the repack case, as well. Even though we
haven't touched its pack-search loop yet, it does still do a
lot of normal object lookups (e.g., for the internal
revision walk), and so improves.
As a point of reference, I also ran the 1000-pack test
against a version of HEAD^ with the last_found_pack
optimization disabled. It takes ~60s, so that gives an
indication of how much even the single-element cache is
helping.
For comparison, here's a smaller repository, git.git:
Test HEAD^ HEAD
---------------------------------------------------------------------
5303.3: rev-list (1) 1.56(1.54+0.01) 1.54(1.51+0.02) -1.3%
5303.4: repack (1) 1.84(1.80+0.10) 1.82(1.80+0.09) -1.1%
5303.6: rev-list (50) 1.58(1.55+0.02) 1.59(1.57+0.01) +0.6%
5303.7: repack (50) 2.50(3.18+0.04) 2.50(3.14+0.04) +0.0%
5303.9: rev-list (1000) 2.76(2.71+0.04) 2.24(2.21+0.02) -18.8%
5303.10: repack (1000) 13.21(19.56+0.25) 11.66(18.01+0.21) -11.7%
You can see that the percentage improvement is similar.
That's because the lookup we are optimizing is roughly
O(nr_objects * nr_packs). Since the number of packs is
constant in both tests, we'd expect the improvement to be
linear in the number of objects. But the whole process is
also linear in the number of objects, so the improvement
is a constant factor.
The exact improvement does also depend on the contents of
the packs. In p5303, the extra packs all have 5 first-parent
commits in them, which is a reasonable simulation of a
pushed-to repository. But it also means that only 250
first-parent commits are in those packs (compared to almost
50,000 total in linux.git), and the rest are in the huge
"base" pack. So once we start looking at history in taht big
pack, that's where we'll find most everything, and even the
1-element cache gets close to 100% cache hits. You could
almost certainly show better numbers with a more
pathological case (e.g., distributing the objects more
evenly across the packs). But that's simply not that
realistic a scenario, so it makes more sense to focus on
these numbers.
The implementation itself is a straightforward application
of the MRU code. We provide an MRU-ordered list of packs
that shadows the packed_git list. This is easy to do because
we only create and revise the pack list in one place. The
"reprepare" code path actually drops the whole MRU and
replaces it for simplicity. It would be more efficient to
just add new entries, but there's not much point in
optimizing here; repreparing happens rarely, and only after
doing a lot of other expensive work. The key things to keep
optimized are traversal (which is just a normal linked list,
albeit with one extra level of indirection over the regular
packed_git list), and marking (which is a constant number of
pointer assignments, though slightly more than the old
last_found_pack was; it doesn't seem to create a measurable
slowdown, though).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The point of this function is to drop an entry from the
"packed_git" cache that points to a file we might be
overwriting, because our contents may not be the same (and
hence the only caller was pack-objects as it moved a
temporary packfile into place).
In older versions of git, this could happen because the
names of packfiles were derived from the set of objects they
contained, not the actual bits on disk. But since 1190a1a
(pack-objects: name pack files after trailer hash,
2013-12-05), the name reflects the actual bits on disk, and
any two packfiles with the same name can be used
interchangeably.
Dropping this function not only saves a few lines of code,
it makes the lifetime of "struct packed_git" much easier to
reason about: namely, we now do not ever free these structs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.
* nd/pack-ofs-4gb-limit:
fsck: use streaming interface for large blobs in pack
pack-objects: do not truncate result in-pack object size on 32-bit systems
index-pack: correct "offset" type in unpack_entry_data()
index-pack: report correct bad object offsets even if they are large
index-pack: correct "len" type in unpack_data()
sha1_file.c: use type off_t* for object_info->disk_sizep
pack-objects: pass length to check_pack_crc() without truncation
|
|
The vast majority of error messages in Git's source code which report a
bug use the convention to prefix the message with "BUG:".
As part of cleaning up merge-recursive to stop die()ing except in case of
detected bugs, let's just make the remainder of the bug reports consistent
with the de facto rule.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
On 32 bit systems with large file support, unsigned long is 32-bit
while the two offsets in the subtraction expression (pack-objects has
the exact same expression as in sha1_file.c but not shown in diff) are
in 64-bit. If an in-pack object is larger than 2^32 len/datalen is
truncated and we get a misleading "error: bad packed object CRC for
..." as a result.
Use off_t for len and datalen. check_pack_crc() already accepts this
argument as off_t and can deal with 4+ GB.
Noticed-by: Christoph Michelbach <michelbach94@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The experimental "multiple worktree" feature gains more safety to
forbid operations on a branch that is checked out or being actively
worked on elsewhere, by noticing that e.g. it is being rebased.
* nd/worktree-various-heads:
branch: do not rename a branch under bisect or rebase
worktree.c: check whether branch is bisected in another worktree
wt-status.c: split bisect detection out of wt_status_get_state()
worktree.c: check whether branch is rebased in another worktree
worktree.c: avoid referencing to worktrees[i] multiple times
wt-status.c: make wt_status_check_rebase() work on any worktree
wt-status.c: split rebase detection out of wt_status_get_state()
path.c: refactor and add worktree_git_path()
worktree.c: mark current worktree
worktree.c: make find_shared_symref() return struct worktree *
worktree.c: store "id" instead of "git_dir"
path.c: add git_common_path() and strbuf_git_common_path()
dir.c: rename str(n)cmp_icase to fspath(n)cmp
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These functions compare two paths that are taken from file system.
Depending on the running file system, paths may need to be compared
case-sensitively or not, and maybe even something else in future. The
current names do not convey that well.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code to read the pack data using the offsets stored in the pack
idx file has been made more carefully check the validity of the
data in the idx.
* jk/pack-idx-corruption-safety:
sha1_file.c: mark strings for translation
use_pack: handle signed off_t overflow
nth_packed_object_offset: bounds-check extended offset
t5313: test bounds-checks of corrupted/malicious pack/idx files
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
|
|
A v2 pack index file can specify an offset within a packfile
of up to 2^64-1 bytes. On a system with a signed 64-bit
off_t, we can represent only up to 2^63-1. This means that a
corrupted .idx file can end up with a negative offset in the
pack code. Our bounds-checking use_pack function looks for
too-large offsets, but not for ones that have wrapped around
to negative. Let's do so, which fixes an out-of-bounds
access demonstrated in t5313.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If a pack .idx file has a corrupted offset for an object, we
may try to access an offset in the .idx or .pack file that
is larger than the file's size. For the .pack case, we have
use_pack() to protect us, which realizes the access is out
of bounds. But if the corrupted value asks us to look in the
.idx file's secondary 64-bit offset table, we blindly add it
to the mmap'd index data and access arbitrary memory.
We can fix this with a simple bounds-check compared to the
size we found when we opened the .idx file.
Note that there's similar code in index-pack that is
triggered only during "index-pack --verify". To support
both, we pull the bounds-check into a separate function,
which dies when it sees a corrupted file.
It would be nice if we could return an error, so that the
pack code could try to find a good copy of the object
elsewhere. Currently nth_packed_object_offset doesn't have
any way to return an error, but it could probably use "0" as
a sentinel value (since no object can start there). This is
the minimal fix, and we can improve the resilience later on
top.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If our size computation overflows size_t, we may allocate a
much smaller buffer than we expected and overflow it. It's
probably impossible to trigger an overflow in most of these
sites in practice, but it is easy enough convert their
additions and multiplications into overflow-checking
variants. This may be fixing real bugs, and it makes
auditing the code easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:
1. It automatically checks the array-size multiplication
for overflow.
2. It always uses sizeof(*array) for the element-size,
so that it can never go out of sync with the declared
type of the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
$GIT_OBJECT_DIRECTORY/info/alternates is a text file that can be
edited with a DOS editor. We do not want to use the real path with
CR appended at the end.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The strbuf_getline() interface allows a byte other than LF or NUL as
the line terminator, but this is only because I wrote these
codepaths anticipating that there might be a value other than NUL
and LF that could be useful when I introduced line_termination long
time ago. No useful caller that uses other value has emerged.
By now, it is clear that the interface is overly broad without a
good reason. Many codepaths have hardcoded preference to read
either LF terminated or NUL terminated records from their input, and
then call strbuf_getline() with LF or NUL as the third parameter.
This step introduces two thin wrappers around strbuf_getline(),
namely, strbuf_getline_lf() and strbuf_getline_nul(), and
mechanically rewrites these call sites to call either one of
them. The changes contained in this patch are:
* introduction of these two functions in strbuf.[ch]
* mechanical conversion of all callers to strbuf_getline() with
either '\n' or '\0' as the third parameter to instead call the
respective thin wrapper.
After this step, output from "git grep 'strbuf_getline('" would
become a lot smaller. An interim goal of this series is to make
this an empty set, so that we can have strbuf_getline_crlf() take
over the shorter name strbuf_getline().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"format-patch" has learned a new option to zero-out the commit
object name on the mbox "From " line.
* bc/format-patch-null-from-line:
format-patch: check that header line has expected format
format-patch: add an option to suppress commit hash
sha1_file.c: introduce a null_oid constant
|
|
The helper used to iterate over loose object directories to prune
stale objects did not closedir() immediately when it is done with a
directory--a callback such as the one used for "git prune" may want
to do rmdir(), but it would fail on open directory on platforms
such as WinXP.
* jk/prune-mtime:
prune: close directory earlier during loose-object directory traversal
|
|
null_oid is the struct object_id equivalent to null_sha1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add has_object_file, which is a wrapper around has_sha1_file, but for
struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
|
|
Having a leftover .idx file without corresponding .pack file in
the repository hurts performance; "git gc" learned to prune them.
* dk/gc-idx-wo-pack:
gc: remove garbage .idx files from pack dir
t5304: test cleaning pack garbage
prepare_packed_git(): refactor garbage reporting in pack directory
|
|
Various compilation fixes and squelching of warnings.
* js/misc-fixes:
Correct fscanf formatting string for I64u values
Silence GCC's "cast of pointer to integer of a different size" warning
Squelch warning about an integer overflow
|
|
When calculating hashes from pointers, it actually makes sense to cut
off the most significant bits. In that case, said warning does not make
a whole lot of sense.
So let's just work around it by casting the pointer first to intptr_t
and then casting up/down to the final integral type.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.
Macintosh-specific breakage was noticed and corrected in this
reroll.
* jk/war-on-sprintf: (70 commits)
name-rev: use strip_suffix to avoid magic numbers
use strbuf_complete to conditionally append slash
fsck: use for_each_loose_file_in_objdir
Makefile: drop D_INO_IN_DIRENT build knob
fsck: drop inode-sorting code
convert strncpy to memcpy
notes: document length of fanout path with a constant
color: add color_set helper for copying raw colors
prefer memcpy to strcpy
help: clean up kfmclient munging
receive-pack: simplify keep_arg computation
avoid sprintf and strcpy with flex arrays
use alloc_ref rather than hand-allocating "struct ref"
color: add overflow checks for parsing colors
drop strcpy in favor of raw sha1_to_hex
use sha1_to_hex_r() instead of strcpy
daemon: use cld->env_array when re-spawning
stat_tracking_info: convert to argv_array
http-push: use an argv_array for setup_revisions
fetch-pack: use argv_array for index-pack / unpack-objects
...
|
|
"git clone --dissociate" runs a big "git repack" process at the
end, and it helps to close file descriptors that are open on the
packs and their idx files before doing so on filesystems that
cannot remove a file that is still open.
* js/clone-dissociate:
clone --dissociate: avoid locking pack files
sha1_file.c: add a function to release all packs
sha1_file: consolidate code to close a pack's file descriptor
t5700: demonstrate a Windows file locking issue with `git clone --dissociate`
|
|
On Windows, files that are in use cannot be removed or renamed. That
means that we have to release pack files when we are about to, say,
repack them. Let's introduce a convenient function to close all the
pack files and their idx files.
While at it, we consolidate the close windows/close fd/close index
stanza in `free_pack_by_name()` into the `close_pack()` function that
is used by the new `close_all_packs()` function to avoid repeated code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There was a lot of repeated code to close the file descriptor of
a given pack. Let's just refactor this code into a single function.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we are allocating a struct with a FLEX_ARRAY member, we
generally compute the size of the array and then sprintf or
strcpy into it. Normally we could improve a dynamic allocation
like this by using xstrfmt, but it doesn't work here; we
have to account for the size of the rest of the struct.
But we can improve things a bit by storing the length that
we use for the allocation, and then feeding it to xsnprintf
or memcpy, which makes it more obvious that we are not
writing more than the allocated number of bytes.
It would be nice if we had some kind of helper for
allocating generic flex arrays, but it doesn't work that
well:
- the call signature is a little bit unwieldy:
d = flex_struct(sizeof(*d), offsetof(d, path), fmt, ...);
You need offsetof here instead of just writing to the
end of the base size, because we don't know how the
struct is packed (partially this is because FLEX_ARRAY
might not be zero, though we can account for that; but
the size of the struct may actually be rounded up for
alignment, and we can't know that).
- some sites do clever things, like over-allocating because
they know they will write larger things into the buffer
later (e.g., struct packed_git here).
So we're better off to just write out each allocation (or
add type-specific helpers, though many of these are one-off
allocations anyway).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When creating a loose object tempfile, we use a fixed
PATH_MAX-sized buffer, and strcpy directly into it. This
isn't buggy, because we do a rough check of the size, but
there's no verification that our guesstimate of the required
space is enough (in fact, it's several bytes too big for the
current naming scheme).
Let's switch to a strbuf, which makes this much easier to
verify. The allocation overhead should be negligible, since
we are replacing a static buffer with a static strbuf, and
we'll only need to allocate on the first call.
While we're here, we can also document a subtle interaction
with mkstemp that would be easy to overlook.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We do some manual memory computation here, and there's no
check that our 60 is not overflowed by the raw sprintf (it
isn't, because the "which" parameter is never longer than
"pack"). We can simplify this greatly with a strbuf.
Technically the end result is not identical, as the original
took care not to rewrite the object directory on each call
for performance reasons. We could do that here, too (by
saving the baselen and resetting to it), but it's not worth
the complexity; this function is not called a lot (generally
once per packfile that we open).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we want to convert "foo.pack" to "foo.idx", we do it by
duplicating the original string and then munging the bytes
in place. Let's use strip_suffix and xstrfmt instead, which
has several advantages:
1. It's more clear what the intent is.
2. It does not implicitly rely on the fact that
strlen(".idx") <= strlen(".pack") to avoid an overflow.
3. We communicate the assumption that the input file ends
with ".pack" (and get a run-time check that this is so).
4. We drop calls to strcpy, which makes auditing the code
base easier.
Likewise, we can do this to convert ".pack" to ".bitmap",
avoiding some manual memory computation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have the path "foo.idx", and we create a buffer big
enough to hold "foo.pack" and "foo.keep", and then strcpy
straight into it. This isn't a bug (we have enough space),
but it's very hard to tell from the strcpy that this is so.
Let's instead use strip_suffix to take off the ".idx",
record the size of our allocation, and use xsnprintf to make
sure we don't violate our assumptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We generally use 32-byte buffers to format git's "type size"
header fields. These should not generally overflow unless
you can produce some truly gigantic objects (and our types
come from our internal array of constant strings). But it is
a good idea to use xsnprintf to make sure this is the case.
Note that we slightly modify the interface to
write_sha1_file_prepare, which nows uses "hdrlen" as an "in"
parameter as well as an "out" (on the way in it stores the
allocated size of the header, and on the way out it returns
the ultimate size of the header).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
|
This function assumes that the relative_base path passed
into it is no larger than PATH_MAX, and writes into a
fixed-size buffer. However, this path may not have actually
come from the filesystem; for example, add_submodule_odb
generates a path using a strbuf and passes it in. This is
hard to trigger in practice, though, because the long
submodule directory would have to exist on disk before we
would try to open its info/alternates file.
We can easily avoid the bug, though, by simply creating the
filename on the heap.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When trying to see that an object does not exist, a state errno
leaked from our "first try to open a packfile with O_NOATIME and
then if it fails retry without it" logic on a system that refuses
O_NOATIME. This confused us and caused us to die, saying that the
packfile is unreadable, when we should have just reported that the
object does not exist in that packfile to the caller.
* cb/open-noatime-clear-errno:
git_open_noatime: return with errno=0 on success
|
|
When trying to see that an object does not exist, a state errno
leaked from our "first try to open a packfile with O_NOATIME and
then if it fails retry without it" logic on a system that refuses
O_NOATIME. This confused us and caused us to die, saying that the
packfile is unreadable, when we should have just reported that the
object does not exist in that packfile to the caller.
* cb/open-noatime-clear-errno:
git_open_noatime: return with errno=0 on success
|