Age | Commit message (Collapse) | Author | Files | Lines |
|
* jc/alternate-push:
push: receiver end advertises refs from alternate repositories
push: prepare sender to receive extended ref information from the receiver
receive-pack: make it a builtin
is_directory(): a generic helper function
|
|
* jc/safe-c-l-d:
safe_create_leading_directories(): make it about "leading" directories
|
|
* maint:
sha1_file: link() returns -1 on failure, not errno
Make git archive respect core.autocrlf when creating zip format archives
Add new test to demonstrate git archive core.autocrlf inconsistency
gitweb: avoid warnings for commits without body
Clarified gitattributes documentation regarding custom hunk header.
git-svn: fix handling of even funkier branch names
git-svn: Always create a new RA when calling do_switch for svn://
git-svn: factor out svnserve test code for later use
diff/diff-files: do not use --cc too aggressively
|
|
5723fe7 (Avoid cross-directory renames and linking on object creation,
2008-06-14) changed the call to use link() directly instead of through a
custom wrapper, but forgot that it returns 0 or -1, not 0 or errno.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Earlier, when pushing into a repository that borrows from alternate object
stores, we followed the longstanding design decision not to trust refs in
the alternate repository that houses the object store we are borrowing
from. If your public repository is borrowing from Linus's public
repository, you pushed into it long time ago, and now when you try to push
your updated history that is in sync with more recent history from Linus,
you will end up sending not just your own development, but also the
changes you acquired through Linus's tree, even though the objects needed
for the latter already exists at the receiving end. This is because the
receiving end does not advertise that the objects only reachable from the
borrowed repository (i.e. Linus's) are already available there.
This solves the issue by making the receiving end advertise refs from
borrowed repositories. They are not sent with their true names but with a
phoney name ".have" to make sure that the old senders will safely ignore
them (otherwise, the old senders will misbehave, trying to push matching
refs, and mirror push that deletes refs that only exist at the receiving
end).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A simple "grep -e stat --and -e S_ISDIR" revealed there are many
open-coded implementations of this function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We used to allow callers to pass "foo/bar/" to make sure both "foo" and
"foo/bar" exist and have good permissions, but this interface is too error
prone. If a caller mistakenly passes a path with trailing slashes
(perhaps it forgot to verify the user input) even when it wants to later
mkdir "bar" itself, it will find that it cannot mkdir "bar". If such a
caller does not bother to check the error for EEXIST, it may even
errorneously die().
Because we have no existing callers to use that obscure feature, this
patch removes it to avoid confusion.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* np/verify-pack:
discard revindex data when pack list changes
|
|
This is needed to fix verify-pack -v with multiple pack arguments.
Also, in theory, revindex data (if any) must be discarded whenever
reprepare_packed_git() is called. In practice this is hard to trigger
though.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* dp/hash-literally:
add --no-filters option to git hash-object
add --path option to git hash-object
use parse_options() in git hash-object
correct usage help string for git-hash-object
correct argument checking test for git hash-object
teach index_fd to work with pipes
|
|
When dealing with a repository with lots of loose objects, sha1_object_info
would rescan the packs directory every time an unpacked object was referenced
before finally giving up and looking for the loose object. This caused a lot
of extra unnecessary system calls during git pack-objects; the code was
rereading the entire pack directory once for each loose object file.
This patch looks for a loose object before falling back to rescanning the
pack directory, rather than the other way around.
Signed-off-by: Steven Grimm <koreth@midwinter.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
index_fd can now work with file descriptors that are not normal files
but any readable file. If the given file descriptor is a regular file
then mmap() is used; for other files, strbuf_read is used.
The path parameter, which has been used as hint for filters, can be
NULL now to indicate that the file should be hashed literally without
any filter.
The index_pipe function is removed as redundant.
Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since commit 8eca0b47ff1598a6d163df9358c0e0c9bd92d4c8, it is possible
for read_sha1_file() to return NULL even with existing objects when they
are corrupted. Previously a corrupted object would have terminated the
program immediately, effectively making read_sha1_file() return NULL
only when specified object is not found.
Let's restore this behavior for all users of read_sha1_file() and
provide a separate function with the ability to not terminate when
bad objects are encountered.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* sp/maint-pack-memuse:
Correct pack memory leak causing git gc to try to exceed ulimit
Conflicts:
sha1_file.c
|
|
When recursing to unpack a delta base we must unuse_pack() so that
the pack window for the current object does not remain pinned in
memory while the delta base is itself being unpacked and materialized
for our use.
On a long delta chain of 50 objects we may need to access 6 different
windows from a very large (>3G) pack file in order to obtain all
of the delta base content. If the process ulimit permits us to
map/allocate only 1.5G we must release windows during this recursion
to ensure we stay within the ulimit and transition memory from pack
cache to standard malloc, or other mmap needs.
Inserting an unuse_pack() call prior to the recursion allows us to
avoid pinning the current window, making it available for garbage
collection if memory runs low.
This has been broken since at least before 1.5.1-rc1, and very
likely earlier than that. Its fixed now. :)
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When printing valuds of type uint32_t, we should use PRIu32, and should
not assume that it is unsigned int. On 32-bit platforms, it could be
defined as unsigned long. The same caution applies to ntohl().
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* j6t/mingw: (38 commits)
compat/pread.c: Add a forward declaration to fix a warning
Windows: Fix ntohl() related warnings about printf formatting
Windows: TMP and TEMP environment variables specify a temporary directory.
Windows: Make 'git help -a' work.
Windows: Work around an oddity when a pipe with no reader is written to.
Windows: Make the pager work.
When installing, be prepared that template_dir may be relative.
Windows: Use a relative default template_dir and ETC_GITCONFIG
Windows: Compute the fallback for exec_path from the program invocation.
Turn builtin_exec_path into a function.
Windows: Use a customized struct stat that also has the st_blocks member.
Windows: Add a custom implementation for utime().
Windows: Add a new lstat and fstat implementation based on Win32 API.
Windows: Implement a custom spawnve().
Windows: Implement wrappers for gethostbyname(), socket(), and connect().
Windows: Work around incompatible sort and find.
Windows: Implement asynchronous functions as threads.
Windows: Disambiguate DOS style paths from SSH URLs.
Windows: A rudimentary poll() emulation.
Windows: Implement start_command().
...
|
|
* lt/config-fsync:
Add config option to enable 'fsync()' of object files
Split up default "i18n" and "branch" config parsing into helper routines
Split up default "user" config parsing into helper routine
Split up default "core" config parsing into helper routine
|
|
The shell version used to use "mkdir -p" to create the repo
path, but the C version just calls "mkdir". Let's replicate
the old behavior. We have to create the git and worktree
leading dirs separately; while most of the time, the
worktree dir contains the git dir (as .git), the user can
override this using GIT_WORK_TREE.
We can reuse safe_create_leading_directories, but we need to
make a copy of our const buffer to do so. Since
merge-recursive uses the same pattern, we can factor this
out into a global function. This has two other cleanup
advantages for merge-recursive:
1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
creates bar, but this function just creates the leading
directories.
2. mkdir_p took a mode argument, but it was completely
ignored.
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Using find_pack_entry_one() to get object offsets is rather suboptimal
when nth_packed_object_offset() can be used directly.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The shell version used to use "mkdir -p" to create the repo
path, but the C version just calls "mkdir". Let's replicate
the old behavior. We have to create the git and worktree
leading dirs separately; while most of the time, the
worktree dir contains the git dir (as .git), the user can
override this using GIT_WORK_TREE.
We can reuse safe_create_leading_directories, but we need to
make a copy of our const buffer to do so. Since
merge-recursive uses the same pattern, we can factor this
out into a global function. This has two other cleanup
advantages for merge-recursive:
1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
creates bar, but this function just creates the leading
directories.
2. mkdir_p took a mode argument, but it was completely
ignored.
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
New pack structures are currently allocated in 2 different places
and all members have to be initialized explicitly. This is prone
to errors leading to segmentation faults as found by Teemu Likonen.
Let's have a common place where this structure is allocated, and have
all members explicitly initialized to zero.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We should be able to fall back to loose objects or alternative packs when
a pack becomes corrupted. This is especially true when an object exists
in one pack only as a delta but its base object is corrupted. Currently
there is no way to retrieve the former object even if the later is
available in another pack or loose.
This patch allows for a delta to be resolved (with a performance cost)
using a base object from a source other than the pack where that delta
is located. Same thing for non-delta objects: rather than failing
outright, a search is made in other packs or used loose when the
currently active pack has it but corrupted.
Of course git will become extremely noisy with error messages when that
happens. However, if the operation succeeds nevertheless, a simple
'git repack -a -f -d' will "fix" the corrupted repository given that all
corrupted objects have a good duplicate somewhere in the object store,
possibly manually copied from another source.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The AIX mkstemp will modify it's template parameter to an empty string if
the call fails. This caused a subsequent mkdir to fail.
Signed-off-by: Patrick Higgins <patrick.higgins@cexp.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In this function we must be careful to handle drive-local paths else there
is a danger that it runs into an infinite loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
|
|
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
|
|
As explained in the documentation[*] this is totally useless on
filesystems that do ordered/journalled data writes, but it can be a
useful safety feature on filesystems like HFS+ that only journal the
metadata, not the actual file contents.
It defaults to off, although we could presumably in theory some day
auto-enable it on a per-filesystem basis.
[*] Yes, I updated the docs for the thing. Hell really _has_ frozen
over, and the four horsemen are probably just beyond the horizon.
EVERYBODY PANIC!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It was implemented as a thin wrapper around an otherwise unused
helper function parse_pack_index_file(). The code becomes simpler
and easier to read by consolidating the two.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a shared repository, we should make sure adjust_shared_perm() is called
after creating the initial fan-out directories under objects/ directory.
Earlier an logico called the function only when mkdir() failed; we should
do so when mkdir() succeeded.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Before even calling this, all callers have done a "has_sha1_file(sha1)"
or "has_loose_object(sha1)" check, so there is no point in doing a
second check.
If something races with us on object creation, we handle that in the
final link() that moves it to the right place.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that we've made the loose SHA1 file reading more careful and
streamlined, we only use the old find_sha1_file() function for checking
whether a loose object file exists at all.
As such, the whole 'return stat information' part of it was just
pointless (nobody cares any more), and the naming of the function is not
really all that relevant either.
So simplify it to not do a 'stat()', but just an existence check (which
is what the callers want), and rename it to 'has_loose_object()' which
matches the use.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We used to do 'stat()+open()+mmap()+close()' to read the loose object
file data, which does work fine, but has a couple of problems:
- it unnecessarily walks the filename twice (at 'stat()' time and then
again to open it)
- NFS generally has open-close consistency guarantees, which means that
the initial 'stat()' was technically done outside of the normal
consistency rules.
So change it to do 'open()+fstat()+mmap()+close()' instead, which avoids
both these issues.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Instead of creating new temporary objects in the top-level git object
directory, create them in the same directory they will finally end up in
anyway. This avoids making the final atomic "rename to stable name"
operation be a cross-directory event, which makes it a lot easier for
various filesystems.
Several filesystems do things like change the inode number when moving
files across directories (or refuse to do it entirely).
In particular, it can also cause problems for NFS implementations that
change the filehandle of a file when it moves to a different directory,
like the old user-space NFS server did, and like the Linux knfsd still
does if you don't export your filesystems with 'no_subtree_check' or if
you export a filesystem that doesn't have stable inode numbers across
renames).
This change also obviously implies creating the object fan-out
subdirectory at tempfile creation time, rather than at the final
move_temp_to_file() time. Which actually accounts for most of the size
of the patch.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
write_sha1_from_fd() and write_sha1_to_fd() were dead code nobody called,
neither the latter's helper repack_object() was.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This consolidates the common operations for closing the new temporary file
that we have written, before we move it into place with the final name.
There's some common code there (make it read-only and check for errors on
close), but more importantly, this also gives a single place to add an
fsync_or_die() call if we want to add a safe mode.
This was triggered due to Denis Bueno apparently twice being able to
corrupt his git repository on OS X due to an unlucky combination of kernel
crashes and a not-very-robust filesystem.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
An earlier commit 633f43e (Remove redundant code, eliminate one static
variable, 2008-05-24) had a thinko (perhaps an eyeno) that broke
sha1_pack_index_name() function. One symptom of this was that the http
walker is now completely broken.
This should fix it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* db/clone-in-c:
Add test for cloning with "--reference" repo being a subset of source repo
Add a test for another combination of --reference
Test that --reference actually suppresses fetching referenced objects
clone: fall back to copying if hardlinking fails
builtin-clone.c: Need to closedir() in copy_or_link_directory()
builtin-clone: fix initial checkout
Build in clone
Provide API access to init_db()
Add a function to set a non-default work tree
Allow for having for_each_ref() list extra refs
Have a constant extern refspec for "--tags"
Add a library function to add an alternate to the alternates file
Add a lockfile function to append to a file
Mark the list of refs to fetch as const
Conflicts:
cache.h
t/t5700-clone-reference.sh
|
|
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is meant to force the creation of a loose object even if it
already exists packed. Needed for the next commit.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is in the core so that, if the alternates file has already been
read, the addition can be parsed and put into effect for the current
process.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Currently, when looking for a packed object from the pack idx, a
simple binary search is used.
A conventional binary search loop looks like this:
unsigned lo, hi;
do {
unsigned mi = (lo + hi) / 2;
int cmp = "entry pointed at by mi" minus "target";
if (!cmp)
return mi; "mi is the wanted one"
if (cmp > 0)
hi = mi; "mi is larger than target"
else
lo = mi+1; "mi is smaller than target"
} while (lo < hi);
"did not find what we wanted"
The invariants are:
- When entering the loop, 'lo' points at a slot that is never
above the target (it could be at the target), 'hi' points at
a slot that is guaranteed to be above the target (it can
never be at the target).
- We find a point 'mi' between 'lo' and 'hi' ('mi' could be
the same as 'lo', but never can be as high as 'hi'), and
check if 'mi' hits the target. There are three cases:
- if it is a hit, we have found what we are looking for;
- if it is strictly higher than the target, we set it to
'hi', and repeat the search.
- if it is strictly lower than the target, we update 'lo'
to one slot after it, because we allow 'lo' to be at the
target and 'mi' is known to be below the target.
If the loop exits, there is no matching entry.
When choosing 'mi', we do not have to take the "middle" but
anywhere in between 'lo' and 'hi', as long as lo <= mi < hi is
satisfied. When we somehow know that the distance between the
target and 'lo' is much shorter than the target and 'hi', we
could pick 'mi' that is much closer to 'lo' than (hi+lo)/2,
which a conventional binary search would pick.
This patch takes advantage of the fact that the SHA-1 is a good
hash function, and as long as there are enough entries in the
table, we can expect uniform distribution. An entry that begins
with for example "deadbeef..." is much likely to appear much
later than in the midway of a reasonably populated table. In
fact, it can be expected to be near 87% (222/256) from the top
of the table.
This is a work-in-progress and has switches to allow easier
experiments and debugging. Exporting GIT_USE_LOOKUP environment
variable enables this code.
On my admittedly memory starved machine, with a partial KDE
repository (3.0G pack with 95M idx):
$ GIT_USE_LOOKUP=t git log -800 --stat HEAD >/dev/null
3.93user 0.16system 0:04.09elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+55588minor)pagefaults 0swaps
Without the patch, the numbers are:
$ git log -800 --stat HEAD >/dev/null
4.00user 0.15system 0:04.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+60258minor)pagefaults 0swaps
In the same repository:
$ GIT_USE_LOOKUP=t git log -2000 HEAD >/dev/null
0.12user 0.00system 0:00.12elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4241minor)pagefaults 0swaps
Without the patch, the numbers are:
$ git log -2000 HEAD >/dev/null
0.05user 0.01system 0:00.07elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8506minor)pagefaults 0swaps
There isn't much time difference, but the number of minor faults
seems to show that we are touching much smaller number of pages,
which is expected.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since commit eb32d236df0c16b936b04f0c5402addb61cdb311, there was a TODO
comment in packed_object_info_detail() about the SHA1 of base object to
OBJ_OFS_DELTA objects. So here it is at last.
While at it, providing the actual storage size information as well is now
trivial.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jk/empty-tree:
add--interactive: handle initial commit better
hard-code the empty tree object
|
|
* mk/maint-parse-careful:
peel_onion: handle NULL
check return value from parse_commit() in various functions
parse_commit: don't fail, if object is NULL
revision.c: handle tag->tagged == NULL
reachable.c::process_tree/blob: check for NULL
process_tag: handle tag->tagged == NULL
check results of parse_commit in merge_bases
list-objects.c::process_tree/blob: check for NULL
reachable.c::add_one_tree: handle NULL from lookup_tree
mark_blob/tree_uninteresting: check for NULL
get_sha1_oneline: check return value of parse_object
read_object_with_reference: don't read beyond the buffer
|
|
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now any commands may reference the empty tree object by its
sha1 (4b825dc642cb6eb9a060e54bf8d69288fbee4904). This is
useful for showing some diffs, especially for initial
commits.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
|
|
fast-import was relying on the fact that on most systems mmap() and
write() are synchronized by the filesystem's buffer cache. We were
relying on the ability to mmap() 20 bytes beyond the current end
of the file, then later fill in those bytes with a future write()
call, then read them through the previously obtained mmap() address.
This isn't always true with some implementations of NFS, but it is
especially not true with our NO_MMAP=YesPlease build time option used
on some platforms. If fast-import was built with NO_MMAP=YesPlease
we used the malloc()+pread() emulation and the subsequent write()
call does not update the trailing 20 bytes of a previously obtained
"mmap()" (aka malloc'd) address.
Under NO_MMAP that behavior causes unpack_entry() in sha1_file.c to
be unable to read an object header (or data) that has been unlucky
enough to be written to the packfile at a location such that it
is in the trailing 20 bytes of a window previously opened on that
same packfile.
This bug has gone unnoticed for a very long time as it is highly data
dependent. Not only does the object have to be placed at the right
position, but it also needs to be positioned behind some other object
that has been accessed due to a branch cache invalidation. In other
words the stars had to align just right, and if you did run into
this bug you probably should also have purchased a lottery ticket.
Fortunately the workaround is a lot easier than the bug explanation.
Before we allow unpack_entry() to read data from a pack window
that has also (possibly) been modified through write() we force
all existing windows on that packfile to be closed. By closing
the windows we ensure that any new access via the emulated mmap()
will reread the packfile, updating to the current file content.
This comes at a slight performance degredation as we cannot reuse
previously cached windows when we update the packfile. But it
is a fairly minor difference as the window closes happen at only
two points:
- When the packfile is finalized and its .idx is generated:
At this stage we are getting ready to update the refs and any
data access into the packfile is going to be random, and is
going after only the branch tips (to ensure they are valid).
Our existing windows (if any) are not likely to be positioned
at useful locations to access those final tip commits so we
probably were closing them before anyway.
- When the branch cache missed and we need to reload:
At this point fast-import is getting change commands for the next
commit and it needs to go re-read a tree object it previously
had written out to the packfile. What windows we had (if any)
are not likely to cover the tree in question so we probably were
closing them before anyway.
We do try to avoid unnecessarily closing windows in the second case
by checking to see if the packfile size has increased since the
last time we called unpack_entry() on that packfile. If the size
has not changed then we have not written additional data, and any
existing window is still vaild. This nicely handles the cases where
fast-import is going through a branch cache reload and needs to read
many trees at once. During such an event we are not likely to be
updating the packfile so we do not cycle the windows between reads.
With this change in place t9301-fast-export.sh (which was broken
by c3b0dec509fe136c5417422f31898b5a4e2d5e02) finally works again.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|