Age | Commit message (Collapse) | Author | Files | Lines |
|
This reverts most of commit a2430dde8ceaaaabf05937438249397b883ca77a.
That commit made the situation better for repositories with relatively
small number of objects. However with many objects and a small pack size
limit, the time required to complete the repack tends towards O(n^2),
or even much worse with long delta chains.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The value passed to --max-pack-size used to count in MiB which was
inconsistent with the corresponding configuration variable as well as
other command arguments which are defined to count in bytes with an
optional unit suffix. This brings --max-pack-size in line with the
rest of Git.
Also, in order not to cause havoc with people used to the previous
megabyte scale, and because this is a sane thing to do anyway, a
minimum size of 1 MiB is enforced to avoid an explosion of pack files.
Adjust and extend test suite accordingly.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Current handling of pack_size_limit is quite suboptimal. Let's consider
a list of objects to pack which contain alternatively big and small
objects (which pretty matches reality when big blobs are interlaced
with tree objects). Currently, the code simply close the pack and opens
a new one when the next object in line breaks the size limit.
The current code may degenerate into:
- small tree object => store into pack #1
- big blob object busting the pack size limit => store into pack #2
- small blob but pack #2 is over the limit already => pack #3
- big blob busting the size limit => pack #4
- small tree but pack #4 is over the limit => pack #5
- big blob => pack #6
- small tree => pack #7
- ... and so on.
The reality is that the content of packs 1, 3, 5 and 7 could well be
stored more efficiently (and delta compressed) together in pack #1 if
the big blobs were not forcing an immediate transition to a new pack.
Incidentally this can be fixed pretty easily by simply skipping over
those objects that are too big to fit in the current pack while trying
the whole list of unwritten objects, and then that list considered from
the beginning again when a new pack is opened. This creates much fewer
smallish pack files and help making more predictable test cases for the
test suite.
This change made one of the self sanity checks useless so it is removed
as well. That check was rather redundant already anyway.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the first piece of threaded code was introduced in commit 8ecce684, it
came with its own THREADED_DELTA_SEARCH Makefile option. Since this time,
more threaded code has come into the codebase and a NO_PTHREADS option has
also been added. Get rid of the original option as the newer, more generic
option covers everything we need.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This required some fairly trivial packfile function 'const' cleanup,
since the builtin commands get a const char *argv[] array.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/conflict-marker-size:
rerere: honor conflict-marker-size attribute
rerere: prepare for customizable conflict marker length
conflict-marker-size: new attribute
rerere: use ll_merge() instead of using xdl_merge()
merge-tree: use ll_merge() not xdl_merge()
xdl_merge(): allow passing down marker_size in xmparam_t
xdl_merge(): introduce xmparam_t for merge specific parameters
git_attr(): fix function signature
Conflicts:
builtin-merge-file.c
ll-merge.c
xdiff/xdiff.h
xdiff/xmerge.c
|
|
The function took (name, namelen) as its arguments, but all the public
callers wanted to pass a full string.
Demote the counted-string interface to an internal API status, and allow
public callers to just pass the string to the function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This patch implements native to Windows subset of pthreads API used by Git.
It allows to remove Pthreads for Win32 dependency for MSVC, msysgit and
Cygwin.
[J6t: If the MinGW build was built as part of the msysgit build
environment, then threading was already enabled because the
pthreads-win32 package is available in msysgit. With this patch, we can now
enable threaded code unconditionally.]
Signed-off-by: Andrzej K. Haczewski <ahaczewski@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* maint:
pack-objects: split implications of --all-progress from progress activation
instaweb: restart server if already running
prune-packed: only show progress when stderr is a tty
Conflicts:
builtin-pack-objects.c
|
|
Currently the --all-progress flag is used to use force progress display
during the writing object phase even if output goes to stdout which is
primarily the case during a push operation. This has the unfortunate
side effect of forcing progress display even if stderr is not a
terminal.
Let's introduce the --all-progress-implied argument which has the same
intent except for actually forcing the activation of any progress
display. With this, progress display will be automatically inhibited
whenever stderr is not a terminal, or full progress display will be
included otherwise. This should let people use 'git push' within a cron
job without filling their logs with useless percentage displays.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Tested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Let's keep thread stuff close together if possible. And in this case,
this even reduces the #ifdef noise, and allows for skipping the
autodetection altogether if delta search is not needed (like with a pure
clone).
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These spaces immediately before the end of lines are unnecessary.
While at it, instead of using a single string literal with backslashes
at end of each line, split the lines into individual string literals
and tell the compiler to concatenate them.
Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* np/maint-1.6.3-deepen:
pack-objects: free preferred base memory after usage
make shallow repository deepening more network efficient
|
|
When adding objects for preferred delta base, the content from tree
objects leading to given paths is kept in a cache. This has the
potential to grow significantly, especially with large directories as
the whole tree object content is loaded in memory, even if in practice
the number of those objects is limited to the 256 cache entries plus the
$window root tree objects. Still, that can't hurt freeing that up after
object enumeration is done, and before more memory is needed for delta
search.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is one of only two places that we use C99 variable length array on
the stack, which some older compilers apparently are not happy with.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The majority of code in core git appears to use a single
space after if/for/while. This is an attempt to bring more
code to this standard. These are entirely cosmetic changes.
Signed-off-by: Brian Gianforcaro <b.gianfo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* cc/replace:
t6050: check pushing something based on a replaced commit
Documentation: add documentation for "git replace"
Add git-replace to .gitignore
builtin-replace: use "usage_msg_opt" to give better error messages
parse-options: add new function "usage_msg_opt"
builtin-replace: teach "git replace" to actually replace
Add new "git replace" command
environment: add global variable to disable replacement
mktag: call "check_sha1_signature" with the replacement sha1
replace_object: add a test case
object: call "check_sha1_signature" with the replacement sha1
sha1_file: add a "read_sha1_file_repl" function
replace_object: add mechanism to replace objects found in "refs/replace/"
refs: add a "for_each_replace_ref" function
|
|
I have 4GB of RAM on my system which should, in theory, be quite enough
to repack a 600 MB repository. However the unbounded delta cache size
always pushes it into swap, at which point everything virtually comes to
a halt. So unbounded caches are never a good idea.
A default of 256MB should be a good compromize between memory usage and
speed where medium sized repositories are still likely to fit in the
cache with a reasonable memory usage, and larger repositories are going
to take quite some time to repack already anyway.
While at it, clarify the associated config variable documentation
entries a bit.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* js/maint-graft-unhide-true-parents:
git repack: keep commits hidden by a graft
Add a test showing that 'git repack' throws away grafted-away parents
Conflicts:
git-repack.sh
|
|
When you have grafts that pretend that a given commit has different
parents than the ones recorded in the commit object, it is dangerous
to let 'git repack' remove those hidden parents, as you can easily
remove the graft and end up with a broken repository.
So let's play it safe and keep those parent objects and everything
that is reachable by them, in addition to the grafted parents.
As this behavior can only be triggered by git pack-objects, and as that
command handles duplicate parents gracefully, we do not bother to cull
duplicated parents that may result by using both true and grafted
parents.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* tr/die_errno:
Use die_errno() instead of die() when checking syscalls
Convert existing die(..., strerror(errno)) to die_errno()
die_errno(): double % in strerror() output just in case
Introduce die_errno() that appends strerror(errno) to die()
|
|
Change calls to die(..., strerror(errno)) to use the new die_errno().
In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Shifting 'unsigned char' or 'unsigned short' left can result in sign
extension errors, since the C integer promotion rules means that the
unsigned char/short will get implicitly promoted to a signed 'int' due to
the shift (or due to other operations).
This normally doesn't matter, but if you shift things up sufficiently, it
will now set the sign bit in 'int', and a subsequent cast to a bigger type
(eg 'long' or 'unsigned long') will now sign-extend the value despite the
original expression being unsigned.
One example of this would be something like
unsigned long size;
unsigned char c;
size += c << 24;
where despite all the variables being unsigned, 'c << 24' ends up being a
signed entity, and will get sign-extended when then doing the addition in
an 'unsigned long' type.
Since git uses 'unsigned char' pointers extensively, we actually have this
bug in a couple of places.
I may have missed some, but this is the result of looking at
git grep '[^0-9 ][ ]*<<[ ][a-z]' -- '*.c' '*.h'
git grep '<<[ ]*24'
which catches at least the common byte cases (shifting variables by a
variable amount, and shifting by 24 bits).
I also grepped for just 'unsigned char' variables in general, and
converted the ones that most obviously ended up getting implicitly cast
immediately anyway (eg hash_name(), encode_85()).
In addition to just avoiding 'unsigned char', this patch also tries to use
a common idiom for the delta header size thing. We had three different
variations on it: "& 0x7fUL" in one place (getting the sign extension
right), and "& ~0x80" and "& 0x7f" in two other places (not getting it
right). Apart from making them all just avoid using "unsigned char" at
all, I also unified them to then use a simple "& 0x7f".
I considered making a sparse extension which warns about doing implicit
casts from unsigned types to signed types, but it gets rather complex very
quickly, so this is just a hack.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This new "read_replace_refs" global variable is set to 1 by
default, so that replace refs are used by default. But
reachability traversal and packing commands ("cmd_fsck",
"cmd_prune", "cmd_pack_objects", "upload_pack",
"cmd_unpack_objects") set it to 0, as they must work with the
original DAG.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* lt/pack-object-memuse:
show_object(): push path_name() call further down
process_{tree,blob}: show objects without buffering
Conflicts:
builtin-pack-objects.c
builtin-rev-list.c
list-objects.c
list-objects.h
upload-pack.c
|
|
In particular, pushing the "path_name()" call _into_ the show() function
would seem to allow
- more clarity into who "owns" the name (ie now when we free the name in
the show_object callback, it's because we generated it ourselves by
calling path_name())
- not calling path_name() at all, either because we don't care about the
name in the first place, or because we are actually happy walking the
linked list of "struct name_path *" and the last component.
Now, I didn't do that latter optimization, because it would require some
more coding, but especially looking at "builtin-pack-objects.c", we really
don't even want the whole pathname, we really would be better off with the
list of path components.
Why? We use that name for two things:
- add_preferred_base_object(), which actually _wants_ to traverse the
path, and now does it by looking for '/' characters!
- for 'name_hash()', which only cares about the last 16 characters of a
name, so again, generating the full name seems to be just unnecessary
work.
Anyway, so I didn't look any closer at those things, but it did convince
me that the "show_object()" calling convention was crazy, and we're
actually better off doing _less_ in list-objects.c, and giving people
access to the internal data structures so that they can decide whether
they want to generate a path-name or not.
This patch does that, and then for people who did use the name (even if
they might do something more clever in the future), it just does the
straightforward "name = path_name(path, component); .. free(name);" thing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Here's a less trivial thing, and slightly more dubious one.
I was looking at that "struct object_array objects", and wondering why we
do that. I have honestly totally forgotten. Why not just call the "show()"
function as we encounter the objects? Rather than add the objects to the
object_array, and then at the very end going through the array and doing a
'show' on all, just do things more incrementally.
Now, there are possible downsides to this:
- the "buffer using object_array" _can_ in theory result in at least
better I-cache usage (two tight loops rather than one more spread out
one). I don't think this is a real issue, but in theory..
- this _does_ change the order of the objects printed. Instead of doing a
"process_tree(revs, commit->tree, &objects, NULL, "");" in the loop
over the commits (which puts all the root trees _first_ in the object
list, this patch just adds them to the list of pending objects, and
then we'll traverse them in that order (and thus show each root tree
object together with the objects we discover under it)
I _think_ the new ordering actually makes more sense, but the object
ordering is actually a subtle thing when it comes to packing
efficiency, so any change in order is going to have implications for
packing. Good or bad, I dunno.
- There may be some reason why we did it that odd way with the object
array, that I have simply forgotten.
Anyway, now that we don't buffer up the objects before showing them
that may actually result in lower memory usage during that whole
traverse_commit_list() phase.
This is seriously not very deeply tested. It makes sense to me, it seems
to pass all the tests, it looks ok, but...
Does anybody remember why we did that "object_array" thing? It used to be
an "object_list" a long long time ago, but got changed into the array due
to better memory usage patterns (those linked lists of obejcts are
horrible from a memory allocation standpoint). But I wonder why we didn't
do this back then. Maybe there's a reason for it.
Or maybe there _used_ to be a reason, and no longer is.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* cc/bisect-filter: (21 commits)
rev-list: add "int bisect_show_flags" in "struct rev_list_info"
rev-list: remove last static vars used in "show_commit"
list-objects: add "void *data" parameter to show functions
bisect--helper: string output variables together with "&&"
rev-list: pass "int flags" as last argument of "show_bisect_vars"
t6030: test bisecting with paths
bisect: use "bisect--helper" and remove "filter_skipped" function
bisect: implement "read_bisect_paths" to read paths in "$GIT_DIR/BISECT_NAMES"
bisect--helper: implement "git bisect--helper"
bisect: use the new generic "sha1_pos" function to lookup sha1
rev-list: call new "filter_skip" function
patch-ids: use the new generic "sha1_pos" function to lookup sha1
sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1
rev-list: pass "revs" to "show_bisect_vars"
rev-list: make "show_bisect_vars" non static
rev-list: move code to show bisect vars into its own function
rev-list: move bisect related code into its own file
rev-list: make "bisect_list" variable local to "cmd_rev_list"
refs: add "for_each_ref_in" function to refactor "for_each_*_ref" functions
quote: add "sq_dequote_to_argv" to put unwrapped args in an argv array
...
|
|
* maint:
GIT 1.6.2.3
State the effect of filter-branch on graft explicitly
process_{tree,blob}: Remove useless xstrdup calls
Conflicts:
GIT-VERSION-GEN
|
|
* maint-1.6.1:
State the effect of filter-branch on graft explicitly
process_{tree,blob}: Remove useless xstrdup calls
|
|
* maint-1.6.0:
State the effect of filter-branch on graft explicitly
process_{tree,blob}: Remove useless xstrdup calls
|
|
On Wed, 8 Apr 2009, Björn Steinbrink wrote:
>
> The name of the processed object was duplicated for passing it to
> add_object(), but that already calls path_name, which allocates a new
> string anyway. So the memory allocated by the xstrdup calls just went
> nowhere, leaking memory.
Ack, ack.
There's another easy 5% or so for the built-in object walker: once we've
created the hash from the name, the name isn't interesting any more, and
so something trivial like this can help a bit.
Does it matter? Probably not on its own. But a few more memory saving
tricks and it might all make a difference.
Linus
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the case of a small repository, pack-objects is smart enough to not
start more threads than necessary. However, the output to the user always
reports the value of the pack.threads configuration and not the real
number of threads to be used.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
Conflicts:
t/t7700-repack.sh
|
|
The goal of this patch is to get rid of the "static struct rev_info
revs" static variable in "builtin-rev-list.c".
To do that, we need to pass the revs to the "show_commit" function
in "builtin-rev-list.c" and this in turn means that the
"traverse_commit_list" function in "list-objects.c" must be passed
functions pointers to functions with 2 parameters instead of one.
So we have to change all the callers and all the functions passed
to "traverse_commit_list".
Anyway this makes the code more clean and more generic, so it
should be a good thing in the long run.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
Conflicts:
t/t7700-repack.sh
|
|
* maint:
Increase the size of the die/warning buffer to avoid truncation
close_sha1_file(): make it easier to diagnose errors
avoid possible overflow in delta size filtering computation
|
|
* maint-1.6.1:
close_sha1_file(): make it easier to diagnose errors
avoid possible overflow in delta size filtering computation
|
|
* maint-1.6.0:
close_sha1_file(): make it easier to diagnose errors
avoid possible overflow in delta size filtering computation
|
|
On a 32-bit system, the maximum possible size for an object is less than
4GB, while 64-bit systems may cope with larger objects. Due to this
limitation, variables holding object sizes are using an unsigned long
type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).
When large objects are encountered, and/or people play with large delta
depth values, it is possible for the maximum allowed delta size
computation to overflow, especially on a 32-bit system. When this
occurs, surviving result bits may represent a value much smaller than
what it is supposed to be, or even zero. This prevents some objects
from being deltified although they do get deltified when a smaller depth
limit is used. Fix this by always performing a 64-bit multiplication.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-pack-directory:
Fix odb_mkstemp() on AIX
Make sure objects/pack exists before creating a new pack
Conflicts:
wrapper.c
|
|
If pack-objects is called with the --unpack-unreachable option then it
will unpack (i.e. loosen) all unreferenced objects from local not-kept
packs, including those that also exist in packs residing in an alternate
object database or a locally kept pack. The only user of this option is
git-repack.
In this case, repack will follow the call to pack-objects with a call to
prune-packed, which will delete these newly loosened objects, making the
act of loosening a waste of time. The unnecessary loosening can be
avoided by checking whether an object exists in a non-local pack or a
locally kept pack before loosening it.
This fixes the 'local packed unreachable obs that exist in alternate ODB
are not loosened' test in t7700.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack. It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These two features were invented for use by repack when repack will delete
the local packs that have been made redundant. The packs accessible
through alternates are not deleted by repack, so the objects contained in
them are still accessible after the local packs are deleted. They do not
need to be repacked into the new pack or loosened. For the case of
loosening they would immediately be deleted by the subsequent prune-packed
that is called by repack anyway.
This fixes the test
'packed unreachable obs in alternate ODB are not loosened' in t7700.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-keep-pack:
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
|
|
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.
Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.
This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This removes --unpacked=<packfile> parameter from the revision parser, and
rewrites its use in git-repack to pass a single --kept-pack-only option
instead.
The new --kept-pack-only option means just that. When this option is
given, is_kept_pack() that used to say "not on the --unpacked=<packfile>
list" now says "the packfile has corresponding .keep file".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack(). The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-1.6.0-pack-directory:
Make sure objects/pack exists before creating a new pack
|