Age | Commit message (Collapse) | Author | Files | Lines |
|
When delta-islands are in use, we need to record the deepest path at
which we find each tree and blob. Our loop to do so counts slashes, so
"foo" is depth 0, "foo/bar" is depth 1, and so on.
However, this neglects root trees, which are represented by the empty
string. Those also have depth 0, but are at a layer above "foo". Thus,
"foo" should be 1, "foo/bar" at 2, and so on. We use this depth to
topo-sort the trees in resolve_tree_islands(). As a result, we may fail
to visit a root tree before the sub-trees it contains, and therefore not
correctly pass down the island marks.
That in turn could lead to missing some delta opportunities (objects are
in the same island, but we didn't realize it) or creating unwanted
cross-island deltas (one object is in an island another isn't, but we
don't realize). In practice, it seems to have only a small effect. Some
experiments on the real-world git/git fork network at GitHub showed an
improvement of only 0.14% in the resulting clone size.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 108f530385 (pack-objects: move tree_depth into 'struct
packing_data', 2018-08-16) started maintaining a tree_depth array that
matches the "objects" array. We extend the array when:
1. The objects array is extended, in which case we use realloc to
extend the tree_depth array.
2. A caller asks to store a tree_depth for object N, and this is the
first such request; we create the array from scratch and store the
value for N.
In the latter case, though, we use regular xmalloc(), and the depth
values for any objects besides N is undefined. This happens to not
trigger a bug with the current code, but the reasons are quite subtle:
- we never ask about the depth for any object with index i < N. This is
because we store the depth immediately for all trees and blobs. So
any such "i" must be a non-tree, and therefore we will never need to
care about its depth (in fact, we really only care about the depth of
trees).
- there are no objects at this point with index i > N, because we
always fill in the depth for a tree immediately after its object
entry is created (we may still allocate uninitialized depth entries,
but they'll be initialized by packlist_alloc() when it initializes
the entry in the "objects" array).
So it works, but only by chance. To be defensive, let's zero the array,
which matches the "unset" values which would be handed out by
oe_tree_depth() already.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit 108f530385 (pack-objects: move tree_depth into 'struct
packing_data', 2018-08-16) dynamically manages a tree_depth array in
packing_data that maintains one of these invariants:
1. tree_depth is NULL (i.e., the requested options don't require us to
track tree depths)
2. tree_depth is non-NULL and has as many entries as the "objects"
array
We maintain (2) by:
a. When the objects array grows, grow tree_depth to the same size
(unless it's NULL, in which case we can leave it).
b. When a caller asks to set a depth via oe_set_tree_depth(), if
tree_depth is NULL we allocate it.
But in (b), we use the number of stored objects, _not_ the allocated
size of the objects array. So we can run into a situation like this:
1. packlist_alloc() needs to store the Nth object, so it grows the
objects array to M, where M > N.
2. oe_set_tree_depth() wants to store a depth, so it allocates an
array of length N. Now we've violated our invariant.
3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
grow the objects array, since N <= M still holds. We try to assign
to tree_depth[N+1], which is out of bounds.
That doesn't happen in our test scripts, because the repositories they
use are so small, but it's easy to trigger by running:
echo HEAD | git pack-objects --revs --delta-islands --stdout >/dev/null
in any reasonably-sized repo (like git.git).
We can fix it by always growing the array to match pack->nr_alloc, not
pack->nr_objects. Likewise for the "layer" array from fe0ac2fb7f
(pack-objects: move 'layer' into 'struct packing_data', 2018-08-16),
which has the same bug.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This reduces the size of 'struct object_entry' from 88 bytes
to 80 and therefore makes packing objects more efficient.
For example on a Linux repo with 12M objects,
`git pack-objects --all` needs extra 96MB memory even if the
layer feature is not used.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This reduces the size of 'struct object_entry' and therefore
makes packing objects more efficient.
This also renames cmp_tree_depth() into tree_depth_compare(),
as it is more modern to have the name of the compare functions
end with "compare".
Helped-by: Jeff King <peff@peff.net>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Implement simple support for --delta-islands option and
repack.useDeltaIslands config variable in git repack.
This allows users to setup delta islands in their config and
get the benefit of less disk usage while cloning and fetching
is still quite fast and not much more CPU intensive.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Implement support for delta islands in git pack-objects
and document how delta islands work in
"Documentation/git-pack-objects.txt" and Documentation/config.txt.
This allows users to setup delta islands in their config and
get the benefit of less disk usage while cloning and fetching
is still quite fast and not much more CPU intensive.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In a following commit, as we will use delta islands, we will
have to compute the write order for different layers, not just
for one.
Let's prepare for that by refactoring the code that will be
used to compute the write order for a given layer into a new
compute_layer_order() function.
This will make it easier to see and understand what the
following changes are doing.
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Hosting providers that allow users to "fork" existing
repos want those forks to share as much disk space as
possible.
Alternates are an existing solution to keep all the
objects from all the forks into a unique central repo,
but this can have some drawbacks. Especially when
packing the central repo, deltas will be created
between objects from different forks.
This can make cloning or fetching a fork much slower
and much more CPU intensive as Git might have to
compute new deltas for many objects to avoid sending
objects from a different fork.
Because the inefficiency primarily arises when an
object is deltified against another object that does
not exist in the same fork, we partition objects into
sets that appear in the same fork, and define
"delta islands". When finding delta base, we do not
allow an object outside the same island to be
considered as its base.
So "delta islands" is a way to store objects from
different forks in the same repo and packfile without
having deltas between objects from different forks.
This patch implements the delta islands mechanism in
"delta-islands.{c,h}", but does not yet make use of it.
A few new fields are added in 'struct object_entry'
in "pack-objects.h" though.
The documentation will follow in a patch that actually
uses delta islands in "builtin/pack-objects.c".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The singleton commit-graph in-core instance is made per in-core
repository instance.
* jt/commit-graph-per-object-store:
commit-graph: add repo arg to graph readers
commit-graph: store graph in struct object_store
commit-graph: add free_commit_graph
commit-graph: add missing forward declaration
object-store: add missing include
commit-graph: refactor preparing commit graph
|
|
Look for broken "&&" chains that are hidden in subshell, many of
which have been found and corrected.
* es/chain-lint-in-subshell:
t/chainlint.sed: drop extra spaces from regex character class
t/chainlint: add chainlint "specialized" test cases
t/chainlint: add chainlint "complex" test cases
t/chainlint: add chainlint "cuddled" test cases
t/chainlint: add chainlint "loop" and "conditional" test cases
t/chainlint: add chainlint "nested subshell" test cases
t/chainlint: add chainlint "one-liner" test cases
t/chainlint: add chainlint "whitespace" test cases
t/chainlint: add chainlint "basic" test cases
t/Makefile: add machinery to check correctness of chainlint.sed
t/test-lib: teach --chain-lint to detect broken &&-chains in subshells
|
|
The lazy clone support had a few places where missing but promised
objects were not correctly tolerated, which have been fixed.
* jt/tags-to-promised-blobs-fix:
tag: don't warn if target is missing but promised
revision: tolerate promised targets of tags
|
|
Add a server-side knob to skip commits in exponential/fibbonacci
stride in an attempt to cover wider swath of history with a smaller
number of iterations, potentially accepting a larger packfile
transfer, instead of going back one commit a time during common
ancestor discovery during the "git fetch" transaction.
* jt/fetch-negotiator-skipping:
negotiator/skipping: skip commits during fetch
|
|
"git send-email" when using in a batched mode that limits the
number of messages sent in a single SMTP session lost the contents
of the variable used to choose between tls/ssl, unable to send the
second and later batches, which has been fixed.
* jm/send-email-tls-auth-on-batch:
send-email: fix tls AUTH when sending batch
|
|
"git rebase" started exporting GIT_DIR environment variable and
exposing it to hook scripts when part of it got rewritten in C.
Instead of matching the old scripted Porcelains' behaviour,
compensate by also exporting GIT_WORK_TREE environment as well to
lessen the damage. This can harm existing hooks that want to
operate on different repository, but the current behaviour is
already broken for them anyway.
* bc/sequencer-export-work-tree-as-well:
sequencer: pass absolute GIT_WORK_TREE to exec commands
|
|
Tests to cover conflict cases that involve submodules have been
added for merge-recursive.
* en/t7405-recursive-submodule-conflicts:
t7405: verify 'merge --abort' works after submodule/path conflicts
t7405: add a directory/submodule conflict
t7405: add a file/submodule conflict
|
|
Tests to cover various conflicting cases have been added for
merge-recursive.
* en/t6036-merge-recursive-tests:
t6036: add a failed conflict detection case: regular files, different modes
t6036: add a failed conflict detection case with conflicting types
t6036: add a failed conflict detection case with submodule add/add
t6036: add a failed conflict detection case with submodule modify/modify
t6036: add a failed conflict detection case with symlink add/add
t6036: add a failed conflict detection case with symlink modify/modify
|
|
The recursive merge strategy did not properly ensure there was no
change between HEAD and the index before performing its operation,
which has been corrected.
* en/dirty-merge-fixes:
merge: fix misleading pre-merge check documentation
merge-recursive: enforce rule that index matches head before merging
t6044: add more testcases with staged changes before a merge is invoked
merge-recursive: fix assumption that head tree being merged is HEAD
merge-recursive: make sure when we say we abort that we actually abort
t6044: add a testcase for index matching head, when head doesn't match HEAD
t6044: verify that merges expected to abort actually abort
index_has_changes(): avoid assuming operating on the_index
read-cache.c: move index_has_changes() from merge.c
|
|
"git rebase --rebase-merges" mode now handles octopus merges as
well.
* js/rebase-merge-octopus:
rebase --rebase-merges: adjust man page for octopus support
rebase --rebase-merges: add support for octopus merges
merge: allow reading the merge commit message from a file
|
|
"git grep" learned the "--only-matching" option.
* tb/grep-only-matching:
grep.c: teach 'git grep --only-matching'
grep.c: extract show_line_header()
|
|
"git gc --auto" opens file descriptors for the packfiles before
spawning "git repack/prune", which would upset Windows that does
not want a process to work on a file that is open by another
process. The issue has been worked around.
* kg/gc-auto-windows-workaround:
gc --auto: release pack files before auto packing
|
|
For a large tree, the index needs to hold many cache entries
allocated on heap. These cache entries are now allocated out of a
dedicated memory pool to amortize malloc(3) overhead.
* jm/cache-entry-from-mem-pool:
block alloc: add validations around cache_entry lifecyle
block alloc: allocate cache entries from mem_pool
mem-pool: fill out functionality
mem-pool: add life cycle management functions
mem-pool: only search head block for available space
block alloc: add lifecycle APIs for cache_entry structs
read-cache: teach make_cache_entry to take object_id
read-cache: teach refresh_cache_entry to take istate
|
|
"git fetch" learned a new option "--negotiation-tip" to limit the
set of commits it tells the other end as "have", to reduce wasted
bandwidth and cycles, which would be helpful when the receiving
repository has a lot of refs that have little to do with the
history at the remote it is fetching from.
* jt/fetch-nego-tip:
fetch-pack: support negotiation tip whitelist
|
|
Various glitches in the heuristics of merge-recursive strategy have
been documented in new tests.
* en/t6042-insane-merge-rename-testcases:
t6042: add testcase covering long chains of rename conflicts
t6042: add testcase covering rename/rename(2to1)/delete/delete conflict
t6042: add testcase covering rename/add/delete conflict type
|
|
lookup_commit_reference() and friends have been updated to find
in-core object for a specific in-core repository instance.
* sb/object-store-lookup: (32 commits)
commit.c: allow lookup_commit_reference to handle arbitrary repositories
commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories
tag.c: allow deref_tag to handle arbitrary repositories
object.c: allow parse_object to handle arbitrary repositories
object.c: allow parse_object_buffer to handle arbitrary repositories
commit.c: allow get_cached_commit_buffer to handle arbitrary repositories
commit.c: allow set_commit_buffer to handle arbitrary repositories
commit.c: migrate the commit buffer to the parsed object store
commit-slabs: remove realloc counter outside of slab struct
commit.c: allow parse_commit_buffer to handle arbitrary repositories
tag: allow parse_tag_buffer to handle arbitrary repositories
tag: allow lookup_tag to handle arbitrary repositories
commit: allow lookup_commit to handle arbitrary repositories
tree: allow lookup_tree to handle arbitrary repositories
blob: allow lookup_blob to handle arbitrary repositories
object: allow lookup_object to handle arbitrary repositories
object: allow object_as_type to handle arbitrary repositories
tag: add repository argument to deref_tag
tag: add repository argument to parse_tag_buffer
tag: add repository argument to lookup_tag
...
|
|
Parsing of -L[<N>][,[<M>]] parameters "git blame" and "git log"
take has been tweaked.
* is/parsing-line-range:
log: prevent error if line range ends past end of file
blame: prevent error if range ends past end of file
|
|
Code restructuring and a small fix to transport protocol v2 during
fetching.
* jt/fetch-pack-negotiator:
fetch-pack: introduce negotiator API
fetch-pack: move common check and marking together
fetch-pack: make negotiation-related vars local
fetch-pack: use ref adv. to prune "have" sent
fetch-pack: directly end negotiation if ACK ready
fetch-pack: clear marks before re-marking
fetch-pack: split up everything_local()
|
|
"git checkout" and "git worktree add" learned to honor
checkout.defaultRemote when auto-vivifying a local branch out of a
remote tracking branch in a repository with multiple remotes that
have tracking branches that share the same names.
* ab/checkout-default-remote:
checkout & worktree: introduce checkout.defaultRemote
checkout: add advice for ambiguous "checkout <branch>"
builtin/checkout.c: use "ret" variable for return
checkout: pass the "num_matches" up to callers
checkout.c: change "unique" member to "num_matches"
checkout.c: introduce an *_INIT macro
checkout.h: wrap the arguments to unique_tracking_name()
checkout tests: index should be clean after dwim checkout
|
|
"git diff --color-moved" feature has further been tweaked.
* sb/diff-color-move-more:
diff.c: offer config option to control ws handling in move detection
diff.c: add white space mode to move detection that allows indent changes
diff.c: factor advance_or_nullify out of mark_color_as_moved
diff.c: decouple white space treatment from move detection algorithm
diff.c: add a blocks mode for moved code detection
diff.c: adjust hash function signature to match hashmap expectation
diff.c: do not pass diff options as keydata to hashmap
t4015: avoid git as a pipe input
xdiff/xdiffi.c: remove unneeded function declarations
xdiff/xdiff.h: remove unused flags
|
|
Test clean-up and corrections.
* es/test-fixes: (26 commits)
t5608: fix broken &&-chain
t9119: fix broken &&-chains
t9000-t9999: fix broken &&-chains
t7000-t7999: fix broken &&-chains
t6000-t6999: fix broken &&-chains
t5000-t5999: fix broken &&-chains
t4000-t4999: fix broken &&-chains
t3030: fix broken &&-chains
t3000-t3999: fix broken &&-chains
t2000-t2999: fix broken &&-chains
t1000-t1999: fix broken &&-chains
t0000-t0999: fix broken &&-chains
t9814: simplify convoluted check that command correctly errors out
t9001: fix broken "invoke hook" test
t7810: use test_expect_code() instead of hand-rolled comparison
t7400: fix broken "submodule add/reconfigure --force" test
t7201: drop pointless "exit 0" at end of subshell
t6036: fix broken "merge fails but has appropriate contents" tests
t5505: modernize and simplify hard-to-digest test
t5406: use write_script() instead of birthing shell script manually
...
|
|
"git fsck" learns to make sure the optional commit-graph file is in
a sane state.
* ds/commit-graph-fsck: (23 commits)
coccinelle: update commit.cocci
commit-graph: update design document
gc: automatically write commit-graph files
commit-graph: add '--reachable' option
commit-graph: use string-list API for input
fsck: verify commit-graph
commit-graph: verify contents match checksum
commit-graph: test for corrupted octopus edge
commit-graph: verify commit date
commit-graph: verify generation number
commit-graph: verify parent list
commit-graph: verify root tree OIDs
commit-graph: verify objects exist
commit-graph: verify corrupt OID fanout and lookup
commit-graph: verify required chunks are present
commit-graph: verify catches corrupt signature
commit-graph: add 'verify' subcommand
commit-graph: load a root tree from specific graph
commit: force commit to parse from object database
commit-graph: parse commit from chosen graph
...
|
|
Recent "security fix" to pay attention to contents of ".gitmodules"
while accepting "git push" was a bit overly strict than necessary,
which has been adjusted.
* jk/fsck-gitmodules-gently:
fsck: downgrade gitmodulesParse default to "info"
fsck: split ".gitmodules too large" error from parse failure
fsck: silence stderr when parsing .gitmodules
config: add options parameter to git_config_from_mem
config: add CONFIG_ERROR_SILENT handler
config: turn die_on_error into caller-facing enum
|
|
Conversion from uchar[40] to struct object_id continues.
* bc/object-id:
pretty: switch hard-coded constants to the_hash_algo
sha1-file: convert constants to uses of the_hash_algo
log-tree: switch GIT_SHA1_HEXSZ to the_hash_algo->hexsz
diff: switch GIT_SHA1_HEXSZ to use the_hash_algo
builtin/merge-recursive: make hash independent
builtin/merge: switch to use the_hash_algo
builtin/fmt-merge-msg: make hash independent
builtin/update-index: simplify parsing of cacheinfo
builtin/update-index: convert to using the_hash_algo
refs/files-backend: use the_hash_algo for writing refs
sha1-name: use the_hash_algo when parsing object names
strbuf: allocate space with GIT_MAX_HEXSZ
commit: express tree entry constants in terms of the_hash_algo
hex: switch to using the_hash_algo
tree-walk: replace hard-coded constants with the_hash_algo
cache: update object ID functions for the_hash_algo
|
|
Tests to cover more D/F conflict cases have been added for
merge-recursive.
* en/t6036-recursive-corner-cases:
t6036: fix broken && chain in sub-shell
t6036: add lots of detail for directory/file conflicts in recursive case
|
|
httpd tests saw occasional breakage due to the way its access log
gets inspected by the tests, which has been updated to make them
less flaky.
* sg/httpd-test-unflake:
t/lib-httpd: avoid occasional failures when checking access.log
t/lib-httpd: add the strip_access_log() helper function
t5541: clean up truncating access log
|
|
A test helper update for Windows.
* bp/test-drop-caches-for-windows:
handle lower case drive letters on Windows
|
|
"git pull --rebase" on a corrupt HEAD caused a segfault. In
general we substitute an empty tree object when running the in-core
equivalent of the diff-index command, and the codepath has been
corrected to do so as well to fix this issue.
* jk/has-uncommitted-changes-fix:
has_uncommitted_changes(): fall back to empty tree
|
|
This character class, like many others in this script, matches
horizontal whitespace consisting of spaces and tabs, however, a few
extra, entirely harmless, spaces somehow slipped into the expression.
Removing them is purely a cosmetic fix.
While at it, re-indent three lines with a single TAB each which were
incorrectly indented with six spaces. Also, a purely cosmetic fix.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Honor core.commentchar when preparing the list of commits to replay
in "rebase -i".
* as/sequencer-customizable-comment-char:
sequencer: use configured comment character
|
|
Code clean-up.
* sb/blame-color:
blame: prefer xsnprintf to strcpy for colors
|
|
Build doc update for Windows.
* nd/command-list:
vcbuild/README: update to accommodate for missing common-cmds.h
|
|
Look for broken use of "VAR=VAL shell_func" in test scripts as part
of test-lint.
* es/test-lint-one-shot-export:
t/check-non-portable-shell: detect "FOO=bar shell_func"
t/check-non-portable-shell: make error messages more compact
t/check-non-portable-shell: stop being so polite
t6046/t9833: fix use of "VAR=VAL cmd" with a shell function
|
|
"git rev-parse ':/substring'" did not consider the history leading
only to HEAD when looking for a commit with the given substring,
when the HEAD is detached. This has been fixed.
* wc/find-commit-with-pattern-on-detached-head:
sha1-name.c: for ":/", find detached HEAD commits
|
|
Correct a broken use of "VAR=VAL shell_func" in a test.
* jc/t3404-one-shot-export-fix:
t3404: fix use of "VAR=VAL cmd" with a shell function
|
|
"git reset --merge" (hence "git merge ---abort") and "git reset --hard"
had trouble working correctly in a sparsely checked out working
tree after a conflict, which has been corrected.
* mk/merge-in-sparse-checkout:
unpack-trees: do not fail reset because of unmerged skipped entry
|
|
Code clean-up.
* hs/push-cert-check-cleanup:
gpg-interface: make parse_gpg_output static and remove from interface header
builtin/receive-pack: use check_signature from gpg-interface
|
|
Handling of an empty range by "git cherry-pick" was inconsistent
depending on how the range ended up to be empty, which has been
corrected.
* jk/empty-pick-fix:
sequencer: don't say BUG on bogus input
sequencer: handle empty-set cases consistently
|