Age | Commit message (Collapse) | Author | Files | Lines |
|
These are in a helper function, so the usual chain-lint doesn't notice
them. This function is still not perfect, as it has some git invocations
on the left-hand-side of the pipe, but it's primary purpose is timing,
not finding bugs or correctness issues.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In an upcoming commit, 'git repack' will want to create a pack comprised
of all of the objects in some packs (the included packs) excluding any
objects in some other packs (the excluded packs).
This caller could iterate those packs themselves and feed the objects it
finds to 'git pack-objects' directly over stdin, but this approach has a
few downsides:
- It requires every caller that wants to drive 'git pack-objects' in
this way to implement pack iteration themselves. This forces the
caller to think about details like what order objects are fed to
pack-objects, which callers would likely rather not do.
- If the set of objects in included packs is large, it requires
sending a lot of data over a pipe, which is inefficient.
- The caller is forced to keep track of the excluded objects, too, and
make sure that it doesn't send any objects that appear in both
included and excluded packs.
But the biggest downside is the lack of a reachability traversal.
Because the caller passes in a list of objects directly, those objects
don't get a namehash assigned to them, which can have a negative impact
on the delta selection process, causing 'git pack-objects' to fail to
find good deltas even when they exist.
The caller could formulate a reachability traversal themselves, but the
only way to drive 'git pack-objects' in this way is to do a full
traversal, and then remove objects in the excluded packs after the
traversal is complete. This can be detrimental to callers who care
about performance, especially in repositories with many objects.
Introduce 'git pack-objects --stdin-packs' which remedies these four
concerns.
'git pack-objects --stdin-packs' expects a list of pack names on stdin,
where 'pack-xyz.pack' denotes that pack as included, and
'^pack-xyz.pack' denotes it as excluded. The resulting pack includes all
objects that are present in at least one included pack, and aren't
present in any excluded pack.
To address the delta selection problem, 'git pack-objects --stdin-packs'
works as follows. First, it assembles a list of objects that it is going
to pack, as above. Then, a reachability traversal is started, whose tips
are any commits mentioned in included packs. Upon visiting an object, we
find its corresponding object_entry in the to_pack list, and set its
namehash parameter appropriately.
To avoid the traversal visiting more objects than it needs to, the
traversal is halted upon encountering an object which can be found in an
excluded pack (by marking the excluded packs as kept in-core, and
passing --no-kept-objects=in-core to the revision machinery).
This can cause the traversal to halt early, for example if an object in
an included pack is an ancestor of ones in excluded packs. But stopping
early is OK, since filling in the namehash fields of objects in the
to_pack list is only additive (i.e., having it helps the delta selection
process, but leaving it blank doesn't impact the correctness of the
resulting pack).
Even still, it is unlikely that this hurts us much in practice, since
the 'git repack --geometric' caller (which is introduced in a later
commit) marks small packs as included, and large ones as excluded.
During ordinary use, the small packs usually represent pushes after a
large repack, and so are unlikely to be ancestors of objects that
already exist in the repository.
(I found it convenient while developing this patch to have 'git
pack-objects' report the number of objects which were visited and got
their namehash fields filled in during traversal. This is also included
in the below patch via trace2 data lines).
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A future caller will want to be able to perform a reachability traversal
which terminates when visiting an object found in a kept pack. The
closest existing option is '--honor-pack-keep', but this isn't quite
what we want. Instead of halting the traversal midway through, a full
traversal is always performed, and the results are only trimmed
afterwords.
Besides needing to introduce a new flag (since culling results
post-facto can be different than halting the traversal as it's
happening), there is an additional wrinkle handling the distinction
in-core and on-disk kept packs. That is: what kinds of kept pack should
stop the traversal?
Introduce '--no-kept-objects[=<on-disk|in-core>]' to specify which kinds
of kept packs, if any, should stop a traversal. This can be useful for
callers that want to perform a reachability analysis, but want to leave
certain packs alone (for e.g., when doing a geometric repack that has
some "large" packs which are kept in-core that it wants to leave alone).
Note that this option is not guaranteed to produce exactly the set of
objects that aren't in kept packs, since it's possible the traversal
order may end up in a situation where a non-kept ancestor was "cut off"
by a kept object (at which point we would stop traversing). But, we
don't care about absolute correctness here, since this will eventually
be used as a purely additive guide in an upcoming new repack mode.
Explicitly avoid documenting this new flag, since it is only used
internally. In theory we could avoid even adding it rev-list, but being
able to spell this option out on the command-line makes some special
cases easier to test without promising to keep it behaving consistently
forever. Those tricky cases are exercised in t6114.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Future callers will want a function to fill a 'struct pack_entry' for a
given object id but _only_ from its position in any kept pack(s).
In particular, an new 'git repack' mode which ensures the resulting
packs form a geometric progress by object count will mark packs that it
does not want to repack as "kept in-core", and it will want to halt a
reachability traversal as soon as it visits an object in any of the kept
packs. But, it does not want to halt the traversal at non-kept, or
.keep packs.
The obvious alternative is 'find_pack_entry()', but this doesn't quite
suffice since it only returns the first pack it finds, which may or may
not be kept (and the mru cache makes it unpredictable which one you'll
get if there are options).
Short of that, you could walk over all packs looking for the object in
each one, but it scales with the number of packs, which may be
prohibitive.
Introduce 'find_kept_pack_entry()', a function which is like
'find_pack_entry()', but only fills in objects in the kept packs.
Handle packs which have .keep files, as well as in-core kept packs
separately, since certain callers will want to distinguish one from the
other. (Though on-disk and in-core kept packs share the adjective
"kept", it is best to think of the two sets as independent.)
There is a gotcha when looking up objects that are duplicated in kept
and non-kept packs, particularly when the MIDX stores the non-kept
version and the caller asked for kept objects only. This could be
resolved by teaching the MIDX to resolve duplicates by always favoring
the kept pack (if one exists), but this breaks an assumption in existing
MIDXs, and so it would require a format change.
The benefit to changing the MIDX in this way is marginal, so we instead
have a more thorough check here which is explained with a comment.
Callers will be added in subsequent patches.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The error message given when a configuration variable that is
expected to have a boolean value has been improved.
* ak/config-bad-bool-error:
config: improve error message for boolean config
|
|
"git reflog expire --stale-fix" can be used to repair the reflog by
removing entries that refer to objects that have been pruned away,
but was not careful to tolerate missing objects.
* js/reflog-expire-stale-fix:
reflog expire --stale-fix: be generous about missing objects
|
|
When certain features (e.g. grafts) used in the repository are
incompatible with the use of the commit-graph, we used to silently
turned commit-graph off; we now tell the user what we are doing.
* js/commit-graph-warning:
commit-graph: when incompatible with graphs, indicate why
|
|
Test to make sure "git rev-parse one-thing one-thing" gives
the same thing twice (when one-thing is --since=X).
* ew/rev-parse-since-test:
t1500: ensure current --since= behavior remains
|
|
"git maintenance" tool learned a new "pack-refs" maintenance task.
* ds/maintenance-pack-refs:
maintenance: incremental strategy runs pack-refs weekly
maintenance: add pack-refs task
|
|
Avoid individual tests in t5411 from getting affected by each other
by forcing them to use separate output files during the test.
* jx/t5411-unique-filenames:
t5411: refactor check of refs using test_cmp_refs
t5411: use different out file to prevent overwriting
|
|
Fix "git fsck --name-objects" which apparently has not been used by
anybody who is motivated enough to report breakage.
* js/fsck-name-objects-fix:
fsck --name-objects: be more careful parsing generation numbers
t1450: robustify `remove_object()`
|
|
The .mailmap is documented to be read only from the root level of a
working tree, but a stray file in a bare repository also was read
by accident, which has been corrected.
* jk/mailmap-only-at-root:
mailmap: only look for .mailmap in work tree
|
|
"git grep --untracked" is meant to be "let's ALSO find in these
files on the filesystem" when looking for matches in the working
tree files, and does not make any sense if the primary search is
done against the index, or the tree objects. The "--cached" and
"--untracked" options have been marked as mutually incompatible.
* mt/grep-cached-untracked:
grep: error out if --untracked is used with --cached
|
|
"git mergetool" feeds three versions (base, local and remote) of
a conflicted path unmodified. The command learned to optionally
prepare these files with unconflicted parts already resolved.
* sh/mergetool-hideresolved:
mergetool: add per-tool support and overrides for the hideResolved flag
mergetool: break setup_tool out into separate initialization function
mergetool: add hideResolved configuration
|
|
Even though invocations of "die()" were logged to the trace2
system, "BUG()"s were not, which has been corrected.
* jt/trace2-BUG:
usage: trace2 BUG() invocations
|
|
The "git range-diff" command learned "--(left|right)-only" option
to show only one side of the compared range.
* js/range-diff-one-side-only:
range-diff: offer --left-only/--right-only options
range-diff: move the diffopt initialization down one layer
range-diff: combine all options in a single data structure
range-diff: simplify code spawning `git log`
range-diff: libify the read_patches() function again
range-diff: avoid leaking memory in two error code paths
|
|
There are other ways than ".." for a single token to denote a
"commit range", namely "<rev>^!" and "<rev>^-<n>", but "git
range-diff" did not understand them.
* js/range-diff-wo-dotdot:
range-diff(docs): explain how to specify commit ranges
range-diff/format-patch: handle commit ranges other than A..B
range-diff/format-patch: refactor check for commit range
|
|
"git clone" tries to locally check out the branch pointed at by
HEAD of the remote repository after it is done, but the protocol
did not convey the information necessary to do so when copying an
empty repository. The protocol v2 learned how to do so.
* jt/clone-unborn-head:
clone: respect remote unborn HEAD
connect, transport: encapsulate arg in struct
ls-refs: report unborn targets of symrefs
|
|
Piecemeal of rewrite of "git bisect" in C continues.
* mr/bisect-in-c-4:
bisect--helper: retire `--check-and-set-terms` subcommand
bisect--helper: reimplement `bisect_skip` shell function in C
bisect--helper: retire `--bisect-auto-next` subcommand
bisect--helper: use `res` instead of return in BISECT_RESET case option
bisect--helper: retire `--bisect-write` subcommand
bisect--helper: reimplement `bisect_replay` shell function in C
bisect--helper: reimplement `bisect_log` shell function in C
|
|
Fix incremental update of commit-graph file around corrected commit
date data.
* ds/commit-graph-genno-fix:
commit-graph: prepare commit graph
commit-graph: be extra careful about mixed generations
commit-graph: compute generations separately
commit-graph: validate layers for generation data
commit-graph: always parse before commit_graph_data_at()
commit-graph: use repo_parse_commit
|
|
The commit-graph learned to use corrected commit dates instead of
the generation number to help topological revision traversal.
* ak/corrected-commit-date:
doc: add corrected commit date info
commit-reach: use corrected commit dates in paint_down_to_common()
commit-graph: use generation v2 only if entire chain does
commit-graph: implement generation data chunk
commit-graph: implement corrected commit date
commit-graph: return 64-bit generation number
commit-graph: add a slab to store topological levels
t6600-test-reach: generalize *_three_modes
commit-graph: consolidate fill_commit_graph_info
revision: parse parent in indegree_walk_step()
commit-graph: fix regression when computing Bloom filters
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When commands are started from a subdirectory, they may have to
compare the path to the subdirectory (called prefix and found out
from $(pwd)) with the tracked paths. On macOS, $(pwd) and
readdir() yield decomposed path, while the tracked paths are
usually normalized to the precomposed form, causing mismatch. This
has been fixed by taking the same approach used to normalize the
command line arguments.
* tb/precompose-prefix-too:
MacOS: precompose_argv_prefix()
|
|
The command line completion (in contrib/) completed "git branch -d"
with branch names, but "git branch -D" offered tagnames in addition,
which has been corrected. "git branch -M" had the same problem.
* jk/complete-branch-force-delete:
doc/git-branch: fix awkward wording for "-c"
completion: handle other variants of "branch -m"
completion: treat "branch -D" the same way as "branch -d"
|
|
Fix in passing custom args from "git clone" to "upload-pack" on the
other side.
* jv/upload-pack-filter-spec-quotefix:
t5544: clarify 'hook works with partial clone' test
upload-pack.c: fix filter spec quoting bug
|
|
Introduce an on-disk file to record revindex for packdata, which
traditionally was always created on the fly and only in-core.
* tb/pack-revindex-on-disk:
t5325: check both on-disk and in-memory reverse index
pack-revindex: ensure that on-disk reverse indexes are given precedence
t: support GIT_TEST_WRITE_REV_INDEX
t: prepare for GIT_TEST_WRITE_REV_INDEX
Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
builtin/pack-objects.c: respect 'pack.writeReverseIndex'
builtin/index-pack.c: write reverse indexes
builtin/index-pack.c: allow stripping arbitrary extensions
pack-write.c: prepare to write 'pack-*.rev' files
packfile: prepare for the existence of '*.rev' files
|
|
Various test updates.
* ab/tests-various-fixup:
rm tests: actually test for SIGPIPE in SIGPIPE test
archive tests: use a cheaper "zipinfo -h" invocation to get header
upload-pack tests: avoid a non-zero "grep" exit status
git-svn tests: rewrite brittle tests to use "--[no-]merges".
git svn mergeinfo tests: refactor "test -z" to use test_must_be_empty
git svn mergeinfo tests: modernize redirection & quoting style
cache-tree tests: explicitly test HEAD and index differences
cache-tree tests: use a sub-shell with less indirection
cache-tree tests: remove unused $2 parameter
cache-tree tests: refactor for modern test style
|
|
|
|
The "ort" merge strategy.
* en/merge-ort-perf:
merge-ort: begin performance work; instrument with trace2_region_* calls
merge-ort: ignore the directory rename split conflict for now
merge-ort: fix massive leak
|
|
ORT merge strategy learns to infer "renamed directory" while
merging.
* en/ort-directory-rename:
merge-ort: fix a directory rename detection bug
merge-ort: process_renames() now needs more defensiveness
merge-ort: implement apply_directory_rename_modifications()
merge-ort: add a new toplevel_dir field
merge-ort: implement handle_path_level_conflicts()
merge-ort: implement check_for_directory_rename()
merge-ort: implement apply_dir_rename() and check_dir_renamed()
merge-ort: implement compute_collisions()
merge-ort: modify collect_renames() for directory rename handling
merge-ort: implement handle_directory_level_conflicts()
merge-ort: implement compute_rename_counts()
merge-ort: copy get_renamed_dir_portion() from merge-recursive.c
merge-ort: add outline of get_provisional_directory_renames()
merge-ort: add outline for computing directory renames
merge-ort: collect which directories are removed in dirs_removed
merge-ort: initialize and free new directory rename data structures
merge-ort: add new data structures for directory rename detection
|
|
* tb/ci-run-cocci-with-18.04:
.github/workflows/main.yml: run static-analysis on bionic
|
|
Currently invalid boolean config values return messages about 'bad
numeric', which is slightly misleading when the error was due to a
boolean value. We can improve the developer experience by returning a
boolean error message when we know the value is neither a bool text or
int.
before with an invalid boolean value of `non-boolean`, its unclear what
numeric is referring to:
fatal: bad numeric config value 'non-boolean' for 'commit.gpgsign': invalid unit
now the error message mentions `non-boolean` is a bad boolean value:
fatal: bad boolean config value 'non-boolean' for 'commit.gpgsign'
Signed-off-by: Andrew Klotz <agc.klotz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When `gc.writeCommitGraph = true`, it is possible that the commit-graph
is _still_ not written: replace objects, grafts and shallow repositories
are incompatible with the commit-graph feature.
Under such circumstances, we need to indicate to the user why the
commit-graph was not written instead of staying silent about it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Whenever a user runs `git reflog expire --stale-fix`, the most likely
reason is that their repository is at least _somewhat_ corrupt. Which
means that it is more than just possible that some objects are missing.
If that is the case, that can currently let the command abort through
the phase where it tries to mark all reachable objects.
Instead of adding insult to injury, let's be gentle and continue as best
as we can in such a scenario, simply by ignoring the missing objects and
moving on.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The version of Ubuntu Linux used by default at GitHub Actions CI
has been updated to one that lack coccinelle; until it gets fixed,
work it around by sticking to the previous release (18.04).
* tb/ci-run-cocci-with-18.04:
.github/workflows/main.yml: run static-analysis on bionic
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Get rid of "GETTEXT_POISON" support altogether, which may or may
not be controversial.
* ab/detox-gettext-tests:
tests: remove uses of GIT_TEST_GETTEXT_POISON=false
tests: remove support for GIT_TEST_GETTEXT_POISON
ci: remove GETTEXT_POISON jobs
|
|
Update support for invalid UTF-8 in PCRE2.
* ab/grep-pcre-invalid-utf8:
grep/pcre2: better support invalid UTF-8 haystacks
grep/pcre2 tests: don't rely on invalid UTF-8 data test
|
|
The support for deprecated PCRE1 library has been dropped.
* ab/retire-pcre1:
Remove support for v1 of the PCRE library
config.mak.uname: remove redundant NO_LIBPCRE1_JIT flag
|
|
Some pretty-format specifiers do not need the data in commit object
(e.g. "%H"), but we were over-eager to load and parse it, which has
been made even lazier.
* jk/pretty-lazy-load-commit:
pretty: lazy-load commit data when expanding user-format
|
|
Cleaning various codepaths up.
* ds/more-index-cleanups:
t1092: test interesting sparse-checkout scenarios
test-lib: test_region looks for trace2 regions
sparse-checkout: load sparse-checkout patterns
name-hash: use trace2 regions for init
repository: add repo reference to index_state
fsmonitor: de-duplicate BUG()s around dirty bits
cache-tree: extract subtree_pos()
cache-tree: simplify verify_cache() prototype
cache-tree: clean up cache_tree_update()
|
|
`git worktree list` now annotates worktrees as prunable, shows
locked and prunable attributes in --porcelain mode, and gained
a --verbose option.
* rs/worktree-list-verbose:
worktree: teach `list` verbose mode
worktree: teach `list` to annotate prunable worktree
worktree: teach `list --porcelain` to annotate locked worktree
t2402: ensure locked worktree is properly cleaned up
worktree: teach worktree_lock_reason() to gently handle main worktree
worktree: teach worktree to lazy-load "prunable" reason
worktree: libify should_prune_worktree()
|
|
When "git rebase -i" processes "fixup" insn, there is no reason to
clean up the commit log message, but we did the usual stripspace
processing. This has been corrected.
* js/rebase-i-commit-cleanup-fix:
rebase -i: do leave commit message intact in fixup! chains
|
|
Code clean-up.
* jk/t0000-cleanups:
t0000: consistently use single quotes for outer tests
t0000: run cleaning test inside sub-test
t0000: run prereq tests inside sub-test
t0000: keep clean-up tests together
|
|
Test fix.
* sg/t7800-difftool-robustify:
t7800-difftool: don't accidentally match tmp dirs
|
|
Lose the debugging aid that may have been useful in the past, but
no longer is, in the "grep" codepaths.
* ab/lose-grep-debug:
grep/log: remove hidden --debug and --grep-debug options
|
|
Code clean-up to ensure our use of hashtables using object names as
keys use the "struct object_id" objects, not the raw hash values.
* jk/use-oid-pos:
oid_pos(): access table through const pointers
hash_pos(): convert to oid_pos()
rerere: use strmap to store rerere directories
rerere: tighten rr-cache dirname check
rerere: check dirname format while iterating rr_cache directory
commit_graft_pos(): take an oid instead of a bare hash
|
|
This behavior of git-rev-parse is observed since git 1.8.3.1
at least(*), and likely earlier versions.
At least one git-reliant project in-the-wild relies on this
current behavior of git-rev-parse being able to handle multiple
--since= arguments without squeezing identical results together.
So add a test to prevent the potential for regression in
downstream projects.
(*) 1.8.3.1 the version packaged for CentOS 7.x
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When trying to find a .mailmap file, we will always look for it in the
current directory. This makes sense in a repository with a working tree,
since we'd always go to the toplevel directory at startup. But for a
bare repository, it can be confusing. With an option like --git-dir (or
$GIT_DIR in the environment), we don't chdir at all, and we'd read
.mailmap from whatever directory you happened to be in before starting
Git.
(Note that --git-dir without specifying a working tree historically
means "the current directory is the root of the working tree", but most
bare repositories will have core.bare set these days, meaning they will
realize there is no working tree at all).
The documentation for gitmailmap(5) says:
If the file `.mailmap` exists at the toplevel of the repository[...]
which likewise reinforces the notion that we are looking in the working
tree.
This patch prevents us from looking for such a file when we're in a bare
repository. This does break something that used to work:
cd bare.git
git cat-file blob HEAD:.mailmap >.mailmap
git shortlog
But that was never advertised in the documentation. And these days we
have mailmap.blob (which defaults to HEAD:.mailmap) to do the same thing
in a much cleaner way.
However, there's one more interesting case: we might not have a
repository at all! The git-shortlog command can be run with git-log
output fed on its stdin, and it will apply the mailmap. In that case, it
probably does make sense to read .mailmap from the current directory.
This patch will continue to do so.
That leads to one even weirder case: if you run git-shortlog to process
stdin, the input _could_ be from a different repository entirely. Should
we respect the in-tree .mailmap then? Probably yes. Whatever the source
of the input, if shortlog is running in a repository, the documentation
claims that we'd read the .mailmap from its top-level (and of course
it's reasonably likely that it _is_ from the same repo, and the user
just preferred to run git-log and git-shortlog separately for whatever
reason).
The included test covers these cases, and we now document the "no repo"
case explicitly.
We also add a test that confirms we find a top-level ".mailmap" even
when we start in a subdirectory of the working tree. This worked both
before and after this commit, but we never tested it explicitly (it
works because we always chdir to the top-level of the working tree if
there is one).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|