summaryrefslogtreecommitdiff
path: root/t
AgeCommit message (Collapse)AuthorFilesLines
2017-01-27pack-objects: enforce --depth limit in reused deltasLibravatar Jeff King1-0/+93
Since 898b14c (pack-objects: rework check_delta_limit usage, 2007-04-16), we check the delta depth limit only when figuring out whether we should make a new delta. We don't consider it at all when reusing deltas, which means that packing once with --depth=250, and then again with --depth=50, the second pack may still contain chains larger than 50. This is generally considered a feature, as the results of earlier high-depth repacks are carried forward, used for serving fetches, etc. However, since we started using cross-pack deltas in c9af708b1 (pack-objects: use mru list when iterating over packs, 2016-08-11), we are no longer bounded by the length of an existing delta chain in a single pack. Here's one particular pathological case: a sequence of N packs, each with 2 objects, the base of which is stored as a delta in a previous pack. If we chain all the deltas together, we have a cycle of length N. We break the cycle, but the tip delta is still at depth N-1. This is less unlikely than it might sound. See the included test for a reconstruction based on real-world actions. I ran into such a case in the wild, where a client was rapidly sending packs, and we had accumulated 10,000 before doing a server-side repack. The pack that "git repack" tried to generate had a very deep chain, which caused pack-objects to run out of stack space in the recursive write_one(). This patch bounds the length of delta chains in the output pack based on --depth, regardless of whether they are caused by cross-pack deltas or existed in the input packs. This fixes the problem, but does have two possible downsides: 1. High-depth aggressive repacks followed by "normal" repacks will throw away the high-depth chains. In the long run this is probably OK; investigation showed that high-depth repacks aren't actually beneficial, and we dropped the aggressive depth default to match the normal case in 07e7dbf0d (gc: default aggressive depth to 50, 2016-08-11). 2. If you really do want to store high-depth deltas on disk, they may be discarded and new delta computed when serving a fetch, unless you set pack.depth to match your high-depth size. The implementation uses the existing search for delta cycles. That lets us compute the depth of any node based on the depth of its base, because we know the base is DFS_DONE by the time we look at it (modulo any cycles in the graph, but we know there cannot be any because we break them as we see them). There is some subtlety worth mentioning, though. We record the depth of each object as we compute it. It might seem like we could save the per-object storage space by just keeping track of the depth of our traversal (i.e., have break_delta_chains() report how deep it went). But we may visit an object through multiple delta paths, and on subsequent paths we want to know its depth immediately, without having to walk back down to its final base (doing so would make our graph walk quadratic rather than linear). Likewise, one could try to record the depth not from the base, but from our starting point (i.e., start recursion_depth at 0, and pass "recursion_depth + 1" to each invocation of break_delta_chains()). And then when recursion_depth gets too big, we know that we must cut the delta chain. But that technique is wrong if we do not visit the nodes in topological order. In a chain A->B->C, it if we visit "C", then "B", then "A", we will never recurse deeper than 1 link (because we see at each node that we have already visited it). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17Merge branch 'ls/filter-process' into maintLibravatar Junio C Hamano2-26/+15
Doc update. * ls/filter-process: t0021: fix flaky test docs: warn about possible '=' in clean/smudge filter process values
2017-01-17Merge branch 'sb/t3600-cleanup' into maintLibravatar Junio C Hamano1-73/+51
Code cleanup. * sb/t3600-cleanup: t3600: slightly modernize style t3600: remove useless redirect
2017-01-17Merge branch 'sb/unpack-trees-grammofix' into maintLibravatar Junio C Hamano1-1/+1
* sb/unpack-trees-grammofix: unpack-trees: fix grammar for untracked files in directories
2017-01-17Merge branch 'ls/t0021-fixup' into maintLibravatar Junio C Hamano1-3/+2
* ls/t0021-fixup: t0021: minor filter process test cleanup
2017-01-17Merge branch 'ak/lazy-prereq-mktemp' into maintLibravatar Junio C Hamano1-1/+2
Test code clean-up. * ak/lazy-prereq-mktemp: t7610: clean up foo.XXXXXX tmpdir
2017-01-17Merge branch 'dt/smart-http-detect-server-going-away' into maintLibravatar Junio C Hamano1-0/+52
When the http server gives an incomplete response to a smart-http rpc call, it could lead to client waiting for a full response that will never come. Teach the client side to notice this condition and abort the transfer. An improvement counterproposal has failed. cf. <20161114194049.mktpsvgdhex2f4zv@sigill.intra.peff.net> * dt/smart-http-detect-server-going-away: upload-pack: optionally allow fetching any sha1 remote-curl: don't hang when a server dies before any output
2017-01-17Merge branch 'gv/p4-multi-path-commit-fix' into maintLibravatar Junio C Hamano1-1/+21
"git p4" that tracks multile p4 paths imported a single changelist that touches files in these multiple paths as one commit, followed by many empty commits. This has been fixed. * gv/p4-multi-path-commit-fix: git-p4: fix multi-path changelist empty commits
2017-01-17Merge branch 'ld/p4-compare-dir-vs-symlink' into maintLibravatar Junio C Hamano1-0/+43
"git p4" misbehaved when swapping a directory and a symbolic link. * ld/p4-compare-dir-vs-symlink: git-p4: avoid crash adding symlinked directory
2017-01-17Merge branch 'jc/push-default-explicit' into maintLibravatar Junio C Hamano1-0/+10
A lazy "git push" without refspec did not internally use a fully specified refspec to perform 'current', 'simple', or 'upstream' push, causing unnecessary "ambiguous ref" errors. * jc/push-default-explicit: push: test pushing ambiguously named branches push: do not use potentially ambiguous default refspec
2017-01-17Merge branch 'jk/index-pack-wo-repo-from-stdin' into maintLibravatar Junio C Hamano6-43/+34
"git index-pack --stdin" needs an access to an existing repository, but "git index-pack file.pack" to generate an .idx file that corresponds to a packfile does not. * jk/index-pack-wo-repo-from-stdin: index-pack: skip collision check when not in repository t: use nongit() function where applicable index-pack: complain when --stdin is used outside of a repo t5000: extract nongit function to test-lib-functions.sh
2017-01-17Merge branch 'jk/quote-env-path-list-component' into maintLibravatar Junio C Hamano2-0/+43
A recent update to receive-pack to make it easier to drop garbage objects made it clear that GIT_ALTERNATE_OBJECT_DIRECTORIES cannot have a pathname with a colon in it (no surprise!), and this in turn made it impossible to push into a repository at such a path. This has been fixed by introducing a quoting mechanism used when appending such a path to the colon-separated list. * jk/quote-env-path-list-component: t5615-alternate-env: double-quotes in file names do not work on Windows t5547-push-quarantine: run the path separator test on Windows, too tmp-objdir: quote paths we add to alternates alternates: accept double-quoted paths
2017-01-17Merge branch 'sb/sequencer-abort-safety' into maintLibravatar Junio C Hamano1-0/+10
Unlike "git am --abort", "git cherry-pick --abort" moved HEAD back to where cherry-pick started while picking multiple changes, when the cherry-pick stopped to ask for help from the user, and the user did "git reset --hard" to a different commit in order to re-attempt the operation. * sb/sequencer-abort-safety: Revert "sequencer: remove useless get_dir() function" sequencer: remove useless get_dir() function sequencer: make sequencer abort safer t3510: test that cherry-pick --abort does not unsafely change HEAD am: change safe_to_abort()'s not rewinding error into a warning am: fix filename in safe_to_abort() error message
2017-01-17Merge branch 'jc/pull-rebase-ff' into maintLibravatar Junio C Hamano1-0/+17
"git pull --rebase", when there is no new commits on our side since we forked from the upstream, should be able to fast-forward without invoking "git rebase", but it didn't. * jc/pull-rebase-ff: pull: fast-forward "pull --rebase=true"
2017-01-17Merge branch 'ak/commit-only-allow-empty' into maintLibravatar Junio C Hamano1-0/+9
"git commit --allow-empty --only" (no pathspec) with dirty index ought to be an acceptable way to create a new commit that does not change any paths, but it was forbidden, perhaps because nobody needed it so far. * ak/commit-only-allow-empty: commit: remove 'Clever' message for --only --amend commit: make --only --allow-empty work without paths
2017-01-17Merge branch 'da/difftool-dir-diff-fix' into maintLibravatar Junio C Hamano1-3/+41
"git difftool --dir-diff" had a minor regression when started from a subdirectory, which has been fixed. * da/difftool-dir-diff-fix: difftool: fix dir-diff index creation when in a subdirectory
2017-01-17Merge branch 'jb/diff-no-index-no-abbrev' into maintLibravatar Junio C Hamano7-0/+34
"git diff --no-index" did not take "--no-abbrev" option. * jb/diff-no-index-no-abbrev: diff: handle --no-abbrev in no-index case
2017-01-17Merge branch 'jk/stash-disable-renames-internally' into maintLibravatar Junio C Hamano1-0/+9
When diff.renames configuration is on (and with Git 2.9 and later, it is enabled by default, which made it worse), "git stash" misbehaved if a file is removed and another file with a very similar content is added. * jk/stash-disable-renames-internally: stash: prefer plumbing over git-diff
2017-01-17Merge branch 'jk/http-walker-limit-redirect' into maintLibravatar Junio C Hamano4-0/+80
Update the error messages from the dumb-http client when it fails to obtain loose objects; we used to give sensible error message only upon 404 but we now forbid unexpected redirects that needs to be reported with something sensible. * jk/http-walker-limit-redirect: http-walker: complain about non-404 loose object errors http: treat http-alternates like redirects http: make redirects more obvious remote-curl: rename shadowed options variable http: always update the base URL for redirects http: simplify update_url_from_redirect
2017-01-17Merge branch 'jc/renormalize-merge-kill-safer-crlf' into maintLibravatar Junio C Hamano1-0/+12
Fix a corner case in merge-recursive regression that crept in during 2.10 development cycle. * jc/renormalize-merge-kill-safer-crlf: convert: git cherry-pick -Xrenormalize did not work merge-recursive: handle NULL in add_cacheinfo() correctly cherry-pick: demonstrate a segmentation fault
2017-01-17Merge branch 'ls/p4-empty-file-on-lfs' into maintLibravatar Junio C Hamano1-0/+2
"git p4" LFS support was broken when LFS stores an empty blob. * ls/p4-empty-file-on-lfs: git-p4: fix empty file processing for large file system backend GitLFS
2017-01-17Merge branch 'nd/worktree-list-fixup' into maintLibravatar Junio C Hamano1-0/+40
The output from "git worktree list" was made in readdir() order, and was unstable. * nd/worktree-list-fixup: worktree list: keep the list sorted worktree.c: get_worktrees() takes a new flag argument get_worktrees() must return main worktree as first item even on error worktree: reorder an if statement worktree.c: zero new 'struct worktree' on allocation
2017-01-17Merge branch 'bw/push-dry-run' into maintLibravatar Junio C Hamano1-0/+24
"git push --dry-run --recurse-submodule=on-demand" wasn't "--dry-run" in the submodules. * bw/push-dry-run: push: fix --dry-run to not push submodules push: --dry-run updates submodules when --recurse-submodules=on-demand
2017-01-17Merge branch 'dt/empty-submodule-in-merge' into maintLibravatar Junio C Hamano2-5/+2
An empty directory in a working tree that can simply be nuked used to interfere while merging or cherry-picking a change to create a submodule directory there, which has been fixed.. * dt/empty-submodule-in-merge: submodules: allow empty working-tree dirs in merge/cherry-pick
2017-01-17Merge branch 'jk/rev-parse-symbolic-parents-fix' into maintLibravatar Junio C Hamano1-0/+18
"git rev-parse --symbolic" failed with a more recent notation like "HEAD^-1" and "HEAD^!". * jk/rev-parse-symbolic-parents-fix: rev-parse: fix parent shorthands with --symbolic
2016-12-21t5615-alternate-env: double-quotes in file names do not work on WindowsLibravatar Johannes Sixt1-1/+1
Protect a recently added test case with !MINGW. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-19git-p4: fix multi-path changelist empty commitsLibravatar George Vanburgh1-1/+21
When importing from multiple perforce paths - we may attempt to import a changelist that contains files from two (or more) of these depot paths. Currently, this results in multiple git commits - one containing the changes, and the other(s) as empty commit(s). This behavior was introduced in commit 1f90a64891 ("git-p4: reduce number of server queries for fetches", 2015-12-19). Reproduction Steps: 1. Have a git repo cloned from a perforce repo using multiple depot paths (e.g. //depot/foo and //depot/bar). 2. Submit a single change to the perforce repo that makes changes in both //depot/foo and //depot/bar. 3. Run "git p4 sync" to sync the change from #2. Change is synced as multiple commits, one for each depot path that was affected. Using a set, instead of a list inside p4ChangesForPaths() ensures that each changelist is unique to the returned list, and therefore only a single commit is generated for each changelist. Reported-by: James Farwell <jfarwell@vmware.com> Signed-off-by: George Vanburgh <gvanburgh@bloomberg.net> Reviewed-by: Luke Diamand <luke@diamand.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-18git-p4: avoid crash adding symlinked directoryLibravatar Luke Diamand1-0/+43
When submitting to P4, if git-p4 came across a symlinked directory, then during the generation of the submit diff, it would try to open it as a normal file and fail. Spot symlinks (of any type) and output a description of the symlink instead. Add a test case. Signed-off-by: Luke Diamand <luke@diamand.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-18t0021: fix flaky testLibravatar Lars Schneider1-16/+1
t0021.15 creates files, adds them to the index, and commits them. All this usually happens in a test run within the same second and Git cannot know if the files have been changed between `add` and `commit`. Thus, Git has to run the clean filter in both operations. Sometimes these invocations spread over two different seconds and Git can infer that the files were not changed between `add` and `commit` based on their modification timestamp. The test would fail as it expects the filter invocation. Remove this expectation to make the test stable. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-16t: use nongit() function where applicableLibravatar Jeff King3-29/+5
Many tests want to run a command outside of any git repo; with the nongit() function this is now a one-liner. It saves a few lines, but more importantly, it's immediately obvious what the code is trying to accomplish. This doesn't convert every such case in the test suite; it just covers those that want to do a one-off command. Other cases, such as the ones in t4035, are part of a larger scheme of outside-repo files, and it's less confusing for them to stay consistent with the surrounding tests. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-16index-pack: complain when --stdin is used outside of a repoLibravatar Jeff King1-0/+15
The index-pack builtin is marked as RUN_SETUP_GENTLY, because it's perfectly fine to index a pack in the filesystem outside of any repository. However, --stdin mode will write the result to the object database, which does not make sense outside of a repository. Doing so creates a bogus ".git" directory with nothing in it except the newly-created pack and its index. Instead, let's flag this as an error and abort. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-16t5000: extract nongit function to test-lib-functions.shLibravatar Jeff King2-14/+14
This function abstracts the idea of running a command outside of any repository (which is slightly awkward to do because even if you make a non-repo directory, git may keep walking up outside of the trash directory). There are several scripts that use the same technique, so let's make the function available for everyone. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-13t5547-push-quarantine: run the path separator test on Windows, tooLibravatar Johannes Sixt1-4/+10
To perform the test case on Windows in a way that corresponds to the POSIX version, inject the semicolon in a directory name. Typically, an absolute POSIX style path, such as the one in $PWD, is translated into a Windows style path by bash when it invokes git.exe. However, the presence of the semicolon suppresses this translation; but the untranslated POSIX style path is useless for git.exe. Therefore, instead of $PWD pass the Windows style path that $(pwd) produces. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-12t3600: slightly modernize styleLibravatar Stefan Beller1-73/+51
Remove the space between redirection and file name. Also remove unnecessary invocations of subshells, such as (cd submod && echo X >untracked ) && as there is no point of having the shell for functional purposes. In case of a single Git command use the `-C` option to let Git cd into the directory. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-12tmp-objdir: quote paths we add to alternatesLibravatar Jeff King1-0/+19
Commit 722ff7f87 (receive-pack: quarantine objects until pre-receive accepts, 2016-10-03) regressed pushes to repositories with colon (or semi-colon in Windows in them) because it adds the repository's main object directory to GIT_ALTERNATE_OBJECT_DIRECTORIES. The receiver interprets the colon as a delimiter, not as part of the path, and index-pack is unable to find objects which it needs to resolve deltas. The previous commit introduced a quoting mechanism for the alternates list; let's use it here to cover this case. We'll avoid quoting when we can, though. This alternate setup is also used when calling hooks, so it's possible that the user may call older git implementations which don't understand the quoting mechanism. By quoting only when necessary, this setup will continue to work unless the user _also_ has a repository whose path contains the delimiter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-12alternates: accept double-quoted pathsLibravatar Jeff King1-0/+18
We read lists of alternates from objects/info/alternates files (delimited by newline), as well as from the GIT_ALTERNATE_OBJECT_DIRECTORIES environment variable (delimited by colon or semi-colon, depending on the platform). There's no mechanism for quoting the delimiters, so it's impossible to specify an alternate path that contains a colon in the environment, or one that contains a newline in a file. We've lived with that restriction for ages because both alternates and filenames with colons are relatively rare, and it's only a problem when the two meet. But since 722ff7f87 (receive-pack: quarantine objects until pre-receive accepts, 2016-10-03), which builds on the alternates system, every push causes the receiver to set GIT_ALTERNATE_OBJECT_DIRECTORIES internally. It would be convenient to have some way to quote the delimiter so that we can represent arbitrary paths. The simplest thing would be an escape character before a quoted delimiter (e.g., "\:" as a literal colon). But that creates a backwards compatibility problem: any path which uses that escape character is now broken, and we've just shifted the problem. We could choose an unlikely escape character (e.g., something from the non-printable ASCII range), but that's awkward to use. Instead, let's treat names as unquoted unless they begin with a double-quote, in which case they are interpreted via our usual C-stylke quoting rules. This also breaks backwards-compatibility, but in a smaller way: it only matters if your file has a double-quote as the very _first_ character in the path (whereas an escape character is a problem anywhere in the path). It's also consistent with many other parts of git, which accept either a bare pathname or a double-quoted one, and the sender can choose to quote or not as required. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-12Merge branch 'jk/alt-odb-cleanup' into jk/quote-env-path-list-componentLibravatar Junio C Hamano1-0/+71
* jk/alt-odb-cleanup: alternates: re-allow relative paths from environment
2016-12-09sequencer: make sequencer abort saferLibravatar Stephan Beyer1-1/+1
In contrast to "git am --abort", a sequencer abort did not check whether the current HEAD is the one that is expected. This can lead to loss of work (when not spotted and resolved using reflog before the garbage collector chimes in). This behavior is now changed by mimicking "git am --abort". The abortion is done but HEAD is not changed when the current HEAD is not the expected HEAD. A new file "sequencer/abort-safety" is added to save the expected HEAD. The new behavior is only active when --abort is invoked on multiple picks. The problem does not occur for the single-pick case because it is handled differently. Signed-off-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-09t3510: test that cherry-pick --abort does not unsafely change HEADLibravatar Stephan Beyer1-0/+10
Signed-off-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-08diff: handle --no-abbrev in no-index caseLibravatar Jack Bates7-0/+34
There are two different places where the --no-abbrev option is parsed, and two different places where SHA-1s are abbreviated. We normally parse --no-abbrev with setup_revisions(), but in the no-index case, "git diff" calls diff_opt_parse() directly, and diff_opt_parse() didn't handle --no-abbrev until now. (It did handle --abbrev, however.) We normally abbreviate SHA-1s with find_unique_abbrev(), but commit 4f03666 ("diff: handle sha1 abbreviations outside of repository, 2016-10-20) recently introduced a special case when you run "git diff" outside of a repository. setup_revisions() does also call diff_opt_parse(), but not for --abbrev or --no-abbrev, which it handles itself. setup_revisions() sets rev_info->abbrev, and later copies that to diff_options->abbrev. It handles --no-abbrev by setting abbrev to zero. (This change doesn't touch that.) Setting abbrev to zero was broken in the outside-of-a-repository special case, which until now resulted in a truly zero-length SHA-1, rather than taking zero to mean do not abbreviate. The only way to trigger this bug, however, was by running "git diff --raw" without either the --abbrev or --no-abbrev options, because 1) without --raw it doesn't respect abbrev (which is bizarre, but has been that way forever), 2) we silently clamp --abbrev=0 to MINIMUM_ABBREV, and 3) --no-abbrev wasn't handled until now. The outside-of-a-repository case is one of three no-index cases. The other two are when one of the files you're comparing is outside of the repository you're in, and the --no-index option. Signed-off-by: Jack Bates <jack@nottheoilrig.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-08difftool: fix dir-diff index creation when in a subdirectoryLibravatar David Aguilar1-3/+41
9ec26e7977 (difftool: fix argument handling in subdirs, 2016-07-18) corrected how path arguments are handled in a subdirectory, but it introduced a regression in how entries outside of the subdirectory are handled by dir-diff. When preparing the right-side of the diff we only include the changed paths in the temporary area. The left side of the diff is constructed from a temporary index that is built from the same set of changed files, but it was being constructed from within the subdirectory. This is a problem because the indexed paths are toplevel-relative, and thus they were not getting added to the index. Teach difftool to chdir to the toplevel of the repository before preparing its temporary indexes. This ensures that all of the toplevel-relative paths are valid. Add test cases to more thoroughly exercise this scenario. Reported-by: Frank Becker <fb@mooflu.com> Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06stash: prefer plumbing over git-diffLibravatar Jeff King1-0/+9
When creating a stash, we need to look at the diff between the working tree and HEAD, and do so using the git-diff porcelain. Because git-diff enables porcelain config like renames by default, this causes at least one problem. The --name-only format will not mention the source side of a rename, meaning we will fail to stash a deletion that is part of a rename. We could fix that case by passing --no-renames, but this is a symptom of a larger problem. We should be using the diff-index plumbing here, which does not have renames enabled by default, and also does not respect any potentially confusing config options. Reported-by: Matthew Patey <matthew.patey2167@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06http: treat http-alternates like redirectsLibravatar Jeff King1-0/+38
The previous commit made HTTP redirects more obvious and tightened up the default behavior. However, there's another way for a server to ask a git client to fetch arbitrary content: by having an http-alternates file (or a regular alternates file, which is used as a backup). Similar to the HTTP redirect case, a malicious server can claim to have refs pointing at object X, return a 404 when the client asks for X, but point to some other URL via http-alternates, which the client will transparently fetch. The end result is that it looks from the user's perspective like the objects came from the malicious server, as the other URL is not mentioned at all. Worse, because we feed the new URL to curl ourselves, the usual protocol restrictions do not kick in (neither curl's default of disallowing file://, nor the protocol whitelisting in f4113cac0 (http: limit redirection to protocol-whitelist, 2015-09-22). Let's apply the same rules here as we do for HTTP redirects. Namely: - unless http.followRedirects is set to "always", we will not follow remote redirects from http-alternates (or alternates) at all - set CURLOPT_PROTOCOLS alongside CURLOPT_REDIR_PROTOCOLS restrict ourselves to a known-safe set and respect any user-provided whitelist. - mention alternate object stores on stderr so that the user is aware another source of objects may be involved The first item may prove to be too restrictive. The most common use of alternates is to point to another path on the same server. While it's possible for a single-server redirect to be an attack, it takes a fairly obscure setup (victim and evil repository on the same host, host speaks dumb http, and evil repository has access to edit its own http-alternates file). So we could make the checks more specific, and only cover cross-server redirects. But that means parsing the URLs ourselves, rather than letting curl handle them. This patch goes for the simpler approach. Given that they are only used with dumb http, http-alternates are probably pretty rare. And there's an escape hatch: the user can allow redirects on a specific server by setting http.<url>.followRedirects to "always". Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06http: make redirects more obviousLibravatar Jeff King2-0/+29
We instruct curl to always follow HTTP redirects. This is convenient, but it creates opportunities for malicious servers to create confusing situations. For instance, imagine Alice is a git user with access to a private repository on Bob's server. Mallory runs her own server and wants to access objects from Bob's repository. Mallory may try a few tricks that involve asking Alice to clone from her, build on top, and then push the result: 1. Mallory may simply redirect all fetch requests to Bob's server. Git will transparently follow those redirects and fetch Bob's history, which Alice may believe she got from Mallory. The subsequent push seems like it is just feeding Mallory back her own objects, but is actually leaking Bob's objects. There is nothing in git's output to indicate that Bob's repository was involved at all. The downside (for Mallory) of this attack is that Alice will have received Bob's entire repository, and is likely to notice that when building on top of it. 2. If Mallory happens to know the sha1 of some object X in Bob's repository, she can instead build her own history that references that object. She then runs a dumb http server, and Alice's client will fetch each object individually. When it asks for X, Mallory redirects her to Bob's server. The end result is that Alice obtains objects from Bob, but they may be buried deep in history. Alice is less likely to notice. Both of these attacks are fairly hard to pull off. There's a social component in getting Mallory to convince Alice to work with her. Alice may be prompted for credentials in accessing Bob's repository (but not always, if she is using a credential helper that caches). Attack (1) requires a certain amount of obliviousness on Alice's part while making a new commit. Attack (2) requires that Mallory knows a sha1 in Bob's repository, that Bob's server supports dumb http, and that the object in question is loose on Bob's server. But we can probably make things a bit more obvious without any loss of functionality. This patch does two things to that end. First, when we encounter a whole-repo redirect during the initial ref discovery, we now inform the user on stderr, making attack (1) much more obvious. Second, the decision to follow redirects is now configurable. The truly paranoid can set the new http.followRedirects to false to avoid any redirection entirely. But for a more practical default, we will disallow redirects only after the initial ref discovery. This is enough to thwart attacks similar to (2), while still allowing the common use of redirects at the repository level. Since c93c92f30 (http: update base URLs when we see redirects, 2013-09-28) we re-root all further requests from the redirect destination, which should generally mean that no further redirection is necessary. As an escape hatch, in case there really is a server that needs to redirect individual requests, the user can set http.followRedirects to "true" (and this can be done on a per-server basis via http.*.followRedirects config). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06http: always update the base URL for redirectsLibravatar Jeff King3-0/+13
If a malicious server redirects the initial ref advertisement, it may be able to leak sha1s from other, unrelated servers that the client has access to. For example, imagine that Alice is a git user, she has access to a private repository on a server hosted by Bob, and Mallory runs a malicious server and wants to find out about Bob's private repository. Mallory asks Alice to clone an unrelated repository from her over HTTP. When Alice's client contacts Mallory's server for the initial ref advertisement, the server issues an HTTP redirect for Bob's server. Alice contacts Bob's server and gets the ref advertisement for the private repository. If there is anything to fetch, she then follows up by asking the server for one or more sha1 objects. But who is the server? If it is still Mallory's server, then Alice will leak the existence of those sha1s to her. Since commit c93c92f30 (http: update base URLs when we see redirects, 2013-09-28), the client usually rewrites the base URL such that all further requests will go to Bob's server. But this is done by textually matching the URL. If we were originally looking for "http://mallory/repo.git/info/refs", and we got pointed at "http://bob/other.git/info/refs", then we know that the right root is "http://bob/other.git". If the redirect appears to change more than just the root, we punt and continue to use the original server. E.g., imagine the redirect adds a URL component that Bob's server will ignore, like "http://bob/other.git/info/refs?dummy=1". We can solve this by aborting in this case rather than silently continuing to use Mallory's server. In addition to protecting from sha1 leakage, it's arguably safer and more sane to refuse a confusing redirect like that in general. For example, part of the motivation in c93c92f30 is avoiding accidentally sending credentials over clear http, just to get a response that says "try again over https". So even in a non-malicious case, we'd prefer to err on the side of caution. The downside is that it's possible this will break a legitimate but complicated server-side redirection scheme. The setup given in the newly added test does work, but it's convoluted enough that we don't need to care about it. A more plausible case would be a server which redirects a request for "info/refs?service=git-upload-pack" to just "info/refs" (because it does not do smart HTTP, and for some reason really dislikes query parameters). Right now we would transparently downgrade to dumb-http, but with this patch, we'd complain (and the user would have to set GIT_SMART_HTTP=0 to fetch). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06docs: warn about possible '=' in clean/smudge filter process valuesLibravatar Lars Schneider2-12/+16
A pathname value in a clean/smudge filter process "key=value" pair can contain the '=' character (introduced in edcc858). Make the user aware of this issue in the docs, add a corresponding test case, and fix the issue in filter process value parser of the example implementation in contrib. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-05t0021: minor filter process test cleanupLibravatar Lars Schneider1-3/+2
Remove superfluous .gitignore pattern and invalid '.' in `git commit` calls. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-05git-p4: fix empty file processing for large file system backend GitLFSLibravatar Lars Schneider1-0/+2
If git-p4 tried to store an empty file in GitLFS then it crashed while parsing the pointer file: oid = re.search(r'^oid \w+:(\w+)', pointerFile, re.MULTILINE).group(1) AttributeError: 'NoneType' object has no attribute 'group' This happens because GitLFS does not create a pointer file for an empty file. Teach git-p4 this behavior to fix the problem and add a test case. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-05commit: make --only --allow-empty work without pathsLibravatar Andreas Krey1-0/+9
--only is implied when paths are present, and required them unless --amend. But with --allow-empty it should be allowed as well - it is the only way to create an empty commit in the presence of staged changes. Signed-off-by: Andreas Krey <a.krey@gmx.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-05t3600: remove useless redirectLibravatar Stefan Beller1-1/+1
In the next line the `actual` is overwritten again, so no need to redirect the output of checkout into that file. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>