summaryrefslogtreecommitdiff
path: root/t/t5550-http-fetch-dumb.sh
AgeCommit message (Collapse)AuthorFilesLines
2016-12-27Merge branch 'bw/transport-protocol-policy'Libravatar Junio C Hamano1-0/+10
Finer-grained control of what protocols are allowed for transports during clone/fetch/push have been enabled via a new configuration mechanism. * bw/transport-protocol-policy: http: respect protocol.*.allow=user for http-alternates transport: add from_user parameter to is_transport_allowed http: create function to get curl allowed protocols transport: add protocol policy config option http: always warn if libcurl version is too old lib-proto-disable: variable name fix
2016-12-19Merge branch 'jk/http-walker-limit-redirect-2.9'Libravatar Junio C Hamano1-0/+61
Transport with dumb http can be fooled into following foreign URLs that the end user does not intend to, especially with the server side redirects and http-alternates mechanism, which can lead to security issues. Tighten the redirection and make it more obvious to the end user when it happens. * jk/http-walker-limit-redirect-2.9: http: treat http-alternates like redirects http: make redirects more obvious remote-curl: rename shadowed options variable http: always update the base URL for redirects http: simplify update_url_from_redirect
2016-12-15http: respect protocol.*.allow=user for http-alternatesLibravatar Jeff King1-0/+10
The http-walker may fetch the http-alternates (or alternates) file from a remote in order to find more objects. This should count as a "not from the user" use of the protocol. But because we implement the redirection ourselves and feed the new URL to curl, it will use the CURLOPT_PROTOCOLS rules, not the more restrictive CURLOPT_REDIR_PROTOCOLS. The ideal solution would be for each curl request we make to know whether or not is directly from the user or part of an alternates redirect, and then set CURLOPT_PROTOCOLS as appropriate. However, that would require plumbing that information through all of the various layers of the http code. Instead, let's check the protocol at the source: when we are parsing the remote http-alternates file. The only downside is that if there's any mismatch between what protocol we think it is versus what curl thinks it is, it could violate the policy. To address this, we'll make the parsing err on the picky side, and only allow protocols that it can parse definitively. So for example, you can't elude the "http" policy by asking for "HTTP://", even though curl might handle it; we would reject it as unknown. The only unsafe case would be if you have a URL that starts with "http://" but curl interprets as another protocol. That seems like an unlikely failure mode (and we are still protected by our base CURLOPT_PROTOCOL setting, so the worst you could do is trigger one of https, ftp, or ftps). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06http: treat http-alternates like redirectsLibravatar Jeff King1-0/+38
The previous commit made HTTP redirects more obvious and tightened up the default behavior. However, there's another way for a server to ask a git client to fetch arbitrary content: by having an http-alternates file (or a regular alternates file, which is used as a backup). Similar to the HTTP redirect case, a malicious server can claim to have refs pointing at object X, return a 404 when the client asks for X, but point to some other URL via http-alternates, which the client will transparently fetch. The end result is that it looks from the user's perspective like the objects came from the malicious server, as the other URL is not mentioned at all. Worse, because we feed the new URL to curl ourselves, the usual protocol restrictions do not kick in (neither curl's default of disallowing file://, nor the protocol whitelisting in f4113cac0 (http: limit redirection to protocol-whitelist, 2015-09-22). Let's apply the same rules here as we do for HTTP redirects. Namely: - unless http.followRedirects is set to "always", we will not follow remote redirects from http-alternates (or alternates) at all - set CURLOPT_PROTOCOLS alongside CURLOPT_REDIR_PROTOCOLS restrict ourselves to a known-safe set and respect any user-provided whitelist. - mention alternate object stores on stderr so that the user is aware another source of objects may be involved The first item may prove to be too restrictive. The most common use of alternates is to point to another path on the same server. While it's possible for a single-server redirect to be an attack, it takes a fairly obscure setup (victim and evil repository on the same host, host speaks dumb http, and evil repository has access to edit its own http-alternates file). So we could make the checks more specific, and only cover cross-server redirects. But that means parsing the URLs ourselves, rather than letting curl handle them. This patch goes for the simpler approach. Given that they are only used with dumb http, http-alternates are probably pretty rare. And there's an escape hatch: the user can allow redirects on a specific server by setting http.<url>.followRedirects to "always". Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06http: make redirects more obviousLibravatar Jeff King1-0/+23
We instruct curl to always follow HTTP redirects. This is convenient, but it creates opportunities for malicious servers to create confusing situations. For instance, imagine Alice is a git user with access to a private repository on Bob's server. Mallory runs her own server and wants to access objects from Bob's repository. Mallory may try a few tricks that involve asking Alice to clone from her, build on top, and then push the result: 1. Mallory may simply redirect all fetch requests to Bob's server. Git will transparently follow those redirects and fetch Bob's history, which Alice may believe she got from Mallory. The subsequent push seems like it is just feeding Mallory back her own objects, but is actually leaking Bob's objects. There is nothing in git's output to indicate that Bob's repository was involved at all. The downside (for Mallory) of this attack is that Alice will have received Bob's entire repository, and is likely to notice that when building on top of it. 2. If Mallory happens to know the sha1 of some object X in Bob's repository, she can instead build her own history that references that object. She then runs a dumb http server, and Alice's client will fetch each object individually. When it asks for X, Mallory redirects her to Bob's server. The end result is that Alice obtains objects from Bob, but they may be buried deep in history. Alice is less likely to notice. Both of these attacks are fairly hard to pull off. There's a social component in getting Mallory to convince Alice to work with her. Alice may be prompted for credentials in accessing Bob's repository (but not always, if she is using a credential helper that caches). Attack (1) requires a certain amount of obliviousness on Alice's part while making a new commit. Attack (2) requires that Mallory knows a sha1 in Bob's repository, that Bob's server supports dumb http, and that the object in question is loose on Bob's server. But we can probably make things a bit more obvious without any loss of functionality. This patch does two things to that end. First, when we encounter a whole-repo redirect during the initial ref discovery, we now inform the user on stderr, making attack (1) much more obvious. Second, the decision to follow redirects is now configurable. The truly paranoid can set the new http.followRedirects to false to avoid any redirection entirely. But for a more practical default, we will disallow redirects only after the initial ref discovery. This is enough to thwart attacks similar to (2), while still allowing the common use of redirects at the repository level. Since c93c92f30 (http: update base URLs when we see redirects, 2013-09-28) we re-root all further requests from the redirect destination, which should generally mean that no further redirection is necessary. As an escape hatch, in case there really is a server that needs to redirect individual requests, the user can set http.followRedirects to "true" (and this can be done on a per-server basis via http.*.followRedirects config). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15Merge branch 'jk/fix-remote-curl-url-wo-proto'Libravatar Junio C Hamano1-0/+8
"git fetch http::/site/path" did not die correctly and segfaulted instead. * jk/fix-remote-curl-url-wo-proto: remote-curl: handle URLs without protocol
2016-09-08remote-curl: handle URLs without protocolLibravatar Jeff King1-0/+8
Generally remote-curl would never see a URL that did not have "proto:" at the beginning, as that is what tells git to run the "git-remote-proto" helper (and git-remote-http, etc, are aliases for git-remote-curl). However, the special syntax "proto::something" will run git-remote-proto with only "something" as the URL. So a malformed URL like: http::/example.com/repo.git will feed the URL "/example.com/repo.git" to git-remote-http. The resulting URL has no protocol, but the code added by 372370f (http: use credential API to handle proxy authentication, 2016-01-26) does not handle this case and segfaults. For the purposes of this code, we don't really care what the exact protocol; only whether or not it is https. So let's just assume that a missing protocol is not, and curl will handle the real error (which is that the URL is nonsense). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07t5550-http-fetch-dumb.sh: use the GIT_TRACE_CURL environment varLibravatar Elia Pinto1-5/+5
Use the new GIT_TRACE_CURL environment variable instead of the deprecated GIT_CURL_VERBOSE. Signed-off-by: Elia Pinto <gitter.spiros@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-28submodule: use prepare_submodule_repo_env consistentlyLibravatar Jeff King1-0/+11
Before 14111fc (git: submodule honor -c credential.* from command line, 2016-02-29), it was sufficient for code which spawned a process in a submodule to just set the child process's "env" field to "local_repo_env" to clear the environment of any repo-specific variables. That commit introduced a more complicated procedure, in which we clear most variables but allow through sanitized config. For C code, we used that procedure only for cloning, but not for any of the programs spawned by submodule.c. As a result, things like "git fetch --recurse-submodules" behave differently than "git clone --recursive"; the former will not pass through the sanitized config. We can fix this by using prepare_submodule_repo_env() everywhere in submodule.c. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-28submodule: export sanitized GIT_CONFIG_PARAMETERSLibravatar Jeff King1-0/+17
Commit 14111fc (git: submodule honor -c credential.* from command line, 2016-02-29) taught git-submodule.sh to save the sanitized value of $GIT_CONFIG_PARAMETERS when clearing the environment for a submodule. However, it failed to export the result, meaning that it had no effect for any sub-programs. We didn't catch this in our initial tests because we checked only the "clone" case, which does not go through the shell script at all. Provoking "git submodule update" to do a fetch demonstrates the bug. Noticed-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-28t5550: break submodule config test into multiple sub-testsLibravatar Jeff King1-2/+6
Right now we test only the cloning case, but there are other interesting cases (e.g., fetching). Let's pull the setup bits into their own test, which will make things flow more logically once we start adding more tests which use the setup. Let's also introduce some whitespace to the clone-test to split the two parts: making sure it fails without our cmdline config, and that it succeeds with it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-28t5550: fix typo in $HTTPD_URLLibravatar Jeff King1-1/+1
Commit 14111fc (git: submodule honor -c credential.* from command line, 2016-02-29) accidentally wrote $HTTP_URL. It happened to work because we ended up with "credential..helper", which we treat the same as "credential.helper", applying it to all URLs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-01git: submodule honor -c credential.* from command lineLibravatar Jacob Keller1-0/+17
Due to the way that the git-submodule code works, it clears all local git environment variables before entering submodules. This is normally a good thing since we want to clear settings such as GIT_WORKTREE and other variables which would affect the operation of submodule commands. However, GIT_CONFIG_PARAMETERS is special, and we actually do want to preserve these settings. However, we do not want to preserve all configuration as many things should be left specific to the parent project. Add a git submodule--helper function, sanitize-config, which shall be used to sanitize GIT_CONFIG_PARAMETERS, removing all key/value pairs except a small subset that are known to be safe and necessary. Replace all the calls to clear_local_git_env with a wrapped function that filters GIT_CONFIG_PARAMETERS using the new helper and then restores it to the filtered subset after clearing the rest of the environment. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-04t/t5550-http-fetch-dumb.sh: use the $( ... ) construct for command substitutionLibravatar Elia Pinto1-4/+4
The Git CodingGuidelines prefer the $(...) construct for command substitution instead of using the backquotes `...`. The backquoted form is the traditional method for command substitution, and is supported by POSIX. However, all but the simplest uses become complicated quickly. In particular, embedded command substitutions and/or the use of double quotes require careful escaping with the backslash character. The patch was generated by: for _f in $(find . -name "*.sh") do perl -i -pe 'BEGIN{undef $/;} s/`(.+?)`/\$(\1)/smg' "${_f}" done and then carefully proof-read. Signed-off-by: Elia Pinto <gitter.spiros@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-22Merge branch 'jk/skip-http-tests-under-no-curl'Libravatar Junio C Hamano1-6/+0
Test clean-up. * jk/skip-http-tests-under-no-curl: tests: skip dav http-push tests under NO_EXPAT=NoThanks t/lib-httpd.sh: skip tests if NO_CURL is defined
2015-05-07t/lib-httpd.sh: skip tests if NO_CURL is definedLibravatar Jeff King1-6/+0
If we built git without curl, we can't actually test against an http server. In fact, all of the test scripts which include lib-httpd.sh already perform this check, with one exception: t5540. For those scripts, this is a noop, and for t5540, this is a bugfix (it used to fail when built with NO_CURL, though it could go unnoticed if you had a stale git-remote-https in your build directory). Noticed-by: Junio C Hamano <junio@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20t: use test_might_fail for diff and grepLibravatar Jeff King1-2/+2
Some tests run diff or grep to produce an output, and then compare the output to an expected value. We know the exit code we expect these processes to have (e.g., grep yields 0 if it produced output and 1 otherwise), so it would not make the test wrong to look for it. But the difference between their output and the expected output (e.g., shown by test_cmp) is much more useful to somebody debugging the test than the test just bailing out. These tests break the &&-chain to skip the exit-code check of the process. However, we can get the same effect by using test_might_fail. Note that in some cases the test did use "|| return 1", which meant the test was not wrong, but it did fool --chain-lint. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20t: fix trivial &&-chain breakageLibravatar Jeff King1-1/+1
These are tests which are missing a link in their &&-chain, but during a setup phase. We may fail to notice failure in commands that build the test environment, but these are typically not expected to fail at all (but it's still good to double-check that our test environment is what we expect). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-18Merge branch 'ye/http-accept-language'Libravatar Junio C Hamano1-0/+42
Using environment variable LANGUAGE and friends on the client side, HTTP-based transports now send Accept-Language when making requests. * ye/http-accept-language: http: add Accept-Language header if possible
2015-02-17Merge branch 'jk/dumb-http-idx-fetch-fix'Libravatar Junio C Hamano1-0/+18
A broken pack .idx file in the receiving repository prevented the dumb http transport from fetching a good copy of it from the other side. * jk/dumb-http-idx-fetch-fix: dumb-http: do not pass NULL path to parse_pack_index
2015-01-28http: add Accept-Language header if possibleLibravatar Yi EungJun1-0/+42
Add an Accept-Language header which indicates the user's preferred languages defined by $LANGUAGE, $LC_ALL, $LC_MESSAGES and $LANG. Examples: LANGUAGE= -> "" LANGUAGE=ko:en -> "Accept-Language: ko, en;q=0.9, *;q=0.1" LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *;q=0.1" LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *;q=0.1" This gives git servers a chance to display remote error messages in the user's preferred language. Limit the number of languages to 1,000 because q-value must not be smaller than 0.001, and limit the length of Accept-Language header to 4,000 bytes for some HTTP servers which cannot accept such long header. Signed-off-by: Yi EungJun <eungjun.yi@navercorp.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-27dumb-http: do not pass NULL path to parse_pack_indexLibravatar Jeff King1-0/+18
Once upon a time, dumb http always fetched .idx files directly into their final location, and then checked their validity with parse_pack_index. This was refactored in commit 750ef42 (http-fetch: Use temporary files for pack-*.idx until verified, 2010-04-19), which uses the following logic: 1. If we have the idx already in place, see if it's valid (using parse_pack_index). If so, use it. 2. Otherwise, fetch the .idx to a tempfile, check that, and if so move it into place. 3. Either way, fetch the pack itself if necessary. However, it got step 1 wrong. We pass a NULL path parameter to parse_pack_index, so an existing .idx file always looks broken. Worse, we do not treat this broken .idx as an opportunity to re-fetch, but instead return an error, ignoring the pack entirely. This can lead to a dumb-http fetch failing to retrieve the necessary objects. This doesn't come up much in practice, because it must be a packfile that we found out about (and whose .idx we stored) during an earlier dumb-http fetch, but whose packfile we _didn't_ fetch. I.e., we did a partial clone of a repository, didn't need some packfiles, and now a followup fetch needs them. Discovery and tests by Charles Bailey <charles@hashpling.org>. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-17http: fix charset detection of extract_content_type()Libravatar Yi EungJun1-0/+5
extract_content_type() could not extract a charset parameter if the parameter is not the first one and there is a whitespace and a following semicolon just before the parameter. For example: text/plain; format=fixed ;charset=utf-8 And it also could not handle correctly some other cases, such as: text/plain; charset=utf-8; format=fixed text/plain; some-param="a long value with ;semicolons;"; charset=utf-8 Thanks-to: Jeff King <peff@peff.net> Signed-off-by: Yi EungJun <eungjun.yi@navercorp.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-27remote-curl: reencode http error messagesLibravatar Jeff King1-0/+5
We currently recognize an error message with a content-type "text/plain; charset=utf-16" as text, but we ignore the charset parameter entirely. Let's encode it to log_output_encoding, which is presumably something the user's terminal can handle. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-27http: extract type/subtype portion of content-typeLibravatar Jeff King1-0/+5
When we get a content-type from curl, we get the whole header line, including any parameters, and without any normalization (like downcasing or whitespace) applied. If we later try to match it with strcmp() or even strcasecmp(), we may get false negatives. This could cause two visible behaviors: 1. We might fail to recognize a smart-http server by its content-type. 2. We might fail to relay text/plain error messages to users (especially if they contain a charset parameter). This patch teaches the http code to extract and normalize just the type/subtype portion of the string. This is technically passing out less information to the callers, who can no longer see the parameters. But none of the current callers cares, and a future patch will add back an easier-to-use method for accessing those parameters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-23t5550: test display of remote http error messagesLibravatar Jeff King1-0/+10
Since commit 426e70d (remote-curl: show server content on http errors, 2013-04-05), we relay any text/plain error messages from the remote server to the user. However, we never tested it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-10test: rename http fetch and push test filesLibravatar Nguyễn Thái Ngọc Duy1-0/+175
Make clear which one is for dumb protocol, which one is for smart from their file name. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>