summaryrefslogtreecommitdiff
path: root/t/t5550-http-fetch-dumb.sh
AgeCommit message (Collapse)AuthorFilesLines
2022-03-17http tests: use "test_hook" for "smart" and "dumb" http testsLibravatar Ævar Arnfjörð Bjarmason1-15/+10
Change the http tests to use "test_hook" insteadd of "write_script". In both cases we can get rid of sub-shelling. For "t/t5550-http-fetch-dumb.sh" add a trivial helper which sets up the hook and calls "update-server-info". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22t5550: require REFFILESLibravatar Han-Wen Nienhuys1-0/+7
The dumb HTTP protocol exposes ref storage details as part of the protocol, so it only works with the FILES refstorage backend Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-01Merge branch 'jt/transfer-fsck-across-packs'Libravatar Junio C Hamano1-1/+4
The approach to "fsck" the incoming objects in "index-pack" is attractive for performance reasons (we have them already in core, inflated and ready to be inspected), but fundamentally cannot be applied fully when we receive more than one pack stream, as a tree object in one pack may refer to a blob object in another pack as ".gitmodules", when we want to inspect blobs that are used as ".gitmodules" file, for example. Teach "index-pack" to emit objects that must be inspected later and check them in the calling "fetch-pack" process. * jt/transfer-fsck-across-packs: fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args
2021-02-22http-fetch: allow custom index-pack argsLibravatar Jonathan Tan1-1/+4
This is the next step in teaching fetch-pack to pass its index-pack arguments when processing packfiles referenced by URIs. The "--keep" in fetch-pack.c will be replaced with a full message in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19t55[4-9]*: adjust the references to the default branch name "main"Libravatar Johannes Schindelin1-11/+11
This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -e 's/retsam/niam/g' \ -- t55[4-9]*.sh t556x*) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Note that t5541 uses the reversed `master` name: `retsam`. We replace it by the equivalent for `main`: `niam`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19tests: mark tests relying on the current default for `init.defaultBranch`Libravatar Johannes Schindelin1-0/+3
In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-06Merge branch 'bc/sha-256-part-2'Libravatar Junio C Hamano1-0/+18
SHA-256 migration work continues. * bc/sha-256-part-2: (44 commits) remote-testgit: adapt for object-format bundle: detect hash algorithm when reading refs t5300: pass --object-format to git index-pack t5704: send object-format capability with SHA-256 t5703: use object-format serve option t5702: offer an object-format capability in the test t/helper: initialize the repository for test-sha1-array remote-curl: avoid truncating refs with ls-remote t1050: pass algorithm to index-pack when outside repo builtin/index-pack: add option to specify hash algorithm remote-curl: detect algorithm for dumb HTTP by size builtin/ls-remote: initialize repository based on fetch t5500: make hash independent serve: advertise object-format capability for protocol v2 connect: parse v2 refs with correct hash algorithm connect: pass full packet reader when parsing v2 refs Documentation/technical: document object-format for protocol v2 t1302: expect repo format version 1 for SHA-256 builtin/show-index: provide options to determine hash algo t5302: modernize test formatting ...
2020-06-19remote-curl: detect algorithm for dumb HTTP by sizeLibravatar brian m. carlson1-0/+18
When reading the info/refs file for a repository, we have no explicit way to detect which hash algorithm is in use because the file doesn't provide one. Detect the hash algorithm in use by the size of the first object ID. If we have an empty repository, we don't know what the hash algorithm is on the remote side, so default to whatever the local side has configured. Without doing this, we cannot clone an empty repository since we don't know its hash algorithm. Test this case appropriately, since we currently have no tests for cloning an empty repository with the dumb HTTP protocol. We anonymize the URL like elsewhere in the function in case the user has decided to include a secret in the URL. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10http-fetch: support fetching packfiles by URLLibravatar Jonathan Tan1-0/+30
Teach http-fetch the ability to download packfiles directly, given a URL, and to verify them. The http_pack_request suite has been augmented with a function that takes a URL directly. With this function, the hash is only used to determine the name of the temporary file. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-22Merge branch 'dl/test-must-fail-fixes-3'Libravatar Junio C Hamano1-3/+1
Test clean-up continues. * dl/test-must-fail-fixes-3: t5801: teach compare_refs() to accept ! t5612: stop losing return codes of git commands t5612: don't use `test_must_fail test_cmp` t5607: reorder `nongit test_must_fail` t5550: simplify no matching line check t5512: stop losing return codes of git commands t5512: stop losing git exit code in here-docs t5512: don't use `test_must_fail test_cmp`
2020-04-19Git 2.22.4Libravatar Jonathan Nieder1-5/+11
This merges up the security fix from v2.17.5. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2020-04-19Git 2.18.4Libravatar Jonathan Nieder1-5/+11
This merges up the security fix from v2.17.5. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2020-04-19credential: treat URL with empty scheme as invalidLibravatar Jonathan Nieder1-0/+9
Until "credential: refuse to operate when missing host or protocol", Git's credential handling code interpreted URLs with empty scheme to mean "give me credentials matching this host for any protocol". Luckily libcurl does not recognize such URLs (it tries to look for a protocol named "" and fails). Just in case that changes, let's reject them within Git as well. This way, credential_from_url is guaranteed to always produce a "struct credential" with protocol and host set. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2020-04-19credential: treat URL without scheme as invalidLibravatar Jonathan Nieder1-5/+2
libcurl permits making requests without a URL scheme specified. In this case, it guesses the URL from the hostname, so I can run git ls-remote http::ftp.example.com/path/to/repo and it would make an FTP request. Any user intentionally using such a URL is likely to have made a typo. Unfortunately, credential_from_url is not able to determine the host and protocol in order to determine appropriate credentials to send, and until "credential: refuse to operate when missing host or protocol", this resulted in another host's credentials being leaked to the named host. Teach credential_from_url_gently to consider such a URL to be invalid so that fsck can detect and block gitmodules files with such URLs, allowing server operators to avoid serving them to downstream users running older versions of Git. This also means that when such URLs are passed on the command line, Git will print a clearer error so affected users can switch to the simpler URL that explicitly specifies the host and protocol they intend. One subtlety: .gitmodules files can contain relative URLs, representing a URL relative to the URL they were cloned from. The relative URL resolver used for .gitmodules can follow ".." components out of the path part and past the host part of a URL, meaning that such a relative URL can be used to traverse from a https://foo.example.com/innocent superproject to a https::attacker.example.com/exploit submodule. Fortunately a leading ':' in the first path component after a series of leading './' and '../' components is unlikely to show up in other contexts, so we can catch this by detecting that pattern. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Reviewed-by: Jeff King <peff@peff.net>
2020-03-27t5550: simplify no matching line checkLibravatar Denton Liu1-3/+1
In the 'did not use upload-pack service' test, we have a complicated song-and-dance to ensure that there are no "/git-upload-pack" lines in "$HTTPD_ROOT_PATH/access.log". Simplify this by just checking that grep returns a non-zero exit code. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25Merge branch 'sg/test-atexit'Libravatar Junio C Hamano1-1/+0
Test framework update to more robustly clean up leftover files and processes after tests are done. * sg/test-atexit: t9811-git-p4-label-import: fix pipeline negation git p4 test: disable '-x' tracing in the p4d watchdog loop git p4 test: simplify timeout handling git p4 test: clean up the p4d cleanup functions git p4 test: use 'test_atexit' to kill p4d and the watchdog process t0301-credential-cache: use 'test_atexit' to stop the credentials helper tests: use 'test_atexit' to stop httpd git-daemon: use 'test_atexit` to stop 'git-daemon' test-lib: introduce 'test_atexit' t/lib-git-daemon: make sure to kill the 'git-daemon' process test-lib: fix interrupt handling with 'dash' and '--verbose-log -x'
2019-03-24http: normalize curl results for dumb loose and alternates fetchesLibravatar Jeff King1-0/+16
If the dumb-http walker encounters a 404 when fetching a loose object, it then looks at any http-alternates for the object. The 404 check is implemented by missing_target(), which checks not only the http code, but also that we got an http error from the CURLcode. That broke when we stopped using CURLOPT_FAILONERROR in 17966c0a63 (http: avoid disconnecting on 404s for loose objects, 2016-07-11), since our CURLcode will now be CURLE_OK. As a result, fetching over dumb-http from a repository with alternates could result in Git printing "Unable to find abcd1234..." and aborting. We could probably fix this just by loosening missing_target(). However, there's other code which looks at the curl result, and it would have to be tweaked as well. Instead, let's just normalize the result the same way the smart-http code does. There's a similar case in processing the alternates (where we failover from "info/http-alternates" to "info/alternates"). We'll give it the same treatment. After this patch, we should be hitting all code paths that need this normalization (notably absent here is the http_pack_request path, but it does not use FAILONERROR, nor missing_target()). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-14tests: use 'test_atexit' to stop httpdLibravatar SZEDER Gábor1-1/+0
Use 'test_atexit' to run cleanup commands to stop httpd at the end of the test script or upon interrupt or failure, as it is shorter, simpler, and more robust than registering such cleanup commands in the trap on EXIT in the test scripts. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-24http-fetch: make `-a` standard behaviourLibravatar Martin Ågren1-0/+11
This is a follow-up to a6c786fce8 (Mark http-fetch without -a as deprecated, 2011-08-23). For more than six years, we have been warning when `-a` is not provided, and the documentation has been saying that `-a` will become the default. It is a bit unclear what "default" means here. There is no such thing as `http-fetch --no-a`. But according to my searches, no-one has been asking on the mailing list how they should silence the warning and prepare for overriding the flipped default. So let's assume that everybody is happy with `-a`. They should be, since not using it may break the repo in such a way that Git itself is unable to fix it. Always behave as if `-a` was given. Since `-a` implies `-c` (get commit objects) and `-t` (get trees), all three options are now unnecessary. Document all of these as historical artefacts that have no effect. Leave no-op code for handling these options in http-fetch.c. The options-handling is currently rather loose. If someone tightens it, we will not want these ignored options to accidentally turn into hard errors. Since `-a` was the only safe and sane usage and we have been pushing people towards it for a long time, refrain from warning when it is used "unnecessarily" now. Similarly, do not add anything scary-looking to the man-page about how it will be removed in the future. We can always do so later. (It is not like we are in desperate need of freeing up one-letter arguments.) Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-18t5550: use write_script to generate post-update hookLibravatar Brandon Williams1-2/+3
The post-update hooks created in t5550-http-fetch-dumb.sh is missing the "!#/bin/sh" line which can cause issues with portability. Instead create the hook using the 'write_script' function which includes the proper "#!" line. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-10Merge branch 'jt/http-base-url-update-upon-redirect'Libravatar Junio C Hamano1-0/+9
When a redirected http transport gets an error during the redirected request, we ignored the error we got from the server, and ended up giving a not-so-useful error message. * jt/http-base-url-update-upon-redirect: http: attempt updating base URL only if no error
2017-02-28http: attempt updating base URL only if no errorLibravatar Jonathan Tan1-0/+9
http.c supports HTTP redirects of the form http://foo/info/refs?service=git-upload-pack -> http://anything -> http://bar/info/refs?service=git-upload-pack (that is to say, as long as the Git part of the path and the query string is preserved in the final redirect destination, the intermediate steps can have any URL). However, if one of the intermediate steps results in an HTTP exception, a confusing "unable to update url base from redirection" message is printed instead of a Curl error message with the HTTP exception code. This was introduced by 2 commits. Commit c93c92f ("http: update base URLs when we see redirects", 2013-09-28) introduced a best-effort optimization that required checking if only the "base" part of the URL differed between the initial request and the final redirect destination, but it performed the check before any HTTP status checking was done. If something went wrong, the normal code path was still followed, so this did not cause any confusing error messages until commit 6628eb4 ("http: always update the base URL for redirects", 2016-12-06), which taught http to die if the non-"base" part of the URL differed. Therefore, teach http to check the HTTP status before attempting to check if only the "base" part of the URL differed. This commit teaches http_request_reauth to return early without updating options->base_url upon an error; the only invoker of this function that passes a non-NULL "options" is remote-curl.c (through "http_get_strbuf"), which only uses options->base_url for an informational message in the situations that this commit cares about (that is, when the return value is not HTTP_OK). The included test checks that the redirect scheme at the beginning of this commit message works, and that returning a 502 in the middle of the redirect scheme produces the correct result. Note that this is different from the test in commit 6628eb4 ("http: always update the base URL for redirects", 2016-12-06) in that this commit tests that a Git-shaped URL (http://.../info/refs?service=git-upload-pack) works, whereas commit 6628eb4 tests that a non-Git-shaped URL (http://.../info/refs/foo?service=git-upload-pack) does not work (even though Git is processing that URL) and is an error that is fatal, not silently swallowed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-27Merge branch 'jn/remote-helpers-with-git-dir'Libravatar Junio C Hamano1-0/+9
"git ls-remote" and "git archive --remote" are designed to work without being in a directory under Git's control. However, recent updates revealed that we randomly look into a directory called .git/ without actually doing necessary set-up when working in a repository. Stop doing so. * jn/remote-helpers-with-git-dir: remote helpers: avoid blind fall-back to ".git" when setting GIT_DIR remote: avoid reading $GIT_DIR config in non-repo
2017-02-14remote helpers: avoid blind fall-back to ".git" when setting GIT_DIRLibravatar Jonathan Nieder1-0/+9
To push from or fetch to the current repository, remote helpers need to know what repository that is. Accordingly, Git sets the GIT_DIR environment variable to the path to the current repository when invoking remote helpers. There is a special case it does not handle: "git ls-remote" and "git archive --remote" can be run to inspect a remote repository without being run from any local repository. GIT_DIR is not useful in this scenario: - if we are not in a repository, we don't need to set GIT_DIR to override an existing GIT_DIR value from the environment. If GIT_DIR is present then we would be in a repository if it were valid and would have called die() if it weren't. - not setting GIT_DIR may cause a helper to do the usual discovery walk to find the repository. But we know we're not in one, or we would have found it ourselves. So in the worst case it may expend a little extra effort to try to find a repository and fail (for example, remote-curl would do this to try to find repository-level configuration). So leave GIT_DIR unset in this case. This makes GIT_DIR easier to understand for remote helper authors and makes transport code less of a special case for repository discovery. Noticed using b1ef400e (setup_git_env: avoid blind fall-back to ".git", 2016-10-20) from 'next': $ cd /tmp $ git ls-remote https://kernel.googlesource.com/pub/scm/git/git fatal: BUG: setup_git_env called without repository Helped-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17Merge branch 'jk/http-walker-limit-redirect' into maintLibravatar Junio C Hamano1-0/+61
Update the error messages from the dumb-http client when it fails to obtain loose objects; we used to give sensible error message only upon 404 but we now forbid unexpected redirects that needs to be reported with something sensible. * jk/http-walker-limit-redirect: http-walker: complain about non-404 loose object errors http: treat http-alternates like redirects http: make redirects more obvious remote-curl: rename shadowed options variable http: always update the base URL for redirects http: simplify update_url_from_redirect
2016-12-27Merge branch 'bw/transport-protocol-policy'Libravatar Junio C Hamano1-0/+10
Finer-grained control of what protocols are allowed for transports during clone/fetch/push have been enabled via a new configuration mechanism. * bw/transport-protocol-policy: http: respect protocol.*.allow=user for http-alternates transport: add from_user parameter to is_transport_allowed http: create function to get curl allowed protocols transport: add protocol policy config option http: always warn if libcurl version is too old lib-proto-disable: variable name fix
2016-12-19Merge branch 'jk/http-walker-limit-redirect-2.9'Libravatar Junio C Hamano1-0/+61
Transport with dumb http can be fooled into following foreign URLs that the end user does not intend to, especially with the server side redirects and http-alternates mechanism, which can lead to security issues. Tighten the redirection and make it more obvious to the end user when it happens. * jk/http-walker-limit-redirect-2.9: http: treat http-alternates like redirects http: make redirects more obvious remote-curl: rename shadowed options variable http: always update the base URL for redirects http: simplify update_url_from_redirect
2016-12-15http: respect protocol.*.allow=user for http-alternatesLibravatar Jeff King1-0/+10
The http-walker may fetch the http-alternates (or alternates) file from a remote in order to find more objects. This should count as a "not from the user" use of the protocol. But because we implement the redirection ourselves and feed the new URL to curl, it will use the CURLOPT_PROTOCOLS rules, not the more restrictive CURLOPT_REDIR_PROTOCOLS. The ideal solution would be for each curl request we make to know whether or not is directly from the user or part of an alternates redirect, and then set CURLOPT_PROTOCOLS as appropriate. However, that would require plumbing that information through all of the various layers of the http code. Instead, let's check the protocol at the source: when we are parsing the remote http-alternates file. The only downside is that if there's any mismatch between what protocol we think it is versus what curl thinks it is, it could violate the policy. To address this, we'll make the parsing err on the picky side, and only allow protocols that it can parse definitively. So for example, you can't elude the "http" policy by asking for "HTTP://", even though curl might handle it; we would reject it as unknown. The only unsafe case would be if you have a URL that starts with "http://" but curl interprets as another protocol. That seems like an unlikely failure mode (and we are still protected by our base CURLOPT_PROTOCOL setting, so the worst you could do is trigger one of https, ftp, or ftps). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06http: treat http-alternates like redirectsLibravatar Jeff King1-0/+38
The previous commit made HTTP redirects more obvious and tightened up the default behavior. However, there's another way for a server to ask a git client to fetch arbitrary content: by having an http-alternates file (or a regular alternates file, which is used as a backup). Similar to the HTTP redirect case, a malicious server can claim to have refs pointing at object X, return a 404 when the client asks for X, but point to some other URL via http-alternates, which the client will transparently fetch. The end result is that it looks from the user's perspective like the objects came from the malicious server, as the other URL is not mentioned at all. Worse, because we feed the new URL to curl ourselves, the usual protocol restrictions do not kick in (neither curl's default of disallowing file://, nor the protocol whitelisting in f4113cac0 (http: limit redirection to protocol-whitelist, 2015-09-22). Let's apply the same rules here as we do for HTTP redirects. Namely: - unless http.followRedirects is set to "always", we will not follow remote redirects from http-alternates (or alternates) at all - set CURLOPT_PROTOCOLS alongside CURLOPT_REDIR_PROTOCOLS restrict ourselves to a known-safe set and respect any user-provided whitelist. - mention alternate object stores on stderr so that the user is aware another source of objects may be involved The first item may prove to be too restrictive. The most common use of alternates is to point to another path on the same server. While it's possible for a single-server redirect to be an attack, it takes a fairly obscure setup (victim and evil repository on the same host, host speaks dumb http, and evil repository has access to edit its own http-alternates file). So we could make the checks more specific, and only cover cross-server redirects. But that means parsing the URLs ourselves, rather than letting curl handle them. This patch goes for the simpler approach. Given that they are only used with dumb http, http-alternates are probably pretty rare. And there's an escape hatch: the user can allow redirects on a specific server by setting http.<url>.followRedirects to "always". Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-06http: make redirects more obviousLibravatar Jeff King1-0/+23
We instruct curl to always follow HTTP redirects. This is convenient, but it creates opportunities for malicious servers to create confusing situations. For instance, imagine Alice is a git user with access to a private repository on Bob's server. Mallory runs her own server and wants to access objects from Bob's repository. Mallory may try a few tricks that involve asking Alice to clone from her, build on top, and then push the result: 1. Mallory may simply redirect all fetch requests to Bob's server. Git will transparently follow those redirects and fetch Bob's history, which Alice may believe she got from Mallory. The subsequent push seems like it is just feeding Mallory back her own objects, but is actually leaking Bob's objects. There is nothing in git's output to indicate that Bob's repository was involved at all. The downside (for Mallory) of this attack is that Alice will have received Bob's entire repository, and is likely to notice that when building on top of it. 2. If Mallory happens to know the sha1 of some object X in Bob's repository, she can instead build her own history that references that object. She then runs a dumb http server, and Alice's client will fetch each object individually. When it asks for X, Mallory redirects her to Bob's server. The end result is that Alice obtains objects from Bob, but they may be buried deep in history. Alice is less likely to notice. Both of these attacks are fairly hard to pull off. There's a social component in getting Mallory to convince Alice to work with her. Alice may be prompted for credentials in accessing Bob's repository (but not always, if she is using a credential helper that caches). Attack (1) requires a certain amount of obliviousness on Alice's part while making a new commit. Attack (2) requires that Mallory knows a sha1 in Bob's repository, that Bob's server supports dumb http, and that the object in question is loose on Bob's server. But we can probably make things a bit more obvious without any loss of functionality. This patch does two things to that end. First, when we encounter a whole-repo redirect during the initial ref discovery, we now inform the user on stderr, making attack (1) much more obvious. Second, the decision to follow redirects is now configurable. The truly paranoid can set the new http.followRedirects to false to avoid any redirection entirely. But for a more practical default, we will disallow redirects only after the initial ref discovery. This is enough to thwart attacks similar to (2), while still allowing the common use of redirects at the repository level. Since c93c92f30 (http: update base URLs when we see redirects, 2013-09-28) we re-root all further requests from the redirect destination, which should generally mean that no further redirection is necessary. As an escape hatch, in case there really is a server that needs to redirect individual requests, the user can set http.followRedirects to "true" (and this can be done on a per-server basis via http.*.followRedirects config). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-15Merge branch 'jk/fix-remote-curl-url-wo-proto'Libravatar Junio C Hamano1-0/+8
"git fetch http::/site/path" did not die correctly and segfaulted instead. * jk/fix-remote-curl-url-wo-proto: remote-curl: handle URLs without protocol
2016-09-08remote-curl: handle URLs without protocolLibravatar Jeff King1-0/+8
Generally remote-curl would never see a URL that did not have "proto:" at the beginning, as that is what tells git to run the "git-remote-proto" helper (and git-remote-http, etc, are aliases for git-remote-curl). However, the special syntax "proto::something" will run git-remote-proto with only "something" as the URL. So a malformed URL like: http::/example.com/repo.git will feed the URL "/example.com/repo.git" to git-remote-http. The resulting URL has no protocol, but the code added by 372370f (http: use credential API to handle proxy authentication, 2016-01-26) does not handle this case and segfaults. For the purposes of this code, we don't really care what the exact protocol; only whether or not it is https. So let's just assume that a missing protocol is not, and curl will handle the real error (which is that the URL is nonsense). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07t5550-http-fetch-dumb.sh: use the GIT_TRACE_CURL environment varLibravatar Elia Pinto1-5/+5
Use the new GIT_TRACE_CURL environment variable instead of the deprecated GIT_CURL_VERBOSE. Signed-off-by: Elia Pinto <gitter.spiros@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-28submodule: use prepare_submodule_repo_env consistentlyLibravatar Jeff King1-0/+11
Before 14111fc (git: submodule honor -c credential.* from command line, 2016-02-29), it was sufficient for code which spawned a process in a submodule to just set the child process's "env" field to "local_repo_env" to clear the environment of any repo-specific variables. That commit introduced a more complicated procedure, in which we clear most variables but allow through sanitized config. For C code, we used that procedure only for cloning, but not for any of the programs spawned by submodule.c. As a result, things like "git fetch --recurse-submodules" behave differently than "git clone --recursive"; the former will not pass through the sanitized config. We can fix this by using prepare_submodule_repo_env() everywhere in submodule.c. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-28submodule: export sanitized GIT_CONFIG_PARAMETERSLibravatar Jeff King1-0/+17
Commit 14111fc (git: submodule honor -c credential.* from command line, 2016-02-29) taught git-submodule.sh to save the sanitized value of $GIT_CONFIG_PARAMETERS when clearing the environment for a submodule. However, it failed to export the result, meaning that it had no effect for any sub-programs. We didn't catch this in our initial tests because we checked only the "clone" case, which does not go through the shell script at all. Provoking "git submodule update" to do a fetch demonstrates the bug. Noticed-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-28t5550: break submodule config test into multiple sub-testsLibravatar Jeff King1-2/+6
Right now we test only the cloning case, but there are other interesting cases (e.g., fetching). Let's pull the setup bits into their own test, which will make things flow more logically once we start adding more tests which use the setup. Let's also introduce some whitespace to the clone-test to split the two parts: making sure it fails without our cmdline config, and that it succeeds with it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-28t5550: fix typo in $HTTPD_URLLibravatar Jeff King1-1/+1
Commit 14111fc (git: submodule honor -c credential.* from command line, 2016-02-29) accidentally wrote $HTTP_URL. It happened to work because we ended up with "credential..helper", which we treat the same as "credential.helper", applying it to all URLs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-01git: submodule honor -c credential.* from command lineLibravatar Jacob Keller1-0/+17
Due to the way that the git-submodule code works, it clears all local git environment variables before entering submodules. This is normally a good thing since we want to clear settings such as GIT_WORKTREE and other variables which would affect the operation of submodule commands. However, GIT_CONFIG_PARAMETERS is special, and we actually do want to preserve these settings. However, we do not want to preserve all configuration as many things should be left specific to the parent project. Add a git submodule--helper function, sanitize-config, which shall be used to sanitize GIT_CONFIG_PARAMETERS, removing all key/value pairs except a small subset that are known to be safe and necessary. Replace all the calls to clear_local_git_env with a wrapped function that filters GIT_CONFIG_PARAMETERS using the new helper and then restores it to the filtered subset after clearing the rest of the environment. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-04t/t5550-http-fetch-dumb.sh: use the $( ... ) construct for command substitutionLibravatar Elia Pinto1-4/+4
The Git CodingGuidelines prefer the $(...) construct for command substitution instead of using the backquotes `...`. The backquoted form is the traditional method for command substitution, and is supported by POSIX. However, all but the simplest uses become complicated quickly. In particular, embedded command substitutions and/or the use of double quotes require careful escaping with the backslash character. The patch was generated by: for _f in $(find . -name "*.sh") do perl -i -pe 'BEGIN{undef $/;} s/`(.+?)`/\$(\1)/smg' "${_f}" done and then carefully proof-read. Signed-off-by: Elia Pinto <gitter.spiros@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-22Merge branch 'jk/skip-http-tests-under-no-curl'Libravatar Junio C Hamano1-6/+0
Test clean-up. * jk/skip-http-tests-under-no-curl: tests: skip dav http-push tests under NO_EXPAT=NoThanks t/lib-httpd.sh: skip tests if NO_CURL is defined
2015-05-07t/lib-httpd.sh: skip tests if NO_CURL is definedLibravatar Jeff King1-6/+0
If we built git without curl, we can't actually test against an http server. In fact, all of the test scripts which include lib-httpd.sh already perform this check, with one exception: t5540. For those scripts, this is a noop, and for t5540, this is a bugfix (it used to fail when built with NO_CURL, though it could go unnoticed if you had a stale git-remote-https in your build directory). Noticed-by: Junio C Hamano <junio@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20t: use test_might_fail for diff and grepLibravatar Jeff King1-2/+2
Some tests run diff or grep to produce an output, and then compare the output to an expected value. We know the exit code we expect these processes to have (e.g., grep yields 0 if it produced output and 1 otherwise), so it would not make the test wrong to look for it. But the difference between their output and the expected output (e.g., shown by test_cmp) is much more useful to somebody debugging the test than the test just bailing out. These tests break the &&-chain to skip the exit-code check of the process. However, we can get the same effect by using test_might_fail. Note that in some cases the test did use "|| return 1", which meant the test was not wrong, but it did fool --chain-lint. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20t: fix trivial &&-chain breakageLibravatar Jeff King1-1/+1
These are tests which are missing a link in their &&-chain, but during a setup phase. We may fail to notice failure in commands that build the test environment, but these are typically not expected to fail at all (but it's still good to double-check that our test environment is what we expect). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-18Merge branch 'ye/http-accept-language'Libravatar Junio C Hamano1-0/+42
Using environment variable LANGUAGE and friends on the client side, HTTP-based transports now send Accept-Language when making requests. * ye/http-accept-language: http: add Accept-Language header if possible
2015-02-17Merge branch 'jk/dumb-http-idx-fetch-fix'Libravatar Junio C Hamano1-0/+18
A broken pack .idx file in the receiving repository prevented the dumb http transport from fetching a good copy of it from the other side. * jk/dumb-http-idx-fetch-fix: dumb-http: do not pass NULL path to parse_pack_index
2015-01-28http: add Accept-Language header if possibleLibravatar Yi EungJun1-0/+42
Add an Accept-Language header which indicates the user's preferred languages defined by $LANGUAGE, $LC_ALL, $LC_MESSAGES and $LANG. Examples: LANGUAGE= -> "" LANGUAGE=ko:en -> "Accept-Language: ko, en;q=0.9, *;q=0.1" LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *;q=0.1" LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *;q=0.1" This gives git servers a chance to display remote error messages in the user's preferred language. Limit the number of languages to 1,000 because q-value must not be smaller than 0.001, and limit the length of Accept-Language header to 4,000 bytes for some HTTP servers which cannot accept such long header. Signed-off-by: Yi EungJun <eungjun.yi@navercorp.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-27dumb-http: do not pass NULL path to parse_pack_indexLibravatar Jeff King1-0/+18
Once upon a time, dumb http always fetched .idx files directly into their final location, and then checked their validity with parse_pack_index. This was refactored in commit 750ef42 (http-fetch: Use temporary files for pack-*.idx until verified, 2010-04-19), which uses the following logic: 1. If we have the idx already in place, see if it's valid (using parse_pack_index). If so, use it. 2. Otherwise, fetch the .idx to a tempfile, check that, and if so move it into place. 3. Either way, fetch the pack itself if necessary. However, it got step 1 wrong. We pass a NULL path parameter to parse_pack_index, so an existing .idx file always looks broken. Worse, we do not treat this broken .idx as an opportunity to re-fetch, but instead return an error, ignoring the pack entirely. This can lead to a dumb-http fetch failing to retrieve the necessary objects. This doesn't come up much in practice, because it must be a packfile that we found out about (and whose .idx we stored) during an earlier dumb-http fetch, but whose packfile we _didn't_ fetch. I.e., we did a partial clone of a repository, didn't need some packfiles, and now a followup fetch needs them. Discovery and tests by Charles Bailey <charles@hashpling.org>. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-17http: fix charset detection of extract_content_type()Libravatar Yi EungJun1-0/+5
extract_content_type() could not extract a charset parameter if the parameter is not the first one and there is a whitespace and a following semicolon just before the parameter. For example: text/plain; format=fixed ;charset=utf-8 And it also could not handle correctly some other cases, such as: text/plain; charset=utf-8; format=fixed text/plain; some-param="a long value with ;semicolons;"; charset=utf-8 Thanks-to: Jeff King <peff@peff.net> Signed-off-by: Yi EungJun <eungjun.yi@navercorp.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-27remote-curl: reencode http error messagesLibravatar Jeff King1-0/+5
We currently recognize an error message with a content-type "text/plain; charset=utf-16" as text, but we ignore the charset parameter entirely. Let's encode it to log_output_encoding, which is presumably something the user's terminal can handle. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-27http: extract type/subtype portion of content-typeLibravatar Jeff King1-0/+5
When we get a content-type from curl, we get the whole header line, including any parameters, and without any normalization (like downcasing or whitespace) applied. If we later try to match it with strcmp() or even strcasecmp(), we may get false negatives. This could cause two visible behaviors: 1. We might fail to recognize a smart-http server by its content-type. 2. We might fail to relay text/plain error messages to users (especially if they contain a charset parameter). This patch teaches the http code to extract and normalize just the type/subtype portion of the string. This is technically passing out less information to the callers, who can no longer see the parameters. But none of the current callers cares, and a future patch will add back an easier-to-use method for accessing those parameters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>