summaryrefslogtreecommitdiff
path: root/builtin/upload-archive.c
AgeCommit message (Collapse)AuthorFilesLines
2015-09-25upload-archive: convert sprintf to strbufLibravatar Jeff King1-5/+4
When we report an error to the client, we format it into a fixed-size buffer using vsprintf(). This can't actually overflow in practice, since we only format a very tame subset of strings (mostly strerror() output). However, it's hard to tell immediately, so let's just use a strbuf so readers do not have to wonder. We do add an allocation here, but the performance is not important; the next step is to call die() anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Libravatar Christian Couder1-1/+1
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20pkt-line: provide a LARGE_PACKET_MAX static bufferLibravatar Jeff King1-5/+2
Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20pkt-line: teach packet_read_line to chomp newlinesLibravatar Jeff King1-4/+0
The packets sent during ref negotiation are all terminated by newline; even though the code to chomp these newlines is short, we end up doing it in a lot of places. This patch teaches packet_read_line to auto-chomp the trailing newline; this lets us get rid of a lot of inline chomping code. As a result, some call-sites which are not reading line-oriented data (e.g., when reading chunks of packfiles alongside sideband) transition away from packet_read_line to the generic packet_read interface. This patch converts all of the existing callsites. Since the function signature of packet_read_line does not change (but its behavior does), there is a possibility of new callsites being introduced in later commits, silently introducing an incompatibility. However, since a later patch in this series will change the signature, such a commit would have to be merged directly into this commit, not to the tip of the series; we can therefore ignore the issue. This is an internal cleanup and should produce no change of behavior in the normal case. However, there is one corner case to note. Callers of packet_read_line have never been able to tell the difference between a flush packet ("0000") and an empty packet ("0004"), as both cause packet_read_line to return a length of 0. Readers treat them identically, even though Documentation/technical/protocol-common.txt says we must not; it also says that implementations should not send an empty pkt-line. By stripping out the newline before the result gets to the caller, we will now treat the newline-only packet ("0005\n") the same as an empty packet, which in turn gets treated like a flush packet. In practice this doesn't matter, as neither empty nor newline-only packets are part of git's protocols (at least not for the line-oriented bits, and readers who are not expecting line-oriented packets will be calling packet_read directly, anyway). But even if we do decide to care about the distinction later, it is orthogonal to this patch. The right place to tighten would be to stop treating empty packets as flush packets, and this change does not make doing so any harder. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20upload-archive: use argv_array to store client argumentsLibravatar Jeff King1-21/+14
The current parsing scheme for upload-archive is to pack arguments into a fixed-size buffer, separated by NULs, and put a pointer to each argument in the buffer into a fixed-size argv array. This works fine, and the limits are high enough that nobody reasonable is going to hit them, but it makes the code hard to follow. Instead, let's just stuff the arguments into an argv_array, which is much simpler. That lifts the "all arguments must fit inside 4K together" limit. We could also trivially lift the MAX_ARGS limitation (in fact, we have to keep extra code to enforce it). But that would mean a client could force us to allocate an arbitrary amount of memory simply by sending us "argument" lines. By limiting the MAX_ARGS, we limit an attacker to about 4 megabytes (64 times a maximum 64K packet buffer). That may sound like a lot compared to the 4K limit, but it's not a big deal compared to what git-archive will actually allocate while working (e.g., to load blobs into memory). The important thing is that it is bounded. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20upload-archive: do not copy repo nameLibravatar Jeff King1-7/+2
According to the comment, enter_repo will modify its input. However, this has not been the case since 1c64b48 (enter_repo: do not modify input, 2011-10-04). Drop the now-useless copy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-21upload-archive: use start_command instead of forkLibravatar Jeff King1-31/+12
The POSIX-function fork is not supported on Windows. Use our start_command API instead, respawning ourselves in a special "writer" mode to follow the alternate code path. Remove the NOT_MINGW-prereq for t5000, as git-archive --remote now works. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-15Revert "upload-archive: use start_command instead of fork"Libravatar Junio C Hamano1-21/+47
This reverts commit c09cd77ea2fe3580b33918a99fe138d239ac2aaf, expecting a better version to be rerolled soon.
2011-10-30upload-archive: use start_command instead of forkLibravatar Erik Faye-Lund1-47/+21
The POSIX-function fork is not supported on Windows. Use our start_command API instead. As this is the last call-site that depends on the fork-stub in compat/mingw.h, remove that as well. Add an undocumented flag to git-archive that tells it that the action originated from a remote, so features can be disabled. Thanks to Jeff King for work on this part. Remove the NOT_MINGW-prereq for t5000, as git-archive --remote now works. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-22upload-archive: allow user to turn off filtersLibravatar Jeff King1-1/+1
Some tar filters may be very expensive to run, so sites do not want to expose them via upload-archive. This patch lets users configure tar.<filter>.remote to turn them off. By default, gzip filters are left on, as they are about as expensive as creating zip archives. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-22archive: move file extension format-guessing lowerLibravatar Jeff King1-1/+1
The process for guessing an archive output format based on the filename is something like this: a. parse --output in cmd_archive; check the filename against a static set of mapping heuristics (right now it just matches ".zip" for zip files). b. if found, stick a fake "--format=zip" at the beginning of the arguments list (if the user did specify a --format manually, the later option will override our fake one) c. if it's a remote call, ship the arguments to the remote (including the fake), which will call write_archive on their end d. if it's local, ship the arguments to write_archive locally There are two problems: 1. The set of mappings is static and at too high a level. The write_archive level is going to check config for user-defined formats, some of which will specify extensions. We need to delay lookup until those are parsed, so we can match against them. 2. For a remote archive call, our set of mappings (or formats) may not match the remote side's. This is OK in practice right now, because all versions of git understand "zip" and "tar". But as new formats are added, there is going to be a mismatch between what the client can do and what the remote server can do. To fix (1), this patch refactors the location guessing to happen at the write_archive level, instead of the cmd_archive level. So instead of sticking a fake --format field in the argv list, we actually pass a "name hint" down the callchain; this hint is used at the appropriate time to guess the format (if one hasn't been given already). This patch leaves (2) unfixed. The name_hint is converted to a "--format" option as before, and passed to the remote. This means the local side's idea of how extensions map to formats will take precedence. Another option would be to pass the name hint to the remote side and let the remote choose. This isn't a good idea for two reasons: 1. There's no room in the protocol for passing that information. We can pass a new argument, but older versions of git on the server will choke on it. 2. Letting the remote side decide creates a silent inconsistency in user experience. Consider the case that the locally installed git knows about the "tar.gz" format, but a remote server doesn't. Running "git archive -o foo.tar.gz" will use the tar.gz format. If we use --remote, and the local side chooses the format, then we send "--format=tar.gz" to the remote, which will complain about the unknown format. But if we let the remote side choose the format, then it will realize that it doesn't know about "tar.gz" and output uncompressed tar without even issuing a warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-22Move 'builtin-*' into a 'builtin/' subdirectoryLibravatar Linus Torvalds1-0/+167
This shrinks the top-level directory a bit, and makes it much more pleasant to use auto-completion on the thing. Instead of [torvalds@nehalem git]$ em buil<tab> Display all 180 possibilities? (y or n) [torvalds@nehalem git]$ em builtin-sh builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o [torvalds@nehalem git]$ em builtin-shor<tab> builtin-shortlog.c builtin-shortlog.o [torvalds@nehalem git]$ em builtin-shortlog.c you get [torvalds@nehalem git]$ em buil<tab> [type] builtin/ builtin.h [torvalds@nehalem git]$ em builtin [auto-completes to] [torvalds@nehalem git]$ em builtin/sh<tab> [type] shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o [torvalds@nehalem git]$ em builtin/sho [auto-completes to] [torvalds@nehalem git]$ em builtin/shor<tab> [type] shortlog.c shortlog.o [torvalds@nehalem git]$ em builtin/shortlog.c which doesn't seem all that different, but not having that annoying break in "Display all 180 possibilities?" is quite a relief. NOTE! If you do this in a clean tree (no object files etc), or using an editor that has auto-completion rules that ignores '*.o' files, you won't see that annoying 'Display all 180 possibilities?' message - it will just show the choices instead. I think bash has some cut-off around 100 choices or something. So the reason I see this is that I'm using an odd editory, and thus don't have the rules to cut down on auto-completion. But you can simulate that by using 'ls' instead, or something similar. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>