tgif.git - Terin's Improved Git Fork

Age	Commit message (Collapse)	Author	Files	Lines
2016-07-13	fsck: use streaming interface for large blobs in pack	Nguyễn Thái Ngọc Duy	1	-4/+3
	For blobs, we want to make sure the on-disk data is not corrupted (i.e. can be inflated and produce the expected SHA-1). Blob content is opaque, there's nothing else inside to check for. For really large blobs, we may want to avoid unpacking the entire blob in memory, just to check whether it produces the same SHA-1. On 32-bit systems, we may not have enough virtual address space for such memory allocation. And even on 64-bit where it's not a problem, allocating a lot more memory could result in kicking other parts of systems to swap file, generating lots of I/O and slowing everything down. For this particular operation, not unpacking the blob and letting check_sha1_signature, which supports streaming interface, do the job is sufficient. check_sha1_signature() is not shown in the diff, unfortunately. But if will be called when "data_valid && !data" is false. We will call the callback function "fn" with NULL as "data". The only callback of this function is fsck_obj_buffer(), which does not touch "data" at all if it's a blob. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-14	t1050-large: generate large files without dd	Johannes Sixt	1	-6/+6
	For some unknown reason, the dd on my Windows box segfaults randomly, but since recently, it does so much more often than it used to, which makes running the test suite burdensome. Use printf to write large files instead of dd. To emphasize that three of the large blobs are exact copies, use cp to allocate them. The new code makes the files a bit smaller, and they are not sparse anymore, but the tests do not depend on these properties. We do not want to use test-genrandom here (which is used to generate large files elsewhere in t1050), so that the files can be compressed well (which keeps the run-time short). The files are now large text files, not binary files. But since they are larger than core.bigfilethreshold they are diagnosed as binary by Git. For this reason, the 'git diff' tests that check the output for "Binary files differ" still pass. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-08	Merge branch 'sp/stream-clean-filter'	Junio C Hamano	1	-1/+1
	When running a required clean filter, we do not have to mmap the original before feeding the filter. Instead, stream the file contents directly to the filter and process its output. * sp/stream-clean-filter: sha1_file: don't convert off_t to size_t too early to avoid potential die() convert: stream from fd to required clean filter to reduce used address space copy_fd(): do not close the input file descriptor mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT config.c: add git_env_ulong() to parse environment variable convert: drop arguments other than 'path' from would_convert_to_git()
2014-08-28	memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT	Steffen Prohaska	1	-1/+1
	GIT_ALLOC_LIMIT limits xmalloc()'s size, which is of type size_t. Better use git_env_ulong() to parse the environment variable, so that the postfixes 'k', 'm', and 'g' can be used; and use size_t to store the limit for consistency. The change to size_t has no direct practical impact, because the environment variable is only meant to be used for our own tests, and we use it to test small sizes. The cast of size in the call to die() is changed to uintmax_t to match the format string PRIuMAX. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-18	diff: shortcut for diff'ing two binary SHA-1 objects	Nguyễn Thái Ngọc Duy	1	-0/+10
	If we are given two SHA-1 and asked to determine if they are different (but not _what_ differences), we know right away by comparing SHA-1. A side effect of this patch is, because large files are marked binary, diff-tree will not need to unpack them. 'diff-index --cached' will not either. But 'diff-files' still does. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-18	diff --stat: mark any file larger than core.bigfilethreshold binary	Nguyễn Thái Ngọc Duy	1	-0/+4
	Too large files may lead to failure to allocate memory. If it happens here, it could impact quite a few commands that involve diff. Moreover, too large files are inefficient to compare anyway (and most likely non-text), so mark them binary and skip looking at their content. Noticed-by: Dale R. Worley <worley@alum.mit.edu> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-18	sha1_file.c: do not die failing to malloc in unpack_compressed_entry	Nguyễn Thái Ngọc Duy	1	-0/+6
	Fewer die() gives better control to the caller, provided that the caller _can_ handle it. And in unpack_compressed_entry() case, it can, because unpack_compressed_entry() already returns NULL if it fails to inflate data. A side effect from this is fsck continues to run when very large blobs are present (and do not fit in memory). Noticed-by: Dale R. Worley <worley@alum.mit.edu> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-29	t1050-large.sh: use the $( ... ) construct for command substitution	Elia Pinto	1	-2/+2
	The Git CodingGuidelines prefer the $(...) construct for command substitution instead of using the backquotes `...`. The backquoted form is the traditional method for command substitution, and is supported by POSIX. However, all but the simplest uses become complicated quickly. In particular, embedded command substitutions and/or the use of double quotes require careful escaping with the backslash character. The patch was generated by: for _f in $(find . -name ".sh") do sed -i 's@`$.$`@$(\1)@g' ${_f} done and then carefully proof-read. Signed-off-by: Elia Pinto <gitter.spiros@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-28	Merge branch 'nd/stream-pack-objects'	Junio C Hamano	1	-0/+12
	"pack-objects" learned to read large loose blobs using the streaming API, without the need to hold everything in core at once.
2012-05-29	pack-objects: use streaming interface for reading large loose blobs	Nguyễn Thái Ngọc Duy	1	-0/+12
	git usually streams large blobs directly to packs. But there are cases where git can create large loose blobs (unpack-objects or hash-object over pipe). Or they can come from other git implementations. core.bigfilethreshold can also be lowered down and introduce a new wave of large loose blobs. Use streaming interface to read/compress/write these blobs in one go. Fall back to normal way if somehow streaming interface cannot be used. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23	index-pack: use streaming interface on large blobs (most of the time)	Nguyễn Thái Ngọc Duy	1	-0/+5
	unpack_raw_entry() will not allocate and return decompressed blobs if they are larger than core.bigFileThreshold. sha1_object() may not be called on those objects because there's no actual content. sha1_object() is called later on those objects, where we can safely use get_data_from_pack() to retrieve blob content for checking. However we always do that when we definitely need the blob content. And we often don't. There are two cases when we may need object content. The first case is when we find an in-repo blob with the same SHA-1. We need to do collision test, byte-on-byte. If this test is on, the blob must be loaded on memory (i.e. no streaming). Normally (e.g. in fetch/pull/clone) this does not happen because git avoid to send objects that client already has. The other case is when --strict is specified and the object in question is not a blob, which can't happen in reality becase we deal with large _blobs_ here. Note: --verify (or git-verify-pack) a pack from current repository will trigger collision test on every object in the pack, which effectively disables this patch. This could be easily worked around by setting GIT_DIR to an imaginary place with no packs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-03	archive-zip: streaming for deflated files	René Scharfe	1	-0/+4
	After an entry has been streamed out, its CRC and sizes are written as part of a data descriptor. For simplicity, we make the buffer for the compressed chunks twice as big as for the uncompressed ones, to be sure the result fit in even if deflate makes them bigger. t5000 verifies output. t1050 makes sure the command always respects core.bigfilethreshold Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-03	archive-zip: streaming for stored files	René Scharfe	1	-0/+4
	Write a data descriptor containing the CRC of the entry and its sizes after streaming it out. For simplicity, do that only if we're storing files (option -0) for now. t5000 verifies output. t1050 makes sure the command always respects core.bigfilethreshold Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-03	archive-tar: stream large blobs to tar file	Nguyễn Thái Ngọc Duy	1	-0/+4
	t5000 verifies output while t1050 makes sure the command always respects core.bigfilethreshold Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07	update-server-info: respect core.bigfilethreshold	Nguyễn Thái Ngọc Duy	1	-1/+1
	This command indirectly calls check_sha1_signature() (add_info_ref -> deref_tag -> parse_object -> ..) , which may put whole blob in memory if the blob's size is under core.bigfilethreshold. As config is not read, the threshold is always 512MB. Respect user settings here. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07	show: use streaming API for showing blobs	Nguyễn Thái Ngọc Duy	1	-1/+1
	Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07	cat-file: use streaming API to print blobs	Nguyễn Thái Ngọc Duy	1	-2/+2
	Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07	Add more large blob test cases	Nguyễn Thái Ngọc Duy	1	-2/+36
	New test cases list commands that should work when memory is limited. All memory allocation functions () learn to reject any allocation larger than $GIT_ALLOC_LIMIT if set. () Not exactly all. Some places do not use x* functions, but malloc/calloc directly, notably diff-delta. These code path should never be run on large blobs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-01	bulk-checkin: replace fast-import based implementation	Junio C Hamano	1	-9/+85
	This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13	Bigfile: teach "git add" to send a large file straight to a pack	Junio C Hamano	1	-0/+27
	When adding a new content to the repository, we have always slurped the blob in its entirety in-core first, and computed the object name and compressed it into a loose object file. Handling large binary files (e.g. video and audio asset for games) has been problematic because of this design. At the middle level of "git add" callchain is an internal API index_fd() that takes an open file descriptor to read from the working tree file being added with its size. Teach it to call out to fast-import when adding a large blob. The write-out codepath in entry.c::write_entry() should be taught to stream, instead of reading everything in core. This should not be so hard to implement, especially if we limit ourselves only to loose object files and non-delta representation in packfiles. Signed-off-by: Junio C Hamano <gitster@pobox.com>