summaryrefslogtreecommitdiff
path: root/sha1_file.c
AgeCommit message (Collapse)AuthorFilesLines
2011-12-16Merge branch 'jc/stream-to-pack'Libravatar Junio C Hamano1-63/+4
* jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h
2011-12-05Merge branch 'nd/misc-cleanups'Libravatar Junio C Hamano1-1/+2
* nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()
2011-12-01bulk-checkin: replace fast-import based implementationLibravatar Junio C Hamano1-63/+4
This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-15sha1_file: don't mix enum with intLibravatar Ramkumar Ramachandra1-1/+1
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27unpack_object_header_buffer(): clear the size field upon errorLibravatar Junio C Hamano1-1/+2
The callers do not use the returned size when the function says it did not use any bytes and sets the type to OBJ_BAD, so this should not matter in practice, but it is a good code hygiene anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-21Merge branch 'jk/maint-pack-objects-compete-with-delete'Libravatar Junio C Hamano1-2/+2
* jk/maint-pack-objects-compete-with-delete: downgrade "packfile cannot be accessed" errors to warnings pack-objects: protect against disappearing packs
2011-10-14downgrade "packfile cannot be accessed" errors to warningsLibravatar Jeff King1-1/+1
These can happen if another process simultaneously prunes a pack. But that is not usually an error condition, because a properly-running prune should have repacked the object into a new pack. So we will notice that the pack has disappeared unexpectedly, print a message, try other packs (possibly after re-scanning the list of packs), and find it in the new pack. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-14pack-objects: protect against disappearing packsLibravatar Jeff King1-1/+1
It's possible that while pack-objects is running, a simultaneously running prune process might delete a pack that we are interested in. Because we load the pack indices early on, we know that the pack contains our item, but by the time we try to open and map it, it is gone. Since c715f78, we already protect against this in the normal object access code path, but pack-objects accesses the packs at a lower level. In the normal access path, we call find_pack_entry, which will call find_pack_entry_one on each pack index, which does the actual lookup. If it gets a hit, we will actually open and verify the validity of the matching packfile (using c715f78's is_pack_valid). If we can't open it, we'll issue a warning and pretend that we didn't find it, causing us to go on to the next pack (or on to loose objects). Furthermore, we will cache the descriptor to the opened packfile. Which means that later, when we actually try to access the object, we are likely to still have that packfile opened, and won't care if it has been unlinked from the filesystem. Notice the "likely" above. If there is another pack access in the interim, and we run out of descriptors, we could close the pack. And then a later attempt to access the closed pack could fail (we'll try to re-open it, of course, but it may have been deleted). In practice, this doesn't happen because we tend to look up items and then access them immediately. Pack-objects does not follow this code path. Instead, it accesses the packs at a much lower level, using find_pack_entry_one directly. This means we skip the is_pack_valid check, and may end up with the name of a packfile, but no open descriptor. We can add the same is_pack_valid check here. Unfortunately, the access patterns of pack-objects are not quite as nice for keeping lookup and object access together. We look up each object as we find out about it, and the only later when writing the packfile do we necessarily access it. Which means that the opened packfile may be closed in the interim. In practice, however, adding this check still has value, for three reasons. 1. If you have a reasonable number of packs and/or a reasonable file descriptor limit, you can keep all of your packs open simultaneously. If this is the case, then the race is impossible to trigger. 2. Even if you can't keep all packs open at once, you may end up keeping the deleted one open (i.e., you may get lucky). 3. The race window is shortened. You may notice early that the pack is gone, and not try to access it. Triggering the problem without this check means deleting the pack any time after we read the list of index files, but before we access the looked-up objects. Triggering it with this check means deleting the pack means deleting the pack after we do a lookup (and successfully access the packfile), but before we access the object. Which is a smaller window. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-05Merge branch 'wh/normalize-alt-odb-path'Libravatar Junio C Hamano1-17/+20
* wh/normalize-alt-odb-path: sha1_file: normalize alt_odb path before comparing and storing
2011-09-07sha1_file: normalize alt_odb path before comparing and storingLibravatar Hui Wang1-17/+20
When it needs to compare and add an alt object path to the alt_odb_list, we normalize this path first since comparing normalized path is easy to get correct result. Use strbuf to replace some string operations, since it is cleaner and safer. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Hui Wang <Hui.Wang@windriver.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-28Merge branch 'jc/maint-clone-alternates'Libravatar Junio C Hamano1-1/+1
* jc/maint-clone-alternates: clone: clone from a repository with relative alternates clone: allow more than one --reference Conflicts: builtin/clone.c
2011-08-23Merge branch 'rt/zlib-smaller-window'Libravatar Junio C Hamano1-6/+26
* rt/zlib-smaller-window: test: consolidate definition of $LF Tolerate zlib deflation with window size < 32Kb
2011-08-23clone: clone from a repository with relative alternatesLibravatar Junio C Hamano1-1/+1
Cloning from a local repository blindly copies or hardlinks all the files under objects/ hierarchy. This results in two issues: - If the repository cloned has an "objects/info/alternates" file, and the command line of clone specifies --reference, the ones specified on the command line get overwritten by the copy from the original repository. - An entry in a "objects/info/alternates" file can specify the object stores it borrows objects from as a path relative to the "objects/" directory. When cloning a repository with such an alternates file, if the new repository is not sitting next to the original repository, such relative paths needs to be adjusted so that they can be used in the new repository. This updates add_to_alternates_file() to take the path to the alternate object store, including the "/objects" part at the end (earlier, it was taking the path to $GIT_DIR and was adding "/objects" itself), as it is technically possible to specify in objects/info/alternates file the path of a directory whose name does not end with "/objects". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11Tolerate zlib deflation with window size < 32KbLibravatar Roberto Tyley1-6/+26
Git currently reports loose objects as 'corrupt' if they've been deflated using a window size less than 32Kb, because the experimental_loose_object() function doesn't recognise the header byte as a zlib header. This patch makes the function tolerant of all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice it's accuracy in distingushing the standard loose-object format from the experimental (now abandoned) format. On memory constrained systems zlib may use a much smaller window size - working on Agit, I found that Android uses a 4KB window; giving a header byte of 0x48, not 0x78. Consequently all loose objects generated appear 'corrupt', which is why Agit is a read-only Git client at this time - I don't want my client to generate Git repos that other clients treat as broken :( This patch makes Git tolerant of different deflate settings - it might appear that it changes experimental_loose_object() to the point where it could incorrectly identify the experimental format as the standard one, but the two criteria (bitmask & checksum) can only give a false result for an experimental object where both of the following are true: 1) object size is exactly 8 bytes when uncompressed (bitmask) 2) [single-byte in-pack git type&size header] * 256 + [1st byte of the following zlib header] % 31 = 0 (checksum) As it happens, for all possible combinations of valid object type (1-4) and window bits (0-7), the only time when the checksum will be divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which, due the fields all Commit objects must contain, could never be as small as 8 bytes in size. Given this, the combination of the two criteria (bitmask & checksum) always correctly determines the buffer format, and is more tolerant than the previous version. The alternative to this patch is simply removing support for the experimental format, which I am also totally cool with. References: Android uses a 4KB window for deflation: http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53 Code snippet searching for false positives with the zlib checksum: https://gist.github.com/1118177 Signed-off-by: Roberto Tyley <roberto.tyley@guardian.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-05Merge branch 'jc/pack-order-tweak'Libravatar Junio C Hamano1-0/+21
* jc/pack-order-tweak: pack-objects: optimize "recency order" core: log offset pack data accesses happened
2011-08-01Merge branch 'jc/legacy-loose-object' into maintLibravatar Junio C Hamano1-29/+33
* jc/legacy-loose-object: sha1_file.c: "legacy" is really the current format
2011-07-19Merge branch 'jc/index-pack'Libravatar Junio C Hamano1-55/+0
* jc/index-pack: verify-pack: use index-pack --verify index-pack: show histogram when emulating "verify-pack -v" index-pack: start learning to emulate "verify-pack -v" index-pack: a miniscule refactor index-pack --verify: read anomalous offsets from v2 idx file write_idx_file: need_large_offset() helper function index-pack: --verify write_idx_file: introduce a struct to hold idx customization options index-pack: group the delta-base array entries also by type Conflicts: builtin/verify-pack.c cache.h sha1_file.c
2011-07-19Merge branch 'jc/zlib-wrap'Libravatar Junio C Hamano1-14/+14
* jc/zlib-wrap: zlib: allow feeding more than 4GB in one go zlib: zlib can only process 4GB at a time zlib: wrap deflateBound() too zlib: wrap deflate side of the API zlib: wrap inflateInit2 used to accept only for gzip format zlib: wrap remaining calls to direct inflate/inflateEnd zlib wrapper: refactor error message formatter Conflicts: sha1_file.c
2011-07-13Merge branch 'jc/legacy-loose-object'Libravatar Junio C Hamano1-29/+33
* jc/legacy-loose-object: sha1_file.c: "legacy" is really the current format
2011-07-06core: log offset pack data accesses happenedLibravatar Junio C Hamano1-0/+21
In a workload other than "git log" (without pathspec nor any option that causes us to inspect trees and blobs), the recency pack order is said to cause the access jump around quite a bit. Add a hook to allow us observe how bad it is. "git config core.logpackaccess /var/tmp/pal.txt" will give you the log in the specified file. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10zlib: zlib can only process 4GB at a timeLibravatar Junio C Hamano1-9/+9
The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10zlib: wrap deflate side of the APILibravatar Junio C Hamano1-5/+5
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-08sha1_file.c: "legacy" is really the current formatLibravatar Junio C Hamano1-29/+33
Every time I look at the read-loose-object codepath, legacy_loose_object() function makes my brain go through mental contortion. When we were playing with the experimental loose object format, it may have made sense to call the traditional format "legacy", in the hope that the experimental one will some day replace it to become official, but it never happened. This renames the function (and negates its return value) to detect if we are looking at the experimental format, and move the code around in its caller which used to do "if we are looing at legacy, do this special case, otherwise the normal case is this". The codepath to read from the loose objects in experimental format is the "unlikely" case. Someday after Git 2.0, we should drop the support of this format. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05verify-pack: use index-pack --verifyLibravatar Junio C Hamano1-55/+0
This finally gets rid of the inefficient verify-pack implementation that walks objects in the packfile in their object name order and replaces it with a call to index-pack --verify. As a side effect, it also removes packed_object_info_detail() API which is rather expensive. As this changes the way errors are reported (verify-pack used to rely on the usual runtime error detection routine unpack_entry() to diagnose the CRC errors in an entry in the *.idx file; index-pack --verify checks the whole *.idx file in one go), update a test that expected the string "CRC" to appear in the error message. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26sha1_file: use the correct type (ssize_t, not size_t) for read-style functionLibravatar Jim Meyering1-1/+1
Using an unsigned type, we would fail to detect a read error and then proceed to try to write (size_t)-1 bytes. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-25Merge branch 'jc/bigfile'Libravatar Junio C Hamano1-24/+123
* jc/bigfile: Bigfile: teach "git add" to send a large file straight to a pack index_fd(): split into two helper functions index_fd(): turn write_object and format_check arguments into one flag
2011-05-20sha1_file.c: expose helpers to read loose objectsLibravatar Junio C Hamano1-3/+3
Make map_sha1_file(), parse_sha1_header() and unpack_sha1_header() available to the streaming read API by exporting them via cache.h header file. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20unpack_object_header(): make it publicLibravatar Junio C Hamano1-4/+4
This function is used to read and skip over the per-object header in a packfile. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20sha1_object_info_extended(): hint about objects in delta-base cacheLibravatar Junio C Hamano1-0/+9
An object found in the delta-base cache is not guaranteed to stay there, but we know it came from a pack and it is likely to give us a quick access if we read_sha1_file() it right now, which is a piece of useful information. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-19Merge branch 'jc/replacing'Libravatar Junio C Hamano1-9/+7
* jc/replacing: read_sha1_file(): allow selective bypassing of replacement mechanism inline lookup_replace_object() calls read_sha1_file(): get rid of read_sha1_file_repl() madness t6050: make sure we test not just commit replacement Declare lookup_replace_object() in cache.h, not in commit.h Conflicts: environment.c
2011-05-19sha1_object_info_extended(): expose a bit more infoLibravatar Junio C Hamano1-11/+31
The original interface for sha1_object_info() takes an object name and gives back a type and its size (the latter is given only when it was asked). The new interface wraps its implementation and exposes a bit more pieces of information that the interface used to discard, namely: - where the object is stored (loose? cached? packed?) - if packed, where in which packfile? Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * In the earlier round, this used u.pack.delta to record the length of the delta chain, but the caller is not necessarily interested in the length of the delta chain per-se, but may only want to know if it is a delta against another object or is stored as a deflated data. Calling packed_object_info_detail() involves walking the reverse index chain to compute the store size of the object and is unnecessarily expensive. We could resurrect the code if a new caller wants to know, but I doubt it.
2011-05-16packed_object_info_detail(): do not return a stringLibravatar Junio C Hamano1-2/+2
Instead return an integer that can be given to typename() if the caller wants a string, just like everybody else does. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15Merge branches 'jc/convert', 'jc/bigfile' and 'jc/replacing' into jc/streamingLibravatar Junio C Hamano1-33/+130
* jc/convert: convert: make it harder to screw up adding a conversion attribute convert: make it safer to add conversion attributes convert: give saner names to crlf/eol variables, types and functions convert: rename the "eol" global variable to "core_eol" * jc/bigfile: Bigfile: teach "git add" to send a large file straight to a pack index_fd(): split into two helper functions index_fd(): turn write_object and format_check arguments into one flag * jc/replacing: read_sha1_file(): allow selective bypassing of replacement mechanism inline lookup_replace_object() calls read_sha1_file(): get rid of read_sha1_file_repl() madness t6050: make sure we test not just commit replacement Declare lookup_replace_object() in cache.h, not in commit.h
2011-05-15git_open_noatime(): drop unused parameterLibravatar Junio C Hamano1-8/+7
Since commit c793430 (Limit file descriptors used by packs, 2011-02-28), the extra parameter added in f2e872aa (Work around EMFILE when there are too many pack files, 2010-11-01) is not used anymore. Remove it. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-15sha1_file: typofixLibravatar Junio C Hamano1-1/+1
The number zero is spelled "zero", not "zer0". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15read_sha1_file(): allow selective bypassing of replacement mechanismLibravatar Junio C Hamano1-4/+6
The way "object replacement" mechanism was tucked to the read_sha1_file() interface was suboptimal in a couple of ways: - Callers that want it to die with useful diagnosis upon seeing a corrupt object does not have a way to say that they do not want any object replacement. - Callers who do not want it to die but want to handle the errors themselves are told to arrange to call read_object(), but the function does not use the replacement mechanism, and also it is a file scope static function that not many callers can call to begin with. This adds a read_sha1_file_extended() that takes a set of flags; the callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask for object replacement mechanism to kick in. Later, we could add another flag bit to tell the function to return an error instead of dying and then remove the misguided "call read_object() yourself". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15read_sha1_file(): get rid of read_sha1_file_repl() madnessLibravatar Junio C Hamano1-8/+4
Most callers want to silently get a replacement object, and they do not care what the real name of the replacement object is. Worse yet, no sane interface to return the underlying object without replacement is provided. Remove the function and make only the few callers that want the name of the replacement object find it themselves. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13Bigfile: teach "git add" to send a large file straight to a packLibravatar Junio C Hamano1-1/+83
When adding a new content to the repository, we have always slurped the blob in its entirety in-core first, and computed the object name and compressed it into a loose object file. Handling large binary files (e.g. video and audio asset for games) has been problematic because of this design. At the middle level of "git add" callchain is an internal API index_fd() that takes an open file descriptor to read from the working tree file being added with its size. Teach it to call out to fast-import when adding a large blob. The write-out codepath in entry.c::write_entry() should be taught to stream, instead of reading everything in core. This should not be so hard to implement, especially if we limit ourselves only to loose object files and non-delta representation in packfiles. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09index_fd(): split into two helper functionsLibravatar Junio C Hamano1-11/+31
Split out the case where we do not know the size of the input (hence we read everything into a strbuf before doing anything) to index_pipe(), and the other case where we mmap or read the whole data to index_bulk(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09index_fd(): turn write_object and format_check arguments into one flagLibravatar Junio C Hamano1-16/+13
The "format_check" parameter tucked after the existing parameters is too ugly an afterthought to live in any reasonable API. Combine it with the other boolean parameter "write_object" into a single "flags" parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-13remove doubled words, e.g., s/to to/to/, and fix related typosLibravatar Jim Meyering1-1/+1
I found that some doubled words had snuck back into projects from which I'd already removed them, so now there's a "syntax-check" makefile rule in gnulib to help prevent recurrence. Running the command below spotted a few in git, too: git ls-files | xargs perl -0777 -n \ -e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt])\s+\1\b/gims)' \ -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \ -e 'print "$ARGV:$n:$v\n"}' Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-26Merge branch 'jc/maint-rerere-in-workdir'Libravatar Junio C Hamano1-0/+29
* jc/maint-rerere-in-workdir: rerere: make sure it works even in a workdir attached to a young repository
2011-03-23rerere: make sure it works even in a workdir attached to a young repositoryLibravatar Junio C Hamano1-0/+29
The git-new-workdir script in contrib/ makes a new work tree by sharing many subdirectories of the .git directory with the original repository. When rerere.enabled is set in the original repository, but the user has not encountered any conflicts yet, the original repository may not yet have .git/rr-cache directory. When rerere wants to run in a new work tree created from such a young original repository, it fails to mkdir(2) .git/rr-cache that is a symlink to a yet-to-be-created directory. There are three possible approaches to this: - A naive solution is not to create a symlink in the git-new-workdir script to a directory the original does not have (yet). This is not a solution, as we tend to lazily create subdirectories of .git/, and having rerere.enabled configuration set is a strong indication that the user _wants_ to have this lazy creation to happen; - We could always create .git/rr-cache upon repository creation. This is tempting but will not help people with existing repositories. - Detect this case by seeing that mkdir(2) failed with EEXIST, checking that the path is a symlink, and try running mkdir(2) on the link target. This patch solves the issue by doing the third one. Strictly speaking, this is incomplete. It does not attempt to handle relative symbolic link that points into the original repository, but this is good enough to help people who use contrib/workdir/git-new-workdir script. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-23Merge branch 'jn/maint-c99-format'Libravatar Junio C Hamano1-7/+2
* jn/maint-c99-format: unbreak and eliminate NO_C99_FORMAT mktag: avoid %td in format string
2011-03-17unbreak and eliminate NO_C99_FORMATLibravatar Jonathan Nieder1-7/+2
In the spirit of v1.5.0.2~21 (Check for PRIuMAX rather than NO_C99_FORMAT in fast-import.c, 2007-02-20), use PRIuMAX from git-compat-util.h on all platforms instead of C99-specific formats like %zu with dangerous fallbacks to %u or %lu. So now C99-challenged platforms can build git without provoking warnings or errors from printf, even if pointers do not have the same size as an int or long. The need for a fallback PRIuMAX is detected in git-compat-util.h with "#ifndef PRIuMAX". So while at it, simplify the Makefile and configure script by eliminating the NO_C99_FORMAT knob altogether. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-15Merge branch 'sp/maint-fd-limit'Libravatar Junio C Hamano1-18/+66
* sp/maint-fd-limit: sha1_file.c: Don't retain open fds on small packs mingw: add minimum getrlimit() compatibility stub Limit file descriptors used by packs
2011-03-02sha1_file.c: Don't retain open fds on small packsLibravatar Shawn O. Pearce1-5/+36
If a pack file is small enough that its entire contents fits within one mmap window, mmap the file and then immediately close its file descriptor. This reduces the number of file descriptors that are needed to read from repositories with many tiny pack files, such as one that has received 1000 pushes (and created 1000 small pack files) since its last repack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28Limit file descriptors used by packsLibravatar Shawn O. Pearce1-13/+30
Rather than using 'errno == EMFILE' after a failed open() call to indicate the process is out of file descriptors and an LRU pack window should be closed, place a hard upper limit on the number of open packs based on the actual rlimit of the process. By using a hard upper limit that is below the rlimit of the current process it is not necessary to check for EMFILE on every single fd-allocating system call. Instead reserving 25 file descriptors makes it safe to assume the system call won't fail due to being over the filedescriptor limit. Here 25 is chosen as a WAG, but considers 3 for stdin/stdout/stderr, and at least a few for other Git code to operate on temporary files. An additional 20 is reserved as it is not known what the C library needs to perform other services on Git's behalf, such as nsswitch or name resolution. This fixes a case where running `git gc --auto` in a repository with more than 1024 packs (but an rlimit of 1024 open fds) fails due to the temporary output file not being able to allocate a file descriptor. The output file is opened by pack-objects after object enumeration and delta compression are done, both of which have already opened all of the packs and fully populated the file descriptor table. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27Merge branch 'nd/hash-object-sanity'Libravatar Junio C Hamano1-7/+47
* nd/hash-object-sanity: Make hash-object more robust against malformed objects Conflicts: cache.h
2011-02-14correct type of EMPTY_TREE_SHA1_BINLibravatar Jonathan Nieder1-1/+1
Functions such as hashcmp that expect a binary SHA-1 value take parameters of type "unsigned char *" to avoid accepting a textual SHA-1 passed by mistake. Unfortunately, this means passing the string literal EMPTY_TREE_SHA1_BIN requires an ugly cast. Tweak the definition of EMPTY_TREE_SHA1_BIN to produce a value of more convenient type. In the future the definition might change to extern const unsigned char empty_tree_sha1_bin[20]; #define EMPTY_TREE_SHA1_BIN empty_tree_sha1_bin Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>