summaryrefslogtreecommitdiff
path: root/sha1_file.c
AgeCommit message (Collapse)AuthorFilesLines
2011-05-25Merge branch 'jc/bigfile'Libravatar Junio C Hamano1-24/+123
* jc/bigfile: Bigfile: teach "git add" to send a large file straight to a pack index_fd(): split into two helper functions index_fd(): turn write_object and format_check arguments into one flag
2011-05-19Merge branch 'jc/replacing'Libravatar Junio C Hamano1-9/+7
* jc/replacing: read_sha1_file(): allow selective bypassing of replacement mechanism inline lookup_replace_object() calls read_sha1_file(): get rid of read_sha1_file_repl() madness t6050: make sure we test not just commit replacement Declare lookup_replace_object() in cache.h, not in commit.h Conflicts: environment.c
2011-05-15git_open_noatime(): drop unused parameterLibravatar Junio C Hamano1-8/+7
Since commit c793430 (Limit file descriptors used by packs, 2011-02-28), the extra parameter added in f2e872aa (Work around EMFILE when there are too many pack files, 2010-11-01) is not used anymore. Remove it. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-15sha1_file: typofixLibravatar Junio C Hamano1-1/+1
The number zero is spelled "zero", not "zer0". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15read_sha1_file(): allow selective bypassing of replacement mechanismLibravatar Junio C Hamano1-4/+6
The way "object replacement" mechanism was tucked to the read_sha1_file() interface was suboptimal in a couple of ways: - Callers that want it to die with useful diagnosis upon seeing a corrupt object does not have a way to say that they do not want any object replacement. - Callers who do not want it to die but want to handle the errors themselves are told to arrange to call read_object(), but the function does not use the replacement mechanism, and also it is a file scope static function that not many callers can call to begin with. This adds a read_sha1_file_extended() that takes a set of flags; the callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask for object replacement mechanism to kick in. Later, we could add another flag bit to tell the function to return an error instead of dying and then remove the misguided "call read_object() yourself". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15read_sha1_file(): get rid of read_sha1_file_repl() madnessLibravatar Junio C Hamano1-8/+4
Most callers want to silently get a replacement object, and they do not care what the real name of the replacement object is. Worse yet, no sane interface to return the underlying object without replacement is provided. Remove the function and make only the few callers that want the name of the replacement object find it themselves. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13Bigfile: teach "git add" to send a large file straight to a packLibravatar Junio C Hamano1-1/+83
When adding a new content to the repository, we have always slurped the blob in its entirety in-core first, and computed the object name and compressed it into a loose object file. Handling large binary files (e.g. video and audio asset for games) has been problematic because of this design. At the middle level of "git add" callchain is an internal API index_fd() that takes an open file descriptor to read from the working tree file being added with its size. Teach it to call out to fast-import when adding a large blob. The write-out codepath in entry.c::write_entry() should be taught to stream, instead of reading everything in core. This should not be so hard to implement, especially if we limit ourselves only to loose object files and non-delta representation in packfiles. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09index_fd(): split into two helper functionsLibravatar Junio C Hamano1-11/+31
Split out the case where we do not know the size of the input (hence we read everything into a strbuf before doing anything) to index_pipe(), and the other case where we mmap or read the whole data to index_bulk(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09index_fd(): turn write_object and format_check arguments into one flagLibravatar Junio C Hamano1-16/+13
The "format_check" parameter tucked after the existing parameters is too ugly an afterthought to live in any reasonable API. Combine it with the other boolean parameter "write_object" into a single "flags" parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-13remove doubled words, e.g., s/to to/to/, and fix related typosLibravatar Jim Meyering1-1/+1
I found that some doubled words had snuck back into projects from which I'd already removed them, so now there's a "syntax-check" makefile rule in gnulib to help prevent recurrence. Running the command below spotted a few in git, too: git ls-files | xargs perl -0777 -n \ -e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt])\s+\1\b/gims)' \ -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \ -e 'print "$ARGV:$n:$v\n"}' Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-26Merge branch 'jc/maint-rerere-in-workdir'Libravatar Junio C Hamano1-0/+29
* jc/maint-rerere-in-workdir: rerere: make sure it works even in a workdir attached to a young repository
2011-03-23rerere: make sure it works even in a workdir attached to a young repositoryLibravatar Junio C Hamano1-0/+29
The git-new-workdir script in contrib/ makes a new work tree by sharing many subdirectories of the .git directory with the original repository. When rerere.enabled is set in the original repository, but the user has not encountered any conflicts yet, the original repository may not yet have .git/rr-cache directory. When rerere wants to run in a new work tree created from such a young original repository, it fails to mkdir(2) .git/rr-cache that is a symlink to a yet-to-be-created directory. There are three possible approaches to this: - A naive solution is not to create a symlink in the git-new-workdir script to a directory the original does not have (yet). This is not a solution, as we tend to lazily create subdirectories of .git/, and having rerere.enabled configuration set is a strong indication that the user _wants_ to have this lazy creation to happen; - We could always create .git/rr-cache upon repository creation. This is tempting but will not help people with existing repositories. - Detect this case by seeing that mkdir(2) failed with EEXIST, checking that the path is a symlink, and try running mkdir(2) on the link target. This patch solves the issue by doing the third one. Strictly speaking, this is incomplete. It does not attempt to handle relative symbolic link that points into the original repository, but this is good enough to help people who use contrib/workdir/git-new-workdir script. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-23Merge branch 'jn/maint-c99-format'Libravatar Junio C Hamano1-7/+2
* jn/maint-c99-format: unbreak and eliminate NO_C99_FORMAT mktag: avoid %td in format string
2011-03-17unbreak and eliminate NO_C99_FORMATLibravatar Jonathan Nieder1-7/+2
In the spirit of v1.5.0.2~21 (Check for PRIuMAX rather than NO_C99_FORMAT in fast-import.c, 2007-02-20), use PRIuMAX from git-compat-util.h on all platforms instead of C99-specific formats like %zu with dangerous fallbacks to %u or %lu. So now C99-challenged platforms can build git without provoking warnings or errors from printf, even if pointers do not have the same size as an int or long. The need for a fallback PRIuMAX is detected in git-compat-util.h with "#ifndef PRIuMAX". So while at it, simplify the Makefile and configure script by eliminating the NO_C99_FORMAT knob altogether. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-15Merge branch 'sp/maint-fd-limit'Libravatar Junio C Hamano1-18/+66
* sp/maint-fd-limit: sha1_file.c: Don't retain open fds on small packs mingw: add minimum getrlimit() compatibility stub Limit file descriptors used by packs
2011-03-02sha1_file.c: Don't retain open fds on small packsLibravatar Shawn O. Pearce1-5/+36
If a pack file is small enough that its entire contents fits within one mmap window, mmap the file and then immediately close its file descriptor. This reduces the number of file descriptors that are needed to read from repositories with many tiny pack files, such as one that has received 1000 pushes (and created 1000 small pack files) since its last repack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28Limit file descriptors used by packsLibravatar Shawn O. Pearce1-13/+30
Rather than using 'errno == EMFILE' after a failed open() call to indicate the process is out of file descriptors and an LRU pack window should be closed, place a hard upper limit on the number of open packs based on the actual rlimit of the process. By using a hard upper limit that is below the rlimit of the current process it is not necessary to check for EMFILE on every single fd-allocating system call. Instead reserving 25 file descriptors makes it safe to assume the system call won't fail due to being over the filedescriptor limit. Here 25 is chosen as a WAG, but considers 3 for stdin/stdout/stderr, and at least a few for other Git code to operate on temporary files. An additional 20 is reserved as it is not known what the C library needs to perform other services on Git's behalf, such as nsswitch or name resolution. This fixes a case where running `git gc --auto` in a repository with more than 1024 packs (but an rlimit of 1024 open fds) fails due to the temporary output file not being able to allocate a file descriptor. The output file is opened by pack-objects after object enumeration and delta compression are done, both of which have already opened all of the packs and fully populated the file descriptor table. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27Merge branch 'nd/hash-object-sanity'Libravatar Junio C Hamano1-7/+47
* nd/hash-object-sanity: Make hash-object more robust against malformed objects Conflicts: cache.h
2011-02-14correct type of EMPTY_TREE_SHA1_BINLibravatar Jonathan Nieder1-1/+1
Functions such as hashcmp that expect a binary SHA-1 value take parameters of type "unsigned char *" to avoid accepting a textual SHA-1 passed by mistake. Unfortunately, this means passing the string literal EMPTY_TREE_SHA1_BIN requires an ugly cast. Tweak the definition of EMPTY_TREE_SHA1_BIN to produce a value of more convenient type. In the future the definition might change to extern const unsigned char empty_tree_sha1_bin[20]; #define EMPTY_TREE_SHA1_BIN empty_tree_sha1_bin Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07sha1_object_info: examine cached_object store tooLibravatar Nguyễn Thái Ngọc Duy1-0/+8
Cached object store was added in d66b37b (Add pretend_sha1_file() interface. - 2007-02-04) as a way to temporarily inject some objects to object store. But only read_sha1_file() knows about this store. While it will return an object from this store, sha1_object_info() will happily say "object not found". Teach sha1_object_info() about the cached store for consistency. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07sha1_file.c: move find_cached_object up so sha1_object_info can use itLibravatar Nguyễn Thái Ngọc Duy1-35/+35
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07Make hash-object more robust against malformed objectsLibravatar Nguyễn Thái Ngọc Duy1-7/+47
Commits, trees and tags have structure. Don't let users feed git with malformed ones. Sooner or later git will die() when encountering them. Note that this patch does not check semantics. A tree that points to non-existent objects is perfectly OK (and should be so, users may choose to add commit first, then its associated tree for example). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-20Correctly report corrupted objectsLibravatar Björn Steinbrink1-1/+1
The errno check added in commit 3ba7a06 "A loose object is not corrupt if it cannot be read due to EMFILE" only checked for whether errno is not ENOENT and thus incorrectly treated "no error" as an error condition. Because of that, it never reached the code path that would report that the object is corrupted and instead caused funny errors like: fatal: failed to read object 333c4768ce595793fdab1ef3a036413e2a883853: Success So we have to extend the check to cover the case in which the object file was successfully read, but its contents are corrupted. Reported-by: Will Palmer <wmpalmer@gmail.com> Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-03Merge branch 'jn/thinner-wrapper'Libravatar Junio C Hamano1-0/+26
* jn/thinner-wrapper: Remove pack file handling dependency from wrapper.o pack-objects: mark file-local variable static wrapper: give zlib wrappers their own translation unit strbuf: move strbuf_branchname to sha1_name.c path helpers: move git_mkstemp* to wrapper.c wrapper: move odb_* to environment.c wrapper: move xmmap() to sha1_file.c
2010-11-10Remove pack file handling dependency from wrapper.oLibravatar Jonathan Nieder1-0/+11
As v1.7.0-rc0~43 (slim down "git show-index", 2010-01-21) explains, use of xmalloc() brings in a dependency on zlib, the sha1 lib, and the rest of git's object file access machinery via try_to_free_pack_memory. That is overkill when xmalloc is just being used as a convenience wrapper to exit when no memory is available. So defer setting try_to_free_pack_memory as try_to_free_routine until the first packfile is opened in add_packed_git(). After this change, a simple program using xmalloc() and no other functions will not pull in any code from libgit.a aside from wrapper.o and usage.o. Improved-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-10wrapper: move xmmap() to sha1_file.cLibravatar Jonathan Nieder1-0/+15
wrapper.o depends on sha1_file.o for a number of reasons. One is release_pack_memory(). xmmap function calls mmap, discarding unused pack windows when necessary to relieve memory pressure. Simple git programs using wrapper.o as a friendly libc do not need this functionality. So move xmmap to sha1_file.o, where release_pack_memory() is. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03Work around EMFILE when there are too many pack filesLibravatar Shawn O. Pearce1-16/+27
When opening any files in the object database, release unused pack windows if the open(2) syscall fails due to EMFILE (too many open files in this process). This allows Git to degrade gracefully on a repository with thousands of pack files, and a commit stored in a loose object in the middle of the history. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03Use git_open_noatime when accessing pack dataLibravatar Shawn O. Pearce1-4/+6
This utility function avoids an unnecessary update of the access time for a loose object file. Just as the atime isn't useful on a loose object, its not useful on the pack or the corresonding idx file. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03A loose object is not corrupt if it cannot be read due to EMFILELibravatar Junio C Hamano1-1/+6
"git fsck" bails out with a claim that a loose object that cannot be read but exists on the filesystem to be corrupt, which is wrong when read_object() failed due to e.g. EMFILE. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03read_sha1_file(): report correct name of packfile with a corrupt objectLibravatar Junio C Hamano1-17/+24
Clarify the error reporting logic by moving the normal codepath (i.e. we read the object we wanted to read correctly) up and return early. The logic to report the name of the packfile with a corrupt object, introduced by e8b15e6 (sha1_file: Show the the type and path to corrupt objects, 2010-06-10), was totally bogus. The function that knows which bad object came from what packfile is has_packed_and_bad(); make it report which packfile the problem was found. "Corrupt" is already an adjective, e.g. an object is "corrupt"; we do not have to say "corrupted object". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-14sha1_file: Show the the type and path to corrupt objectsLibravatar Ævar Arnfjörð Bjarmason1-2/+11
Change the error message that's displayed when we encounter corrupt objects to be more specific. We now print the type (loose or packed) of corrupted objects, along with the full path to the file in question. Before: $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df fatal: object 909ef997367880aaf2133bafa1f1a71aa28e09df is corrupted After: $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df fatal: loose object 909ef997367880aaf2133bafa1f1a71aa28e09df (stored in .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df) is corrupted Knowing the path helps to quickly analyze what's wrong: $ file .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df: empty Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-06-13Merge branch 'jk/maint-sha1-file-name-fix'Libravatar Junio C Hamano1-13/+15
* jk/maint-sha1-file-name-fix: remove over-eager caching in sha1_file_name
2010-05-25remove over-eager caching in sha1_file_nameLibravatar Jeff King1-13/+15
This function takes a sha1 and produces a loose object filename. It caches the location of the object directory so that it can fill the sha1 information directly without allocating a new buffer (and in its original incarnation, without calling getenv(), though these days we cache that with the code in environment.c). This cached base directory can become stale, however, if in a single process git changes the location of the object directory (e.g., by running setup_work_tree, which will chdir to the new worktree). In most cases this isn't a problem, because we tend to set up the git repository location and do any chdir()s before actually looking up any objects, so the first lookup will cache the correct location. In the case of reset --hard, however, we do something like: 1. look up the commit object 2. notice we are doing --hard, run setup_work_tree 3. look up the tree object to reset Step (3) fails because our cache object directory value is bogus. This patch simply removes the caching. We use a static buffer instead of allocating one each time (the original version treated the malloc'd buffer as a static, so there is no change in calling semantics). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-21Merge branch 'sp/maint-dumb-http-pack-reidx'Libravatar Junio C Hamano1-4/+10
* sp/maint-dumb-http-pack-reidx: http.c::new_http_pack_request: do away with the temp variable filename http-fetch: Use temporary files for pack-*.idx until verified http-fetch: Use index-pack rather than verify-pack to check packs Allow parse_pack_index on temporary files Extract verify_pack_index for reuse from verify_pack Introduce close_pack_index to permit replacement http.c: Remove unnecessary strdup of sha1_to_hex result http.c: Don't store destination name in request structures http.c: Drop useless != NULL test in finish_http_pack_request http.c: Tiny refactoring of finish_http_pack_request t5550-http-fetch: Use subshell for repository operations http.c: Remove bad free of static block
2010-05-18Merge branch 'maint'Libravatar Junio C Hamano1-3/+4
* maint: Documentation/gitdiffcore: fix order in pickaxe description Documentation: fix minor inconsistency Documentation: rebase -i ignores options passed to "git am" hash_object: correction for zero length file
2010-05-18hash_object: correction for zero length fileLibravatar Dmitry Potapov1-3/+4
The check whether size is zero was done after if size <= SMALL_FILE_SIZE, as result, zero size case was never triggered. Instead zero length file was treated as any other small file. This did not caused any problem, but if we have a special case for size equal to zero, it is better to make it work and avoid redundant malloc(). Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19Allow parse_pack_index on temporary filesLibravatar Shawn O. Pearce1-2/+1
The easiest way to verify a pack index is to open it through the standard parse_pack_index function, permitting the header check to happen when the file is mapped. However, the dumb HTTP client needs to verify a pack index before its moved into its proper file name within the objects/pack directory, to prevent a corrupt index from being made available. So permit the caller to specify the exact path of the index file. For now we're still using the final destination name within the sole call site in http.c, but eventually we will start to parse the temporary path instead. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19Introduce close_pack_index to permit replacementLibravatar Shawn O. Pearce1-2/+9
By closing the pack index, a caller can later overwrite the index with an updated index file, possibly after converting from v1 to the v2 format. Because p->index_data is NULL after close, on the next access the index will be opened again and the other members will be updated with new data. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-01make commit_tree a library functionLibravatar Jeff King1-0/+10
Until now, this has been part of the commit-tree builtin. However, it is already used by other builtins (like commit, merge, and notes), and it would be useful to access it from library code. The check_valid helper has to come along, too, but is given a more library-ish name of "assert_sha1_type". Otherwise, the code is unchanged. There are still a few rough edges for a library function, like printing the utf8 warning to stderr, but we can address those if and when they come up as inappropriate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-01fix const-correctness of write_sha1_fileLibravatar Jeff King1-3/+3
These should take const buffers as input data, but zlib's next_in pointer is not const-correct. Let's fix it at the zlib level, though, so the cast happens in one obvious place. This should be safe, as a similar cast is used in zlib's example code for a const array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-08Merge branch 'mm/mkstemps-mode-for-packfiles' into maintLibravatar Junio C Hamano1-3/+3
* mm/mkstemps-mode-for-packfiles: Use git_mkstemp_mode instead of plain mkstemp to create object files git_mkstemps_mode: don't set errno to EINVAL on exit. Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later. git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument. Move gitmkstemps to path.c Add a testcase for ACL with restrictive umask.
2010-03-07Merge branch 'nd/root-git'Libravatar Junio C Hamano1-7/+0
* nd/root-git: Add test for using Git at root of file system Support working directory located at root Move offset_1st_component() to path.c init-db, rev-parse --git-dir: do not append redundant slash make_absolute_path(): Do not append redundant slash Conflicts: setup.c sha1_file.c
2010-03-07Merge branch 'mm/mkstemps-mode-for-packfiles'Libravatar Junio C Hamano1-3/+3
* mm/mkstemps-mode-for-packfiles: Use git_mkstemp_mode instead of plain mkstemp to create object files git_mkstemps_mode: don't set errno to EINVAL on exit. Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later. git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument. Move gitmkstemps to path.c Add a testcase for ACL with restrictive umask.
2010-03-04Merge branch 'dp/read-not-mmap-small-loose-object' into maintLibravatar Junio C Hamano1-0/+10
* dp/read-not-mmap-small-loose-object: hash-object: don't use mmap() for small files
2010-03-02Merge branch 'np/compress-loose-object-memsave'Libravatar Junio C Hamano1-14/+19
* np/compress-loose-object-memsave: sha1_file: be paranoid when creating loose objects sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-22Use git_mkstemp_mode instead of plain mkstemp to create object filesLibravatar Matthieu Moy1-3/+3
We used to unnecessarily give the read permission to group and others, regardless of the umask, which isn't serious because the objects are still protected by their containing directory, but isn't necessary either. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21sha1_file: be paranoid when creating loose objectsLibravatar Nicolas Pitre1-0/+9
We don't want the data being deflated and stored into loose objects to be different from what we expect. While the deflated data is protected by a CRC which is good enough for safe data retrieval operations, we still want to be doubly sure that the source data used at object creation time is still what we expected once that data has been deflated and its CRC32 computed. The most plausible data corruption may occur if the source file is modified while Git is deflating and writing it out in a loose object. Or Git itself could have a bug causing memory corruption. Or even bad RAM could cause trouble. So it is best to make sure everything is coherent and checksum protected from beginning to end. To do so we compute the SHA1 of the data being deflated _after_ the deflate operation has consumed that data, and make sure it matches with the expected SHA1. This way we can rely on the CRC32 checked by the inflate operation to provide a good indication that the data is still coherent with its SHA1 hash. One pathological case we ignore is when the data is modified before (or during) deflate call, but changed back before it is hashed. There is some overhead of course. Using 'git add' on a set of large files: Before: real 0m25.210s user 0m23.783s sys 0m1.408s After: real 0m26.537s user 0m25.175s sys 0m1.358s The overhead is around 5% for full data coherency guarantee. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21hash-object: don't use mmap() for small filesLibravatar Dmitry Potapov1-0/+10
Using read() instead of mmap() can be 39% speed up for 1Kb files and is 1% speed up 1Mb files. For larger files, it is better to use mmap(), because the difference between is not significant, and when there is not enough memory, mmap() performs much better, because it avoids swapping. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21sha1_file: don't malloc the whole compressed result when writing out objectsLibravatar Nicolas Pitre1-14/+10
There is no real advantage to malloc the whole output buffer and deflate the data in a single pass when writing loose objects. That is like only 1% faster while using more memory, especially with large files where memory usage is far more. It is best to deflate and write the data out in small chunks reusing the same memory instead. For example, using 'git add' on a few large files averaging 40 MB ... Before: 21.45user 1.10system 0:22.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+828040outputs (0major+142640minor)pagefaults 0swaps After: 21.50user 1.25system 0:22.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+828040outputs (0major+104408minor)pagefaults 0swaps While the runtime stayed relatively the same, the number of minor page faults went down significantly. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-16Move offset_1st_component() to path.cLibravatar Nguyễn Thái Ngọc Duy1-7/+0
The implementation is also lightly modified to use is_dir_sep() instead of hardcoding '/'. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>