summaryrefslogtreecommitdiff
path: root/sha1_file.c
AgeCommit message (Collapse)AuthorFilesLines
2008-06-14Simplify and rename find_sha1_file()Libravatar Linus Torvalds1-10/+8
Now that we've made the loose SHA1 file reading more careful and streamlined, we only use the old find_sha1_file() function for checking whether a loose object file exists at all. As such, the whole 'return stat information' part of it was just pointless (nobody cares any more), and the naming of the function is not really all that relevant either. So simplify it to not do a 'stat()', but just an existence check (which is what the callers want), and rename it to 'has_loose_object()' which matches the use. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-14Make loose object file reading more carefulLibravatar Linus Torvalds1-26/+44
We used to do 'stat()+open()+mmap()+close()' to read the loose object file data, which does work fine, but has a couple of problems: - it unnecessarily walks the filename twice (at 'stat()' time and then again to open it) - NFS generally has open-close consistency guarantees, which means that the initial 'stat()' was technically done outside of the normal consistency rules. So change it to do 'open()+fstat()+mmap()+close()' instead, which avoids both these issues. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-14Avoid cross-directory renames and linking on object creationLibravatar Linus Torvalds1-41/+42
Instead of creating new temporary objects in the top-level git object directory, create them in the same directory they will finally end up in anyway. This avoids making the final atomic "rename to stable name" operation be a cross-directory event, which makes it a lot easier for various filesystems. Several filesystems do things like change the inode number when moving files across directories (or refuse to do it entirely). In particular, it can also cause problems for NFS implementations that change the filehandle of a file when it moves to a different directory, like the old user-space NFS server did, and like the Linux knfsd still does if you don't export your filesystems with 'no_subtree_check' or if you export a filesystem that doesn't have stable inode numbers across renames). This change also obviously implies creating the object fan-out subdirectory at tempfile creation time, rather than at the final move_temp_to_file() time. Which actually accounts for most of the size of the patch. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-13sha1_file.c: dead code removalLibravatar Junio C Hamano1-142/+0
write_sha1_from_fd() and write_sha1_to_fd() were dead code nobody called, neither the latter's helper repack_object() was. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-10Consolidate SHA1 object file closeLibravatar Linus Torvalds1-6/+11
This consolidates the common operations for closing the new temporary file that we have written, before we move it into place with the final name. There's some common code there (make it read-only and check for errors on close), but more importantly, this also gives a single place to add an fsync_or_die() call if we want to add a safe mode. This was triggered due to Denis Bueno apparently twice being able to corrupt his git repository on OS X due to an unlucky combination of kernel crashes and a not-very-robust filesystem. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-28fix sha1_pack_index_name()Libravatar Junio C Hamano1-4/+5
An earlier commit 633f43e (Remove redundant code, eliminate one static variable, 2008-05-24) had a thinko (perhaps an eyeno) that broke sha1_pack_index_name() function. One symptom of this was that the http walker is now completely broken. This should fix it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-25Merge branch 'db/clone-in-c'Libravatar Junio C Hamano1-0/+12
* db/clone-in-c: Add test for cloning with "--reference" repo being a subset of source repo Add a test for another combination of --reference Test that --reference actually suppresses fetching referenced objects clone: fall back to copying if hardlinking fails builtin-clone.c: Need to closedir() in copy_or_link_directory() builtin-clone: fix initial checkout Build in clone Provide API access to init_db() Add a function to set a non-default work tree Allow for having for_each_ref() list extra refs Have a constant extern refspec for "--tags" Add a library function to add an alternate to the alternates file Add a lockfile function to append to a file Mark the list of refs to fetch as const Conflicts: cache.h t/t5700-clone-reference.sh
2008-05-24Remove redundant code, eliminate one static variableLibravatar Heikki Orsila1-27/+17
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-13add a force_object_loose() functionLibravatar Nicolas Pitre1-13/+47
This is meant to force the creation of a loose object even if it already exists packed. Needed for the next commit. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-04Add a library function to add an alternate to the alternates fileLibravatar Daniel Barkalow1-0/+12
This is in the core so that, if the alternates file has already been read, the addition can be parsed and put into effect for the current process. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-03Cleanup xread() loops to use read_in_full()Libravatar Heikki Orsila1-10/+4
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-09sha1-lookup: more memory efficient search in sorted list of SHA-1Libravatar Junio C Hamano1-3/+32
Currently, when looking for a packed object from the pack idx, a simple binary search is used. A conventional binary search loop looks like this: unsigned lo, hi; do { unsigned mi = (lo + hi) / 2; int cmp = "entry pointed at by mi" minus "target"; if (!cmp) return mi; "mi is the wanted one" if (cmp > 0) hi = mi; "mi is larger than target" else lo = mi+1; "mi is smaller than target" } while (lo < hi); "did not find what we wanted" The invariants are: - When entering the loop, 'lo' points at a slot that is never above the target (it could be at the target), 'hi' points at a slot that is guaranteed to be above the target (it can never be at the target). - We find a point 'mi' between 'lo' and 'hi' ('mi' could be the same as 'lo', but never can be as high as 'hi'), and check if 'mi' hits the target. There are three cases: - if it is a hit, we have found what we are looking for; - if it is strictly higher than the target, we set it to 'hi', and repeat the search. - if it is strictly lower than the target, we update 'lo' to one slot after it, because we allow 'lo' to be at the target and 'mi' is known to be below the target. If the loop exits, there is no matching entry. When choosing 'mi', we do not have to take the "middle" but anywhere in between 'lo' and 'hi', as long as lo <= mi < hi is satisfied. When we somehow know that the distance between the target and 'lo' is much shorter than the target and 'hi', we could pick 'mi' that is much closer to 'lo' than (hi+lo)/2, which a conventional binary search would pick. This patch takes advantage of the fact that the SHA-1 is a good hash function, and as long as there are enough entries in the table, we can expect uniform distribution. An entry that begins with for example "deadbeef..." is much likely to appear much later than in the midway of a reasonably populated table. In fact, it can be expected to be near 87% (222/256) from the top of the table. This is a work-in-progress and has switches to allow easier experiments and debugging. Exporting GIT_USE_LOOKUP environment variable enables this code. On my admittedly memory starved machine, with a partial KDE repository (3.0G pack with 95M idx): $ GIT_USE_LOOKUP=t git log -800 --stat HEAD >/dev/null 3.93user 0.16system 0:04.09elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+55588minor)pagefaults 0swaps Without the patch, the numbers are: $ git log -800 --stat HEAD >/dev/null 4.00user 0.15system 0:04.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+60258minor)pagefaults 0swaps In the same repository: $ GIT_USE_LOOKUP=t git log -2000 HEAD >/dev/null 0.12user 0.00system 0:00.12elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+4241minor)pagefaults 0swaps Without the patch, the numbers are: $ git log -2000 HEAD >/dev/null 0.05user 0.01system 0:00.07elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8506minor)pagefaults 0swaps There isn't much time difference, but the number of minor faults seems to show that we are touching much smaller number of pages, which is expected. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-01fix unimplemented packed_object_info_detail() featuresLibravatar Nicolas Pitre1-3/+7
Since commit eb32d236df0c16b936b04f0c5402addb61cdb311, there was a TODO comment in packed_object_info_detail() about the SHA1 of base object to OBJ_OFS_DELTA objects. So here it is at last. While at it, providing the actual storage size information as well is now trivial. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-20Merge branch 'jk/empty-tree'Libravatar Junio C Hamano1-0/+11
* jk/empty-tree: add--interactive: handle initial commit better hard-code the empty tree object
2008-02-18Merge branch 'mk/maint-parse-careful'Libravatar Junio C Hamano1-1/+2
* mk/maint-parse-careful: peel_onion: handle NULL check return value from parse_commit() in various functions parse_commit: don't fail, if object is NULL revision.c: handle tag->tagged == NULL reachable.c::process_tree/blob: check for NULL process_tag: handle tag->tagged == NULL check results of parse_commit in merge_bases list-objects.c::process_tree/blob: check for NULL reachable.c::add_one_tree: handle NULL from lookup_tree mark_blob/tree_uninteresting: check for NULL get_sha1_oneline: check return value of parse_object read_object_with_reference: don't read beyond the buffer
2008-02-18read_object_with_reference: don't read beyond the bufferLibravatar Martin Koegler1-1/+2
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-13hard-code the empty tree objectLibravatar Jeff King1-0/+11
Now any commands may reference the empty tree object by its sha1 (4b825dc642cb6eb9a060e54bf8d69288fbee4904). This is useful for showing some diffs, especially for initial commits. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-06safecrlf: Add mechanism to warn about irreversible crlf conversionsLibravatar Steffen Prohaska1-1/+2
CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de>
2008-01-17Fix random fast-import errors when compiled with NO_MMAPLibravatar Shawn O. Pearce1-0/+16
fast-import was relying on the fact that on most systems mmap() and write() are synchronized by the filesystem's buffer cache. We were relying on the ability to mmap() 20 bytes beyond the current end of the file, then later fill in those bytes with a future write() call, then read them through the previously obtained mmap() address. This isn't always true with some implementations of NFS, but it is especially not true with our NO_MMAP=YesPlease build time option used on some platforms. If fast-import was built with NO_MMAP=YesPlease we used the malloc()+pread() emulation and the subsequent write() call does not update the trailing 20 bytes of a previously obtained "mmap()" (aka malloc'd) address. Under NO_MMAP that behavior causes unpack_entry() in sha1_file.c to be unable to read an object header (or data) that has been unlucky enough to be written to the packfile at a location such that it is in the trailing 20 bytes of a window previously opened on that same packfile. This bug has gone unnoticed for a very long time as it is highly data dependent. Not only does the object have to be placed at the right position, but it also needs to be positioned behind some other object that has been accessed due to a branch cache invalidation. In other words the stars had to align just right, and if you did run into this bug you probably should also have purchased a lottery ticket. Fortunately the workaround is a lot easier than the bug explanation. Before we allow unpack_entry() to read data from a pack window that has also (possibly) been modified through write() we force all existing windows on that packfile to be closed. By closing the windows we ensure that any new access via the emulated mmap() will reread the packfile, updating to the current file content. This comes at a slight performance degredation as we cannot reuse previously cached windows when we update the packfile. But it is a fairly minor difference as the window closes happen at only two points: - When the packfile is finalized and its .idx is generated: At this stage we are getting ready to update the refs and any data access into the packfile is going to be random, and is going after only the branch tips (to ensure they are valid). Our existing windows (if any) are not likely to be positioned at useful locations to access those final tip commits so we probably were closing them before anyway. - When the branch cache missed and we need to reload: At this point fast-import is getting change commands for the next commit and it needs to go re-read a tree object it previously had written out to the packfile. What windows we had (if any) are not likely to cover the tree in question so we probably were closing them before anyway. We do try to avoid unnecessarily closing windows in the second case by checking to see if the packfile size has increased since the last time we called unpack_entry() on that packfile. If the size has not changed then we have not written additional data, and any existing window is still vaild. This nicely handles the cases where fast-import is going through a branch cache reload and needs to read many trees at once. During such an event we are not likely to be updating the packfile so we do not cycle the windows between reads. With this change in place t9301-fast-export.sh (which was broken by c3b0dec509fe136c5417422f31898b5a4e2d5e02) finally works again. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-03Fix grammar nits in documentation and in code comments.Libravatar Jim Meyering1-1/+1
Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-28sha1_file.c: Fix size_t related printf format warningsLibravatar Steffen Prohaska1-4/+6
The old way of fixing warnings did not succeed on MinGW. MinGW does not support C99 printf format strings for size_t [1]. But gcc on MinGW issues warnings if C99 printf format is not used. Hence, the old stragegy to avoid warnings fails. [1] http://www.mingw.org/MinGWiki/index.php/C99 This commits passes arguments of type size_t through a tiny helper functions that casts to the type expected by the format string. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-14Use is_absolute_path() in sha1_file.c.Libravatar Johannes Sixt1-4/+4
There are some places that test for an absolute path. Use the helper function to ease porting. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-29Merge branch 'maint'Libravatar Junio C Hamano1-7/+9
* maint: RelNotes-1.5.3.5: describe recent fixes merge-recursive.c: mrtree in merge() is not used before set sha1_file.c: avoid gcc signed overflow warnings Fix a small memory leak in builtin-add honor the http.sslVerify option in shell scripts
2007-10-29sha1_file.c: avoid gcc signed overflow warningsLibravatar Junio C Hamano1-7/+9
With the recent gcc, we get: sha1_file.c: In check_packed_git_: sha1_file.c:527: warning: assuming signed overflow does not occur when assuming that (X + c) < X is always false sha1_file.c:527: warning: assuming signed overflow does not occur when assuming that (X + c) < X is always false for a piece of code that tries to make sure that off_t is large enough to hold more than 2^32 offset. The test tried to make sure these do not wrap-around: /* make sure we can deal with large pack offsets */ off_t x = 0x7fffffffUL, y = 0xffffffffUL; if (x > (x + 1) || y > (y + 1)) { but gcc assumes it can do whatever optimization it wants for a signed overflow (undefined behaviour) and warns about this construct. Follow Linus's suggestion to check sizeof(off_t) instead to work around the problem. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-03Merge branch 'ph/strbuf'Libravatar Junio C Hamano1-65/+15
* ph/strbuf: (44 commits) Make read_patch_file work on a strbuf. strbuf_read_file enhancement, and use it. strbuf change: be sure ->buf is never ever NULL. double free in builtin-update-index.c Clean up stripspace a bit, use strbuf even more. Add strbuf_read_file(). rerere: Fix use of an empty strbuf.buf Small cache_tree_write refactor. Make builtin-rerere use of strbuf nicer and more efficient. Add strbuf_cmp. strbuf_setlen(): do not barf on setting length of an empty buffer to 0 sq_quote_argv and add_to_string rework with strbuf's. Full rework of quote_c_style and write_name_quoted. Rework unquote_c_style to work on a strbuf. strbuf API additions and enhancements. nfv?asprintf are broken without va_copy, workaround them. Fix the expansion pattern of the pseudo-static path buffer. builtin-for-each-ref.c::copy_name() - do not overstep the buffer. builtin-apply.c: fix a tiny leak introduced during xmemdupz() conversion. Use xmemdupz() in many places. ...
2007-09-29strbuf change: be sure ->buf is never ever NULL.Libravatar Pierre Habouzit1-2/+1
For that purpose, the ->buf is always initialized with a char * buf living in the strbuf module. It is made a char * so that we can sloppily accept things that perform: sb->buf[0] = '\0', and because you can't pass "" as an initializer for ->buf without making gcc unhappy for very good reasons. strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf anymore. as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying ->buf isn't an option anymore, if ->buf is going to escape from the scope, and eventually be free'd. API changes: * strbuf_setlen now always works, so just make strbuf_reset a convenience macro. * strbuf_detatch takes a size_t* optional argument (meaning it can be NULL) to copy the buffer's len, as it was needed for this refactor to make the code more readable, and working like the callers. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-18Use xmemdupz() in many places.Libravatar Pierre Habouzit1-9/+3
Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-17Export matches_pack_name() and fix its return valueLibravatar Junio C Hamano1-7/+7
The function sounds boolean; make it behave as one, not "0 for success, non-zero for failure". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16Now that cache.h needs strbuf.h, remove useless includes.Libravatar Pierre Habouzit1-1/+0
Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16Rewrite convert_to_{git,working_tree} to use strbuf's.Libravatar Pierre Habouzit1-5/+5
* Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char *src, size_t len, struct strbuf *sb) { int ret = 0; ret |= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret |= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret | filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-10Replace all read_fd use with strbuf_read, and get rid of it.Libravatar Pierre Habouzit1-51/+9
This brings builtin-stripspace, builtin-tag and mktag to use strbufs. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-25Don't segfault if we failed to inflate a packed deltaLibravatar Shawn O. Pearce1-0/+4
Under some types of packfile corruption the zlib stream holding the data for a delta within a packfile may fail to inflate, due to say a CRC failure within the compressed data itself. When this occurs the unpack_compressed_entry function will return NULL as a signal to the caller that the data is not available. Unfortunately we then tried to use that NULL as though it referenced a memory location where a delta was stored and tried to apply it to the delta base. Loading a byte from the NULL address typically causes a SIGSEGV. cate on #git noticed this failure in `git fsck --full` where the call to verify_pack() first noticed that the packfile was corrupt by finding that the packfile's SHA-1 did not match the raw data of the file. After finding this fsck went ahead and tried to verify every object within the packfile, even though the packfile was already known to be bad. If we are going to shovel bad data at the delta unpacking code, we better handle it correctly. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-14Avoid ambiguous error message if pack.idx header is wrongLibravatar Luiz Fernando N. Capitulino1-2/+2
Print the index version when an error occurs so the user knows what type of header (and size) we thought the index should have had. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18Rename read_pipe() with read_fd() and make its buffer nul-terminated.Libravatar Carlos Rica1-6/+15
The new name is closer to the purpose of the function. A NUL-terminated buffer makes things easier when callers need that. Since the function returns only the memory written with data, almost always allocating more space than needed because final size is unknown, an extra NUL terminating the buffer is harmless. It is not included in the returned size, so the function remains working as before. Also, now the function allows the buffer passed to be NULL at first, and alloc_nr is now used for growing the buffer, instead size=*2. Signed-off-by: Carlos Rica <jasampler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-03Merge branch 'maint'Libravatar Junio C Hamano1-3/+13
* maint: Document -<n> for git-format-patch glossary: add 'reflog' diff --no-index: fix --name-status with added files Don't smash stack when $GIT_ALTERNATE_OBJECT_DIRECTORIES is too long
2007-07-03Don't smash stack when $GIT_ALTERNATE_OBJECT_DIRECTORIES is too longLibravatar Jim Meyering1-3/+13
There is no restriction on the length of the name returned by get_object_directory, other than the fact that it must be a stat'able git object directory. That means its name may have length up to PATH_MAX-1 (i.e., often 4095) not counting the trailing NUL. Combine that with the assumption that the concatenation of that name and suffixes like "/info/alternates" and "/pack/---long-name---.idx" will fit in a buffer of length PATH_MAX, and you see the problem. Here's a fix: sha1_file.c (prepare_packed_git_one): Lengthen "path" buffer so we are guaranteed to be able to append "/pack/" without checking. Skip any directory entry that is too long to be appended. (read_info_alternates): Protect against a similar buffer overrun. Before this change, using the following admittedly contrived environment setting would cause many git commands to clobber their stack and segfault on a system with PATH_MAX == 4096: t=$(perl -e '$s=".git/objects";$n=(4096-6-length($s))/2;print "./"x$n . $s') export GIT_ALTERNATE_OBJECT_DIRECTORIES=$t touch g ./git-update-index --add g If you run the above commands, you'll soon notice that many git commands now segfault, so you'll want to do this: unset GIT_ALTERNATE_OBJECT_DIRECTORIES Signed-off-by: Jim Meyering <jim@meyering.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-26Merge branch 'maint'Libravatar Junio C Hamano1-1/+4
* maint: config: Change output of --get-regexp for valueless keys config: Complete documentation of --get-regexp cleanup merge-base test script Fix zero-object version-2 packs Ignore submodule commits when fetching over dumb protocols
2007-06-26Fix zero-object version-2 packsLibravatar Linus Torvalds1-1/+4
A pack-file can get created without any objects in it (to transfer "no data" - which can happen if you use a reference git repo, for example, or just otherwise just end up transferring only branch head information and already have all the objects themselves). And while we probably should never create an index for such a pack, if we do (and we do), the index file size sanity checking was incorrect. This fixes it. Reported-and-tested-by: Jocke Tjernlund <tjernlund@tjernlund.se> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-13More staticLibravatar Junio C Hamano1-1/+1
There still are quite a few symbols that ought to be static. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-13-Wold-style-definition fixLibravatar Junio C Hamano1-1/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07War on whitespaceLibravatar Junio C Hamano1-4/+4
This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-02Merge branch 'sp/pack'Libravatar Junio C Hamano1-5/+38
* sp/pack: Style nit - don't put space after function names Ensure the pack index is opened before access Simplify index access condition in count-objects, pack-redundant Test for recent rev-parse $abbrev_sha1 regression rev-parse: Identify short sha1 sums correctly. Attempt to delay prepare_alt_odb during get_sha1 Micro-optimize prepare_alt_odb Lazily open pack index files on demand
2007-05-31Merge branch 'maint'Libravatar Junio C Hamano1-1/+1
* maint: git-config: Improve documentation of git-config file handling git-config: Various small fixes to asciidoc documentation decode_85(): fix missing return. fix signed range problems with hex conversions
2007-05-31Merge branch 'maint-1.5.1' into maintLibravatar Junio C Hamano1-1/+1
* maint-1.5.1: git-config: Improve documentation of git-config file handling git-config: Various small fixes to asciidoc documentation decode_85(): fix missing return. fix signed range problems with hex conversions
2007-05-30always start looking up objects in the last used pack firstLibravatar Nicolas Pitre1-4/+18
Jon Smirl said: | Once an object reference hits a pack file it is very likely that | following references will hit the same pack file. So first place to | look for an object is the same place the previous object was found. This is indeed a good heuristic so here it is. The search always start with the pack where the last object lookup succeeded. If the wanted object is not available there then the search continues with the normal pack ordering. To test this I split the Linux repository into 66 packs and performed a "time git-rev-list --objects --all > /dev/null". Best results are as follows: Pack Sort w/o this patch w/ this patch ------------------------------------------------------------- recent objects last 26.4s 20.9s recent objects first 24.9s 18.4s This shows that the pack order based on object age has some influence, but that the last-used-pack heuristic is even more significant in reducing object lookup. Signed-off-by: Nicolas Pitre <nico@cam.org> --- Note: the --max-pack-size to git-repack currently produces packs with old objects after those containing recent objects. The pack sort based on filesystem timestamp is therefore backward for those. This needs to be fixed of course, but at least it made me think about this variable for the test. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-30fix signed range problems with hex conversionsLibravatar Linus Torvalds1-1/+1
Make hexval_table[] "const". Also make sure that the accessor function hexval() does not access the table with out-of-range values by declaring its parameter "unsigned char", instead of "unsigned int". With this, gcc can just generate: movzbl (%rdi), %eax movsbl hexval_table(%rax),%edx movzbl 1(%rdi), %eax movsbl hexval_table(%rax),%eax sall $4, %edx orl %eax, %edx for the code to generate a byte from two hex characters. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-29Style nit - don't put space after function namesLibravatar Shawn O. Pearce1-1/+1
Our style is to not put a space after a function name. I did here, and Junio applied the patch with the incorrect formatting. So I'm cleaning up after myself since I noticed it upon review. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-26Micro-optimize prepare_alt_odbLibravatar Shawn O. Pearce1-2/+3
Calling getenv() is not that expensive, but its also not free, and its certainly not cheaper than testing to see if alt_odb_tail is not null. Because we are calling prepare_alt_odb() from within find_sha1_file every time we cannot find an object file locally we want to skip out of prepare_alt_odb() as early as possible once we have initialized our alternate list. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-26Lazily open pack index files on demandLibravatar Shawn O. Pearce1-3/+35
In some repository configurations the user may have many packfiles, but all of the recent commits/trees/tags/blobs are likely to be in the most recent packfile (the one with the newest mtime). It is therefore common to be able to complete an entire operation by accessing only one packfile, even if there are 25 packfiles available to the repository. Rather than opening and mmaping the corresponding .idx file for every pack found, we now only open and map the .idx when we suspect there might be an object of interest in there. Of course we cannot known in advance which packfile contains an object, so we still need to scan the entire packed_git list to locate anything. But odds are users want to access objects in the most recently created packfiles first, and that may be all they ever need for the current operation. Junio observed in b867092f that placing recent packfiles before older ones can slightly improve access times for recent objects, without degrading it for historical object access. This change improves upon Junio's observations by trying even harder to avoid the .idx files that we won't need. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-20Merge branch 'np/pack'Libravatar Junio C Hamano1-36/+11
* np/pack: deprecate the new loose object header format make "repack -f" imply "pack-objects --no-reuse-object" allow for undeltified objects not to be reused