summaryrefslogtreecommitdiff
path: root/read-cache.c
AgeCommit message (Collapse)AuthorFilesLines
2013-06-20Merge branch 'rs/discard-index-discard-array'Libravatar Junio C Hamano1-2/+3
* rs/discard-index-discard-array: read-cache: free cache in discard_index read-cache: add simple performance test
2013-06-11Merge branch 'fc/do-not-use-the-index-in-add-to-index'Libravatar Junio C Hamano1-3/+3
* fc/do-not-use-the-index-in-add-to-index: read-cache: trivial style cleanups read-cache: fix wrong 'the_index' usage
2013-06-09read-cache: free cache in discard_indexLibravatar René Scharfe1-2/+3
discard_cache doesn't have to free the array of cache entries, because the next call of read_cache can simply reuse it, as they all operate on the global variable the_index. discard_index on the other hand does have to free it, because it can be used e.g. with index_state variables on the stack, in which case a missing free would cause an unrecoverable leak. This patch releases the memory and removes a comment that was relevant for discard_cache but has become outdated. Since discard_cache is just a wrapper around discard_index nowadays, we lose the optimization that avoids reallocation of that array within loops of read_cache and discard_cache. That doesn't cause a performance regression for me, however (HEAD = this patch, HEAD^ = master + p0002): Test // HEAD^ HEAD ---------------\\----------------------------------------------------- 0002.1: read_ca// 1000 times 0.62(0.58+0.04) 0.61(0.58+0.02) -1.6% Suggested-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03read-cache: trivial style cleanupsLibravatar Felipe Contreras1-2/+2
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03read-cache: fix wrong 'the_index' usageLibravatar Felipe Contreras1-1/+1
We are dealing with the 'istate' index, not 'the_index'. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02read-cache: mark cache_entry pointers constLibravatar René Scharfe1-8/+10
ie_match_stat and ie_modified only derefence their struct cache_entry pointers for reading. Add const to the parameter declaration here and do the same for the static helper function used by them, as it's the same there as well. This allows callers to pass in const pointers. Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-21Merge branch 'lf/read-blob-data-from-index'Libravatar Junio C Hamano1-0/+34
Reduce duplicated code between convert.c and attr.c. * lf/read-blob-data-from-index: convert.c: remove duplicate code read_blob_data_from_index(): optionally return the size of blob data attr.c: extract read_index_data() as read_blob_data_from_index()
2013-04-17read_blob_data_from_index(): optionally return the size of blob dataLibravatar Lukas Fleischer1-1/+3
This allows for optionally getting the size of the returned data and will be used in a follow-up patch. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17attr.c: extract read_index_data() as read_blob_data_from_index()Libravatar Lukas Fleischer1-0/+32
Extract the read_index_data() function from attr.c and move it to read-cache.c; rename it to read_blob_data_from_index() and update the function signature of it to align better with index/cache API functions. This allows for reusing the function in convert.c later. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-03Merge branch 'kb/name-hash' into maint-1.8.1Libravatar Junio C Hamano1-5/+4
* kb/name-hash: name-hash.c: fix endless loop with core.ignorecase=true
2013-04-01Merge branch 'kb/name-hash'Libravatar Junio C Hamano1-5/+4
The code to keep track of what directory names are known to Git on platforms with case insensitive filesystems can get confused upon a hash collision between these pathnames and looped forever. * kb/name-hash: name-hash.c: fix endless loop with core.ignorecase=true
2013-03-19Merge branch 'nd/doc-index-format'Libravatar Junio C Hamano1-1/+1
Update the index format documentation to mention the v4 format. * nd/doc-index-format: update-index: list supported idx versions and their features read-cache.c: use INDEX_FORMAT_{LB,UB} in verify_hdr() index-format.txt: mention of v4 is missing in some places
2013-02-27name-hash.c: fix endless loop with core.ignorecase=trueLibravatar Karsten Blees1-5/+4
With core.ignorecase=true, name-hash.c builds a case insensitive index of all tracked directories. Currently, the existing cache entry structures are added multiple times to the same hashtable (with different name lengths and hash codes). However, there's only one dir_next pointer, which gets completely messed up in case of hash collisions. In the worst case, this causes an endless loop if ce == ce->dir_next (see t7062). Use a separate hashtable and separate structures for the directory index so that each directory entry has its own next pointer. Use reference counting to track which directory entry contains files. There are only slight changes to the name-hash.c API: - new free_name_hash() used by read_cache.c::discard_index() - remove_name_hash() takes an additional index_state parameter - index_name_exists() for a directory (trailing '/') may return a cache entry that has been removed (CE_UNHASHED). This is not a problem as the return value is only used to check if the directory exists (dir.c) or to normalize casing of directory names (read-cache.c). Getting rid of cache_entry.dir_next reduces memory consumption, especially with core.ignorecase=false (which doesn't use that member at all). With core.ignorecase=true, building the directory index is slightly faster as we add / check the parent directory first (instead of going through all directory levels for each file in the index). E.g. with WebKit (~200k files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms to 130ms. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-22read-cache.c: use INDEX_FORMAT_{LB,UB} in verify_hdr()Libravatar Nguyễn Thái Ngọc Duy1-1/+1
9d22778 (read-cache.c: write prefix-compressed names in the index - 2012-04-04) defined these. Interestingly, they were not used by read-cache.c, or anywhere in that patch. They were used in builtin/update-index.c later for checking supported index versions. Use them here too. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-22Enable minimal stat checkingLibravatar Robin Rosenberg1-10/+14
Specifically the fields uid, gid, ctime, ino and dev are set to zero by JGit. Other implementations, eg. Git in cygwin are allegedly also somewhat incompatible with Git For Windows and on *nix platforms the resolution of the timestamps may differ. Any stat checking by git will then need to check content, which may be very slow, particularly on Windows. Since mtime and size is typically enough we should allow the user to tell git to avoid checking these fields if they are set to zero in the index. This change introduces a core.checkstat config option where the the user can select to check all fields (default), or just size and the whole second part of mtime (minimal). Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-15read-cache.c: mark a private file-scope symbol as staticLibravatar Junio C Hamano1-1/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27Merge branch 'jk/maint-null-in-trees'Libravatar Junio C Hamano1-0/+2
We do not want a link to 0{40} object stored anywhere in our objects. * jk/maint-null-in-trees: fsck: detect null sha1 in tree entries do not write null sha1s to on-disk index diff: do not use null sha1 as a sentinel value
2012-08-22Merge branch 'nd/index-errno'Libravatar Junio C Hamano1-5/+1
Assignments to errno before calling system functions that used to matter in the old code were left behind after the code structure changed sufficiently to make them useless. * nd/index-errno: read_index_from: remove bogus errno assignments
2012-08-06read_index_from: remove bogus errno assignmentsLibravatar Nguyễn Thái Ngọc Duy1-5/+1
These assignments comes from the very first commit e83c516 (Initial revision of "git", the information manager from hell - 2005-04-07). Back then we did not die() when errors happened so correct errno was required. Since 5d1a5c0 ([PATCH] Better error reporting for "git status" - 2005-10-01), read_index_from() learned to die rather than just return -1 and these assignments became irrelevant. Remove them. While at it, move die_errno() next to xmmap() call because it's the mmap's error code that we care about. Otherwise if close(fd); fails, it could overwrite mmap's errno. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-29do not write null sha1s to on-disk indexLibravatar Jeff King1-0/+2
We should never need to write the null sha1 into an index entry (short of the 1 in 2^160 chance that somebody actually has content that hashes to it). If we attempt to do so, it is much more likely that it is a bug, since we use the null sha1 as a sentinel value to mean "not valid". The presence of null sha1s in the index (which can come from, among other things, "update-index --cacheinfo", or by reading a corrupted tree) can cause problems for later readers, because they cannot distinguish the literal null sha1 from its use a sentinel value. For example, "git diff-files" on such an entry would make it appear as if it is stat-dirty, and until recently, the diff code assumed such an entry meant that we should be diffing a working tree file rather than a blob. Ideally, we would stop such entries from entering even our in-core index. However, we do sometimes legitimately add entries with null sha1s in order to represent these sentinel situations; simply forbidding them in add_index_entry breaks a lot of the existing code. However, we can at least make sure that our in-core sentinel representation never makes it to disk. To be thorough, we will test an attempt to add both a blob and a submodule entry. In the former case, we might run into problems anyway because we will be missing the blob object. But in the latter case, we do not enforce connectivity across gitlink entries, making this our only point of enforcement. The current implementation does not care which type of entry we are seeing, but testing both cases helps future-proof the test suite in case that changes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-23Merge branch 'tg/ce-namelen-field'Libravatar Junio C Hamano1-28/+39
Split lower bits of ce_flags field and creates a new ce_namelen field in the in-core index structure. * tg/ce-namelen-field: Strip namelen out of ce_flags into a ce_namelen field
2012-07-15Merge branch 'tg/maint-cache-name-compare'Libravatar Junio C Hamano1-4/+9
Even though the index can record pathnames longer than 1<<12 bytes, in some places we were not comparing them in full, potentially replacing index entries instead of adding. * tg/maint-cache-name-compare: cache_name_compare(): do not truncate while comparing paths
2012-07-11Strip namelen out of ce_flags into a ce_namelen fieldLibravatar Thomas Gummerer1-28/+39
Strip the name length from the ce_flags field and move it into its own ce_namelen field in struct cache_entry. This will both give us a tiny bit of a performance enhancement when working with long pathnames and is a refactoring for more readability of the code. It enhances readability, by making it more clear what is a flag, and where the length is stored and make it clear which functions use stages in comparisions and which only use the length. It also makes CE_NAMEMASK private, so that users don't mistakenly write the name length in the flags. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-11Merge branch 'tg/maint-cache-name-compare' into tg/ce-namelen-fieldLibravatar Junio C Hamano1-4/+9
* tg/maint-cache-name-compare: cache_name_compare(): do not truncate while comparing paths
2012-07-11cache_name_compare(): do not truncate while comparing pathsLibravatar Junio C Hamano1-4/+9
We failed to use ce_namelen() equivalent and instead only compared up to the CE_NAMEMASK bytes by mistake. Adding an overlong path that shares the same common prefix as an existing entry in the index did not add a new entry, but instead replaced the existing one, as the result. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08Replace strlen() with ce_namelen()Libravatar Thomas Gummerer1-2/+2
Replace strlen(ce->name) with ce_namelen() in a couple of places which gives us some additional bits of performance. Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-02Merge branch 'jc/index-v4'Libravatar Junio C Hamano1-42/+223
Trivially shrinks the on-disk size of the index file to save both I/O and checksum overhead. The topic should give a solid base to build on further updates, with the code refactoring in its earlier parts, and the backward compatibility mechanism in its later parts. * jc/index-v4: index-v4: document the entry format unpack-trees: preserve the index file version of original update-index: upgrade/downgrade on-disk index version read-cache.c: write prefix-compressed names in the index read-cache.c: read prefix-compressed names in index on-disk version v4 read-cache.c: move code to copy incore to ondisk cache to a helper function read-cache.c: move code to copy ondisk to incore cache to a helper function read-cache.c: report the header version we do not understand read-cache.c: make create_from_disk() report number of bytes it consumed read-cache.c: allow unaligned mapping of the index file cache.h: hide on-disk index details varint: make it available outside the context of pack
2012-04-04read-cache.c: write prefix-compressed names in the indexLibravatar Junio C Hamano1-10/+54
Teach the code to write the index in the v4 on-disk format. Record the format version of the on-disk index we read from in the index_state, and use the format when writing the new index out. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03read-cache.c: read prefix-compressed names in index on-disk version v4Libravatar Junio C Hamano1-7/+51
Because the entries are sorted by path, adjacent entries in the index tend to share the leading components of them, and it makes sense to only store the differences in later entries. In the v4 on-disk format of the index, each on-disk cache entry stores the number of bytes to be stripped from the end of the previous name, and the bytes to append to the result, to come up with its name. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03read-cache.c: move code to copy incore to ondisk cache to a helper functionLibravatar Junio C Hamano1-9/+17
This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03read-cache.c: move code to copy ondisk to incore cache to a helper functionLibravatar Junio C Hamano1-19/+25
This makes the change in a later patch look less scary. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03read-cache.c: report the header version we do not understandLibravatar Junio C Hamano1-2/+4
Instead of just saying "bad index version", report the value we read from the disk. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03read-cache.c: make create_from_disk() report number of bytes it consumedLibravatar Junio C Hamano1-3/+6
The function is the one that is reading from the data stream. It only is natural to make it responsible for reporting this number, not the caller. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03read-cache.c: allow unaligned mapping of the index fileLibravatar Junio C Hamano1-12/+32
Both the on-disk format v2 and v3 pads the "name" field to the multiple of eight to make sure that various quantities in network long/short type can be accessed with ntohl/ntohs without having to worry about alignment, but this forces us to waste disk I/O bandwidth. Introduce ntoh_s()/ntoh_l() macros that the callers can use as if they were the regular ntohs()/ntohl() on a field that may not be aligned correctly. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03cache.h: hide on-disk index detailsLibravatar Junio C Hamano1-0/+54
The on-disk format of the index file is a detail whose implementation is neatly encapsulated in read-cache.c; there is no need to expose it to the general public that include the cache.h header file. Also add a prominent mark to read-cache.c to delineate the parts that deal with the index file I/O routines from the remainder of the file. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-23make is_empty_blob_sha1 available everywhereLibravatar Jeff King1-10/+0
The read-cache implementation defines this static function, but it is a generally useful concept in git. Let's give the empty blob the same treatment as the empty tree, providing both hex and binary forms of the sha1. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-17refresh_index: do not show unmerged path that is outside pathspecLibravatar Junio C Hamano1-2/+9
When running "git add --refresh <pathspec>", we incorrectly showed the path that is unmerged even if it is outside the specified pathspec, even though we did honor pathspec and refreshed only the paths that matched. Note that this cange does not affect "git update-index --refresh"; for hysterical raisins, it does not take a pathspec (it takes real paths) and more importantly itss command line options are parsed and executed one by one as they are encountered, so "git update-index --refresh foo" means "first refresh the index, and then update the entry 'foo' by hashing the contents in file 'foo'", not "refresh only entry 'foo'". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-09Merge branch 'rs/allocate-cache-entry-individually'Libravatar Junio C Hamano1-51/+32
* rs/allocate-cache-entry-individually: cache.h: put single NUL at end of struct cache_entry read-cache.c: allocate index entries individually Conflicts: read-cache.c
2011-11-18refresh_index: make porcelain output more specificLibravatar Jeff King1-2/+21
If you have a deleted file and a porcelain refreshes the cache, we print: Unstaged changes after reset: M file This is technically correct, in that the file is modified, but it's friendlier to the user if we further differentiate the case of a deleted file (especially because this output looks a lot like "diff --name-status", which would also make the distinction). Similarly, we can distinguish typechanges ("T") and intent-to-add files ("A"), both of which appear as just "M" in the current output. The plumbing output for all cases remains "needs update" for historical compatibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18refresh_index: rename format variablesLibravatar Jeff King1-6/+6
When refreshing the index, for modified (or unmerged) files we will print "needs update" (or "needs merge") for plumbing, or line similar to the output from "diff --name-status" for porcelain. The variables holding which type of message to show are named after the plumbing messages. However, as we begin to differentiate more cases at the porcelain level (with the plumbing message staying the same), that naming scheme will become awkward. Instead, name the variables after which case we found (modified or unmerged), not what we will output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18read-cache: let refresh_cache_ent pass up changed flagsLibravatar Jeff King1-3/+6
This will enable refresh_cache to differentiate more cases of modification (such as typechange) when telling the user what isn't fresh. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-26read-cache.c: allocate index entries individuallyLibravatar René Scharfe1-50/+31
The code to estimate the in-memory size of the index based on its on-disk representation is subtly wrong for certain architecture-dependent struct layouts. Instead of fixing it, replace the code to keep the index entries in a single large block of memory and allocate each entry separately instead. This is both simpler and more flexible, as individual entries can now be freed. Actually using that added flexibility is left for a later patch. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-26read-cache.c: fix index memory allocationLibravatar René Scharfe1-3/+3
estimate_cache_size() tries to guess how much memory is needed for the in-memory representation of an index file. It does that by using the file size, the number of entries and the difference of the sizes of the on-disk and in-memory structs -- without having to check the length of the name of each entry, which varies for each entry, but their sums are the same no matter the representation. Except there can be a difference. First of all, the size is really calculated by ce_size and ondisk_ce_size based on offsetof(..., name), not sizeof, which can be different. And entries are padded with 1 to 8 NULs at the end (after the variable name) to make their total length a multiple of eight. So in order to allocate enough memory to hold the index, change the delta calculation to be based on offsetof(..., name) and round up to the next multiple of eight. On a 32-bit Linux, this delta was used before: sizeof(struct cache_entry) == 72 sizeof(struct ondisk_cache_entry) == 64 --- 8 The actual difference for an entry with a filename length of one was, however (find the definitions are in cache.h): offsetof(struct cache_entry, name) == 72 offsetof(struct ondisk_cache_entry, name) == 62 ce_size == (72 + 1 + 8) & ~7 == 80 ondisk_ce_size == (62 + 1 + 8) & ~7 == 64 --- 16 So eight bytes less had been allocated for such entries. The new formula yields the correct delta: (72 - 62 + 7) & ~7 == 16 Reported-by: John Hsing <tsyj2007@gmail.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-25Merge branch 'maint'Libravatar Junio C Hamano1-1/+1
* maint: whitespace: have SP on both sides of an assignment "=" update-ref: whitespace fix
2011-08-25whitespace: have SP on both sides of an assignment "="Libravatar Junio C Hamano1-1/+1
I've deliberately excluded the borrowed code in compat/nedmalloc directory. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-29Merge branch 'ef/maint-win-verify-path'Libravatar Junio C Hamano1-14/+11
* ef/maint-win-verify-path: verify_dotfile(): do not assume '/' is the path seperator verify_path(): simplify check at the directory boundary verify_path: consider dos drive prefix real_path: do not assume '/' is the path seperator A Windows path starting with a backslash is absolute
2011-06-08verify_dotfile(): do not assume '/' is the path seperatorLibravatar Theo Niessink1-3/+4
verify_dotfile() currently assumes that the path seperator is '/', but on Windows it can also be '\\', so use is_dir_sep() instead. Signed-off-by: Theo Niessink <theo@taletn.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-07verify_path(): simplify check at the directory boundaryLibravatar Junio C Hamano1-10/+3
We simply want to say "At a directory boundary, be careful with a name that begins with a dot, forbid a name that ends with the boundary character or has duplicated bounadry characters". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-27verify_path: consider dos drive prefixLibravatar Erik Faye-Lund1-1/+4
If someone manage to create a repo with a 'C:' entry in the root-tree, files can be written outside of the working-dir. This opens up a can-of-worms of exploits. Fix it by explicitly checking for a dos drive prefix when verifying a paht. While we're at it, make sure that paths beginning with '\' is considered absolute as well. Noticed-by: Theo Niessink <theo@taletn.com> Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09index_fd(): turn write_object and format_check arguments into one flagLibravatar Junio C Hamano1-2/+2
The "format_check" parameter tucked after the existing parameters is too ugly an afterthought to live in any reasonable API. Combine it with the other boolean parameter "write_object" into a single "flags" parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>