summaryrefslogtreecommitdiff
path: root/Documentation/technical/api-hashmap.txt
AgeCommit message (Collapse)AuthorFilesLines
2017-03-28Merge branch 'jh/memihash-opt'Libravatar Junio C Hamano1-0/+22
The name-hash used for detecting paths that are different only in cases (which matter on case insensitive filesystems) has been optimized to take advantage of multi-threading when it makes sense. * jh/memihash-opt: name-hash: add test-lazy-init-name-hash to .gitignore name-hash: add perf test for lazy_init_name_hash name-hash: add test-lazy-init-name-hash name-hash: perf improvement for lazy_init_name_hash hashmap: document memihash_cont, hashmap_disallow_rehash api hashmap: add disallow_rehash setting hashmap: allow memihash computation to be continued name-hash: specify initial size for istate.dir_hash table
2017-03-23hashmap: document memihash_cont, hashmap_disallow_rehash apiLibravatar Jeff Hostetler1-0/+22
Document memihash_cont() and hashmap_disallow_rehash() in Documentation/technical/api-hashmap.txt. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-19clear_delta_base_cache(): don't modify hashmap while iteratingLibravatar Jeff King1-1/+3
On Thu, Jan 19, 2017 at 03:03:46PM +0100, Ulrich Spörlein wrote: > > I suspect the patch below may fix things for you. It works around it by > > walking over the lru list (either is fine, as they both contain all > > entries, and since we're clearing everything, we don't care about the > > order). > > Confirmed. With the patch applied, I can import the whole 55G in one go > without any crashes or aborts. Thanks much! Thanks. Here it is rolled up with a commit message. -- >8 -- Subject: clear_delta_base_cache(): don't modify hashmap while iterating Removing entries while iterating causes fast-import to access an already-freed `struct packed_git`, leading to various confusing errors. What happens is that clear_delta_base_cache() drops the whole contents of the cache by iterating over the hashmap, calling release_delta_base_cache() on each entry. That function removes the item from the hashmap. The hashmap code may then shrink the table, but the hashmap_iter struct retains an offset from the old table. As a result, the next call to hashmap_iter_next() may claim that the iteration is done, even though some items haven't been visited. The only caller of clear_delta_base_cache() is fast-import, which wants to clear the cache because it is discarding the packed_git struct for its temporary pack. So by failing to remove all of the entries, we still have references to the freed packed_git. To make things even more confusing, this doesn't seem to trigger with the test suite, because it depends on complexities like the size of the hash table, which entries got cleared, whether we try to access them before they're evicted from the cache, etc. So I've been able to identify the problem with large imports like freebsd's svn import, or a fast-export of linux.git. But nothing that would be reasonable to run as part of the normal test suite. We can fix this easily by iterating over the lru linked list instead of the hashmap. They both contain the same entries, and we can use the "safe" variant of the list iterator, which exists for exactly this case. Let's also add a warning to the hashmap API documentation to reduce the chances of getting bit by this again. Reported-by: Ulrich Spörlein <uqs@freebsd.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-02hashmap: clarify that hashmap_entry can safely be discardedLibravatar Junio C Hamano1-0/+5
The API documentation said that the hashmap_entry structure to be embedded in the caller's structure is to be treated as opaque, which left the reader wondering if it can safely be discarded when it no longer is necessary. If the hashmap_entry structure had references to external resources such as allocated memory or an open file descriptor, merely free(3)ing the containing structure (when the caller's structure is on the heap) or letting it go out of scope (when it is on the stack) would end up leaking the external resource. Document that there is no need for hashmap_entry_clear() that corresponds to hashmap_entry_init() to give the API users a little bit of peace of mind. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07hashmap: add string interning APILibravatar Karsten Blees1-0/+15
Interning short strings with high probability of duplicates can reduce the memory footprint and speed up comparisons. Add strintern() and memintern() APIs that use a hashmap to manage the pool of unique, interned strings. Note: strintern(getenv()) could be used to sanitize git's use of getenv(), in case we ever encounter a platform where a call to getenv() invalidates previous getenv() results (which is allowed by POSIX). Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07hashmap: add simplified hashmap_get_from_hash() APILibravatar Karsten Blees1-0/+14
Hashmap entries are typically looked up by just a key. The hashmap_get() API expects an initialized entry structure instead, to support compound keys. This flexibility is currently only needed by find_dir_entry() in name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other (currently five) call sites of hashmap_get() have to set up a near emtpy entry structure, resulting in duplicate code like this: struct hashmap_entry keyentry; hashmap_entry_init(&keyentry, hash(key)); return hashmap_get(map, &keyentry, key); Add a hashmap_get_from_hash() API that allows hashmap lookups by just specifying the key and its hash code, i.e.: return hashmap_get_from_hash(map, hash(key), key); Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07hashmap: improve struct hashmap member documentationLibravatar Karsten Blees1-4/+12
Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07hashmap: factor out getting a hash code from a SHA1Libravatar Karsten Blees1-0/+9
Copying the first bytes of a SHA1 is duplicated in six places, however, the implications (the actual value would depend on the endianness of the platform) is documented only once. Add a properly documented API for this. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-19Documentation/technical/api-hashmap: remove source highlightingLibravatar Anders Kaseorg1-1/+0
The highlighting was pretty, but unfortunately, the failure mode when source-highlight is not installed was that the entire code block disappears. See https://bugs.debian.org/745591, https://bugs.launchpad.net/bugs/1316810. Signed-off-by: Anders Kaseorg <andersk@mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24hashmap.h: use 'unsigned int' for hash-codes everywhereLibravatar Karsten Blees1-1/+1
Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18add a hashtable implementation that supports O(1) removalLibravatar Karsten Blees1-0/+235
The existing hashtable implementation (in hash.[ch]) uses open addressing (i.e. resolve hash collisions by distributing entries across the table). Thus, removal is difficult to implement with less than O(n) complexity. Resolving collisions of entries with identical hashes (e.g. via chaining) is left to the client code. Add a hashtable implementation that supports O(1) removal and is slightly easier to use due to builtin entry chaining. Supports all basic operations init, free, get, add, remove and iteration. Also includes ready-to-use hash functions based on the public domain FNV-1 algorithm (http://www.isthe.com/chongo/tech/comp/fnv). The per-entry data structure (hashmap_entry) is piggybacked in front of the client's data structure to save memory. See test-hashmap.c for usage examples. The hashtable is resized by a factor of four when 80% full. With these settings, average memory consumption is about 2/3 of hash.[ch], and insertion is about twice as fast due to less frequent resizing. Lookups are also slightly faster, because entries are strictly confined to their bucket (i.e. no data of other buckets needs to be traversed). Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>