summaryrefslogtreecommitdiff
path: root/hash.c
AgeCommit message (Collapse)AuthorFilesLines
2008-02-22hash: fix lookup_hash semanticsLibravatar Jeff King1-1/+1
We were returning the _address of_ the stored item (or NULL) instead of the item itself. While this sort of indirection is useful for insertion (since you can lookup and then modify), it is unnecessary for read-only lookup. Since the hash code splits these functions between the internal lookup_hash_entry function and the public lookup_hash function, it makes sense for the latter to provide what users of the library expect. The result of this was that the index caching returned bogus results on lookup. We unfortunately didn't catch this because we were returning a "struct cache_entry **" as a "void *", and accidentally assigning it to a "struct cache_entry *". As it happens, this actually _worked_ most of the time, because the entries were defined as: struct cache_entry { struct cache_entry *next; ... }; meaning that interpreting a "struct cache_entry **" as a "struct cache_entry *" would yield an entry where all fields were totally bogus _except_ for the next pointer, which pointed to the actual cache entry. When walking the list, we would look at the bogus "name" field, which was unlikely to match our lookup, and then proceed to the "real" entry. The reading of bogus data was silently ignored most of the time, but could cause a segfault for some data (which seems to be more common on OS X). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26Do linear-time/space rename logic for exact renamesLibravatar Linus Torvalds1-0/+110
This implements a smarter rename detector for exact renames, which rather than doing a pairwise comparison (time O(m*n)) will just hash the files into a hash-table (size O(n+m)), and only do pairwise comparisons to renames that have the same hash (time O(n+m) except for unrealistic hash collissions, which we just cull aggressively). Admittedly the exact rename case is not nearly as interesting as the generic case, but it's an important case none-the-less. A similar general approach should work for the generic case too, but even then you do need to handle the exact renames/copies separately (to avoid the inevitable added cost factor that comes from the _size_ of the file), so this is worth doing. In the expectation that we will indeed do the same hashing trick for the general rename case, this code uses a generic hash-table implementation that can be used for other things too. In fact, we might be able to consolidate some of our existing hash tables with the new generic code in hash.[ch]. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>