diff options
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/api-hash.txt | 52 | ||||
-rw-r--r-- | Documentation/technical/api-hashmap.txt | 234 | ||||
-rw-r--r-- | Documentation/technical/api-parse-options.txt | 4 | ||||
-rw-r--r-- | Documentation/technical/api-strbuf.txt | 10 | ||||
-rw-r--r-- | Documentation/technical/bitmap-format.txt | 164 | ||||
-rw-r--r-- | Documentation/technical/http-protocol.txt | 6 |
6 files changed, 406 insertions, 64 deletions
diff --git a/Documentation/technical/api-hash.txt b/Documentation/technical/api-hash.txt deleted file mode 100644 index e5061e0677..0000000000 --- a/Documentation/technical/api-hash.txt +++ /dev/null @@ -1,52 +0,0 @@ -hash API -======== - -The hash API is a collection of simple hash table functions. Users are expected -to implement their own hashing. - -Data Structures ---------------- - -`struct hash_table`:: - - The hash table structure. The `array` member points to the hash table - entries. The `size` member counts the total number of valid and invalid - entries in the table. The `nr` member keeps track of the number of - valid entries. - -`struct hash_table_entry`:: - - An opaque structure representing an entry in the hash table. The `hash` - member is the entry's hash key and the `ptr` member is the entry's - value. - -Functions ---------- - -`init_hash`:: - - Initialize the hash table. - -`free_hash`:: - - Release memory associated with the hash table. - -`insert_hash`:: - - Insert a pointer into the hash table. If an entry with that hash - already exists, a pointer to the existing entry's value is returned. - Otherwise NULL is returned. This allows callers to implement - chaining, etc. - -`lookup_hash`:: - - Lookup an entry in the hash table. If an entry with that hash exists - the entry's value is returned. Otherwise NULL is returned. - -`for_each_hash`:: - - Call a function for each entry in the hash table. The function is - expected to take the entry's value as its only argument and return an - int. If the function returns a negative int the loop is aborted - immediately. Otherwise, the return value is accumulated and the sum - returned upon completion of the loop. diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt new file mode 100644 index 0000000000..b977ae8bbb --- /dev/null +++ b/Documentation/technical/api-hashmap.txt @@ -0,0 +1,234 @@ +hashmap API +=========== + +The hashmap API is a generic implementation of hash-based key-value mappings. + +Data Structures +--------------- + +`struct hashmap`:: + + The hash table structure. ++ +The `size` member keeps track of the total number of entries. The `cmpfn` +member is a function used to compare two entries for equality. The `table` and +`tablesize` members store the hash table and its size, respectively. + +`struct hashmap_entry`:: + + An opaque structure representing an entry in the hash table, which must + be used as first member of user data structures. Ideally it should be + followed by an int-sized member to prevent unused memory on 64-bit + systems due to alignment. ++ +The `hash` member is the entry's hash code and the `next` member points to the +next entry in case of collisions (i.e. if multiple entries map to the same +bucket). + +`struct hashmap_iter`:: + + An iterator structure, to be used with hashmap_iter_* functions. + +Types +----- + +`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`:: + + User-supplied function to test two hashmap entries for equality. Shall + return 0 if the entries are equal. ++ +This function is always called with non-NULL `entry` / `entry_or_key` +parameters that have the same hash code. When looking up an entry, the `key` +and `keydata` parameters to hashmap_get and hashmap_remove are always passed +as second and third argument, respectively. Otherwise, `keydata` is NULL. + +Functions +--------- + +`unsigned int strhash(const char *buf)`:: +`unsigned int strihash(const char *buf)`:: +`unsigned int memhash(const void *buf, size_t len)`:: +`unsigned int memihash(const void *buf, size_t len)`:: + + Ready-to-use hash functions for strings, using the FNV-1 algorithm (see + http://www.isthe.com/chongo/tech/comp/fnv). ++ +`strhash` and `strihash` take 0-terminated strings, while `memhash` and +`memihash` operate on arbitrary-length memory. ++ +`strihash` and `memihash` are case insensitive versions. + +`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`:: + + Initializes a hashmap structure. ++ +`map` is the hashmap to initialize. ++ +The `equals_function` can be specified to compare two entries for equality. +If NULL, entries are considered equal if their hash codes are equal. ++ +If the total number of entries is known in advance, the `initial_size` +parameter may be used to preallocate a sufficiently large table and thus +prevent expensive resizing. If 0, the table is dynamically resized. + +`void hashmap_free(struct hashmap *map, int free_entries)`:: + + Frees a hashmap structure and allocated memory. ++ +`map` is the hashmap to free. ++ +If `free_entries` is true, each hashmap_entry in the map is freed as well +(using stdlib's free()). + +`void hashmap_entry_init(void *entry, unsigned int hash)`:: + + Initializes a hashmap_entry structure. ++ +`entry` points to the entry to initialize. ++ +`hash` is the hash code of the entry. + +`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`:: + + Returns the hashmap entry for the specified key, or NULL if not found. ++ +`map` is the hashmap structure. ++ +`key` is a hashmap_entry structure (or user data structure that starts with +hashmap_entry) that has at least been initialized with the proper hash code +(via `hashmap_entry_init`). ++ +If an entry with matching hash code is found, `key` and `keydata` are passed +to `hashmap_cmp_fn` to decide whether the entry matches the key. + +`void *hashmap_get_next(const struct hashmap *map, const void *entry)`:: + + Returns the next equal hashmap entry, or NULL if not found. This can be + used to iterate over duplicate entries (see `hashmap_add`). ++ +`map` is the hashmap structure. ++ +`entry` is the hashmap_entry to start the search from, obtained via a previous +call to `hashmap_get` or `hashmap_get_next`. + +`void hashmap_add(struct hashmap *map, void *entry)`:: + + Adds a hashmap entry. This allows to add duplicate entries (i.e. + separate values with the same key according to hashmap_cmp_fn). ++ +`map` is the hashmap structure. ++ +`entry` is the entry to add. + +`void *hashmap_put(struct hashmap *map, void *entry)`:: + + Adds or replaces a hashmap entry. If the hashmap contains duplicate + entries equal to the specified entry, only one of them will be replaced. ++ +`map` is the hashmap structure. ++ +`entry` is the entry to add or replace. ++ +Returns the replaced entry, or NULL if not found (i.e. the entry was added). + +`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`:: + + Removes a hashmap entry matching the specified key. If the hashmap + contains duplicate entries equal to the specified key, only one of + them will be removed. ++ +`map` is the hashmap structure. ++ +`key` is a hashmap_entry structure (or user data structure that starts with +hashmap_entry) that has at least been initialized with the proper hash code +(via `hashmap_entry_init`). ++ +If an entry with matching hash code is found, `key` and `keydata` are +passed to `hashmap_cmp_fn` to decide whether the entry matches the key. ++ +Returns the removed entry, or NULL if not found. + +`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`:: +`void *hashmap_iter_next(struct hashmap_iter *iter)`:: +`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`:: + + Used to iterate over all entries of a hashmap. ++ +`hashmap_iter_init` initializes a `hashmap_iter` structure. ++ +`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no +more entries. ++ +`hashmap_iter_first` is a combination of both (i.e. initializes the iterator +and returns the first entry, if any). + +Usage example +------------- + +Here's a simple usage example that maps long keys to double values. +------------ +struct hashmap map; + +struct long2double { + struct hashmap_entry ent; /* must be the first member! */ + long key; + double value; +}; + +static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused) +{ + return !(e1->key == e2->key); +} + +void long2double_init(void) +{ + hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0); +} + +void long2double_free(void) +{ + hashmap_free(&map, 1); +} + +static struct long2double *find_entry(long key) +{ + struct long2double k; + hashmap_entry_init(&k, memhash(&key, sizeof(long))); + k.key = key; + return hashmap_get(&map, &k, NULL); +} + +double get_value(long key) +{ + struct long2double *e = find_entry(key); + return e ? e->value : 0; +} + +void set_value(long key, double value) +{ + struct long2double *e = find_entry(key); + if (!e) { + e = malloc(sizeof(struct long2double)); + hashmap_entry_init(e, memhash(&key, sizeof(long))); + e->key = key; + hashmap_add(&map, e); + } + e->value = value; +} +------------ + +Using variable-sized keys +------------------------- + +The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary +`hashmap_entry` structure as key to find the correct entry. If the key data is +variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable +to create a full-fledged entry structure on the heap and copy all the key data +into the structure. + +In this case, the `keydata` parameter can be used to pass +variable-sized key data directly to the comparison function, and the `key` +parameter can be a stripped-down, fixed size entry structure allocated on the +stack. + +See test-hashmap.c for an example using arbitrary-length strings as keys. diff --git a/Documentation/technical/api-parse-options.txt b/Documentation/technical/api-parse-options.txt index be50cf4de3..1f2db31312 100644 --- a/Documentation/technical/api-parse-options.txt +++ b/Documentation/technical/api-parse-options.txt @@ -160,10 +160,6 @@ There are some macros to easily define options: `int_var` is set to `integer` with `--option`, and reset to zero with `--no-option`. -`OPT_SET_PTR(short, long, &ptr_var, description, ptr)`:: - Introduce a boolean option. - If used, set `ptr_var` to `ptr`. - `OPT_STRING(short, long, &str_var, arg_str, description)`:: Introduce an option with string argument. The string argument is put into `str_var`. diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt index 3350d97dda..1d00e4d596 100644 --- a/Documentation/technical/api-strbuf.txt +++ b/Documentation/technical/api-strbuf.txt @@ -7,10 +7,10 @@ use the mem* functions than a str* one (memchr vs. strchr e.g.). Though, one has to be careful about the fact that str* functions often stop on NULs and that strbufs may have embedded NULs. -An strbuf is NUL terminated for convenience, but no function in the +A strbuf is NUL terminated for convenience, but no function in the strbuf API actually relies on the string being free of NULs. -strbufs has some invariants that are very important to keep in mind: +strbufs have some invariants that are very important to keep in mind: . The `buf` member is never NULL, so it can be used in any usual C string operations safely. strbuf's _have_ to be initialized either by @@ -56,8 +56,8 @@ Data structures * `struct strbuf` This is the string buffer structure. The `len` member can be used to -determine the current length of the string, and `buf` member provides access to -the string itself. +determine the current length of the string, and `buf` member provides +access to the string itself. Functions --------- @@ -184,7 +184,7 @@ strbuf_addstr(sb, "immediate string"); `strbuf_addbuf`:: - Copy the contents of an other buffer at the end of the current one. + Copy the contents of another buffer at the end of the current one. `strbuf_adddup`:: diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt new file mode 100644 index 0000000000..f8c18a0f7a --- /dev/null +++ b/Documentation/technical/bitmap-format.txt @@ -0,0 +1,164 @@ +GIT bitmap v1 format +==================== + + - A header appears at the beginning: + + 4-byte signature: {'B', 'I', 'T', 'M'} + + 2-byte version number (network byte order) + The current implementation only supports version 1 + of the bitmap index (the same one as JGit). + + 2-byte flags (network byte order) + + The following flags are supported: + + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + This flag must always be present. It implies that the bitmap + index has been generated for a packfile with full closure + (i.e. where every single object in the packfile can find + its parent links inside the same packfile). This is a + requirement for the bitmap index format, also present in JGit, + that greatly reduces the complexity of the implementation. + + - BITMAP_OPT_HASH_CACHE (0x4) + If present, the end of the bitmap file contains + `N` 32-bit name-hash values, one per object in the + pack. The format and meaning of the name-hash is + described below. + + 4-byte entry count (network byte order) + + The total count of entries (bitmapped commits) in this bitmap index. + + 20-byte checksum + + The SHA1 checksum of the pack this bitmap index belongs to. + + - 4 EWAH bitmaps that act as type indexes + + Type indexes are serialized after the hash cache in the shape + of four EWAH bitmaps stored consecutively (see Appendix A for + the serialization format of an EWAH bitmap). + + There is a bitmap for each Git object type, stored in the following + order: + + - Commits + - Trees + - Blobs + - Tags + + In each bitmap, the `n`th bit is set to true if the `n`th object + in the packfile is of that type. + + The obvious consequence is that the OR of all 4 bitmaps will result + in a full set (all bits set), and the AND of all 4 bitmaps will + result in an empty bitmap (no bits set). + + - N entries with compressed bitmaps, one for each indexed commit + + Where `N` is the total amount of entries in this bitmap index. + Each entry contains the following: + + - 4-byte object position (network byte order) + The position **in the index for the packfile** where the + bitmap for this commit is found. + + - 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). + + Note that this compression can be recursive. In order to + XOR this entry with a previous one, the previous entry needs + to be decompressed first, and so on. + + The hard-limit for this offset is 160 (an entry can only be + xor'ed against one of the 160 entries preceding it). This + number is always positive, and hence entries are always xor'ed + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. + + - 1-byte flags for this bitmap + At the moment the only available flag is `0x1`, which hints + that this bitmap can be re-used when rebuilding bitmap indexes + for the repository. + + - The compressed bitmap itself, see Appendix A. + +== Appendix A: Serialization format for an EWAH bitmap + +Ewah bitmaps are serialized in the same protocol as the JAVAEWAH +library, making them backwards compatible with the JGit +implementation: + + - 4-byte number of bits of the resulting UNCOMPRESSED bitmap + + - 4-byte number of words of the COMPRESSED bitmap, when stored + + - N x 8-byte words, as specified by the previous field + + This is the actual content of the compressed bitmap. + + - 4-byte position of the current RLW for the compressed + bitmap + +All words are stored in network byte order for their corresponding +sizes. + +The compressed bitmap is stored in a form of run-length encoding, as +follows. It consists of a concatenation of an arbitrary number of +chunks. Each chunk consists of one or more 64-bit words + + H L_1 L_2 L_3 .... L_M + +H is called RLW (run length word). It consists of (from lower to higher +order bits): + + - 1 bit: the repeated bit B + + - 32 bits: repetition count K (unsigned) + + - 31 bits: literal word count M (unsigned) + +The bitstream represented by the above chunk is then: + + - K repetitions of B + + - The bits stored in `L_1` through `L_M`. Within a word, bits at + lower order come earlier in the stream than those at higher + order. + +The next word after `L_M` (if any) must again be a RLW, for the next +chunk. For efficient appending to the bitstream, the EWAH stores a +pointer to the last RLW in the stream. + + +== Appendix B: Optional Bitmap Sections + +These sections may or may not be present in the `.bitmap` file; their +presence is indicated by the header flags section described above. + +Name-hash cache +--------------- + +If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains +a cache of 32-bit values, one per object in the pack. The value at +position `i` is the hash of the pathname at which the `i`th object +(counting in index order) in the pack can be found. This can be fed +into the delta heuristics to compare objects with similar pathnames. + +The hash algorithm used is: + + hash = 0; + while ((c = *name++)) + if (!isspace(c)) + hash = (hash >> 2) + (c << 24); + +Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag. +If implementations want to choose a different hashing scheme, they are +free to do so, but MUST allocate a new header flag (because comparing +hashes made under two different schemes would be pointless). diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt index 544373b16f..58c7e872d7 100644 --- a/Documentation/technical/http-protocol.txt +++ b/Documentation/technical/http-protocol.txt @@ -60,7 +60,7 @@ Because Git repositories are accessed by standard path components server administrators MAY use directory based permissions within their HTTP server to control repository access. -Clients SHOULD support Basic authentication as described by RFC 2616. +Clients SHOULD support Basic authentication as described by RFC 2617. Servers SHOULD support Basic authentication by relying upon the HTTP server placed in front of the Git server software. @@ -500,7 +500,7 @@ TODO: Document this further. References ---------- -link:http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)] -link:http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1] +http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)] +http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1] link:technical/pack-protocol.html link:technical/protocol-capabilities.html |