summaryrefslogtreecommitdiff
path: root/fast-import.c
AgeCommit message (Collapse)AuthorFilesLines
2014-06-20fast-import: refactor parsing of spacesLibravatar Jeff King1-13/+7
When we see a file change in a commit, we expect one of: 1. A mark. 2. An "inline" keyword. 3. An object sha1. The handling of spaces is inconsistent between the three options. Option 1 calls a sub-function which checks for the space, but doesn't parse past it. Option 2 parses the space, then deliberately avoids moving the pointer past it. Option 3 detects the space locally but doesn't move past it. This is confusing, because it looks like option 1 forgets to check for the space (it's just buried). And option 2 checks for "inline ", but only moves strlen("inline") characters forward, which looks like a bug but isn't. We can make this more clear by just having each branch move past the space as it is checked (and we can replace the doubled use of "inline" with a call to skip_prefix). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20fast-import: use skip_prefix for parsing inputLibravatar Jeff King1-72/+51
Fast-import does a lot of parsing of commands and dispatching to sub-functions. For example, given "option foo", we might recognize "option " using starts_with, and then hand it off to parse_option() to do the rest. However, we do not let parse_option know that we have parsed the first part already. It gets the full buffer, and has to skip past the uninteresting bits. Some functions simply add a magic constant: char *option = command_buf.buf + 7; Others use strlen: char *option = command_buf.buf + strlen("option "); And others use strchr: char *option = strchr(command_buf.buf, ' ') + 1; All of these are brittle and easy to get wrong (especially given that the starts_with call and the code that assumes the presence of the prefix are far apart). Instead, we can use skip_prefix, and just pass each handler a pointer to its arguments. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20use skip_prefix to avoid magic numbersLibravatar Jeff King1-32/+37
It's a common idiom to match a prefix and then skip past it with a magic number, like: if (starts_with(foo, "bar")) foo += 3; This is easy to get wrong, since you have to count the prefix string yourself, and there's no compiler check if the string changes. We can use skip_prefix to avoid the magic numbers here. Note that some of these conversions could be much shorter. For example: if (starts_with(arg, "--foo=")) { bar = arg + 6; continue; } could become: if (skip_prefix(arg, "--foo=", &bar)) continue; However, I have left it as: if (skip_prefix(arg, "--foo=", &v)) { bar = v; continue; } to visually match nearby cases which need to actually process the string. Like: if (skip_prefix(arg, "--foo=", &v)) { bar = atoi(v); continue; } Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20fast-import: fix read of uninitialized argv memoryLibravatar Jeff King1-5/+8
Fast-import shares code between its command-line parser and the "option" command. To do so, it strips the "--" from any command-line options and passes them to the option parser. However, it does not confirm that the option even begins with "--" before blindly passing "arg + 2". It does confirm that the option starts with "-", so the only affected case was: git fast-import - which would read uninitialized memory after the argument. We can fix it by using skip_prefix and checking the result. As a bonus, this gets rid of some magic numbers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-21fast-import: add support to delete refsLibravatar Felipe Contreras1-3/+10
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-10use strchrnul() in place of strchr() and strlen()Libravatar Rohit Mani1-21/+12
Avoid scanning strings twice, once with strchr() and then with strlen(), by using strchrnul(). Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rohit Mani <rohit.mani@outlook.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Libravatar Christian Couder1-40/+40
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-20Merge branch 'bk/refs-multi-update'Libravatar Junio C Hamano1-1/+1
Give "update-refs" a "--stdin" option to read multiple update requests and perform them in an all-or-none fashion. * bk/refs-multi-update: update-ref: add test cases covering --stdin signature update-ref: support multiple simultaneous updates refs: add update_refs for multiple simultaneous updates refs: add function to repack without multiple refs refs: factor delete_ref loose ref step into a helper refs: factor update_ref steps into helpers refs: report ref type from lock_any_ref_for_update reset: rename update_refs to reset_refs
2013-09-17Merge branch 'rh/ishes-doc'Libravatar Junio C Hamano1-10/+10
We liberally use "committish" and "commit-ish" (and "treeish" and "tree-ish"); as these are non-words, let's unify these terms to their dashed form. More importantly, clarify the documentation on object peeling using these terms. * rh/ishes-doc: glossary: fix and clarify the definition of 'ref' revisions.txt: fix and clarify <rev>^{<type>} glossary: more precise definition of tree-ish (a.k.a. treeish) use 'commit-ish' instead of 'committish' use 'tree-ish' instead of 'treeish' glossary: define commit-ish (a.k.a. committish) glossary: mention 'treeish' as an alternative to 'tree-ish'
2013-09-04use 'commit-ish' instead of 'committish'Libravatar Richard Hansen1-8/+8
Replace 'committish' in documentation and comments with 'commit-ish' to match gitglossary(7) and to be consistent with 'tree-ish'. The only remaining instances of 'committish' are: * variable, function, and macro names * "(also committish)" in the definition of commit-ish in gitglossary[7] Signed-off-by: Richard Hansen <rhansen@bbn.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-04use 'tree-ish' instead of 'treeish'Libravatar Richard Hansen1-2/+2
Replace 'treeish' in documentation and comments with 'tree-ish' to match gitglossary(7). The only remaining instances of 'treeish' are: * variable, function, and macro names * "(also treeish)" in the definition of tree-ish in gitglossary(7) Signed-off-by: Richard Hansen <rhansen@bbn.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-04Merge branch 'jk/fast-import-empty-ls'Libravatar Junio C Hamano1-20/+38
* jk/fast-import-empty-ls: fast-import: allow moving the root tree fast-import: allow ls or filecopy of the root tree fast-import: set valid mode on root tree in "ls" t9300: document fast-import empty path issues
2013-08-30refs: report ref type from lock_any_ref_for_updateLibravatar Brad King1-1/+1
Expose lock_ref_sha1_basic's type_p argument to callers of lock_any_ref_for_update. Update all call sites to ignore it by passing NULL for now. Signed-off-by: Brad King <brad.king@kitware.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-23fast-import: allow moving the root treeLibravatar John Keeping1-7/+14
Because fast-import.c::tree_content_remove does not check for the empty path, it is not possible to move the root tree to a subdirectory. Instead the error "Path not in branch" is produced (note the double space where the empty path has been inserted). Fix this by explicitly checking for the empty path and handling it. Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-23fast-import: allow ls or filecopy of the root treeLibravatar John Keeping1-13/+22
Commit 178e1de (fast-import: don't allow 'ls' of path with empty components, 2012-03-09) restricted paths which: . contain an empty directory component (e.g. foo//bar is invalid), . end with a directory separator (e.g. foo/ is invalid), . start with a directory separator (e.g. /foo is invalid). However, the implementation also caught the empty path, which should represent the root tree. Relax this restriction so that the empty path is explicitly allowed and refers to the root tree. Reported-by: Dave Abrahams <dave@boostpro.com> Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-23fast-import: set valid mode on root tree in "ls"Libravatar John Keeping1-0/+2
This prevents a failure later when we lift the restriction on ls with the empty path. Signed-off-by: John Keeping <john@keeping.me.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02Merge branch 'fc/fast-export-persistent-marks'Libravatar Junio C Hamano1-1/+1
Optimization for fast-export by avoiding unnecessarily resolving arbitrary object name and parsing object when only presence and type information is necessary, etc. * fc/fast-export-persistent-marks: fast-{import,export}: use get_sha1_hex() to read from marks file fast-export: don't parse commits while reading marks file fast-export: do not parse non-commit objects while reading marks file
2013-05-07fast-{import,export}: use get_sha1_hex() to read from marks fileLibravatar Felipe Contreras1-1/+1
It's wrong to call get_sha1() if they should be SHA-1s, plus inefficient. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-28sparse: Fix mingw_main() argument number/type errorsLibravatar Ramsay Jones1-2/+2
Sparse issues 68 errors (two errors for each main() function) such as the following: SP git.c git.c:510:5: error: too many arguments for function mingw_main git.c:510:5: error: symbol 'mingw_main' redeclared with different type \ (originally declared at git.c:510) - different argument counts The errors are caused by the 'main' macro used by the MinGW build to provide a replacement main() function. The original main function is effectively renamed to 'mingw_main' and is called from the new main function. The replacement main is used to execute certain actions common to all git programs on MinGW (e.g. ensure the standard I/O streams are in binary mode). In order to suppress the errors, we change the macro to include the parameters in the declaration of the mingw_main function. Unfortunately, this change provokes both sparse and gcc to complain about 9 calls to mingw_main(), such as the following: CC git.o git.c: In function 'main': git.c:510: warning: passing argument 2 of 'mingw_main' from \ incompatible pointer type git.c:510: note: expected 'const char **' but argument is of \ type 'char **' In order to suppress these warnings, since both of the main functions need to be declared with the same prototype, we change the declaration of the 9 main functions, thus: int main(int argc, char **argv) Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-29fast-import: Fix an gcc -Wuninitialized warningLibravatar Ramsay Jones1-0/+1
Commit cbfd5e1c ("drop some obsolete "x = x" compiler warning hacks", 21-03-2013) removed a gcc hack that suppressed an "might be used uninitialized" warning issued by older versions of gcc. However, commit 3aa99df8 ('fast-import: clarify "inline" logic in file_change_m', 21-03-2013) addresses an (almost) identical issue (with very similar code), but includes additional code in it's resolution. The solution used by this commit, unlike that used by commit cbfd5e1c, also suppresses the -Wuninitialized warning on older versions of gcc. In order to suppress the warning (against the 'oe' symbol) in the note_change_n() function, we adopt the same solution used by commit 3aa99df8. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-21fast-import: clarify "inline" logic in file_change_mLibravatar Jeff King1-1/+2
When we read a fast-import line like: M 100644 :1 foo.c we point the local object_entry variable "oe" to the object named by the mark ":1". When the input uses the "inline" construct, however, we do not have such an object_entry. The current code is careful not to access "oe" in the inline case, but we can make the assumption even more obvious (and catch violations of it) by setting oe to NULL and adding a comment. As a bonus, this also squelches an over-zealous gcc -Wuninitialized warning, which means we can drop the "oe = oe" initialization hack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-21drop some obsolete "x = x" compiler warning hacksLibravatar Jeff King1-1/+1
In cases where the setting and access of a variable are protected by the same conditional flag, older versions of gcc would generate a "might be used unitialized" warning. We silence the warning by initializing the variable to itself, a hack that gcc recognizes. Modern versions of gcc are smart enough to get this right, going back to at least version 4.3.5. gcc 4.1 does get it wrong in both cases, but is sufficiently old that we probably don't need to care about it anymore. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-21fast-import: use pointer-to-pointer to keep list tailLibravatar Jeff King1-6/+4
This is shorter, idiomatic, and it means the compiler does not get confused about whether our "e" pointer is valid, letting us drop the "e = e" hack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-11Merge branch 'jc/merge-bases'Libravatar Junio C Hamano1-1/+1
Optimise the "merge-base" computation a bit, and also update its users that do not need the full merge-base information to call a cheaper subset. * jc/merge-bases: reduce_heads(): reimplement on top of remove_redundant() merge-base: "--is-ancestor A B" get_merge_bases_many(): walk from many tips in parallel in_merge_bases(): use paint_down_to_common() merge_bases_many(): split out the logic to paint history in_merge_bases(): omit unnecessary redundant common ancestor reduction http-push: use in_merge_bases() for fast-forward check receive-pack: use in_merge_bases() for fast-forward check in_merge_bases(): support only one "other" commit
2012-08-27in_merge_bases(): support only one "other" commitLibravatar Junio C Hamano1-1/+1
In early days of its life, I planned to make it possible to compute "is a commit contained in all of these other commits?" with this function, but it turned out that no caller needed it. Just make it take two commit objects and add a comment to say what these two functions do. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-10fast-import: tighten parsing of datarefsLibravatar Pete Wyckoff1-33/+77
The syntax for the use of mark references in fast-import demands either a SP (space) or LF (end-of-line) after a mark reference. Fast-import does not complain when garbage appears after a mark reference in some cases. Factor out parsing of mark references and complain if errant characters are found. Also be a little more careful when parsing "inline" and SHA1s, complaining if extra characters appear or if the form of the dataref is unrecognized. Buggy input can cause fast-import to produce the wrong output, silently, without error. This makes it difficult to track down buggy generators of fast-import streams. An example is seen in the last line of this commit command: commit refs/heads/S2 committer Name <name@example.com> 1112912893 -0400 data <<COMMIT commit message COMMIT from :1M 100644 :103 hello.c It is missing a newline and should be: [...] from :1 M 100644 :103 hello.c What fast-import does is to produce a commit with the same contents for hello.c as in refs/heads/S2^. What the buggy program was expecting was the contents of blob :103. While the resulting commit graph looked correct, the contents in some commits were wrong. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-26Merge branch 'jn/maint-fast-import-empty-ls' into maintLibravatar Junio C Hamano1-0/+4
* jn/maint-fast-import-empty-ls: fast-import: don't allow 'ls' of path with empty components fast-import: leakfix for 'ls' of dirty trees
2012-03-09fast-import: don't allow 'ls' of path with empty componentsLibravatar Jonathan Nieder1-0/+2
As the fast-import manual explains: The value of <path> must be in canonical form. That is it must not: . contain an empty directory component (e.g. foo//bar is invalid), . end with a directory separator (e.g. foo/ is invalid), . start with a directory separator (e.g. /foo is invalid), Unfortunately the "ls" command accepts these invalid syntaxes and responds by declaring that the indicated path is missing. This is too subtle and causes importers to silently misbehave; better to error out so the operator knows what's happening. The C, R, and M commands already error out for such paths. Reported-by: Andrew Sayers <andrew-git@pileofstuff.org> Analysis-by: David Barr <davidbarr@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-03-09fast-import: leakfix for 'ls' of dirty treesLibravatar Jonathan Nieder1-0/+2
When the chosen directory has changed since it was last written to pack, "tree_content_get" makes a deep copy of its content to scribble on while computing the tree name, which we forgot to free. This leak has been present since the 'ls' command was introduced in v1.7.5-rc0~3^2~33 (fast-import: add 'ls' command, 2010-12-02). Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-03-05fast-import: zero all of 'struct tag' to silence valgrindLibravatar Thomas Rast1-1/+1
When running t9300, valgrind (correctly) complains about an uninitialized value in write_crash_report: ==2971== Use of uninitialised value of size 8 ==2971== at 0x4164F4: sha1_to_hex (hex.c:70) ==2971== by 0x4073E4: die_nicely (fast-import.c:468) ==2971== by 0x43284C: die (usage.c:86) ==2971== by 0x40420D: main (fast-import.c:2731) ==2971== Uninitialised value was created by a heap allocation ==2971== at 0x4C29B3D: malloc (vg_replace_malloc.c:263) ==2971== by 0x433645: xmalloc (wrapper.c:35) ==2971== by 0x405DF5: pool_alloc (fast-import.c:619) ==2971== by 0x407755: pool_calloc.constprop.14 (fast-import.c:634) ==2971== by 0x403F33: main (fast-import.c:3324) Fix this by zeroing all of the 'struct tag'. We would only need to zero out the 'sha1' field, but this way seems more future-proof. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21Appease Sun Studio by renaming "tmpfile"Libravatar Ævar Arnfjörð Bjarmason1-4/+4
On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-19Merge branch 'ab/enable-i18n'Libravatar Junio C Hamano1-0/+2
* ab/enable-i18n: i18n: add infrastructure for translating Git with gettext Conflicts: Makefile
2011-12-16Merge branch 'jc/stream-to-pack'Libravatar Junio C Hamano1-17/+8
* jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h
2011-12-05i18n: add infrastructure for translating Git with gettextLibravatar Ævar Arnfjörð Bjarmason1-0/+2
Change the skeleton implementation of i18n in Git to one that can show localized strings to users for our C, Shell and Perl programs using either GNU libintl or the Solaris gettext implementation. This new internationalization support is enabled by default. If gettext isn't available, or if Git is compiled with NO_GETTEXT=YesPlease, Git falls back on its current behavior of showing interface messages in English. When using the autoconf script we'll auto-detect if the gettext libraries are installed and act appropriately. This change is somewhat large because as well as adding a C, Shell and Perl i18n interface we're adding a lot of tests for them, and for those tests to work we need a skeleton PO file to actually test translations. A minimal Icelandic translation is included for this purpose. Icelandic includes multi-byte characters which makes it easy to test various edge cases, and it's a language I happen to understand. The rest of the commit message goes into detail about various sub-parts of this commit. = Installation Gettext .mo files will be installed and looked for in the standard $(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to override that, but that's only intended to be used to test Git itself. = Perl Perl code that's to be localized should use the new Git::I18n module. It imports a __ function into the caller's package by default. Instead of using the high level Locale::TextDomain interface I've opted to use the low-level (equivalent to the C interface) Locale::Messages module, which Locale::TextDomain itself uses. Locale::TextDomain does a lot of redundant work we don't need, and some of it would potentially introduce bugs. It tries to set the $TEXTDOMAIN based on package of the caller, and has its own hardcoded paths where it'll search for messages. I found it easier just to completely avoid it rather than try to circumvent its behavior. In any case, this is an issue wholly internal Git::I18N. Its guts can be changed later if that's deemed necessary. See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for a further elaboration on this topic. = Shell Shell code that's to be localized should use the git-sh-i18n library. It's basically just a wrapper for the system's gettext.sh. If gettext.sh isn't available we'll fall back on gettext(1) if it's available. The latter is available without the former on Solaris, which has its own non-GNU gettext implementation. We also need to emulate eval_gettext() there. If neither are present we'll use a dumb printf(1) fall-through wrapper. = About libcharset.h and langinfo.h We use libcharset to query the character set of the current locale if it's available. I.e. we'll use it instead of nl_langinfo if HAVE_LIBCHARSET_H is set. The GNU gettext manual recommends using langinfo.h's nl_langinfo(CODESET) to acquire the current character set, but on systems that have libcharset.h's locale_charset() using the latter is either saner, or the only option on those systems. GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either, but MinGW and some others need to use libcharset.h's locale_charset() instead. =Credits This patch is based on work by Jeff Epler <jepler@unpythonic.net> who did the initial Makefile / C work, and a lot of comments from the Git mailing list, including Jonathan Nieder, Jakub Narebski, Johannes Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and others. [jc: squashed a small Makefile fix from Ramsay] Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-30csum-file: introduce sha1file_checkpointLibravatar Junio C Hamano1-17/+8
It is useful to be able to rewind a check-summed file to a certain previous state after writing data into it using sha1write() API. The fast-import command does this after streaming a blob data to the packfile being generated and then noticing that the same blob has already been written, and it does this with a private code truncate_pack() that is commented as "Yes, this is a layering violation". Introduce two API functions, sha1file_checkpoint(), that allows the caller to save a state of a sha1file, and then later revert it to the saved state. Use it to reimplement truncate_pack(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-28fast-import: Fix incorrect fanout level when modifying existing notes refsLibravatar Johan Herland1-3/+25
This fixes the bug uncovered by the tests added in the previous two patches. When an existing notes ref was loaded into the fast-import machinery, the num_notes counter associated with that ref remained == 0, even though the true number of notes in the loaded ref was higher. This caused a fanout level of 0 to be used, although the actual fanout of the tree could be > 0. Manipulating the notes tree at an incorrect fanout level causes removals to silently fail, and modifications of existing notes to instead produce an additional note (leaving the old object in place at a different fanout level). This patch fixes the bug by explicitly counting the number of notes in the notes tree whenever it looks like the num_notes counter could be wrong (when num_notes == 0). There may be false positives (i.e. triggering the counting when the notes tree is truly empty), but in those cases, the counting should not take long. Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-13Merge branch 'di/fast-import-empty-tag-note-fix'Libravatar Junio C Hamano1-0/+4
* di/fast-import-empty-tag-note-fix: fast-import: don't allow to note on empty branch fast-import: don't allow to tag empty branch
2011-10-05Change check_ref_format() to take a flags argumentLibravatar Michael Haggerty1-6/+1
Change check_ref_format() to take a flags argument that indicates what is acceptable in the reference name (analogous to "git check-ref-format"'s "--allow-onelevel" and "--refspec-pattern"). This is more convenient for callers and also fixes a failure in the test suite (and likely elsewhere in the code) by enabling "onelevel" and "refspec-pattern" to be allowed independently of each other. Also rename check_ref_format() to check_refname_format() to make it obvious that it deals with refnames rather than references themselves. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-22fast-import: don't allow to note on empty branchLibravatar Dmitry Ivankov1-0/+2
'reset' command makes fast-import start a branch from scratch. It's name is kept in lookup table but it's sha1 is null_sha1 (special value). 'notemodify' command can be used to add a note on branch head given it's name. lookup_branch() is used it that case and it doesn't check for null_sha1. So fast-import writes a note for null_sha1 object instead of giving a error. Add a check to deny adding a note on empty branch and add a corresponding test. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-22fast-import: don't allow to tag empty branchLibravatar Dmitry Ivankov1-0/+2
'reset' command makes fast-import start a branch from scratch. It's name is kept in lookup table but it's sha1 is null_sha1 (special value). 'tag' command can be used to tag a branch by it's name. lookup_branch() is used it that case and it doesn't check for null_sha1. So fast-import writes a tag for null_sha1 object instead of giving a error. Add a check to deny tagging an empty branch and add a corresponding test. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-28Merge branch 'di/fast-import-tagging'Libravatar Junio C Hamano1-7/+7
* di/fast-import-tagging: fast-import: allow to tag newly created objects fast-import: add tests for tagging blobs
2011-08-28Merge branch 'di/fast-import-blob-tweak'Libravatar Junio C Hamano1-5/+12
* di/fast-import-blob-tweak: fast-import: treat cat-blob as a delta base hint for next blob fast-import: count and report # of calls to diff_delta in stats
2011-08-28Merge branch 'di/fast-import-deltified-tree'Libravatar Junio C Hamano1-5/+30
* di/fast-import-deltified-tree: fast-import: prevent producing bad delta fast-import: add a test for tree delta base corruption
2011-08-28Merge branch 'di/fast-import-ident'Libravatar Junio C Hamano1-12/+21
* di/fast-import-ident: fsck: improve committer/author check fsck: add a few committer name tests fast-import: check committer name more strictly fast-import: don't fail on omitted committer name fast-import: add input format tests
2011-08-23fast-import: allow to tag newly created objectsLibravatar Dmitry Ivankov1-3/+7
fast-import allows to tag objects by sha1 and to query sha1 of objects being imported. So it should allow to tag these objects, make it do so. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-23fast-import: add tests for tagging blobsLibravatar Dmitry Ivankov1-7/+3
fast-import allows to create an annotated tag that annotates a blob, via mark or direct sha1 specification. For mark it works, for sha1 it tries to read the object. It tries to do so via read_sha1_file, and then checks the size to be at least 46. That's weird, let's just allow to (annotated) tag any object referenced by sha1. If the object originates from our packfile, we still fail though. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-22fast-import: treat cat-blob as a delta base hint for next blobLibravatar Dmitry Ivankov1-1/+6
Delta base for blobs is chosen as a previously saved blob. If we treat cat-blob's blob as a delta base for the next blob, nothing is likely to become worse. For fast-import stream producer like svn-fe cat-blob is used like following: - svn-fe reads file delta in svn format - to apply it, svn-fe asks cat-blob 'svn delta base' - applies 'svn delta' to the response - produces a blob command to store the result Currently there is no way for svn-fe to give fast-import a hint on object delta base. While what's requested in cat-blob is most of the time a best delta base possible. Of course, it could be not a good delta base, but we don't know any better one anyway. So do treat cat-blob's result as a delta base for next blob. The profit is nice: 2x to 7x reduction in pack size AND 1.2x to 3x time speedup due to diff_delta being faster on good deltas. git gc --aggressive can compress it even more, by 10% to 70%, utilizing more cpu time, real time and 3 cpu cores. Tested on 213M and 2.7G fast-import streams, resulting packs are 22M and 113M, import time is 7s and 60s, both streams are produced by svn-fe, sniffed and then used as raw input for fast-import. For git-fast-export produced streams there is no change as it doesn't use cat-blob and doesn't try to reorder blobs in some smart way to make successive deltas small. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Acked-by: David Barr <davidbarr@google.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-22fast-import: count and report # of calls to diff_delta in statsLibravatar Dmitry Ivankov1-4/+6
It's an interesting number, how often do we try to deltify each type of objects and how often do we succeed. So do add it to stats. Success doesn't mean much gain in pack size though. As we allow delta to be as big as (data.len - 20). And delta close to data.len gains nothing compared to no delta at all even after zlib compression (delta is pretty much the same as data, just with few modifications). We should try to make less attempts that result in huge deltas as these consume more cpu than trivial small deltas. Either by choosing a better delta base or reducing delta size upper bound or doing less delta attempts at all. Currently, delta base for blobs is a waste literally. Each blob delta base is chosen as a previously stored blob. Disabling deltas for blobs doesn't increase pack size and reduce import time, or at least doesn't increase time for all fast-import streams I've tried. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Acked-by: David Barr <davidbarr@google.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-14fast-import: prevent producing bad deltaLibravatar Dmitry Ivankov1-5/+30
To produce deltas for tree objects fast-import tracks two versions of tree's entries - base and current one. Base version stands both for a delta base of this tree, and for a entry inside a delta base of a parent tree. So care should be taken to keep it in sync. tree_content_set cuts away a whole subtree and replaces it with a new one (or NULL for lazy load of a tree with known sha1). It keeps a base sha1 for this subtree (needed for parent tree). And here is the problem, 'subtree' tree root doesn't have the implied base version entries. Adjusting the subtree to include them would mean a deep rewrite of subtree. Invalidating the subtree base version would mean recursive invalidation of parents' base versions. So just mark this tree as do-not-delta me. Abuse setuid bit for this purpose. tree_content_replace is the same as tree_content_set except that is is used to replace the root, so just clearing base sha1 here (instead of setting the bit) is fine. [di: log message] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11fast-import: check committer name more strictlyLibravatar Dmitry Ivankov1-12/+17
The documentation declares following identity format: (<name> SP)? LT <email> GT where name is any string without LF and LT characters. But fast-import just accepts any string up to first GT instead of checking the whole format, and moreover just writes it as is to the commit object. git-fsck checks for [^<\n]* <[^<>\n]*> format. Note that the space is mandatory. And the space quirk is already handled via extending the string to the left when needed. Modify fast-import input identity format to a slightly stricter one - deny LF, LT and GT in both <name> and <email>. And check for it. This is stricter then git-fsck as fsck accepts "Name> <email>" currently, but soon fsck check will be adjusted likewise. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>