summaryrefslogtreecommitdiff
path: root/Documentation/git-fast-import.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/git-fast-import.txt')
-rw-r--r--Documentation/git-fast-import.txt246
1 files changed, 188 insertions, 58 deletions
diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 73f980638e..7d9aad2a7e 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -9,7 +9,7 @@ git-fast-import - Backend for fast Git data importers
SYNOPSIS
--------
[verse]
-frontend | 'git fast-import' [options]
+frontend | 'git fast-import' [<options>]
DESCRIPTION
-----------
@@ -40,21 +40,37 @@ OPTIONS
not contain the old commit).
--quiet::
- Disable all non-fatal output, making fast-import silent when it
- is successful. This option disables the output shown by
- \--stats.
+ Disable the output shown by --stats, making fast-import usually
+ be silent when it is successful. However, if the import stream
+ has directives intended to show user output (e.g. `progress`
+ directives), the corresponding messages will still be shown.
--stats::
Display some basic statistics about the objects fast-import has
created, the packfiles they were stored into, and the
memory used by fast-import during this run. Showing this output
- is currently the default, but can be disabled with \--quiet.
+ is currently the default, but can be disabled with --quiet.
+
+--allow-unsafe-features::
+ Many command-line options can be provided as part of the
+ fast-import stream itself by using the `feature` or `option`
+ commands. However, some of these options are unsafe (e.g.,
+ allowing fast-import to access the filesystem outside of the
+ repository). These options are disabled by default, but can be
+ allowed by providing this option on the command line. This
+ currently impacts only the `export-marks`, `import-marks`, and
+ `import-marks-if-exists` feature commands.
++
+ Only enable this option if you trust the program generating the
+ fast-import stream! This option is enabled automatically for
+ remote-helpers that use the `import` capability, as they are
+ already trusted to run their own code.
Options for Frontends
~~~~~~~~~~~~~~~~~~~~~
--cat-blob-fd=<fd>::
- Write responses to `cat-blob` and `ls` queries to the
+ Write responses to `get-mark`, `cat-blob`, and `ls` queries to the
file descriptor <fd> instead of `stdout`. Allows `progress`
output intended for the end-user to be separated from other
output.
@@ -81,12 +97,12 @@ Locations of Marks Files
have been completed, or to save the marks table across
incremental runs. As <file> is only opened and truncated
at checkpoint (or completion) the same path can also be
- safely given to \--import-marks.
+ safely given to --import-marks.
--import-marks=<file>::
Before processing any input, load the marks specified in
<file>. The input file must exist, must be readable, and
- must use the same format as produced by \--export-marks.
+ must use the same format as produced by --export-marks.
Multiple options may be supplied to import more than one
set of marks. If a mark is defined to different values,
the last file wins.
@@ -106,6 +122,26 @@ Locations of Marks Files
Relative and non-relative marks may be combined by interweaving
--(no-)-relative-marks with the --(import|export)-marks= options.
+Submodule Rewriting
+~~~~~~~~~~~~~~~~~~~
+
+--rewrite-submodules-from=<name>:<file>::
+--rewrite-submodules-to=<name>:<file>::
+ Rewrite the object IDs for the submodule specified by <name> from the values
+ used in the from <file> to those used in the to <file>. The from marks should
+ have been created by `git fast-export`, and the to marks should have been
+ created by `git fast-import` when importing that same submodule.
++
+<name> may be any arbitrary string not containing a colon character, but the
+same value must be used with both options when specifying corresponding marks.
+Multiple submodules may be specified with different values for <name>. It is an
+error not to use these options in corresponding pairs.
++
+These options are primarily useful when converting a repository from one hash
+algorithm to another; without them, fast-import will fail if it encounters a
+submodule because it has no way of writing the object ID into the new hash
+algorithm.
+
Performance and Compression Tuning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -121,7 +157,7 @@ Performance and Compression Tuning
--depth=<n>::
Maximum delta depth, for blob and tree deltification.
- Default is 10.
+ Default is 50.
--export-pack-edges=<file>::
After creating a packfile, print a line of data to
@@ -136,8 +172,10 @@ Performance and Compression Tuning
Maximum size of each output packfile.
The default is unlimited.
+fastimport.unpackLimit::
+ See linkgit:git-config[1]
-Performance
+PERFORMANCE
-----------
The design of fast-import allows it to import large projects in a minimum
amount of memory usage and processing time. Assuming the frontend
@@ -153,7 +191,7 @@ faster if the source data is stored on a different drive than the
destination Git repository (due to less IO contention).
-Development Cost
+DEVELOPMENT COST
----------------
A typical frontend for fast-import tends to weigh in at approximately 200
lines of Perl/Python/Ruby code. Most developers have been able to
@@ -163,7 +201,7 @@ an ideal situation, given that most conversion tools are throw-away
(use once, and never look back).
-Parallel Operation
+PARALLEL OPERATION
------------------
Like 'git push' or 'git fetch', imports handled by fast-import are safe to
run alongside parallel `git repack -a -d` or `git gc` invocations,
@@ -179,12 +217,12 @@ fast-forward update, fast-import will skip updating that ref and instead
prints a warning message. fast-import will always attempt to update all
branch refs, and does not stop on the first failure.
-Branch updates can be forced with \--force, but it's recommended that
-this only be used on an otherwise quiet repository. Using \--force
+Branch updates can be forced with --force, but it's recommended that
+this only be used on an otherwise quiet repository. Using --force
is not necessary for an initial import into an empty repository.
-Technical Discussion
+TECHNICAL DISCUSSION
--------------------
fast-import tracks a set of branches in memory. Any branch can be created
or modified at any point during the import process by sending a
@@ -202,7 +240,7 @@ directory also allows fast-import to run very quickly, as it does not
need to perform any costly file update operations when switching
between branches.
-Input Format
+INPUT FORMAT
------------
With the exception of raw file data (which Git does not interpret)
the fast-import input format is text (ASCII) based. This text based
@@ -231,11 +269,11 @@ Date Formats
~~~~~~~~~~~~
The following date formats are supported. A frontend should select
the format it will use for this import by passing the format name
-in the \--date-format=<fmt> command line option.
+in the --date-format=<fmt> command-line option.
`raw`::
This is the Git native format and is `<time> SP <offutc>`.
- It is also fast-import's default format, if \--date-format was
+ It is also fast-import's default format, if --date-format was
not specified.
+
The time of the event is specified by `<time>` as the number of
@@ -251,11 +289,18 @@ advisement to help formatting routines display the timestamp.
If the local offset is not available in the source material, use
``+0000'', or the most common local offset. For example many
organizations have a CVS repository which has only ever been accessed
-by users who are located in the same location and timezone. In this
+by users who are located in the same location and time zone. In this
case a reasonable offset from UTC could be assumed.
+
Unlike the `rfc2822` format, this format is very strict. Any
-variation in formatting will cause fast-import to reject the value.
+variation in formatting will cause fast-import to reject the value,
+and some sanity checks on the numeric values may also be performed.
+
+`raw-permissive`::
+ This is the same as `raw` except that no sanity checks on
+ the numeric epoch and local offset are performed. This can
+ be useful when trying to filter or import an existing history
+ with e.g. bogus timezone values.
`rfc2822`::
This is the standard email format as described by RFC 2822.
@@ -271,7 +316,7 @@ the malformed string. There are also some types of malformed
strings which Git will parse wrong, and yet consider valid.
Seriously malformed strings will be rejected.
+
-Unlike the `raw` format above, the timezone/UTC offset information
+Unlike the `raw` format above, the time zone/UTC offset information
contained in an RFC 2822 date string is used to adjust the date
value to UTC prior to storage. Therefore it is important that
this information be as accurate as possible.
@@ -287,13 +332,13 @@ format, or its format is easily convertible to it, as there is no
ambiguity in parsing.
`now`::
- Always use the current time and timezone. The literal
+ Always use the current time and time zone. The literal
`now` must always be supplied for `<when>`.
+
-This is a toy format. The current time and timezone of this system
+This is a toy format. The current time and time zone of this system
is always copied into the identity string at the time it is being
created by fast-import. There is no way to specify a different time or
-timezone.
+time zone.
+
This particular format is supplied as it's short to implement and
may be useful to a process that wants to create a new commit
@@ -334,6 +379,13 @@ and control the current import process. More detailed discussion
`commit` command. This command is optional and is not
needed to perform an import.
+`alias`::
+ Record that a mark refers to a given object without first
+ creating any new object. Using --import-marks and referring
+ to missing marks will cause fast-import to fail, so aliases
+ can provide a way to set otherwise pruned commits to a valid
+ value (e.g. the nearest non-pruned ancestor).
+
`checkpoint`::
Forces fast-import to close the current packfile, generate its
unique SHA-1 checksum and index, and start a new packfile.
@@ -348,7 +400,12 @@ and control the current import process. More detailed discussion
`done`::
Marks the end of the stream. This command is optional
unless the `done` feature was requested using the
- `--done` command line option or `feature done` command.
+ `--done` command-line option or `feature done` command.
+
+`get-mark`::
+ Causes fast-import to print the SHA-1 corresponding to a mark
+ to the file descriptor set with `--cat-blob-fd`, or `stdout` if
+ unspecified.
`cat-blob`::
Causes fast-import to print a blob in 'cat-file --batch'
@@ -377,11 +434,13 @@ change to the project.
....
'commit' SP <ref> LF
mark?
+ original-oid?
('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
'committer' (SP <name>)? SP LT <email> GT SP <when> LF
+ ('encoding' SP <encoding>)?
data
('from' SP <commit-ish> LF)?
- ('merge' SP <commit-ish> LF)?
+ ('merge' SP <commit-ish> LF)*
(filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
LF?
....
@@ -413,7 +472,12 @@ However it is recommended that a `filedeleteall` command precede
all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
the same commit, as `filedeleteall` wipes the branch clean (see below).
-The `LF` after the command is optional (it used to be required).
+The `LF` after the command is optional (it used to be required). Note
+that for reasons of backward compatibility, if the commit ends with a
+`data` command (i.e. it has no `from`, `merge`, `filemodify`,
+`filedelete`, `filecopy`, `filerename`, `filedeleteall` or
+`notemodify` commands) then two `LF` commands may appear at the end of
+the command instead of just one.
`author`
^^^^^^^^
@@ -437,10 +501,16 @@ the email address from the other fields in the line. Note that
of bytes, except `LT`, `GT` and `LF`. `<name>` is typically UTF-8 encoded.
The time of the change is specified by `<when>` using the date format
-that was selected by the \--date-format=<fmt> command line option.
+that was selected by the --date-format=<fmt> command-line option.
See ``Date Formats'' above for the set of supported formats, and
their syntax.
+`encoding`
+^^^^^^^^^^
+The optional `encoding` command indicates the encoding of the commit
+message. Most commits are UTF-8 and the encoding is omitted, but this
+allows importing commit messages into git without first reencoding them.
+
`from`
^^^^^^
The `from` command is used to specify the commit to initialize
@@ -483,6 +553,9 @@ Marks must be declared (via `mark`) before they can be used.
* Any valid Git SHA-1 expression that resolves to a commit. See
``SPECIFYING REVISIONS'' in linkgit:gitrevisions[7] for details.
+* The special null SHA-1 (40 zeros) specifies that the branch is to be
+ removed.
+
The special case of restarting an incremental import from the
current branch value should be written as:
----
@@ -504,10 +577,6 @@ omitted when creating a new branch, the first `merge` commit will be
the first ancestor of the current commit, and the branch will start
out with no files. An unlimited number of `merge` commands per
commit are permitted by fast-import, thereby establishing an n-way merge.
-However Git's other tools never create commits with more than 15
-additional ancestors (forming a 16-way merge). For this reason
-it is suggested that frontends do not use more than 15 `merge`
-commands per commit; 16, if starting a new, empty branch.
Here `<commit-ish>` is any of the commit specification expressions
also accepted by `from` (see above).
@@ -601,7 +670,7 @@ be removed from the branch.
See `filemodify` above for a detailed description of `<path>`.
`filecopy`
-^^^^^^^^^^^^
+^^^^^^^^^^
Recursively copies an existing file or subdirectory to a different
location within the branch. The existing file or directory must
exist. If the destination exists it will be completely replaced
@@ -734,6 +803,19 @@ New marks are created automatically. Existing marks can be moved
to another object simply by reusing the same `<idnum>` in another
`mark` command.
+`original-oid`
+~~~~~~~~~~~~~~
+Provides the name of the object in the original source control system.
+fast-import will simply ignore this directive, but filter processes
+which operate on and modify the stream before feeding to fast-import
+may have uses for this information
+
+....
+ 'original-oid' SP <object-identifier> LF
+....
+
+where `<object-identifer>` is any string not containing LF.
+
`tag`
~~~~~
Creates an annotated tag referring to a specific commit. To create
@@ -741,7 +823,9 @@ lightweight (non-annotated) tags see the `reset` command below.
....
'tag' SP <name> LF
+ mark?
'from' SP <commit-ish> LF
+ original-oid?
'tagger' (SP <name>)? SP LT <email> GT SP <when> LF
data
....
@@ -816,6 +900,7 @@ assigned mark.
....
'blob' LF
mark?
+ original-oid?
data
....
@@ -878,6 +963,21 @@ a data chunk which does not have an LF as its last byte.
+
The `LF` after `<delim> LF` is optional (it used to be required).
+`alias`
+~~~~~~~
+Record that a mark refers to a given object without first creating any
+new object.
+
+....
+ 'alias' LF
+ mark
+ 'to' SP <commit-ish> LF
+ LF?
+....
+
+For a detailed description of `<commit-ish>` see above under `from`.
+
+
`checkpoint`
~~~~~~~~~~~~
Forces fast-import to close the current packfile, start a new one, and to
@@ -889,7 +989,7 @@ save out all current branch refs, tags and marks.
....
Note that fast-import automatically switches packfiles when the current
-packfile reaches \--max-pack-size, or 4 GiB, whichever limit is
+packfile reaches --max-pack-size, or 4 GiB, whichever limit is
smaller. During an automatic packfile switch fast-import does not update
the branch refs, tags or marks.
@@ -931,6 +1031,21 @@ Placing a `progress` command immediately after a `checkpoint` will
inform the reader when the `checkpoint` has been completed and it
can safely access the refs that fast-import updated.
+`get-mark`
+~~~~~~~~~~
+Causes fast-import to print the SHA-1 corresponding to a mark to
+stdout or to the file descriptor previously arranged with the
+`--cat-blob-fd` argument. The command otherwise has no impact on the
+current import; its purpose is to retrieve SHA-1s that later commits
+might want to refer to in their commit messages.
+
+....
+ 'get-mark' SP ':' <idnum> LF
+....
+
+See ``Responses To Commands'' below for details about how to read
+this output safely.
+
`cat-blob`
~~~~~~~~~~
Causes fast-import to print a blob to a file descriptor previously
@@ -954,9 +1069,10 @@ Output uses the same format as `git cat-file --batch`:
<contents> LF
====
-This command can be used anywhere in the stream that comments are
-accepted. In particular, the `cat-blob` command can be used in the
-middle of a commit but not in the middle of a `data` command.
+This command can be used where a `filemodify` directive can appear,
+allowing it to be used in the middle of a commit. For a `filemodify`
+using an inline directive, it can also appear right before the `data`
+directive.
See ``Responses To Commands'' below for details about how to read
this output safely.
@@ -969,8 +1085,8 @@ printing a blob from the active commit (with `cat-blob`) or copying a
blob or tree from a previous commit for use in the current one (with
`filemodify`).
-The `ls` command can be used anywhere in the stream that comments are
-accepted, including the middle of a commit.
+The `ls` command can also be used where a `filemodify` directive can
+appear, allowing it to be used in the middle of a commit.
Reading from the active commit::
This form can only be used in the middle of a `commit`.
@@ -1001,7 +1117,8 @@ Output uses the same format as `git ls-tree <tree> -- <path>`:
====
The <dataref> represents the blob, tree, or commit object at <path>
-and can be used in later 'cat-blob', 'filemodify', or 'ls' commands.
+and can be used in later 'get-mark', 'cat-blob', 'filemodify', or
+'ls' commands.
If there is no file or subtree at that path, 'git fast-import' will
instead report
@@ -1030,7 +1147,7 @@ relative-marks::
no-relative-marks::
force::
Act as though the corresponding command-line option with
- a leading '--' was passed on the command line
+ a leading `--` was passed on the command line
(see OPTIONS, above).
import-marks::
@@ -1043,9 +1160,11 @@ import-marks-if-exists::
"feature import-marks-if-exists" like a corresponding
command-line option silently skips a nonexistent file.
+get-mark::
cat-blob::
ls::
- Require that the backend support the 'cat-blob' or 'ls' command.
+ Require that the backend support the 'get-mark', 'cat-blob',
+ or 'ls' command respectively.
Versions of fast-import not supporting the specified command
will exit with a message indicating so.
This lets the import error out early with a clear message,
@@ -1079,13 +1198,13 @@ options the user may specify to git fast-import itself.
The `<option>` part of the command may contain any of the options
listed in the OPTIONS section that do not change import semantics,
-without the leading '--' and is treated in the same way.
+without the leading `--` and is treated in the same way.
Option commands must be the first commands on the input (not counting
feature commands), to give an option command after any non-option
command is an error.
-The following commandline options change import semantics and may therefore
+The following command-line options change import semantics and may therefore
not be passed as option:
* date-format
@@ -1099,11 +1218,11 @@ not be passed as option:
If the `done` feature is not in use, treated as if EOF was read.
This can be used to tell fast-import to finish early.
-If the `--done` command line option or `feature done` command is
+If the `--done` command-line option or `feature done` command is
in use, the `done` command is mandatory and marks the end of the
stream.
-Responses To Commands
+RESPONSES TO COMMANDS
---------------------
New objects written by fast-import are not available immediately.
Most fast-import commands have no visible effect until the next
@@ -1125,14 +1244,14 @@ bidirectional pipes:
git fast-import >fast-import-output
====
-A frontend set up this way can use `progress`, `ls`, and `cat-blob`
-commands to read information from the import in progress.
+A frontend set up this way can use `progress`, `get-mark`, `ls`, and
+`cat-blob` commands to read information from the import in progress.
To avoid deadlock, such frontends must completely consume any
-pending output from `progress`, `ls`, and `cat-blob` before
+pending output from `progress`, `ls`, `get-mark`, and `cat-blob` before
performing writes to fast-import that might block.
-Crash Reports
+CRASH REPORTS
-------------
If fast-import is supplied invalid input it will terminate with a
non-zero exit status and create a crash report in the top level of
@@ -1219,7 +1338,7 @@ An example crash:
END OF CRASH REPORT
====
-Tips and Tricks
+TIPS AND TRICKS
---------------
The following tips and tricks have been collected from various
users of fast-import, and are offered here as suggestions.
@@ -1227,7 +1346,7 @@ users of fast-import, and are offered here as suggestions.
Use One Mark Per Commit
~~~~~~~~~~~~~~~~~~~~~~~
When doing a repository conversion, use a unique mark per commit
-(`mark :<n>`) and supply the \--export-marks option on the command
+(`mark :<n>`) and supply the --export-marks option on the command
line. fast-import will dump a file which lists every mark and the Git
object SHA-1 that corresponds to it. If the frontend can tie
the marks back to the source repository, it is easy to verify the
@@ -1292,7 +1411,7 @@ even for considerably large projects (100,000+ commits).
However repacking the repository is necessary to improve data
locality and access performance. It can also take hours on extremely
-large projects (especially if -f and a large \--window parameter is
+large projects (especially if -f and a large --window parameter is
used). Since repacking is safe to run alongside readers and writers,
run the repack in the background and let it finish when it finishes.
There is no reason to wait to explore your new Git project!
@@ -1306,7 +1425,7 @@ Repacking Historical Data
~~~~~~~~~~~~~~~~~~~~~~~~~
If you are repacking very old imported data (e.g. older than the
last year), consider expending some extra CPU time and supplying
-\--window=50 (or higher) when you run 'git repack'.
+--window=50 (or higher) when you run 'git repack'.
This will take longer, but will also produce a smaller packfile.
You only need to expend the effort once, and everyone using your
project will benefit from the smaller repository.
@@ -1321,7 +1440,7 @@ Your users will feel better knowing how much of the data stream
has been processed.
-Packfile Optimization
+PACKFILE OPTIMIZATION
---------------------
When packing a blob fast-import always attempts to deltify against the last
blob written. Unless specifically arranged for by the frontend,
@@ -1351,8 +1470,15 @@ deltas are suboptimal (see above) then also adding the `-f` option
to force recomputation of all deltas can significantly reduce the
final packfile size (30-50% smaller can be quite typical).
+Instead of running `git repack` you can also run `git gc
+--aggressive`, which will also optimize other things after an import
+(e.g. pack loose refs). As noted in the "AGGRESSIVE" section in
+linkgit:git-gc[1] the `--aggressive` option will find new deltas with
+the `-f` option to linkgit:git-repack[1]. For the reasons elaborated
+on above using `--aggressive` after a fast-import is one of the few
+cases where it's known to be worthwhile.
-Memory Utilization
+MEMORY UTILIZATION
------------------
There are a number of factors which affect how much memory fast-import
requires to perform an import. Like critical sections of core
@@ -1408,7 +1534,7 @@ branch, their in-memory storage size can grow to a considerable size
fast-import automatically moves active branches to inactive status based on
a simple least-recently-used algorithm. The LRU chain is updated on
each `commit` command. The maximum number of active branches can be
-increased or decreased on the command line with \--active-branches=.
+increased or decreased on the command line with --active-branches=.
per active tree
~~~~~~~~~~~~~~~
@@ -1430,7 +1556,7 @@ and lazy loading of subtrees, allows fast-import to efficiently import
projects with 2,000+ branches and 45,114+ files in a very limited
memory footprint (less than 2.7 MiB per active branch).
-Signals
+SIGNALS
-------
Sending *SIGUSR1* to the 'git fast-import' process ends the current
packfile early, simulating a `checkpoint` command. The impatient
@@ -1438,6 +1564,10 @@ operator can use this facility to peek at the objects and refs from an
import in progress, at the cost of some added running time and worse
compression.
+SEE ALSO
+--------
+linkgit:git-fast-export[1]
+
GIT
---
Part of the linkgit:git[1] suite