diff options
Diffstat (limited to 'Documentation/git-maintenance.txt')
-rw-r--r-- | Documentation/git-maintenance.txt | 344 |
1 files changed, 344 insertions, 0 deletions
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt new file mode 100644 index 0000000000..1e738ad398 --- /dev/null +++ b/Documentation/git-maintenance.txt @@ -0,0 +1,344 @@ +git-maintenance(1) +================== + +NAME +---- +git-maintenance - Run tasks to optimize Git repository data + + +SYNOPSIS +-------- +[verse] +'git maintenance' run [<options>] + + +DESCRIPTION +----------- +Run tasks to optimize Git repository data, speeding up other Git commands +and reducing storage requirements for the repository. + +Git commands that add repository data, such as `git add` or `git fetch`, +are optimized for a responsive user experience. These commands do not take +time to optimize the Git data, since such optimizations scale with the full +size of the repository while these user commands each perform a relatively +small action. + +The `git maintenance` command provides flexibility for how to optimize the +Git repository. + +SUBCOMMANDS +----------- + +register:: + Initialize Git config values so any scheduled maintenance will + start running on this repository. This adds the repository to the + `maintenance.repo` config variable in the current user's global + config and enables some recommended configuration values for + `maintenance.<task>.schedule`. The tasks that are enabled are safe + for running in the background without disrupting foreground + processes. ++ +The `register` subcommand will also set the `maintenance.strategy` config +value to `incremental`, if this value is not previously set. The +`incremental` strategy uses the following schedule for each maintenance +task: ++ +-- +* `gc`: disabled. +* `commit-graph`: hourly. +* `prefetch`: hourly. +* `loose-objects`: daily. +* `incremental-repack`: daily. +-- ++ +`git maintenance register` will also disable foreground maintenance by +setting `maintenance.auto = false` in the current repository. This config +setting will remain after a `git maintenance unregister` command. + +run:: + Run one or more maintenance tasks. If one or more `--task` options + are specified, then those tasks are run in that order. Otherwise, + the tasks are determined by which `maintenance.<task>.enabled` + config options are true. By default, only `maintenance.gc.enabled` + is true. + +start:: + Start running maintenance on the current repository. This performs + the same config updates as the `register` subcommand, then updates + the background scheduler to run `git maintenance run --scheduled` + on an hourly basis. + +stop:: + Halt the background maintenance schedule. The current repository + is not removed from the list of maintained repositories, in case + the background maintenance is restarted later. + +unregister:: + Remove the current repository from background maintenance. This + only removes the repository from the configured list. It does not + stop the background maintenance processes from running. + +TASKS +----- + +commit-graph:: + The `commit-graph` job updates the `commit-graph` files incrementally, + then verifies that the written data is correct. The incremental + write is safe to run alongside concurrent Git processes since it + will not expire `.graph` files that were in the previous + `commit-graph-chain` file. They will be deleted by a later run based + on the expiration delay. + +prefetch:: + The `prefetch` task updates the object directory with the latest + objects from all registered remotes. For each remote, a `git fetch` + command is run. The configured refspec is modified to place all + requested refs within `refs/prefetch/`. Also, tags are not updated. ++ +This is done to avoid disrupting the remote-tracking branches. The end users +expect these refs to stay unmoved unless they initiate a fetch. With prefetch +task, however, the objects necessary to complete a later real fetch would +already be obtained, so the real fetch would go faster. In the ideal case, +it will just become an update to a bunch of remote-tracking branches without +any object transfer. + +gc:: + Clean up unnecessary files and optimize the local repository. "GC" + stands for "garbage collection," but this task performs many + smaller tasks. This task can be expensive for large repositories, + as it repacks all Git objects into a single pack-file. It can also + be disruptive in some situations, as it deletes stale data. See + linkgit:git-gc[1] for more details on garbage collection in Git. + +loose-objects:: + The `loose-objects` job cleans up loose objects and places them into + pack-files. In order to prevent race conditions with concurrent Git + commands, it follows a two-step process. First, it deletes any loose + objects that already exist in a pack-file; concurrent Git processes + will examine the pack-file for the object data instead of the loose + object. Second, it creates a new pack-file (starting with "loose-") + containing a batch of loose objects. The batch size is limited to 50 + thousand objects to prevent the job from taking too long on a + repository with many loose objects. The `gc` task writes unreachable + objects as loose objects to be cleaned up by a later step only if + they are not re-added to a pack-file; for this reason it is not + advisable to enable both the `loose-objects` and `gc` tasks at the + same time. + +incremental-repack:: + The `incremental-repack` job repacks the object directory + using the `multi-pack-index` feature. In order to prevent race + conditions with concurrent Git commands, it follows a two-step + process. First, it calls `git multi-pack-index expire` to delete + pack-files unreferenced by the `multi-pack-index` file. Second, it + calls `git multi-pack-index repack` to select several small + pack-files and repack them into a bigger one, and then update the + `multi-pack-index` entries that refer to the small pack-files to + refer to the new pack-file. This prepares those small pack-files + for deletion upon the next run of `git multi-pack-index expire`. + The selection of the small pack-files is such that the expected + size of the big pack-file is at least the batch size; see the + `--batch-size` option for the `repack` subcommand in + linkgit:git-multi-pack-index[1]. The default batch-size is zero, + which is a special case that attempts to repack all pack-files + into a single pack-file. + +pack-refs:: + The `pack-refs` task collects the loose reference files and + collects them into a single file. This speeds up operations that + need to iterate across many references. See linkgit:git-pack-refs[1] + for more information. + +OPTIONS +------- +--auto:: + When combined with the `run` subcommand, run maintenance tasks + only if certain thresholds are met. For example, the `gc` task + runs when the number of loose objects exceeds the number stored + in the `gc.auto` config setting, or when the number of pack-files + exceeds the `gc.autoPackLimit` config setting. Not compatible with + the `--schedule` option. + +--schedule:: + When combined with the `run` subcommand, run maintenance tasks + only if certain time conditions are met, as specified by the + `maintenance.<task>.schedule` config value for each `<task>`. + This config value specifies a number of seconds since the last + time that task ran, according to the `maintenance.<task>.lastRun` + config value. The tasks that are tested are those provided by + the `--task=<task>` option(s) or those with + `maintenance.<task>.enabled` set to true. + +--quiet:: + Do not report progress or other information over `stderr`. + +--task=<task>:: + If this option is specified one or more times, then only run the + specified tasks in the specified order. If no `--task=<task>` + arguments are specified, then only the tasks with + `maintenance.<task>.enabled` configured as `true` are considered. + See the 'TASKS' section for the list of accepted `<task>` values. + + +TROUBLESHOOTING +--------------- +The `git maintenance` command is designed to simplify the repository +maintenance patterns while minimizing user wait time during Git commands. +A variety of configuration options are available to allow customizing this +process. The default maintenance options focus on operations that complete +quickly, even on large repositories. + +Users may find some cases where scheduled maintenance tasks do not run as +frequently as intended. Each `git maintenance run` command takes a lock on +the repository's object database, and this prevents other concurrent +`git maintenance run` commands from running on the same repository. Without +this safeguard, competing processes could leave the repository in an +unpredictable state. + +The background maintenance schedule runs `git maintenance run` processes +on an hourly basis. Each run executes the "hourly" tasks. At midnight, +that process also executes the "daily" tasks. At midnight on the first day +of the week, that process also executes the "weekly" tasks. A single +process iterates over each registered repository, performing the scheduled +tasks for that frequency. Depending on the number of registered +repositories and their sizes, this process may take longer than an hour. +In this case, multiple `git maintenance run` commands may run on the same +repository at the same time, colliding on the object database lock. This +results in one of the two tasks not running. + +If you find that some maintenance windows are taking longer than one hour +to complete, then consider reducing the complexity of your maintenance +tasks. For example, the `gc` task is much slower than the +`incremental-repack` task. However, this comes at a cost of a slightly +larger object database. Consider moving more expensive tasks to be run +less frequently. + +Expert users may consider scheduling their own maintenance tasks using a +different schedule than is available through `git maintenance start` and +Git configuration options. These users should be aware of the object +database lock and how concurrent `git maintenance run` commands behave. +Further, the `git gc` command should not be combined with +`git maintenance run` commands. `git gc` modifies the object database +but does not take the lock in the same way as `git maintenance run`. If +possible, use `git maintenance run --task=gc` instead of `git gc`. + +The following sections describe the mechanisms put in place to run +background maintenance by `git maintenance start` and how to customize +them. + +BACKGROUND MAINTENANCE ON POSIX SYSTEMS +--------------------------------------- + +The standard mechanism for scheduling background tasks on POSIX systems +is cron(8). This tool executes commands based on a given schedule. The +current list of user-scheduled tasks can be found by running `crontab -l`. +The schedule written by `git maintenance start` is similar to this: + +----------------------------------------------------------------------- +# BEGIN GIT MAINTENANCE SCHEDULE +# The following schedule was created by Git +# Any edits made in this region might be +# replaced in the future by a Git command. + +0 1-23 * * * "/<path>/git" --exec-path="/<path>" for-each-repo --config=maintenance.repo maintenance run --schedule=hourly +0 0 * * 1-6 "/<path>/git" --exec-path="/<path>" for-each-repo --config=maintenance.repo maintenance run --schedule=daily +0 0 * * 0 "/<path>/git" --exec-path="/<path>" for-each-repo --config=maintenance.repo maintenance run --schedule=weekly + +# END GIT MAINTENANCE SCHEDULE +----------------------------------------------------------------------- + +The comments are used as a region to mark the schedule as written by Git. +Any modifications within this region will be completely deleted by +`git maintenance stop` or overwritten by `git maintenance start`. + +The `crontab` entry specifies the full path of the `git` executable to +ensure that the executed `git` command is the same one with which +`git maintenance start` was issued independent of `PATH`. If the same user +runs `git maintenance start` with multiple Git executables, then only the +latest executable is used. + +These commands use `git for-each-repo --config=maintenance.repo` to run +`git maintenance run --schedule=<frequency>` on each repository listed in +the multi-valued `maintenance.repo` config option. These are typically +loaded from the user-specific global config. The `git maintenance` process +then determines which maintenance tasks are configured to run on each +repository with each `<frequency>` using the `maintenance.<task>.schedule` +config options. These values are loaded from the global or repository +config values. + +If the config values are insufficient to achieve your desired background +maintenance schedule, then you can create your own schedule. If you run +`crontab -e`, then an editor will load with your user-specific `cron` +schedule. In that editor, you can add your own schedule lines. You could +start by adapting the default schedule listed earlier, or you could read +the crontab(5) documentation for advanced scheduling techniques. Please +do use the full path and `--exec-path` techniques from the default +schedule to ensure you are executing the correct binaries in your +schedule. + + +BACKGROUND MAINTENANCE ON MACOS SYSTEMS +--------------------------------------- + +While macOS technically supports `cron`, using `crontab -e` requires +elevated privileges and the executed process does not have a full user +context. Without a full user context, Git and its credential helpers +cannot access stored credentials, so some maintenance tasks are not +functional. + +Instead, `git maintenance start` interacts with the `launchctl` tool, +which is the recommended way to schedule timed jobs in macOS. Scheduling +maintenance through `git maintenance (start|stop)` requires some +`launchctl` features available only in macOS 10.11 or later. + +Your user-specific scheduled tasks are stored as XML-formatted `.plist` +files in `~/Library/LaunchAgents/`. You can see the currently-registered +tasks using the following command: + +----------------------------------------------------------------------- +$ ls ~/Library/LaunchAgents/org.git-scm.git* +org.git-scm.git.daily.plist +org.git-scm.git.hourly.plist +org.git-scm.git.weekly.plist +----------------------------------------------------------------------- + +One task is registered for each `--schedule=<frequency>` option. To +inspect how the XML format describes each schedule, open one of these +`.plist` files in an editor and inspect the `<array>` element following +the `<key>StartCalendarInterval</key>` element. + +`git maintenance start` will overwrite these files and register the +tasks again with `launchctl`, so any customizations should be done by +creating your own `.plist` files with distinct names. Similarly, the +`git maintenance stop` command will unregister the tasks with `launchctl` +and delete the `.plist` files. + +To create more advanced customizations to your background tasks, see +launchctl.plist(5) for more information. + + +BACKGROUND MAINTENANCE ON WINDOWS SYSTEMS +----------------------------------------- + +Windows does not support `cron` and instead has its own system for +scheduling background tasks. The `git maintenance start` command uses +the `schtasks` command to submit tasks to this system. You can inspect +all background tasks using the Task Scheduler application. The tasks +added by Git have names of the form `Git Maintenance (<frequency>)`. +The Task Scheduler GUI has ways to inspect these tasks, but you can also +export the tasks to XML files and view the details there. + +Note that since Git is a console application, these background tasks +create a console window visible to the current user. This can be changed +manually by selecting the "Run whether user is logged in or not" option +in Task Scheduler. This change requires a password input, which is why +`git maintenance start` does not select it by default. + +If you want to customize the background tasks, please rename the tasks +so future calls to `git maintenance (start|stop)` do not overwrite your +custom tasks. + + +GIT +--- +Part of the linkgit:git[1] suite |