Configuration
The configuration file defines the cleanup rules that determine which resources to keep, remove, or ignore. It uses JSON or JSONC (JSON with comments) format.
File Structure
The configuration file has a hierarchical structure with path-based overrides:
{
"scope": "https://gitlab.example.com/my-group",
"/": {
// Root configuration (required)
// Applied to all projects by default
"branches": { ... },
"protectedBranches": { ... },
"containers": { ... },
"packages": { ... }
},
"/subgroup": {
// Override for projects in /my-group/subgroup
// Partial definition is ok / expected -- this inherits from parent groups
"branches": { ... }
},
"/subgroup/specific-project": {
// Override for a specific project
// This inherits from both the "/subgroup" and "/" rules
"packages": { ... }
}
}
Scope
The scope field must match the --scope command line argument.
This ensures the configuration is applied to the intended namespace.
It also serves as documentation to make it easy to see what the various paths are relative to.
Scoping prevents the cleanup actions from affecting projects outside the configuration file’s defined area of operations. It is necessary because the input data files often have more project data available than one config file would apply to.
Path-Based Overrides
Configuration is applied hierarchically based on project paths.
The project’s cleanup configuration starts with the root config, which must exist and must be named exactly "/".
This root configuration must be fully specified — all parameters need an explicit value.
Then, each other configuration that matches the project path is applied. These can be partially specified, and they only alter the parameters supplied. Override configurations are applied based on the project’s group structure, not the specification order in the config file.
Object Overrides
When overriding an object in the config file, the resulting object is merged together. Individual keys are overwritten, while unspecified values are left with their current value. If one of the object’s keys is itself another object, it is updated recursively with the same algorithm.
Object Override Example
Given a configuration file that looks like this:
{
"scope": "https://gitlab.example.com/my-group",
"/": {
"branches": {
"merged": "remove",
"squashMerged": "remove",
"closed": "remove",
"noRecentCommits": {
"threshold": "6 months",
"action": "remove"
},
"patterns": []
},
// Other required parameters not shown
},
"/subgroup": {
"branches": {
"closed": "ignore",
"noRecentCommits": {
"threshold": "1 year"
}
}
}
}
The resulting merged branches config for projects inside /subgroup would be:
{
"merged": "remove",
"squashMerged": "remove",
"closed": "ignore",
"noRecentCommits": {
"threshold": "1 year",
"action": "remove"
},
"patterns": []
}
Array Overrides
Arrays are handled specially within overrides. The root configuration defines arrays normally — as JSON arrays containing the appropriate objects. However, overrides are represented as special objects themselves rather than arrays. That override object follows the format:
{
// One or both of these can be specified to extend the base array
"prepend": [ ... ],
"append": [ ... ],
// If 'replace' is used, it must be the only key
"replace": [ ... ]
}
This allows the array to be overridden by replacement, or by adding extra items to the beginning or end of the array. There is no mechanism to insert items into the middle of the parent array, nor is there a mechanism to override some of the fields of a particular item in the parent array.
Array Override Example
Given a configuration file that looks like this:
{
"/": {
"branches": {
// Other required parameters not shown
"patterns": [
{ "regex": "temp-.*", "action": "remove" }
]
}
},
"/special-project": {
"branches": {
"patterns": {
"prepend": [
{ "regex": "keep-.*", "action": "keep" }
]
}
}
}
}
The resulting patterns array for the /special-project would be:
[
{ "regex": "keep-.*", "action": "keep" },
{ "regex": "temp-.*", "action": "remove" }
]
Pattern Matching
Most of the configuration sections have a patterns condition.
This is an array of regular expressions with an associated action.
The condition is considered met if any of the regular expressions match.
The chosen action will be the first entry that matches.
The regular expressions follow JavaScript conventions, with two exceptions:
Whole Word Matches Only
The entire value must match the pattern.
This is implemented by changing the pattern to ^(?:${PATTERN})$.
A non-capturing group is used so that the anchors aren’t part of any alternations of the pattern, and don’t interfere with any backreferences.
Placeholders
Before parsing the pattern as a regular expression, it is first scanned for various placeholder keywords, which are replaced with values relating to the current item being tested. Those placeholder values are matched literally — that is, the values are escaped for regular expression syntax before being substituted. The replacements are surrounded by a non-capturing group, so that they can be treated as a single token and don’t interfere with any backreferences.
The placeholders are:
${PROJECT_SLUG}-
This matches a slugified version of project’s path. Note that this is only the "path" attribute, which does not include the namespace.
${BRANCH_SLUG}-
This matches a slugified version of any branch that exists (and will be kept).
${BRANCH}-
This matches the exact name of any branch that exists (and will be kept).
${TAG_SLUG}-
This matches a slugified version of any tag that exists.
${REF_SLUG}-
This matches any
${BRANCH_SLUG}and${TAG_SLUG}. This is effectively the same as(?:${BRANCH_SLUG}|${TAG_SLUG}).
Slugification Algorithm
This follows GitLab conventions for creating URL safe slug versions of variables. In short, this is:
-
Lowercased
-
Strings of non-alphanumeric characters replaced with a single dash
-
Leading and trailing dashes removed
-
Truncated to 63 characters
How "Any" Reference Works
The reference based placeholders will match on any branch/tag that is still being kept in the current analysis. It does not necessarily need to be the branch being considered, so you can use patterns that apply to branches based on naming similar to other branches. This is implemented by creating an array of regular expressions for each pattern — where each one has a different branch/tag substituted for the placeholder. If any of them match, then the pattern is determined to have matched.
One reference at a time
To avoid polynomial explosion, if multiple reference based placeholders are used in the same pattern, they must all refer to the same reference.
For example, the pattern ${BRANCH}${BRANCH} matches any branch that is a repetition of another branch’s name (like mainmain), but not two different names concatenated (like maindevelop).
This also holds for variations like ${BRANCH}${BRANCH_SLUG}, ${BRANCH}${TAG_SLUG}, or ${TAG_SLUG}${REF_SLUG}.
Mixing Branches and Tags
In the case that branch and tag placeholders are both used, only one of the two placeholders will be able to match (no reference is both a branch and a tag).
To implement this, first all the patterns are checked by substituting the branch while setting the ${TAG_SLUG} to a null character (which cannot match anything).
Then, the reverse — branch placeholders are set to null while substituting all tag references.
Patterns using both like this will need alternation (${BRANCH_SLUG}|${TAG_SLUG}) or optional modifiers (${BRANCH_SLUG}?${TAG_SLUG}?).
Kept Branch Recursion
Only the kept branches can be substituted in for the branch based placeholders. If these are used as part of the patterns that determine which branches are kept, the algorithm initially guesses that all branches will be kept, then checks patterns accordingly. If any were marked as removed, then it updates the kept list and re-runs all patterns. If the keep/remove lists change, it runs again, and so on until there’s no longer any changes from run to run.
There are pathological cases where it would never resolve. This is detected coarsely — if it takes over a hundred rounds, it is determined to be an infinite loop and all processing stops.
Why would anybody use these?
The main reason is when items like containers or packages have branch names embedded, and you’d like to delete such items when the branches are gone.
For instance, you may have a "keep" action associated with containers named ${PROJECT_SLUG}-${BRANCH_SLUG}.
This would likely be followed up by a rule like ${PROJECT_SLUG}-.* set to "remove".
Order of these in the patterns array would matter — the keep rule should come first because the first match determines the action.
Another possibility is sibling branches — Git patterns where branches are created in pairs.
For example, you may have trusted-${BRANCH} set to "keep", followed by trusted-.* set to "remove".
This would automatically deleted trusted branches once their corresponding main branch has been removed.
Conditions and Actions
Each of the configuration file sections refers to a group of items that can be cleaned, such as branches, containers, packages, etc. These items are checked against a number of different conditions, most of which are hard-coded logic but some of which have configurable options. These conditions are things like "merged", "no recent commits", etc. They are defined in depth in their corresponding sections below.
All conditions are configured to have a single action.
The available actions are "remove", "keep", and "ignore".
Putting these concepts together, each item (branch, container, etc.) will match on a number of conditions. Each of those conditions will have one action, so the item will then have a set of actions applied to it. Those actions are combined into a single action (keep or remove), following the logic:
-
Only
"remove"/"keep"actions are considered ("ignore"actions are — wait for it — ignored). -
The item is kept if there is at least one
"keep"condition, or if there are no"remove"conditions. -
The item is removed if there is at least one
"remove"condition and no"keep"conditions.
The "ignore" action is mostly used to disable a particular condition.
Since the configuration file must specify all parameters, any that you don’t want to use should be ignored.
Or, if the condition shouldn’t apply to a specific project, override that condition as ignored.
Remember that the patterns configuration only applies the action of the first matched regular expression.
This can be used to create custom rules based on the item name, forcibly saving special branches / containers / etc.
The "ignore" action can be used in the patterns array to stop processing, preventing later patterns from matching.
|
Branches Section
The branches section is used to delete branches that are no longer needed.
"branches": {
"merged": "remove",
"squashMerged": "remove",
"closed": "remove",
"noRecentCommits": {
"threshold": "6 months",
"action": "remove"
},
"patterns": [
{
"regex": "feature/.*",
"action": "ignore"
}
]
}
Branch Conditions
merged-
Branches that have already been merged.
This condition occurs if GitLab reports the branch as merged (reachable from the default branch), or if a merged MR is found with a matching branch name / SHA.
These branches are typically safe to delete, because they can be recreated easily.
squashMerged-
Branches that were merged using squash merge.
These won’t show up as reachable from the default branch, because a new commit is made during squashing. However, they will still appear in the list of merged MRs as a squash merged MR. Like the
mergedcondition, the branch name must match the MR and the SHA of the branch must match the latest commit on the MR.These branches are fairly safe to delete, because GitLab stores internal references to commits and can reconstitute the branch if needed. However, local pulls will not have these commits once deleted, so recovery requires going through the GitLab server.
closed-
Branches associated with merge requests that were closed without merging.
These branches could be restored using GitLab’s internal references (similar to how
squashMergedbranches would be restored). However, they would not be restorable locally, and the commits were never merged into the default branch. Deleting these is mostly for cleaning up work that was rejected. noRecentCommits-
Branches with no commits within a specified time period.
This condition is configured with a
thresholdparameter, which is used to determine if the branch has had recent commits or not. The threshold can be a duration string (such as"6 months","90 days","1 year", etc.), or it can be an absolute date in ISO format (such as2024-01-01T00:00:00Z).If the threshold is a duration, then the last commit date is compared against the current time whenever the cleanup script runs. If the most recent commit is older than the specified threshold, then the condition matches and the action is applied (in combination with other conditions, of course).
If the threshold is an absolute date, then the last commit date is compared against that time. If the most recent commit is older than that specified date, then the condition matches.
This rule is used to clean up abandoned work. These branches may or may not have corresponding MRs. Note that GitLab will automatically close open MRs that correspond to deleted branches.
patterns-
Branches matching particular regular expressions.
This matches the branch names against the specified regular expressions. See the Pattern Matching section for nuances of how the regexes work, including available placeholder variables.
This is a flexible rule that can be used to forcibly save particular protected / important branches, or to remove certain branches that don’t have other reasons to be kept.
Protected Branches Section
The protectedBranches section controls cleanup of protected branch rules.
| This is removing the protected branch rule, not the branch itself. The branch must be removed by matching conditions in the Branches Section. |
"protectedBranches": {
"missingBranch": "remove",
"wildcardsWithoutMatches": "remove"
}
Protected Branch Conditions
missingBranch-
This removes a simple protection rule that no longer matches any kept branch. Note that the list of kept branches is used, so this could remove the protected branch setting in the same cleanup as the branch itself was deleted.
wildcardsWithoutMatches-
This removes wildcard protection rules that no longer match any branches. These are treated separately, because the rule often applies to current and future branches, so deleting it just because nothing currently matches would be incorrect.
Containers Section
The containers section controls cleanup of container registry images.
"containers": {
"patterns": [
{
"regex": "temp-.*",
"action": "remove"
},
{
"regex": "production",
"action": "keep"
}
]
}
Container Conditions
patterns-
This matches containers based on their image name. See the Pattern Matching section for details on placeholder variables.
| GitLab already has a container cleanup policy to delete tags within an image. This is used to delete full images. |
Packages Section
The packages section controls cleanup of package registry entries.
"packages": {
"recentCreation": {
"threshold": "1 week",
"action": "keep"
},
"recentAccess": {
"threshold": "1 month",
"action": "keep"
},
"latestBuild": "keep",
"patterns": [
{
"regex": {
"name": "my-package",
"version": ".*-SNAPSHOT"
},
"action": "remove"
}
]
}
Package Conditions
recentCreation-
Packages created within the specified threshold, or after the specified date.
recentAccess-
Packages downloaded within the specified threshold, or after the specified date.
latestBuild-
Packages that were built as part of the latest pipeline of a surviving branch.
patterns-
Packages matching particular regular expressions.
Note that each of these patterns has two regular expressions, one for the package name and one for its version. Both must match for the pattern to apply. See the Pattern Matching section for details on placeholder variables.
Thresholds
Similar to noRecentCommits in the Branches Section, the threshold can be a duration string (such as 1 week, 3 days, etc.) or an absolute date in ISO format (such as 2024-01-01T00:00:00Z).
If a duration is used, then recent is defined as everything within that duration from the time of execution of the cleanup script.
If an absolute date is used, then "recent" is defined as everything after that date.
Complete Example
{
"scope": "https://gitlab.example.com/my-org",
"/": {
"branches": {
"merged": "remove",
"squashMerged": "remove",
"closed": "remove",
"noRecentCommits": {
"threshold": "6 months",
"action": "remove"
},
"patterns": [
{ "regex": "release/.*", "action": "keep" },
{ "regex": "hotfix/.*", "action": "keep" }
]
},
"protectedBranches": {
"missingBranch": "remove",
"wildcardsWithoutMatches": "ignore"
},
"containers": {
"patterns": [
{ "regex": "dev-.*", "action": "remove" }
]
},
"packages": {
"recentCreation": {
"threshold": "6 months",
"action": "keep"
},
"recentAccess": {
"threshold": "3 months",
"action": "keep"
},
"latestBuild": "keep",
"patterns": [
{
"regex": {
"name": ".*",
"version": ".*-SNAPSHOT"
},
"action": "remove"
}
]
}
},
"/legacy": {
"branches": {
"noRecentCommits": {
"threshold": "1 year"
}
}
}
}