Script Usage

The project-cleanup.js script is designed to be used within a GitLab CI/CD Pipeline, utilizing the published Docker container.

It can be used for a single GitLab namespace (aka "Scope"). However, it is expected that a data collection script runs first, then multiple project-cleanup.js scripts run. Each of the cleanup scripts will be given the same data files, but use their own separate repositories for the config file and log outputs. An example of what this might look like is:

stages:
  - clean

variables:
  CLEANUP_CONTAINER: "community.opengroup.org:5555/divido/project-cleanup/project-cleanup"

.cleanup:
  stage: clean
  variables:
    GIT_SUBMODULE_STRATEGY: recursive
    GIT_DEPTH: 0

  before_script:
    # This config file is one way to supply tokens for GitLab servers, and it is used by
    # community.opengroup.org/divido/gitlab-scripts/core
    # Alternatively, the project-cleanup.js script can have a "-t" option to provide the token directly
    - cp $GITLAB_SCRIPTS_CONFIG ~/.gitlab-scripts.config

  script:
    - git clone $LOGS_REPO logs/
    - git clone $CONFIG_REPO config/

    # This computes the output file for the removed entries
    # Remember that project-cleanup.js will automatically create parent directories as needed
    - REMOVED_OUTPUT=$(date --utc '+removed/%Y/%m-%b/%Y-%m-%d_%H%M.md'); echo $REMOVED_OUTPUT

    # The kept output overwrites previous. Make sure the directory is cleaned up
    - rm -rf logs/kept-details/ logs/kept.md

    - >
      project-cleanup.js -n -h community
      --scope $SCOPE
      --projects $PROJECTS_JSON
      --mrs $MRS_JSON
      --config config/config.jsonc
      --max-log-size "1 MB"
      --kept-output logs/kept.md
      --removed-output logs/$REMOVED_OUTPUT

    - cd logs
    - git config user.email " "
    - git config user.name "Automated Commit"
    - git add .
    - git commit -m "Updated Logs"
    - git push

alpha-group:
  extends: .cleanup
  image: $CLEANUP_CONTAINER:v1.0
  variables:
    SCOPE: https://community.opengroup.org/alpha
    LOGS_REPO: $ALPHA_LOGS_REPO
    CONFIG_REPO: $ALPHA_CONFIG_REPO

bravo-group:
  extends: .cleanup
  image: $CLEANUP_CONTAINER:v1.0
  variables:
    SCOPE: https://community.opengroup.org/bravo
    LOGS_REPO: $BRAVO_LOGS_REPO
    CONFIG_REPO: $BRAVO_CONFIG_REPO

This relies on CI variables for the logs and config repositories, which need to be Git repositories with an embedded username/password; or you must also provide a GIT_CREDENTIALS variable to authenticate. The logging automatically sets the commit author to "Automated Commit" with no email, but any other author information can be used.

Many other CI configurations are viable, depending on the specific use cases. This is intended to illustrate the basic structure with PROJECTS_JSON and MRS_JSON being computed in previous (unshown) stages, then shared amongst all cleanup operations.

Command Line Options

Input Options

All input options are required.

-s, --scope <url>: The GitLab namespace URL that defines the boundary for cleanup operations. This prevents any cleanup of projects outside of the scope. It must match the scope value in the configuration file, or the script will fail.
-p, --projects <file>: Path to the JSON file containing project data. See Input Files for the expected format.
-m, --mrs <file>: Path to the JSON file containing merge request data. See Input Files for the expected format.
-c, --config <file>: Path to the cleanup configuration file in JSON or JSONC format. See Configuration for details.

Output Options

Output options are not required, but generally at least one is provided in order to create an enduring log for the operations. Multiple output options are permitted to create separate Kept Output / Removed Output, though you cannot specify the same output kind multiple times.

Parent directories are automatically created if necessary when outputting files. Existing files are overwritten.

-o, --combined-output <file>: Generate a Markdown report showing all resources (branches, containers, etc.), whether they were kept/removed, and why. This output kind is best for small groups, since it outputs all entries together.
-r, --removed-output <file>: This option generates a Markdown report that only shows removed entries. Projects that have no removed entries are omitted, keeping the resulting output file to a minimal file size.
-k, --kept-output <file>: This option generates a Markdown report that only shows kept entries. It is typically not necessary to maintain a log of every run’s kept output, but having access to the most recent one can help determine why a particular resource wasn’t deleted.

Additional Options

-d, --dry-run: This option will prevent the deletions from actually occurring. The output files will still refer to "Removed" items, but those items will have a status of "Planned" rather than the normal success / failure. This is useful when first establishing a configuration file on a project, before the removal policies have been fully reviewed.
--max-log-size <size>: This specifies the maximum output file size before splitting into multiple files. Accepts human-readable sizes like 1 MB or 500 KB. When the output would exceed this value, the output file instead creates a brief summary of the projects with links to subpages. The subpages are created in a "details" folder, which is created in the same directory as the output file itself. Each project gets a separate page in the details folder.

In cases where a single project’s details exceed the maximum log size, the script will output a warning but write the too-large output file anyway. There is no mechanism to further divide a details page into small files.

--clean: If a details folder already exists, delete the contents first. This is useful when overwriting the same output file (typically --kept-output), so that extraneous details pages are removed.

Common Connectivity Options

These options are inherited from the core gitlab-scripts module. They conflict with common conventions, but are kept because of legacy use. In particular, -h does not mean "help", and -n does not mean "dry run".

-h: This sets the hostname of the GitLab server to take actions on. If a .gitlab-scripts.config file is used, this can be the simple moniker from the config file. Alternatively, it can be the base URL of the GitLab server (for example, https://community.opengroup.org).
-t: This can be used to explicitly provide the authentication token for running the cleanup. It should be a personal access token that corresponds to a user with enough permissions to remove items from the scope.
-n: This indicates the script execution should be non-interactive. This prevents the core GitLab Scripts initialization logic from prompting the user to create or update a .gitlab-scripts.config file, and is recommended when used outside of an interactive terminal.