# GitHub Backup **Automatically backup your GitHub repositories to your local machine.** This tool is designed to automatically pull the list of GitHub repositories from one, or more, GitHub organizations and clone (or fetch) them to your local machine. It is designed to be run as part of a scheduled backup process with the ultimate goal of ensuring that you have a local copy of all of your GitHub repositories should the unthinkable happen. ## Installation Install with [Homebrew](https://brew.sh): ```sh brew install sierrasoftworks/tap/github-backup ``` ## Features - **Backup Multiple Organizations**, automatically gathering the full list of repositories for each organization through the GitHub API. - **Backup Starred Repos**, automatically gathering the full list of your starred repositories - **Repo Allowlists/Denylists** to provide fine-grained control over which repositories are backed up and which are not. - **GitHub Enterprise Support** for those of you running your own GitHub instances and not relying on GitHub.com. ## Example ```bash # Run the tool directly ./github-backup --config config.yaml # Or run it in a container docker run \ -v $(pwd)/config.yaml:/config.yaml \ -v $(pwd)/backups:/backups \ ghcr.io/SierraSoftworks/github-backup:latest \ --config /config.yaml ``` ### Configuration ```yaml # Run a backup every hour (will use `git fetch` for existing copies) # You can also omit this if you want to run a one-shot backup schedule: "0 * * * *" backups: - kind: github/repo from: user # The user associated with the provided credentials to: /backups/personal credentials: !UsernamePassword { username: "", password: "" } properties: query: "affiliation=owner" # Additional query parameters to pass to GitHub when fetching repositories - kind: github/repo from: "users/another-user" to: /backups/friend credentials: !Token "your_github_token" - kind: github/repo from: "orgs/my-org" to: /backups/work filter: '!repo.fork && repo.name contains "awesome"' - kind: github/release from: "orgs/my-org" to: /backups/releases filter: '!release.prerelease && !asset.source-code' # You can also backup single repositories directly if you wish - kind: github/repo from: "repos/my-org/repo" to: /backups/work # This is particularly useful for backing up release artifacts for # specific projects. - kind: github/release from: "repos/my-org/repo" to: /backups/releases filter: '!release.prerelease' # Backup all repositories starred by the currently authenticated user - kind: github/repo from: "starred" to: /backups/starred/repos credentials: !Token "your_github_pat" # Backup all GitHub Gists for your authenticated user - kind: github/gist from: "user" to: /backups/gists/user credentials: !Token "your_github_token" # Backup all Gists starred by the currently authenticated user - kind: github/gist from: "starred" to: /backups/starred/gists credentials: !Token "your_github_pat" # Backup public GitHub Gist of another user - kind: github/gist from: "users/another-user" to: /backups/gists/another-user ``` #### Backing up to a Forgejo instance In addition to writing backups to the local filesystem, the `to` field can describe a remote [Forgejo](https://forgejo.org/) instance. Repositories are mirrored using Forgejo's repository migration API, while release artifacts are uploaded as release attachments. ```yaml backups: # Mirror a repository to a Forgejo instance - kind: github/repo from: repos/SierraSoftworks/github-backup to: kind: forgejo/repo address: https://forgejo.example.com owner: backups credentials: !Token "your_forgejo_access_token" # Upload release artifacts to a Forgejo instance - kind: github/release from: repos/SierraSoftworks/github-backup to: kind: forgejo/release address: https://forgejo.example.com owner: backups credentials: !Token "your_forgejo_access_token" filter: '!release.prerelease' ``` The `owner` field selects the Forgejo user or organization which should own the mirrored repositories (and host the releases). The `credentials` field accepts the same `!Token` and `!UsernamePassword` forms as GitHub credentials. #### Backing up to multiple destinations The `to` field also accepts a **list** of targets, allowing a single policy to mirror its source to several destinations at once. The source (for example the GitHub API) is queried only once, and each resulting repository or release is written to every configured target. You can freely mix filesystem paths and remote targets within the same list. ```yaml backups: # Back up a repository to the local filesystem *and* a Forgejo instance - kind: github/repo from: repos/SierraSoftworks/github-backup to: - /backups/github - kind: forgejo/repo address: https://forgejo.example.com owner: backups credentials: !Token "your_forgejo_access_token" ``` When `to` is omitted it defaults to a single `./backups` filesystem target, and a single target (a path string or a remote map) continues to work exactly as before. ### OpenTelemetry Reporting In addition to the standard logging output, this tool also supports reporting metrics to an OpenTelemetry-compatible backend. This can be useful for tracking the performance of the tool over time and configuring monitoring in case backups start to fail. Configuration is conducted through the use of environment variables: ```bash OTEL_EXPORTER_OTLP_ENDPOINT=https://your-otel-collector:4317 OTEL_EXPORTER_OTLP_HEADERS=X-API-KEY=your-api-key OTEL_TRACES_SAMPLER=traceidratio OTEL_TRACES_SAMPLER_ARG=1.0 ``` ### Cron Monitoring If you run this tool on a schedule, you'll often want to be alerted when a backup run fails to start or complete. To support this, GitHub Backup can report the state of each scheduled run to an HTTP-based cron monitoring service such as [Sentry Cron Monitors](https://docs.sentry.io/product/crons/) or [healthchecks.io](https://healthchecks.io/). Monitoring is configured under the top-level `ping` key, where you can provide a separate URL for each state you care about. Each URL is fetched with a simple HTTP `GET` request when the corresponding state is reached, and any state you omit is simply not reported. ```yaml ping: # Fetched when a backup run starts. start: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=in_progress # Fetched when a backup run completes successfully. success: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=ok # Fetched when a backup run completes with one or more errors. failure: https://sentry.io/api/0/organizations/your-org/monitors/github-backup/checkins/?status=error ``` A run is reported as a `failure` if any policy reports one or more errors, and as a `success` otherwise. Reporting is best-effort: if the monitoring service can't be reached, a warning is logged but the backup run itself is unaffected. ## Filters This tool allows you to configure filters to control which GitHub repositories are backed up and which are not. Filters are used within the `backups` section of your configuration file and can be specified on a per-user or per-organization basis. When writing a filter, the goal is to write a logical expression which evaluates to `true` when you wish to include a repository and `false` when you wish to exclude it. The filter language supports several operators and properties which can be used to control this process. ### Available filters For `kind: github/repo` and `kind: github/star` | Field | Type | Description (_Example_) | |------------------------|------------|----------------------------------------------------------------------------------------------------| | `repo.name` | `string` | The name of the repository (_Hello-World_) | | `repo.fullname` | `string` | The full-name of the repository (_octocat/Hello-World_) | | `repo.private` | `boolean` | Whether the repository is private | | `repo.public` | `boolean` | Whether the repository is public | | `repo.fork` | `boolean` | Whether the repository is a fork | | `repo.size` | `integer` | The size of the repository, in kilobytes (_1024_). | | `repo.archived` | `boolean` | Whether the repository is archived | | `repo.disabled` | `boolean` | Returns whether or not this repository disabled | | `repo.default_branch` | `string` | The default branch of the repository (_main_) | | `repo.empty` | `boolean` | Whether the repository is empty (When a repository is initially created, `repo.empty` is `true`) | | `repo.template` | `boolean` | Whether this repository acts as a template that can be used to generate new repositories | | `repo.forks` | `integer` | The number of times this repository is forked | | `repo.stargazers` | `integer` | The number of people starred this repository | For `kind: github/release` | Field | Type | Description (_Example_) | |-----------------------|------------|--------------------------------------------------------------------| | `release.tag` | `string` | The name of the tag (_v1.0.0_) | | `release.name` | `string` | The name of the release (_v1.0.0_) | | `release.draft` | `boolean` | Whether the release is a draft (unpublished) release | | `release.prerelease` | `boolean` | Whether to identify the release as a prerelease or a full release | | `release.published` | `boolean` | Whether the release is a published (not a draft) release | | `asset.name` | `string` | The file name of the asset (_github-backup-darwin-arm64_) | | `asset.size` | `integer` | The size of the asset, in kilobytes. (_1024_) | | `asset.downloaded` | `boolean` | If the asset was downloaded at least once from the GitHub Release | For `kind: github/gist` | Field | Type | Description | |-------------------------|-----------|------------------------------------------------| | `gist.public` | `boolean` | Whether the gist is public | | `gist.private` | `boolean` | Whether the gist is private | | `gist.comments_enabled` | `boolean` | Whether comments are enabled for the gist | | `gist.comments` | `integer` | Number of comments on the gist | | `gist.files` | `integer` | Number of files in the gist | | `gist.file_names` | `array` | List of file names in the gist | | `gist.languages` | `array` | List of programming languages used in the gist | | `gist.type` | `string` | Type of content in the gist | ### Examples Here are some examples of filters you might choose to use: - `!repo.fork || !repo.archived || !repo.empty` - Do not include repositories which are forks, archived, or empty. - `repo.private` - Only include private repositories in your list. - `repo.public && !repo.fork` - Only include public repositories which are not forks. - `repo.name contains "awesome"` - Only include repositories which have "awesome" in their name. - `(repo.name contains "awesome" || repo.name contains "cool") && !repo.fork` - Only include repositories which have "awesome" or "cool" in their name and are not forks. - `!release.prerelease && !asset.source-code` - Only include release artifacts which are not marked as pre-releases and are not source code archives. - `repo.name in ["git-tool", "grey"]` - Only include repositories with the names "git-tool" or "grey". - `repo.stargazers >= 5` - Only include repositories with at least 5 stars.