After backing up all my gists and cloning all my starred repositories there is one more thing I want to accomplish: backup my Github repositories, and by that I really mean the ones I manage and have commit rights to. I could do this by cloning and periodically pulling (as we discussed here), but you might have noticed that I explicitly exclude my own repositories in that script by checking for repo.owner.login. The reason is: I want to mirror them into Gitea.

a mirrored repository in Gitea

Why Gitea? Untypically, I’d like a Web UI onto these repositories in addition to the files in the file system. It could have been Gitlab, but I think Gitea is probably the option with the lowest resource requirements.

When I add a repository to Gitea and specify I want it to be mirrored, Gitea will take charge of periodically querying the source repository and pulling changes in it. I’ve mentioned Gitea previously, and I find it’s improving as it matures. I’ve been doing this with version 1.7.5.

After setting up Gitea and creating a user, I create an API token in Gitea with which I can create repositories programatically. The following program will obtain a list of all Github repositories I have, skip those I’ve forked from elsewhere, and then create the repository in Gitea.

#!/usr/bin/env python -B

from github import Github		# https://github.com/PyGithub/PyGithub
import requests
import json
import sys
import os

gitea_url = "http://127.0.0.1:3000/api/v1"
gitea_token = open(os.path.expanduser("~/.gitea-api")).read().strip()

session = requests.Session()        # Gitea
session.headers.update({
    "Content-type"  : "application/json",
    "Authorization" : "token {0}".format(gitea_token),
})

r = session.get("{0}/user".format(gitea_url))
if r.status_code != 200:
    print("Cannot get user details", file=sys.stderr)
    exit(1)

gitea_uid = json.loads(r.text)["id"]

github_username = "jpmens"
github_token = open(os.path.expanduser("~/.github-token")).read().strip()
gh = Github(github_token)

for repo in gh.get_user().get_repos():
    # Mirror to Gitea if I haven't forked this repository from elsewhere
    if not repo.fork:
        m = {
            "repo_name"         : repo.full_name.replace("/", "-"),
            "description"       : repo.description or "not really known",
            "clone_addr"        : repo.clone_url,
            "mirror"            : True,
            "private"           : repo.private,
            "uid"               : gitea_uid,
        }

        if repo.private:
            m["auth_username"]  = github_username
            m["auth_password"]  = "{0}".format(github_token)

        jsonstring = json.dumps(m)

        r = session.post("{0}/repos/migrate".format(gitea_url), data=jsonstring)
        if r.status_code != 201:            # if not CREATED
            if r.status_code == 409:        # repository exists
                continue
            print(r.status_code, r.text, jsonstring)

You’ll notice that I handle private Github repositories specifically in that I add username and Github token to the Gitea mirror request. While I could do that as a matter of course, the username/token tuple is stored in Gitea and is, unfortunately, displayed in the Clone from URL field when you view the mirror properties in the UI. For this reason, I limit specifying the Github repository authorization to repos which actually require it.

Gitea stores clones of the repositories it mirrors in a directory I specify when setting it up (the ROOT key in the [repository] section of app.ini), so I could access the repositories from that if something goes wrong with Gitea:

$ git clone http://localhost:3000/jpm/jpmens-jo.git

...

$ tree -d /path/to/gitea-repositories/jpm/jpmens-jo.git/
gitea-repositories/jpm/jpmens-jo.git/
├── hooks
├── info
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

$ git clone /path/to/gitea-repositories/jpm/jpmens-jo.git/
Cloning into 'jpmens-jo'...
done.

I can configure Gitea’s cron schedule with an entry in app.ini:

[cron]
; Enable running cron tasks periodically.
ENABLED = true
; Run cron tasks when Gitea starts.
RUN_AT_START = true

; Update mirrors
[cron.update_mirrors]
SCHEDULE = @every 10m

[mirror]
; Default interval as a duration between each check
DEFAULT_INTERVAL = 8h
; Min interval as a duration must be > 1m
MIN_INTERVAL = 10m

The DEFAULT_INTERVAL is the default which is copied into the respository-specific mirror settings when creating the mirror. I can modify the interval in the UI, and MIN_INTERVAL is a setting which forbids users (i.e. myself) from entering shorter intervals:

repository-specific mirror settings

If I’m impatient or want to prod Gitea into mirroring a particular repository on demand, I can POST a request to its API:

curl -s -XPOST http://localhost:3000/api/v1/repos/jpm/jpmens-jo/mirror-sync \
     -H "accept: application/json" \
     -H "Authorization: token xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

In order to monitor that mirroring is actually happening, I will periodically obtain the SHA of the last commit to the master branch on Github (that’s the best I can come up with in terms of “last updated” as there really isn’t a “last SHA” independent of a particular branch) and will see if I find that particular commit on Gitea’s side. If Gitea doesn’t carry it, I yell.

So, where importing is a one-time thing, mirroring causes Gitea to periodically check whether the source repo has changed, and if so, it pulls changes in. Mirroring doesn’t pull in issues or pull requests from Github, which is a bit of a shame, but I understand it’s not trivial to do. If you want a utility which does that, gitea-github-migrator is a one-shot program which does what it says on the tin. What Gitea does bring accross is a repository’s Wiki, and it does so by creating a *.wiki.git repository next to the actual repo, visible in the file system; within the UI it’s where you’d expect it to be and not separately listed.

If you want to set up your own self-hosted Gitea, it’s not difficult, and it doesn’t have to be public: mine is not Internet-accessible, but it has Internet access in order to be able to mirror repositories from GitHub.

I am not migrating away from GitHub because I see no reason to: the platform is very useful to me, and I’d not like to loose it. What I’m trying to accomplish is a fail-safe in case something happens to GitHub which would make me loose access, be that voluntarily or involuntarily.

Updates

On 2020-02-07 Stefan sends me an updated version of the mirror program and writes:

I made it so that you can specify a map that links remote repository names to local Gitea organizations so that one could group remotely mirrored repos into, well, Gitea organizations. If a repo isn’t found in the map it will be created in the account of the user specified in the script.

Very nice.

git, gitea, and github :: 15 Apr 2019 :: e-mail