a3nm's blog

Migrating from cgit to stagit

I serve my git repositories over HTTP for people who want to browse them without having to clone them. I used to do this with cgit, which is a server-side dynamic solution written in C. It worked nicely, but lately some bots have been busy crawling these git repositories, and I regularly ran into trouble where the cgit.cgi processes ended up in a busy loop, eating 100% of CPU for unclear reasons. More generally, I had always been anxious about using a dynamic solution to serve these repositories: all the rest of my website is static, which I think is more elegant and more reassuring in terms of security.

The natural approach would be to turn cgit into a static solution by precompiling all pages whenever a git repository is updated. However, this is not reasonable: cgit allows you, e.g., to see the status of every file at every commit, or to diff any pair of commits, which would be too expensive to precompute. These features are not very useful, so I was considering to do it but tweak cgit's output to suppress the useless parts; but this would have been tedious.

Fortunately, there is a better way: the stagit tool is a minimalistic variant of cgit, also written in C, which is designed to be static. So I have just removed cgit from my server and installed stagit instead. Obviously it's too early for me to say whether stagit is a perfect solution, but I'm happy with what I have seen so far. Here are some quick and messy notes about how I did it and what surprised me, in case you are considering doing the same.

Stagit is not packaged for Debian yet but it's easy to compile and install (and the source code is rather short if you want to hack it). You will need libgit2-dev, which is packaged by Debian. I edited a bit the source to suit my needs; cf my local fork: I changed a bit the HTML, fixed the CSS to work better on mobile displays, renamed some files, etc. It's a bit ugly to have HTML boilerplate hardcoded in the C code, but it works, and if it starts misbehaving it will be easier for me to investigate.

Stagit provides one command stagit to generate the HTML for a repository, and one command stagit-index to generate an index of the various repositories. The README is rather clear (you can also look at the manpages in the repo). Of course, you need to re-run stagit whenever a git repository is updated, so you'll need a post-receive hook like the one they provide, which I adapted to my needs. One concern is that running stagit is synchronous, i.e., when doing a git push, you must wait for stagit to complete. However, it seems to run instantly on my repositories, so that's no big deal.

To get a nice index of the repositories, you need to change your git repositories to edit description with a description and url with the clone URL. There is also support for a owner field, but I removed this from the generated HTML as I'm the owner of all the repos I host. As the setup of a new git repository had become a bit tedious, I wrote a script for that, too.

About the url: you should know that stagit does not take care of allowing people to clone your repository. One solution is to run a git server for that (which the official stagit repository seems to do), but I didn't want it because it's not static. Instead, I intend people to clone my repositories using the dumb HTTP protocol: it only requires you to serve your git repositories with your Web server, and to run git update-server-info, as can be done easily using the post-update.sample hook. So for each repository you will have the stagit version and the bare repository. However, this will mean that the git clone URL will be different from the stagit URL, which is a bit jarring. So I cheated using some lighttpd mod_rewrite rules to transparently do the redirection. (Note that git clone will still point out the existence of this redirect when doing the cloning, so it's not completely transparent.) Here are the rules, following this page thanks to immae for suggesting an improvement:

  "^/git/([^/.]*)/HEAD$" => "/git/$1.git/HEAD",
  "^/git/([^/.]*)/info/(.*)$" => "/git/$1.git/info/$2",
  "^/git/([^/.]*)/objects/(.*)$" => "/git/$1.git/objects/$2",
  "^/git/([^/.]*)/git-upload-pack$" => "/git/$1.git/git-upload-pack",
  "^/git/([^/.]*)/git-receive-pack$" => "/git/$1.git/git-receive-pack",

One last thing about the migration to stagit is that I didn't want to break all the cgit URLs that used to work before. Of course, not all cgit pages have a stagit counterpart, but most of the important ones do, however their names are a bit different. Again, not very robust, but here goes:

  "^/git/([^/.]*)/commit/\?id=(.*)$" => "/git/$1/commit/$2.html",
  "^/git/([^/.]*)/about(/.*)?$" => "/git/$1/file/README.html",
  "^/git/([^/.]*)/log(/.*)?$" => "/git/$1/index.html",
  "^/git/([^/.]*)/refs(/.*)?$" => "/git/$1/refs.html",
  "^/git/([^/.]*)/tree/?(\?.*)?$" => "/git/$1/files.html",
  "^/git/([^/.]*)/tree/([^?]*)(\?.*)?$" => "/git/$1/file/$2.html",
  "^/git/([^/.]*)/plain/([^?]*)(\?.*)?$" => "/git/$1/file/$2.html",
  "^/git/([^.?]*)\?.*$" => "/git/$1",
  "^/git/([^/.]*)/([^?]*)\?.*$" => "/git/$1",

So there you have it: a completely static web version of my git repositories that can also be used to clone them with the dumb HTTP transport, a hook to update the web version, a script to create a new repository, and no more problems or possible security vulnerabilities with cgit!

comments welcome at a3nm<REMOVETHIS>@a3nm.net