Migrating from cgit to stagit
I serve my git repositories over HTTP for people who want to browse them without having to clone them. I used to do this with cgit, which is a server-side dynamic solution written in C. It worked nicely, but lately some bots have been busy crawling these git repositories, and I regularly ran into trouble where the cgit.cgi processes ended up in a busy loop, eating 100% of CPU for unclear reasons. More generally, I had always been anxious about using a dynamic solution to serve these repositories: all the rest of my website is static, which I think is more elegant and more reassuring in terms of security.
The natural approach would be to turn cgit into a static solution by precompiling all pages whenever a git repository is updated. However, this is not reasonable: cgit allows you, e.g., to see the status of every file at every commit, or to diff any pair of commits, which would be too expensive to precompute. These features are not very useful, so I was considering to do it but tweak cgit's output to suppress the useless parts; but this would have been tedious.
Fortunately, there is a better way: the stagit tool is a minimalistic variant of cgit, also written in C, which is designed to be static. So I have just removed cgit from my server and installed stagit instead. Obviously it's too early for me to say whether stagit is a perfect solution, but I'm happy with what I have seen so far. Here are some quick and messy notes about how I did it and what surprised me, in case you are considering doing the same. As of 2022, stagit works fine and I'm still using it.
Stagit is not packaged for Debian yet but it's easy to compile and install (and
the source code is rather short if you want to hack it). You will need
libgit2-dev
, which is packaged by Debian. I edited a bit the source to suit my
needs; cf my local fork: I changed a bit the HTML, fixed the CSS
to work better on mobile displays, renamed some files, etc. It's a bit ugly to
have HTML boilerplate hardcoded in the C code, but it works, and if it starts
misbehaving it will be easier for me to investigate.
Stagit provides one command stagit
to generate the HTML for a repository, and one
command stagit-index
to generate an index of the various repositories. The
README is rather clear (you can
also look at the manpages in the repo). Of course, you need to re-run stagit
whenever a git repository is updated, so you'll need a post-receive hook like
the one they provide, which I
adapted to my needs. One concern
is that running stagit is synchronous, i.e., when doing a git push
, you must
wait for stagit to complete. However, it seems to run instantly on my
repositories, so that's no big deal.
To get a nice index of the repositories, you need to change your git
repositories to edit description
with a description and url
with the clone
URL. There is also support for a owner field, but I removed this from the
generated HTML as I'm the owner of all the repos I host. As the setup of a new
git repository had become a bit tedious, I wrote a
script for that, too.
About the url
: you should know that stagit does not take care of allowing
people to clone your repository. One solution is to run a git server for that
(which the official stagit repository seems to do), but I didn't want it because
it's not static. Instead, I intend people to clone my repositories using the dumb HTTP
protocol:
it only requires you to serve your git repositories with your Web server, and to run git
update-server-info
, as can be done easily using the post-update.sample
hook.
So for each repository you will have the stagit version and the bare
repository. However, this will mean that the git clone URL
will be different from the stagit URL, which is a bit jarring. So I cheated
using some lighttpd
mod_rewrite
rules to transparently do the redirection. (Note that git clone
will still
point out the existence of this redirect when doing the cloning, so it's not
completely transparent.) Here are the rules, following this page
thanks to immae for suggesting an
improvement:
"^/git/([^/.]*)/HEAD$" => "/git/$1.git/HEAD",
"^/git/([^/.]*)/info/(.*)$" => "/git/$1.git/info/$2",
"^/git/([^/.]*)/objects/(.*)$" => "/git/$1.git/objects/$2",
"^/git/([^/.]*)/git-upload-pack$" => "/git/$1.git/git-upload-pack",
"^/git/([^/.]*)/git-receive-pack$" => "/git/$1.git/git-receive-pack",
One last thing about the migration to stagit is that I didn't want to break all the cgit URLs that used to work before. Of course, not all cgit pages have a stagit counterpart, but most of the important ones do, however their names are a bit different. Again, not very robust, but here goes:
"^/git/([^/.]*)/commit/\?id=(.*)$" => "/git/$1/commit/$2.html",
"^/git/([^/.]*)/about(/.*)?$" => "/git/$1/file/README.html",
"^/git/([^/.]*)/log(/.*)?$" => "/git/$1/index.html",
"^/git/([^/.]*)/refs(/.*)?$" => "/git/$1/refs.html",
"^/git/([^/.]*)/tree/?(\?.*)?$" => "/git/$1/files.html",
"^/git/([^/.]*)/tree/([^?]*)(\?.*)?$" => "/git/$1/file/$2.html",
"^/git/([^/.]*)/plain/([^?]*)(\?.*)?$" => "/git/$1/file/$2.html",
"^/git/([^.?]*)\?.*$" => "/git/$1",
"^/git/([^/.]*)/([^?]*)\?.*$" => "/git/$1",
So there you have it: a completely static web version of my git repositories that can also be used to clone them with the dumb HTTP transport, a hook to update the web version, a script to create a new repository, and no more problems or possible security vulnerabilities with cgit!