Indexing encrypted email with notmuch

Since version 0.26, the mail indexing tool that I use, notmuch, now makes it easy to index encrypted mail.

The original behavior was that notmuch did not index the contents of encrypted emails, as they were encrypted and it couldn't access them. This meant that you couldn't search inside encrypted emails (except for headers, e.g., the subject, recipient, etc.).

Now, notmuch is able to use gpg (and gpg-agent) to read and index the cleartext of encrypted emails. Of course, this means that notmuch's index can now be used to reconstruct encrypted emails; in particular, as notmuch stores the session keys for messages in its index, this means that any attacker who can access the index can decrypt the messages¹. For my use case, I think that this security risk is acceptable: I essentially see GPG as a tool to ensure that messages are not altered between the sender and recipient, my notmuch index is stored on an encrypted partition anyway, and my GPG passphrase is usually cached by gpg-agent so an attacker who has control over my machine would be able to access the plaintext of encrypted messages quite easily.

So, if you also use notmuch, if you also have your passphrase cached by gpg-agent at least part of the time, and if you want notmuch to index the cleartext of your encrypted emails, here is what you should do. First, you should make sure that you have notmuch 0.26 or a more recent version. Second, you should tell notmuch that you want it to index the cleartext of encrypted email:

notmuch config set index.decrypt true

Beware, this configuration flag lives only in the database, not in the config file; hence, e.g., it will not be synchronized across multiple machines if you synchronize your config files from one machine to another.

Then you should reindex all encrypted email that notmuch knows about but hasn't indexed yet (this took around 15 mins in my case):

notmuch reindex tag:encrypted and not property:index.decryption=success

Of course, you will be prompted for your GPG passphrase if it isn't cached (and also possibly for the passphrase of other keys that you have used in the past). Once this has completed, you should check the encrypted messages that notmuch was still unable to index:

notmuch search tag:encrypted and not property:index.decryption=success

In my case, there were only a few that couldn't be indexed -- and usually it was because they hadn't been encrypted for my key because the sender had made some mistake.

From now on, notmuch new should automatically index the cleartext of incoming messages when your GPG passphrase is cached by gpg-agent. The last step is the following: if your passphrase is not cached all the time, then you should arrange for the notmuch reindex command above to be executed regularly, so that encrypted messages will eventually be indexed.

The setup described in this post lead to unpleasant side effects where GPG invocations would hang, probably because notmuch tried to ask for a passphrase. To avoid this, I had to ensure that the notmuch reindex command, when run regularly, never tried to ask for a passphrase if it wasn't currently stored by the agent. I did this by setting PINENTRY_USER_DATA=none and modifying my custom pinentry script to handle properly this value. (Of course, this means that encrypted messages will not be correctly indexed when the GPG agent hasn't cached the passphrase, but the hope is that they will eventually be indexed.)

Another problem that I had is that notmuch reindex would waste CPU time by trying to reindex each time the emails where it had previously failed. To avoid this, I reviewed manually the mails that couldn't be indexed, tagged them with a special tag, and then excluded mails with that tag from the notmuch reindex command. I also added a crontab entry to review periodically the emails where indexing failed, so I will tag them appropriately if the failure is expected. A more elaborate idea would be to exclude from the notmuch reindex command the emails that are too ancient; or maybe script things so that when all GPG keys are available in the agent but notmuch cannot index a message then it should tag it so as not to try again.

I'm still having problems with pinentry misbehaving, i.e., either not showing up anymore because a pinentry-curses is waiting for input somewhere, or having pinentry-gtk popping up uninvited. I can live with it for now but at some point I should investigate this and tidy it up.

In fact, the historical workaround to index encrypted email with notmuch was simply to arrange for it to be decrypted when it arrives. I would also be OK with the security implications of this, but I have never set it up, because it's complicated to do right, especially because my GPG passphrase isn't always available in gpg-agent's cache. Besides, I prefer to keep an original copy of the email that I receive, so I think it's cleaner to keep the encrypted messages as-is and have notmuch store in its index its additional information that it needs. ↩

Migrating from cgit to stagit

I serve my git repositories over HTTP for people who want to browse them without having to clone them. I used to do this with cgit, which is a server-side dynamic solution written in C. It worked nicely, but lately some bots have been busy crawling these git repositories, and I regularly ran into trouble where the cgit.cgi processes ended up in a busy loop, eating 100% of CPU for unclear reasons. More generally, I had always been anxious about using a dynamic solution to serve these repositories: all the rest of my website is static, which I think is more elegant and more reassuring in terms of security.

The natural approach would be to turn cgit into a static solution by precompiling all pages whenever a git repository is updated. However, this is not reasonable: cgit allows you, e.g., to see the status of every file at every commit, or to diff any pair of commits, which would be too expensive to precompute. These features are not very useful, so I was considering to do it but tweak cgit's output to suppress the useless parts; but this would have been tedious.

Fortunately, there is a better way: the stagit tool is a minimalistic variant of cgit, also written in C, which is designed to be static. So I have just removed cgit from my server and installed stagit instead. Obviously it's too early for me to say whether stagit is a perfect solution, but I'm happy with what I have seen so far. Here are some quick and messy notes about how I did it and what surprised me, in case you are considering doing the same. As of 2022, stagit works fine and I'm still using it.

Stagit is not packaged for Debian yet but it's easy to compile and install (and the source code is rather short if you want to hack it). You will need libgit2-dev, which is packaged by Debian. I edited a bit the source to suit my needs; cf my local fork: I changed a bit the HTML, fixed the CSS to work better on mobile displays, renamed some files, etc. It's a bit ugly to have HTML boilerplate hardcoded in the C code, but it works, and if it starts misbehaving it will be easier for me to investigate.

Stagit provides one command stagit to generate the HTML for a repository, and one command stagit-index to generate an index of the various repositories. The README is rather clear (you can also look at the manpages in the repo). Of course, you need to re-run stagit whenever a git repository is updated, so you'll need a post-receive hook like the one they provide, which I adapted to my needs. One concern is that running stagit is synchronous, i.e., when doing a git push, you must wait for stagit to complete. However, it seems to run instantly on my repositories, so that's no big deal.

To get a nice index of the repositories, you need to change your git repositories to edit description with a description and url with the clone URL. There is also support for a owner field, but I removed this from the generated HTML as I'm the owner of all the repos I host. As the setup of a new git repository had become a bit tedious, I wrote a script for that, too.

About the url: you should know that stagit does not take care of allowing people to clone your repository. One solution is to run a git server for that (which the official stagit repository seems to do), but I didn't want it because it's not static. Instead, I intend people to clone my repositories using the dumb HTTP protocol: it only requires you to serve your git repositories with your Web server, and to run git update-server-info, as can be done easily using the post-update.sample hook. So for each repository you will have the stagit version and the bare repository. However, this will mean that the git clone URL will be different from the stagit URL, which is a bit jarring. So I cheated using some lighttpd mod_rewrite rules to transparently do the redirection. (Note that git clone will still point out the existence of this redirect when doing the cloning, so it's not completely transparent.) Here are the rules, following this page thanks to immae for suggesting an improvement:

  "^/git/([^/.]*)/HEAD$" => "/git/$1.git/HEAD",
  "^/git/([^/.]*)/info/(.*)$" => "/git/$1.git/info/$2",
  "^/git/([^/.]*)/objects/(.*)$" => "/git/$1.git/objects/$2",
  "^/git/([^/.]*)/git-upload-pack$" => "/git/$1.git/git-upload-pack",
  "^/git/([^/.]*)/git-receive-pack$" => "/git/$1.git/git-receive-pack",

One last thing about the migration to stagit is that I didn't want to break all the cgit URLs that used to work before. Of course, not all cgit pages have a stagit counterpart, but most of the important ones do, however their names are a bit different. Again, not very robust, but here goes:

  "^/git/([^/.]*)/commit/\?id=(.*)$" => "/git/$1/commit/$2.html",
  "^/git/([^/.]*)/about(/.*)?$" => "/git/$1/file/README.html",
  "^/git/([^/.]*)/log(/.*)?$" => "/git/$1/index.html",
  "^/git/([^/.]*)/refs(/.*)?$" => "/git/$1/refs.html",
  "^/git/([^/.]*)/tree/?(\?.*)?$" => "/git/$1/files.html",
  "^/git/([^/.]*)/tree/([^?]*)(\?.*)?$" => "/git/$1/file/$2.html",
  "^/git/([^/.]*)/plain/([^?]*)(\?.*)?$" => "/git/$1/file/$2.html",
  "^/git/([^.?]*)\?.*$" => "/git/$1",
  "^/git/([^/.]*)/([^?]*)\?.*$" => "/git/$1",

So there you have it: a completely static web version of my git repositories that can also be used to clone them with the dumb HTTP transport, a hook to update the web version, a script to create a new repository, and no more problems or possible security vulnerabilities with cgit!

An update on CalDAV and CardDAV with Radicale

This is a quick update to a previous post where I explained how to self-host your calendar and contacts using the Radicale CalDAV and CardDAV server, and how to access them on Android devices with DAVdroid.

Three years later, I am still using this setup. I only use my Android phone to access the calendar and contacts, so the Radicale server is essentially a way to back the contacts and calendars up; although I have also tried accessing them, e.g., with Evolution. Over these three years, DAVdroid has evolved and gotten a bit more user-friendly and stable, though I have had a few problems (e.g., duplicated calendar events). Radicale has evolved too, I'm currently at version 1.1.1, which is the one provided by Debian even though it is really outdated. (Also, as of this writing, Radicale is not available in the Debian testing repos, see here, but it can be installed from Debian stable.)

The main change that I did is on the server. In the old guide, I explained how to set up Radicale so that it listens on port 5232, manages authentication and encryption, and DAVdroid connects to it directly. I have changed this setup so that DAVdroid now connects to Apache2, which manages authentication and encryption, and talks to Radicale using WSGI. This has a number of advantages:

You can encrypt the connection with SSL managed by Apache, e.g., using Let's Encrypt, without self-signed certificates or other ad-hoc setup; and you don't need to trust Radicale to do the encryption correctly.
The server listens on the standard HTTPS port (443) rather than the custom Radicale port (5232) so the connections aren't blocked on unfriendly networks.
You can use vhosts, e.g., to host it on a subdomain.
Authentication is managed by Apache, not Radicale. This is somewhat reassuring: even if Radicale has a massive security flaw, only users that correctly authenticated with Apache can talk to it at all.
The most important point: with the old setup, Radicale would inexplicably hang every now and then, presumably when the phone disconnected messily from it. (I think it is this bug). With the new setup, this does not happen. (Maybe the bug has been fixed in more recent Radicale versions anyway, I don't know.)

Of course, the downside of this new setup is that you need Apache just to route requests to Radicale. As I needed Apache for other purposes, though, I didn't mind.

The setup

I haven't documented this setup while I did it, so here a hopefully complete description of what I currently have.

You need to install Apache, and enable the SSL and WSGI and auth_basic modules (run as root a2enmod ssl and a2enmod wsgi and a2enmod auth_basic and service apache2 restart). Of course, basic HTTP authentication may sound insecure, but we will only be doing it over HTTPS.

You should set up Let's Encrypt certificates (e.g., with certbot), something I mentioned in this previous guide.

Of course you need to install radicale. We are going to put all radicale-related stuff in /srv/radicale, but of course this can be changed. The files in this directory should be readable and writable by the Web server.

You then need to create a file in /etc/apache2/sites-enabled whose contents look as follows:

<IfModule mod_ssl.c>
<VirtualHost *:443>
        ServerName dav.example.com

        ServerAdmin youremail@example.com
        DocumentRoot /var/www/html/

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined

        WSGIDaemonProcess radicale user=www-data group=www-data threads=1
        WSGIScriptAlias / /srv/radicale/radicale.wsgi

        <Directory /srv/radicale/>
            WSGIProcessGroup radicale
            WSGIApplicationGroup %{GLOBAL}
            AllowOverride None
            AuthType basic
            AuthName "dav.example.com"
            AuthUserFile /srv/radicale/passwd
            Require user youruser
            SSLRequireSSL
        </Directory>

SSLCertificateFile /etc/letsencrypt/live/example.com/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem
Include /etc/letsencrypt/options-ssl-apache.conf
</VirtualHost>
</IfModule>

The file /srv/radicale/passwd contains the username and passwords of who can access the server, managed as usual with the htpasswd utility. The file /srv/radicale/radicale.wsgi contains the invocation to run Radicale and points to the config file, as follows:

import radicale
configuration = radicale.config.read(["/srv/radicale/config"])
radicale.log.start()
application = radicale.Application()

To create the config file, you can, e.g., write the following in /srv/radicale/config

[encoding]
request = utf-8
stock = utf-8

[rights]
type = owner_only

[storage]
type = filesystem
filesystem_folder = /srv/radicale/collections

[logging]
config = /srv/radicale/logging

In this file, /srv/radicale/collections contains the Radicale collections as in the old guide. The file /srv/radicale/logging contains the radicale logging configuration. Here is mine:

# inspired by https://github.com/Kozea/Radicale/issues/266#issuecomment-121170414
[loggers]
keys = root

[handlers]
keys = file

[formatters]
keys = full

[logger_root]
level = DEBUG
handlers = file

[handler_file]
args = ('/srv/radicale/logs/radicale.log','a',32768,3)
level = INFO
class = handlers.RotatingFileHandler
formatter = full

[formatter_full]
format = %(asctime)s - %(levelname)s: %(message)s

In the above, /srv/radicale/logs is where you want radicale to write its log files. You probably need to specify it manually, because radicale is run by the Web server, which may not have the right to log, e.g., in /var/log/radicale as the default configuration would do.

SWERC 2017 and 2018

I just realized I hadn't mentioned here something that had kept me busy over the autumn months. With my university, Télécom ParisTech, and with my colleagues Bertrand Meyer and Pierre Senellart, we have been organizing the SWERC programming contest in November 2017, and will do so again in December 2018. SWERC is the South-Western Europe Regional Contest for ACM ICPC which is the most famous competitive programming competition for university students. You can read more about the contest here. We have welcomed 76 teams of three contestants each, from 48 institutions in France, Israel, Italy, Portugal, Spain, and Switzerland. The top-3 teams in the rankings are from ENS Paris, ETH Zürich, and SNS Pisa: they will compete in the ICPC world finals in Beijing.

The Télécom student association Comète has made a very nice video covering SWERC'17, which went out recently, and gives a good idea of what the contest was like. You can watch it on Youtube or in the iframe below, or download it directly if you prefer.

If you like competitive programming, you can have a look at the SWERC'17 problems on our website, or on UVa Online Judge or ACM-ICPC Live Archive. And if you are from a university in South-Western Europe and are eligible to participate, then we'd be glad to see you compete at SWERC'18! Registrations will open here in early September 2018.

Modern blockbusters: a dining metaphor

This is just a text I wrote to explain how I felt about most blockbuster movies. I didn't know what to do with it, so here it is.

There's this new restaurant in town that has posters and ads everywhere. Everyone's talking about it, and they all seem to have a pretty strong opinion, so you go with some friends to see what it's like.

The first impression is outstanding. The restaurant is lavishly decorated. The room, furniture, atmosphere, music, are all spectacular and have obviously been painstakingly designed for your enjoyment. The waiters have fancy, colorful, creative dresses, and they usher you to a comfortable seat on a magnificent table adorned with the finest dining ware.

You spend some time admiring the setting: the paintings on the wall, the patterns of the wallpaper, the carefully engineered lighting, and the complex ballet of the waiters. Soon enough, the first dish is served. It consists of various kinds of canapés, neatly arranged on a splendid plate. They look wonderful even if not particularly original. You have a bite, and the taste is good, not exceptional compared to your expectations, but certainly not bad either; just not especially remarkable.

You finish the plate, and a different waiter comes to the table after a while, with a new plate of other kinds of canapés. How formal, you think, how unbelievably fancy to have two rounds of appetizers before the meal has even started! The ingredients are different, but your opinion is essentially the same: excellent visual impression, classical recipes, enjoyable yet somewhat unsurprising taste.

A third plate of canapés comes in, and now you start to suspect that something is off. Why are they only serving such cocktail food? Worse, you can't figure out any logic in the contents of the plates: now some of the bites are sweet, but on the next plate everything is savory again. And the meal continues like this, with a series of plates of different kinds of hors d'oeuvres brought by various waiters.

It's not that the experience is really unpleasant. You can appreciate the setting, the lighting, and the subtle changes in atmosphere and music throughout the evening. You can also wonder about the seemingly random assortment of tastes, plates, and waiters, that comes to the table every now and then. You can also enjoy the food, which is acceptable even if not strikingly good. But after one hour and a half of this, your expectations have been building up to something more. Surely all of this has been leading to a proper dish of some kind? Alas, no: the series of appetizers continues for one more hour, you progressively realize that it's getting too late for your hopes to materialize, and then the check comes and confirms what you had feared. You feel somewhat queasy as you get up and leave the table, like when you have too many snacks in a row: you're no longer hungry, but you don't feel like you had a proper meal either. In fact, it's a bit as if you had been robbed of the opportunity of having one.

As it turns out, your friends are all thrilled about this incredible dinner experience, but it seems that you haven't been paying attention to the same things as them. For one thing, they really enjoyed the beauty of the setting, the music, and how everything was pleasing to the eye and ears. You readily concede that all of this was perfect, but you try to bring the discussion back to the food. "But wasn't the food pretty too", they ask? "Didn't it perfectly match the plates, the table and the decoration of the room?"

Your friends also loved that the restaurant staff was so varied. This is something that you had essentially missed, although you do remember that the plates were brought by many different waiters, with interesting costumes and ties and hairstyles. To your friends, the main point of the various canapés was the story that they were telling about the lives of the waiters and the relationships between them. They can discuss it for ages: "Did you understand why the short bearded guy brought the foie gras plate, although the chunks of duck magret had all been delivered by the tall blonde waitress until then?" "Oh, my interpretation is that the bearded guy has a secret duck side in him, but he's conflicted about his relationship with the bald guy who brought the veal liver."

You ask: "But why the hell did they serve chocolate mousse verrines between the foie gras and veal liver?" Of course, they reply, the reason why the skinny old waitress brought the chocolate mousse was to appease the tension between bearded guy and bald guy. "But what good did it do to the meal", you ask? And they answer: "It brings forward the side of the old waitress's character that feels guilty for the bearded guy's struggle."

You try to explain how you would have liked the meal to have a certain structure, with recognizable dishes arranged in a consistent order. Your friends pounce on this, and question you: why are you so attached to this traditional structure of a formal meal? Why should a good meal necessarily consist of a starter, a main course, and a dessert? "But the point is not the specific structure," you reply, "so much as having any kind of understandable connection between the successive dishes." Some of your friends then ask: "But don't you see how subversive it is to have served an anchovy paste toast just after a chocolate parfait? Don't you like this sort of strong political statement?" You still fail to see the radical appeal of this, given that the setting was otherwise rather consensual, and the food consisted of perfectly standard Western fare. To you, the meal didn't look like a satire of anything in particular, except maybe itself.

They ask, "but didn't you like how the meal was surprising and unpredictable?" And indeed, you have to agree that you couldn't anticipate anything, given that it appeared to be completely random. You explain how the lack of structure makes it impossible for you to summarize, or indeed to remember, the sequence of foods that you had. They disagree: to them, the meal was rich and complex, and anyway the main questions to examine are character-related, e.g., how the blonde waitress's disappearance at the middle of the meal could be linked to the increasingly important role of the bald guy in connection to the sweet and especially fruit-flavored foods.

Your friends are all eager to return to this place when they will start serving their new menu next year. To them, this meal has been building up to the great surprises that the next dinner will surely bring. "Think of all the new kinds of food that we will discover! And in particular I wonder whether we will see the blonde waitress again? I wonder whether she might bring us some scallops in a green plate, because remember that the only seafood so far had been brought by the old waitress, also in a green plate, so this could be some hint of a family relationship between them?" And when you express your lack of enthusiasm, they don't understand you: if you complained so much about the food, why aren't you hungry for more?