a3nm's blog

A fundamental problem with OpenID

— updated

In this post I describe what I think is a fundamental problem with OpenID, and how I think a decentralized authentication scheme should work to avoid this problem. I assume that you are familiar with OpenID, DNS, asymmetric crypto and X.509 client certificates. Just a refresher about the terminology: with OpenID, you log in at a relying party by specifying a URL, the relying party queries this URL to find out who is the provider, and the provider takes care of identification.

This post is not about the practical problems of getting people to understand and adopt OpenID. I do not intend to complain about the many websites where you can only log in with a closed list of proprietary providers and not with the OpenID URL of your choice even though it probably wouldn't be harder to support. I do not intend to complain about the fact that people don't get OpenID, or discuss whether the cause is OpenID's complexity or the general public's cluelessness. My complaint is about the core design of OpenID. It is the following:

OpenID uses URLs to identify people, and URLs rely on the DNS system.

Why is that bad?

DNS is centralized
If you are your OpenID provider, you need to get a domain name at a DNS registrar. This process costs money, and depends on the DNS system, which is centralized.
URLs can change
Do you want to change your OpenID URL? Well, do it at every relying party if they have an option for that. Want to do it globally? Well, you can't. So, you might get stuck paying for that old domain name forever after all.
Domain names are reassigned
If you accidentally let your domain name expire, then you are locked out of your accounts at relying parties. Worse, anyone can buy it. At which point, they get full control of your OpenID URL, and you have no way to stop them. Woops.
DNS is insecure
Securing DNS is an afterthought, so it's a bad idea to assume it's secure unless there's no better way.

It is true that the last two points can be mitigated if you instruct the relying party to use HTTPS (by indicating "https://" explicitly in your OpenID URL), and if you have an HTTPS certificate that relying parties will trust, and if an attacker doesn't (obviously, if a relying party accepts your server's self-signed HTTPS certificate, then it would gladly accept an attacker's).

To summarize: OpenID is decentralized but relies on a system (DNS) which is not very secure, centralized, and in which being your own provider requires you to commit to a domain name which you basically have to renew and pay for indefinitely. Hmm. Can we do better?

The fundamental principle of OpenID is that you provide the identifier of a resource (the URL) and prove that you control this resource. You actually exercise your control over the resource by having it point to an OpenID provider whom you trust to identify you in a suitable way, but the basic idea remains: to log in, you name something that you control.

It turns out that there are other resources that you can name and prove that you control, and that don't need to be registered in a centralized system like DNS. I'm thinking about public keys. You could provide your public key (or its fingerprint under a secure hash function) to relying parties, which would associate you to this public key (a URI of sorts) rather than associating you to a URL. You would then demonstrate ownership of the private part of this key when logging in. Problem solved. This is how ssh public-key authentication works, but this has also been applied to the web: X.509 client certificates work precisely like this. Sadly, no one uses them, because people do not want to generate keys and configure their browsers and such.

All hope is not lost, though. What about the following scheme:

  1. You give a URL to the relying party.
  2. This URL points to a document indicating both a public key and an OpenID provider which controls the associated private key.
  3. The provider demonstrates to the relying party that it owns the private key associated to the public key.
  4. You authenticate with the provider like in vanilla OpenID.
  5. The relying party does not associate your account with the URL you provided, but with the public key.

This indirection would change nothing for casual users who would just register with any OpenID provider and log in with their OpenID URL without caring at all about "their" key which would be entirely managed by their provider. The behavior of relying parties, and the interaction between relying parties and providers, would be different, but none of this would be visible to the user.

However, power users would own their key without being tied to a specific URL, and could configure an endpoint with this key at any domain of their choice. Benefits:

  1. If you want to change your domain name, just put the key at the new domain and identify with a URL at the new domain. The URL is just a pointer, the underlying key is still the same, so you're still the same person to the relying party even though the URL changed.
  2. If your domain name expires, no big deal, someone who buys it will not get your key, and you can just host the key elsewhere.
  3. If you want to stop being your own provider, just entrust your private key to an existing provider and you can start logging in with a URL pointing to this provider.
  4. If you are paranoid and do not trust DNS, it would be easy to extend the protocol a bit so that, when you give your URL, you can also optionally specify your public key to the relying party. In this case, the relying party would be required to check that the public key used by the provider matches the one you specified at login.
  5. If you don't want to have anything to do with DNS, there's nothing stopping you from using the IP of your server as URL. Hell, if you're not browsing from behind a firewall and your machine can accept incoming HTTP connections, you can have your key on your machine, and just give the IP of your machine -- you don't even need a server.

Hopefully I managed to convince you that identifying users (as opposed to locating their provider) using URLs is a bad idea. Of course, this point is not specifically targeted against OpenID. Mozilla Persona (aka. BrowserID) is using email addresses as identities, which also depend on DNS. I'm still looking for a decentralized authentication scheme which understands that you should just use URLs as pointers and use public keys as identifiers.

Addenum: It turns out that OpenID 2.0 is supposed to support XRIs which (according to the last paragraph of this section) can be used to mitigate the problems I'm talking about. However, after spending some time trying to understand what XRIs are and if someone is using them, I'm not convinced that it is really an elegant and practical way to solve this problem, so I think the point still stands.

Shortcomings of the real world

— updated

Here is a list of fundamental differences between reality and idealized models of the world. It can provide guidelines when designing virtual worlds, or serve as a checklist when trying to reason about the real world:

Irreversibility.
Some things are more easily done than undone (building something versus destroying it, cleaning something up rather than making it dirty, etc.), and some cannot be undone at all (killing people, losing information, wasting resources, etc.). This means that a small number of wrongdoers can have a disproportionate impact because undoing their mess take so much time, and this implies that preventive measures are needed to limit the occurrence of irreversible bad things. This is in contrast to virtual places like Wikipedia where reverting edits isn't substantially harder than making them, and where you can benefit from the fact that vandals are a small minority.
Low dimensionality.
The world has a small number of spatial dimensions: only two are really usable, the third one is harder to use because of gravity. Because of this, the possibility of interaction is limited: you cannot have a high number of things acceptably close to each other. This holds both for groups of people (large groups of people cannot interact meaningfully in real life, which is an obstacle to large-scale collaboration) and for cities (to have everything close to everything, you need absurdly high density).
Imperfect coordination.
Even with arbitrarily good communication technology, large groups of people are harder to coordinate than small groups, because of cognitive limits. For this reason, whenever two groups have contrary interests and must hold one against the other, the larger group will be disadvantaged and have much higher risk of defection. This is a factor explaining why the masses have a hard time coordinating, even though they are numerous by definition.
Non-autonomy of children.
While the harm principle dictates that consenting adults in isolation can be simplified out of the moral equation, this does not work with children: adults in isolation can have children, and those children will not be able to legally consent to everything their parents might do to them. For this reason, society has to keep an eye on how parents raise their children, and find some compromise between the parents' rights and the child's.
Necessary infrastructure.
Long-distance communication is not a given but depends on artificial infrastructure which is not free, can fail or can be controlled by malicious parties. You cannot assume that everyone has access to the Internet in the same way that anyone has access to air.
Unbounded vital needs.
If the vital needs of people could be bounded, there would be some hope of managing to satisfy the needs of everyone and assuming that the survival of every human being is ensured. Sadly, people can have arbitrarily complex health problems and could need arbitrarily involved and expensive treatment. This can be dealt with through an insurance system, but complicates things because some people who have simple needs will want to opt out of such a system, making it unsustainable.
Critical mind.
To achieve the independent thought and critical spirit required to be a free, autonomous agent, education is required. People who are not given this education cannot be considered as individuals and it might not make sense to consider that they are responsible for their actions. Yet, they need to be dealt with in some way or other.
Inheritance.
Assume that money represents some measure of social utility, and that people who earned money should be allowed to use it as they like. In this setting, it is a major problem that most people will want to give their money to their offspring, because the money that the offspring will thus inherit is not linked to their social value. The problem is that the individual interests of the donor ("benefit my offspring") are at odds with the interests of society ("allocate money to people who produce value"). There are no solutions except restricting the freedom of people to use their money or increasing inequality at birth because of the parents' wealth.
Physical encounters.
It is not possible to assume that people live autonomously in isolation from each other and only communicate by exchanging of information. People desire friendships, close relationships, and physical relationships. For this reason, they have to meet in real life.
Public-private continuum.
You cannot divide the world in public places and private places and say that there should be no expectation of privacy in public places, because you need to go through public places to travel from one private place to another. Besides, private conversations will often take place in public space with some expectation of privacy between the speakers. I tried to think more about this point.
Repetitive work.
In reality, repetitive tasks have to be carried out. If you want to do the same things multiple times, you will have to do so, and it will usually be complicated to build a robot to perform the task for you. This is in contrast to the virtual universe where things are usually much easier to formalize and automatize, and where the effort required by a task is much closer to its Kolmogorov complexity.
No records.
Even if there is no expectation of privacy somewhere, there is usually no complete perpetual record of what took place there. Hence, there cannot always be an objective assessment of the truth of factual statements involving public data. Note that the problem is not that records are not reliable and can be tampered with, but the fact that they are not complete or numerous enough: the higher the number of independent records, the harder it gets to engineer consistent fabrications. This is in contrast to virtual space where there is usually abundant evidence available because recording something is often easier than not recording it.

[I just wrote this list quickly to dump some ideas I had in the back of my head, it might not make much sense.]

Recording all your terminal sessions

— updated

I love to log as much information as I can about what I do on my computer. (Of course, I never send those logs to third-party services.) I log all of my keystrokes, I religiously keep all of my command history, all of my email and IRC logs, and so on.

However, something that I didn't log so far is what appears in my terminals. This was a shame: since terminals display text, you would expect that you could log everything which appears on them without using up so much space after all. Logging this information could be useful to reconstitute what you were doing at a particular point in time, to understand how you ended up making a certain mistake or doing a certain thing, to show to someone how to do something, to recover the output of any particular command of your history, and so on.

There is a tool called ttyrec which can be used to log what happens in your terminals (including timing information), but I didn't use it systematically so far because of one simple issue: if you run cat large_file, then ttyrec will happily put all the content of large_file in its log, even though you probably didn't care about it. Just a few accidents like this and your log files can become huge.

The point of this post is to advertise ttyrex, a slight modification of ttyrec which adds an option to cap the quantity of data logged every second. The point of this is that when doing cat large_file, you can just log a small quantity of the file every second and skip the rest, and you will get a reasonable approximation of what you saw on the terminal without using up too much space.

I have been starting ttyrex systematically with urxvt for some time now, compressing logs that are older than two weeks (this saves a tremendous amount of space), and the last two week's worth of logs use up a quantity of disk space which I think is reasonable by today's standards (less that 1 GB). I have also tweaked zsh to store the start time and stop time of the recorded sessions, the start time of ongoing sessions, and the command history of each session: I have then written a command to replay what happened at any point in time (i.e., open one replay terminal for each terminal that was open at some timestamp, and jump at the correct position in each of the replays), and a command to take a line of the command history and open a replay of the right session at the right time to see when the command was entered and which results it gave.

I haven't found any use for all of this yet except playing around, but it's pretty fun (having terminals which replay what I did in the past feels a lot like time travel).

Installing CyanogenMod on a Galaxy Nexus (GSM)

— updated

I just installed CyanogenMod on my Galaxy Nexus phone. There is an official guide; here is my summary of what you need to do.

Backup your data
The process will reset your phone, so you need to back up all your data. An useful open-source program to take care of (part of) this is Slight backup.
Retrieve fastboot
Follow these instructions to install fastboot. I'm not sure that this was entirely straightforward, but I did this long ago so I'm not really sure. fastboot is now packaged for Debian so it is much simpler to install: apt-get install android-tools-fastboot.
Unlock the bootloader
Power down the device, and press the power, volume up and volume down buttons simultaneously for a few seconds. You will thus reach the bootloader. Connect the USB cable and run fastboot oem unlock, and confirm. This will reset the device and unlock the bootloader.
Retrieve and run ClockworkMod
I trust the official guide to have an up-to-date ClockworkMod download link. This being said, you don't need to keep ClockworkMod on your device: you can just boot it as needed. To do so, get to the bootloader like in the previous section, and run fastboot boot CLOCKWORK where CLOCKWORK is the ClockworkMod image file.
Perform a backup
Use ClockworkMod to back up the device before installing anything else, and use adb to retrieve the backup to your computer.
Format all partitions
This is an important step missing from the official guide: you should format /cache, /system and /data before installing CyanogenMod. Otherwise, in my case, CyanogenMod was stuck at the boot animation and adb logcat seemed to suggest that it had to do with a NullPointerException while reading the existing settings.
Retrieve CyanogenMod
Download a CyanogenMod image from this page. I first thought I'd go with a stable version, but I picked the latest nightly as of this writing (cm-10-20120923-NIGHTLY-maguro.zip) and had no problems with it yet.
Install CyanogenMod
Use adb to push the downloaded image to the sdcard folder on the device, and install the image using ClockworkMod.

irctk -- an IRC toolkit

— updated

There are a lot of language-specific libraries to interact with IRC and write bots, but if you want to do this from the shell, the only option I know of is to use ii. Sadly, ii is based on the idea of setting a connection up and interacting with filesystem objects, which is inconvenient if you just want to hack something together in one line like you do with netcat for TCP connections. This post presents irctk, a C program I wrote using libircclient. irctk connects to a server specified on the CLI and uses its standard input and output to read what it should say and write what it just heard. With irctk, you can do stuff like:

# output your server log events on irc
ssh server tail -f logfile.log | irctk example.com '#dashboard'
# timestamp and log irc messages to a file
irctk example.com '#chan' | awk '{ print strftime("%s"), $0; fflush() }' >file

You can also write programs which interact with their standard input and output and then just lift them to IRC with irctk. An example of this is wikifirc, a tool to filter irc.wikimedia.org on specific pages and users. The general scheme is just:

mkfifo fifo
cat fifo | irctk example.com '#chan' | program > fifo

Or you can just write simple programs directly in bash. As a convenience, irctk has options to filter incoming messages and only keep those which are specifically addressed to him, and it can reply automatically to the person who addressed him on the channel where it was addressed. For example, if you address the following bot like "fingerbot: foobar" or "/msg fingerbot foobar", it will reply with information about user foobar found with the finger command:

cat fifo | irctk -Fr fingerbot@example.com '#chat' |
  while read; do
    finger -s -- "$REPLY" 2>&1 | tail -1
  done >fifo

Here is a funnier (bash-specific) example: a bot to roll dice like "dmbot: 3d42" (thanks, p4bl0!): Fixed the code to avoid modulo bias

cat fifo | irctk -Fr dmbot@example.com '#chat' |
while read line; do
  if grep -E '^[0-9]{1,2}d[1-9][0-9]{0,2}$' <<<"$line" &>/dev/null; then
    D=(${line/d/ })
    for ((i = 0; i < ${D[0]}; i++)); do
      shuf -i1-${D[1]} -n1 | tr '\n' ' '
    done
    echo
  else
    echo "format error: must be NdM with N<100 and M<1000"
  fi
done >fifo

irctk has a few other features, like support for commands like "/join", "/nick", "/part", etc., to be able to script actions. Installing irctk should be as easy as installing libircclient (by hand, the version packaged in e.g. Debian is not recent enough as of this writing), and then typing:

git clone 'https://a3nm.net/git/irctk'
cd irctk
make

You can check the README for more information. Comments, suggestions, bug reports and feature requests are welcome at <a3nmNOSPAM@a3nm.net>.