Curses continuity

Having played the Continuity Game, I didn't like to have to use Flash. So I rewrote a similar game in Python and curses. It's an old, quick job in less than 500 lines of code, and doesn't really include a lot of levels, but it's playable and hackable.

You can get continuity on its repository. You can also directly download a source archive which is current as of this writing. Public domain.

Playing a note with the PC speaker on Linux

At some point, I needed to design a command that you could use to play a note (identified by its scientific pitch notation) on the PC speaker. For instance, I would like to play an A-sharp, 3rd octave on the speaker with:

play_note 'a#3'

For this, we need the beep program which allows us to beep the PC speaker at a given frequency. We then need something to go from "a#3" to the frequency of that particular note (233.081879 Hz in this case). The main point of this post is actually to advertise a2freq, a small tool that I wrote to do precisely this, ie. convert scientific pitch notation to a frequency. It also supports MIDI note numbers (which it uses as an intermediate representation), if you ever need to convert a MIDI file to a beep sequence... For instance:

$ a2freq a4
440.000000

Hopefully the next person who needs to do this can land here and save half an hour by using this code rather than writing it.

To put the pieces together, we can now define play_note as:

beep ${@:1:$(($#-1))} -f `a2freq ${!#}`

The complex crap that you see here is used to make it possible to pass additional arguments to beep, so that you can play a 100-millisecond A with:

play_note -l 100 a

The last parameter is taken to be a note, the rest is passed to beep. (Another decent solution would have been to play multiple arguments as a note sequence, rather than just playing one note and passing all previous arguments to beep.)

One caveat of this: sometimes frequencies get rounded in a bad way, meaning that the notes aren't right. This isn't a2freq's fault; it might be beep's, the kernel's, or the hardware's. On my machine, a#5 and b5 sound the same because of that. So don't assume that this will always work.

Which Creative Commons license?

Why can't people just stop writing "under a Creative Commons license" and actually say which one they have in mind? "Under a CC license" only indicates that attribution is required and that verbatim non-commercial reproduction is allowed, but does nothing to identify where you fall between the almost all-permissive CC-BY and the much more restrictive CC-BY-NC-ND.

Deduplication attacks

The Dropship affair is, to my knowledge, the first example of an interesting attack on deduplication mechanisms which I thought I'd present it here for people who might not want to read about the whole story. Some of the ideas in this post come from comments on Hackernews, especially from this comment by limmeau and replies by alex1 and 0x0.

Data deduplication is a natural and well-known idea: when you're storing redundant data, you want to implement something at the filesystem level to store multiple copies of the same data at the same place and save disk space. This is especially true for personal hosting services like Dropbox, where some big files will be uploaded multiple times by a lot of users (redundancy) and seldom modified (better efficiency because you have to actually copy the data to create the modified version when this happens). [I know that deduplication is done on blocks rather than files, but let's simplify.]

Now, a natural extension of the scheme for personal hosting services is to do the deduplication on the client side. When the user wants to upload something, you ask them to give you a hash of the content first, and check if you already have a file with that hash. If you do, you assume it's the same file (collisions would be both unlikely and impossible to force if you use the right sort of hash functions, and we will indeed assume that the hash used is secure). This scheme both saves space on the server (you only keep one copy of the file) and saves bandwidth (because you don't even have to transfer the file). Sounds great, doesn't it?

Well, the slight problem is that, given a hash, this allows the client to know if any user on the server has a file with this hash, and to get a copy of it if there is one. That is, assuming that you know how the client and server communicate, but unless you rely on security by obscurity, this should be assumed anyway since the client runs on untrusted hardware. (As a side note, the Dropbox client is actually closed source, but this didn't protect them, though I guess Dropbox has better things to do than trying to obfuscate things.)

Hence, we can imagine at least two attacks:

File sharing: Instead of sharing files, you can just share hashes and get the file from Dropbox, which is more convenient and might make Dropbox liable if you use this for illegal purposes such as copyright infringement.
Spying: You can try to guess if someone is hosting a file on Dropbox by testing if the hash is known. In particular, if you have a redacted document for which you can generate possible values for the redacted content and for which you think an unredacted version is on Dropbox, you can test all values and see which one Dropbox knows.

These are not unique attacks, because public file sharing sites like Megaupload will let you exchange files by just transmitting URLs anyway, and because you shouldn't put sensitive data on Dropbox to start with (yes, they encrypt your files, but I think it's done on the server, and the client is proprietary anyway). However, I find this quite elegant.

As a purely useless exercise, I will now concentrate on the file sharing attack, forgetting that public file sharing services exist. limmeau's comment suggests the following protection against the attack: the server can ask the client for bytes of the file at random offsets of its choosing along with requesting the hash. It seems to me that, if the client only has constant size information about the file, the probability of the client managing to pass the challenge decreases as the file size grows (of course, usual disclaimers apply). What's more, the overhead in bandwidth and computation for this scheme (both on client and server) is quite small.

A possible refinement of the attack (alex1's comment) would be to have a network where attackers can forward the server challenges and get the responses from people who actually have the file. We have added the constraint that the exchange must take place when the server challenge is known, but the file can still be retrieved from the server with the attacker exchanging a constant size amount of information with a file-holding accomplice. Can we prevent that?

The answer is yes, if the server and client share some amount of information unknown to the accomplices. For instance, the server could require the client to keep an entropy pool generated from the client's uploads to the server (and bounded by a size C, because the hosting service isn't very useful otherwise). The server could then build challenges requiring knowledge of the entropy pool, like requesting the checksum of the file using the pool as salt, which would require the attacker to exchange more information with the accomplice: what we would hope to achieve if the pool is full would be to force the exchange to require min(C, N) with N the size of the file.

There are still a lot of questions we could ask about that, though. What happens if we take into account the fact that there are multiple transactions? Also, can we do this in a way which is more efficient for the server? Checking random bytes of the file was easy; computing a checksum over it using the entropy pool as salt is hard. Keeping a copy of the pool takes up some space too, though it should be possible to store it efficiently since it is computed from uploads which actually reside on the server.

Representing fuel efficiency

In Europe, fuel efficiency is usually expressed as fuel consumption, which is the number of liters you use for every 100 kilometers you travel (in French, "litres au 100").

The funny thing is that, from the point of view of physics, "litres au 100" are homogeneous to a surface (because liters are a unit for volume and kilometers a unit for distance). For example, if you take a typical consumption of 10 liters per 100 km, this means a consumption of 10^-2 cubic meters for 10⁶ meters, ie 10^-8 square meters (or 0.01 square millimeters, ie. a square 0.1 mm by 0.1 mm).

The most incredible thing, however, is that this way of imagining things lends itself to a very clear visual interpretation which makes it possible to see the quantity of fuel used when travelling a given distance. Imagine that the surface representing the consumption is some kind of antenna over the car. When the car is running, the quantity of petrol used is the volume traversed by the surface.

In the US, however, you don't use fuel consumption that much; you talk about fuel economy, expressed as miles per gallon. Not only do Americans use of imperial units rather than SI units, but they're also thinking the other way round: whereas the French indicate the number of liters to use for a given distance, Americans indicate how far you can go with a given quantity of petrol. This means that the unit of fuel economy is the inverse of a surface. Can we get a visual interpretation out of that too?

The answer is yes, though the interpretation isn't quite as straightforward as in the previous case. Assuming that we have a given volume of fuel and that we know the value of the fuel economy, we would like to see the distance which we can travel. The idea is to see the petrol tank as a cylinder with a base surface S and a height h, and imagine your fuel efficiency of x (in m^-2) as indicating that you get x "efficiency points" per square meter. (This is quite logical: a frequency of 42 Hz, ie. 42 s^-1, means that you encounter 42 "beats" per second.)

Now, imagine a plane which sweeps through the tank while staying parallel to the base. The intersection between the tank and the plane contains a constant number of efficiency points (because it is always of surface S), and you can see them as travelling along with the plane. It happens that the total distance travelled by all of them (ie. h times the number of efficiency points in surface S) is the distance which you can travel.

Of course, if the number of points is fractional, just count the fractional part as appearing only for that fraction of the sweeping, ie. if you have 2.5 points, then count 2.5 times h. Another remark: in fact, you could arrange your fuel in any shape you want, and integrate the number of efficiency points over the (changing) size of the intersection of the sweeping plane with the tank. The assumption that it is arranged in a cylindrical volume is only there for simplicity's sake.

It feels a bit annoying that this second visualization seems more complicated than the first one although it seems to be more or less the same idea. I don't know if this could count as a proof that European way to do things is better than the American one? :-P

Thanks to Claude Perdigou for help in finding the second visualization.

Some more explanations on xkcd What If? and Physics.SE.