a3nm's blog

Russian roulette: a mathematical analysis

— updated

A pompous title, because the underlying math isn't that complicated, but still, here we go.

The game of Russian roulette is played as follows:

We have a revolver with a p-round cylinder loaded with only one round. The cylinder is spun so that the round lies in a position which is assumed to be random. We have n players numbered 0, 1, 2, ..., n-1 standing in a circle. Each player, starting at player 0, will try to shoot himself in the head. If the player dies, the game stops. If the player misses, they pass the revolver to the next player (player n-1 passes to player 0).

There are two main variants: either players don't spin the cylinder when they give it to the next player, or they spin it (and we assume that it puts the bullet in a new random position chosen uniformly at random and independently from other draws).

Players spin

In this version, the cylinder is spun before passing the revolver. This is equivalent to each player rolling a q-sided die, or drawing from an urn with replacement. In this section, we will write f = 1/p for brevity.

Here is my take at solving this one. Let us denote by qi the probability that player i is the one to die. Notice that this game is memoryless in the sense that if a certain number k of rounds have been played without anyone dying, then the situation is exactly the same as in the beginning of the game (except the revolver might be in someone else's hands). For this reason, we have qi = (1 - fqi-1 for all i: a player's odds of dying are that of the previous player times the probability of the revolver not firing. It isn't very hard to see that this is true for the first round, and the observation about the game being memoryless explains why this continues to be true.

Now, we use the fact that the probabilities sum to one, ie. q0 + q1 + ⋯ + qn-1 = 1. This means that q0 + q0(1 - f) + ⋯ + q0(1 - f)n-1, which is q0((1 - f)0 + (1 - f)1 + ⋯ + (1 - f)n-1), is equal to 1. Using the geometric series formulae, we can write that q0(1 - (1 - f)n) = 1 - (1 - f) = f. Now that we know q0 and can express qi as a function of q0, we have the general solution:

qi = f(1 - f)i / (1 - (1 - f)n)

Notice that for an (countable) infinite number of players, the denominator becomes 1, and the formula qi = f(1 - f)i. This makes sense: for an infinite number of players, each players gets the revolver at most once, and their odds of dying is the odds that each of the previous players survived times the odds that they die. Also note that, obviously, the game is not fair, and that for i < j, qi > qj. Of course, the game is more unfair for larger values of f (because the penalty for being among the first to play is higher if the probability of dying is higher and the number of expected rounds is lower).

Players don't spin

This would be the version where players draw from an urn without replacement.

At first glance, this might seem more complicated, because the game isn't memoryless anymore. You might be tempted to think of it as a tree: either player 1 dies with probability 1/p, or player 2 dies with probability (1 - 1/p) · (1/(p-1)) = 1/p (interesting!), or player 3 dies with probability (1 - 1/p) · (1 - 1/(p-1)) · (1/(p-2)) = 1/p (very interesting!), and a pattern seems to emerge...

Well, this is not at all the right way to think about the problem, and I am ashamed to admit that I did not see why before someone explained to me. The right way to see things is that when you're spinning the cylinder at the beginning, you are effectively choosing which player will die, uniformly at random. If pn, then it is now obvious that the first p players have probability 1/p of dying and the others have probability 0. If p > n, then it's harder to write but not much more complicated: writing p = u n + v (where u is the quotient and v the remainder), we have qi = (u + 1)/p for i < v and qi = u/p for iv.

As a consequence, this variant is fair if (and only if) p is a multiple of n.

Efficient email management

— updated

I love email. I really love email. It is federated, which means I can run it on my own server. It is flexible, which means I can bend it to unusual uses. It is pretty simple, which means I can automate stuff with it or even write my own client. It has been around forever, which means it'll probably stay around for some more time. And so, as for everything which I spend a lot of time dealing with (and don't feel guilty about), I find myself trying to manage email the right way.

This post is not about the technical details of my current email setup (which is not perfect yet), but a general description of what I consider to be desirable features for an email setup. I might blog about the technical side once I get it to work exactly like I'd like it to work. Also, this post isn't about what you shouldn't do when sending email to someone, it's about dealing with incoming mail.

Archiving

Most people seem to keep all their email in their inbox (with occasional mass-archiving of all the inbox in a "sort later" folder), and either manage incoming email immediately or hope that they will somehow think about it later. It amazes me that people find this usable and usually remember important things. I use my inbox for messages which require some action on my part (like reading it carefully, reading something they refer to, doing something, answering, etc., or maybe just archiving the message). Whatever is in the inbox has to be moved out at some point. My inbox is a todo list.

The reason why people never archive their mail is because they usually have complicated schemes to archive messages. For instance, I've seen people archive messages in folders labeled by sender (hey, I also used to do that back in the days), which is quite pointless because this information is in the message anyway. My solution is different: one big archive folder. I'll sort later if I need to. Most stuff which gets there is never needed again. Archives like that should be optimized for writing.

Actually, I'll probably never need to sort anything because I can just search my mail (using indexing to get decent speed). True, it can be useful to keep some tags in order to add information which is not easy to find in the message (say I want to find quickly all email about some project...), but I'll only do it if I think I could probably need those messages later. If this isn't the case, just archive, and search later if need be.

Speed

The system should be keyboard-driven, and fast. "keyboard-driven" isn't just about having keyboard shortcuts, it's about having an interface designed with the keyboard in mind. "fast" isn't just about optimizing the hell out of some operations, but about having no useless noticeable delays, which is surprisingly rare.

Integrity

Unless you're working under very tight space constraints or you're being mailbombed, I don't see why you would delete a message, ever. Seriously. Archive it, and have a tag for things that you don't want to see unless you explicitly search for them. There is no reason for this to be hard (just bind it to the key you would ordinarily use to delete stuff).

This section is called "integrity", because I feel that you should distinguish the pristine data that you received from the stuff you add on top of it. Deleting a message violates this rule; modifying a message also does. Mainstream email clients usually don't do that (except to toggle flags), but I've seen mutt plugins silently change the message headers when you did some thing or the other. There should be a pristine copy of the mail which arrives, augmented by a datastore of what you did to the mail, probably supplemented by a cache to speed up things like search. The point is that you should distinguish the base data (email), the user data (labels, annotations, whatever) and the cache (which is just there for speed and can be regenerated).

Threading

Email threading is an absolute must. I cannot imagine what my inbox would look like without threading. I remember I was very surprised to learn that threading isn't guesswork based on subject and citations but a standardized, elegant and simple mechanism (with the headers Message-ID, In-Reply-To and References) which is unexpectedly followed by almost all email clients, even the ones which do not use them for display.

By email threading, I mean regular threads, ie. trees. The Gmail linear approximation (put messages in a thread, but don't show the tree, order messages by date) is better than nothing, but not satisfactory.

Priority inbox and spam filtering

People seem to rely on spam filtering to get rid of unwanted messages and, for Gmail users, rely on Priority inbox to cherry-pick important messages. Those two tools are useful, but it is unacceptable to delegate the task of choosing what to read and what to ignore to some software outside your control which tries to be intelligent.

As it turns out, I don't need those two tools, because I don't get much spam and I am able to deal with individual messages manually. If I needed to, though, I would insist on rolling my own there, more than anywhere else, to be sure that the filtering is sensible and that no important message gets lost because someone (or rather something) thought they knew what was interesting to me.

Mailing-lists

Email is not necessarily designed for messages that you want to read. It's also a convenient way to receive stuff you might want to catch up on later. Think of it as a neat way to keep a mirror of a mailing-list archive and browse it at some point if needed. It's okay to subscribe to tons of mailing-lists, as long as you filter them and redirect them in a folder which is separated from your real mail.

There is very little motive to unsubscribe from anything except technical limitations. If a list used to interest you, it might interest you again later--just redirect it directly to the archive, and tag it so it won't pollute your search results unless you want to.

Tags

In addition to the "tags to help future searches" idea mentioned above, there is also the use case "tags to manage your workflow". Everyone probably has their optimal setup, but there are a few common traps:

  • A message can be "unread" because you didn't read it completely and satisfactorily, or because it just arrived and you didn't triage it. The second one is far more urgent than the first--you want to eliminate useless stuff and see important stuff at once, but having a long non-urgent message identified as such sitting around "unread" is acceptable. It might make sense to open an "unread" message in the second sense, and mark it as "unread" in the first sense (aka. "todo: read").
  • The "sent" messages fall broadly into three categories, you will probably have to distinguish them somehow:
    1. The messages you have to send but don't care about, like answering a request from someone else. Once you have sent the message, the thread can be archived, because you don't care about it unless the other guy replies.
    2. The messages you send and care about, like requesting something. Once you have sent the message, you have to keep it somewhere and make sure that you renew your request after some time until you get an answer or lose interest.
    3. Personal messages where you care about the other person, ie. giving news and receiving news every now and then. The right way to manage this would probably be a counter indicating the last time you sent a message to someone, to give news regularly. A decent approximation, though, is to manage the thread like for case 2.
  • There are "blocked" messages for which you have to do something, but cannot do it right now. Examples are: you need some information which isn't available online, you need to ask someone IRL, you have to wait for someone else to do something, etc. You need to keep track of those messages, but not keep them in your inbox and spend your time bumping into them and remembering that you can't do anything about them yet. It might be a good idea to annotate such messages to remember what you are waiting for before you move them back to the inbox. The problem is that you still have to check those every now and then to see if what you were waiting for has taken place; it would perhaps make sense to indicate when you want to be reminded about the message's existence.

Choices

If the inbox is a todo list, then there is a tradeoff to find between guidance ("next step is: do X") and flexibility ("actually, I feel motivated about Y right now"). Presenting the list of messages in the inbox and asking the user to pick one is very flexible, but the high number of possible choices can be daunting. It's interesting to imagine what the opposite approach would feel like: the system would display threads one after the other, and requires you to either do what is to be done, or add a note explaining what you're waiting for...

Scanning ISBN bar codes

— updated

I have a book collection which I would to have an index of. Fortunately, most recent books have a barcode on their cover which indicates their ISBN, an identifier which is unique for the book and can be used (at least in theory) to retrieve details about the book (title, author, cover image sometimes...). Thus, to keep a list of what is in my bookshelves, I just have to scan the barcode of each book and save the list somewhere.

To do this, I need a tool which takes a video stream as input and outputs the list of recognized EAN-13 codes, one per line. I do not need a fancy app with a slick captive user interface which would try to be smart and store a database in some mysterious place while making it harder for me to script things. Do one thing, and do it well.

If someone else intends to do this, here are two possible ways, and ideas to use the ISBNs.

Barcode Scanner on Android

This is actually a pretty natural thing to do with an Android phone, except you have to find the right barcode scanning app. (Most turn out to be riddled with ads, or ridiculous limitations, or don't have the required feature.)

The ZXing library has an Android port called Barcode Scanner which was able to read almost all tested barcodes with the webcam of my HTC Desire. Activate the Bulk scan mode option in Settings, scan items, go to History, select Send history (beware, Clear history is right next to it and has no confirmation step), and send an email to yourself with the csv attachment. It is free as in freedom (Apache License 2.0).

zbarcam

The zbarcam command (packaged for Debian) is exactly what I needed, but my computer's webcam was too crappy to scan barcodes. This might work for you if you have a better webcam. For the record, before learning about Barcode Scanner, I tried to use my HTC Desire as a webcam over USB using AndroidUsbCamera. Sadly, it does not work on recent kernels because it requires V4L1 support which was dropped in kernel 2.6.38.

What to do with the ISBNs

I thought that an ISBN lookup program would be packaged for Debian, but apparently not. The simplest API to use seems to be that of Open Library (a nice project, by the way, which provides dumps which are apparently in the public domain): example for ISBN 9780517223628. 56 out of my 115 scanned books were found.

For better options, a good starting point seems to be this StackOverflow thread. It seems that the Google Books API is the best one, but interfacing with Google APIs seems to require API keys, registration, etc., so I didn't bother yet.

List Wikipedia categories on stdout

— updated

This is not rocket science, but maybe it can be useful to someone, or it can serve as a simple example of how you can query the Mediawiki API (or any JSON API) using Python 3. The two arguments are the name of the wiki (like "en.wikipedia.org") and the name of the category, and the output is the contents of the category, one item per line. Request continuations are handled, the script will pause for two seconds every request (ie. every 500 items).

For the code (public domain), see here: wikicateg

Here's an example:

$ lswikicateg.py en.wikipedia.org Experiments
Animal testing
Carryover effect
Checking whether a coin is fair
Control variable
Cranfield Experiments
Generality (psychology)
Inflatable Antenna Experiment
Jerry Klein's 2006 radio experiment
LOCAD
Long-term experiment
Lost in the mall technique
Natural experiment
Scientific control
The Third Wave
Category:Experimental archaeology
Category:Experimental economics
Category:Experimental film
Category:Experimental mathematics
Category:Experimental mechanics
Category:Experimental medical treatments
Category:Experimental methods of birth control
Category:Experimental missiles
Category:Experimental music
Category:Experimental physics
Category:Experimental psychology
Category:Experimental rockets
Category:Science experiments
Category:Experimental television stations
Category:Thought experiments
Category:Experimental vehicles

Unbuffering stdin and stdout

— updated

As part of my musings with irctk (which is not yet usable as of this writing), I have been looking for ways to run commands without bufferisation. This is annoyingly hard, because what should make you save time (one-line IRC bots) is actually a mess of cases where things fail to work because buffering happens somewhere in some program in the pipe.

If the program you're trying to pipe is one that you wrote, there are often (more or less obscure) ways to disable buffering or flush buffers by hand. Think about the "-u" option in Python 2 or the (non-POSIX) "fflush" command in awk. If you didn't write the program, maybe it already has an option to do what you want (GNU grep has "--lines-buffered") or maybe not (I'm not aware of such a trick for cut, for instance).

In the last case, you have to use an external tool to change the behaviour of the program. In the relevant StackOverflow question, the suggested tools are a simple but limited program called unbuffer:

unbuffer myprogram

or a complicated socat call:

socat EXEC:myprogram,pty,ctty,echo=0 STDIO

It turns out that there is a third option: the stdbuf program in (recent versions of) the GNU Coreutils does the job both powerfully and with a pleasant syntax. Here is my current recipe to get rid of buffer woes:

stdbuf -i0 -o0 -e0 myprogram

If you want to really know how all this mess works, here is a nice explanation. If you don't care, just remember that (1) buffering can cause data to get stuck in pipes and that (2) stdbuf is the program you want to use in this case.