Possible improvements for text entry on keypads

People seem to be entering lots of text using crippled telephone keypads these days. I personally hate these things, because they're so much slower than a regular keyboard, but I have to admit that carrying such a keyboard in your pocket is slightly unpractical.

The layout used on keypads is great for new users. I mean, since the letters are alphabetically ordered, you don't have to hunt that much for the key you want. However, a lot of phone users use their keypads so much that they know the layout by heart. I'm not one of them, but when I see them, I wonder: "wouldn't it be possible to create a layout which is less intuitive but more efficient ?"

Optimizing for frequency (without T9)

Of course, the answer is yes. If we arranged the letters on the keys in a way which ensures that the most frequent letters only require one keypress to reach, things would probably be a lot better.

Let's do the math. I assume that we will only be remapping the '2', '3', '4', '5', '6', '7', '8' and '9' keys. The layout used by mobile phones today is:

2: ABC
3: DEF
4: GHI
5: JKL
6: MNO
7: PQRS
8: TUV
9: WXYZ

Using letter frequency data for the English language, here is a possible layout optimized for letter frequency:

2: ERG
3: TDY
4: ALP
5: OCB
6: IUV
7: NMKQ
8: SWJ
9: HFXZ

Still using the letter frequency data, it's easy to compute the average number of keypresses by letter:

2.149 keypresses per letter for the default layout
1.462 keypresses per letter for the alternative layout

The bottom line is that this simple alternative layout requires 32% less keypresses than the default one.

Optimizing for key repetition (still without T9)

The previous analysis fails to take into account the fact that, whenever you want to use in succession two characters which are on the same key, you have to wait because the phone can't guess where the two characters could be separated. Some phone users prefer to type a dummy character and then delete, which removes the time penalty at the cost of two keypresses, or, more efficiently, press the "right" key (only one keypress).

Still, it might be possible to optimize the layout by ensuring that the letters of the most frequent digraphs are on different keys. We can try to do this without losing the frequency-optimization we did earlier, just by exchanging letters which require the same number of keypresses. (It don't say that it's the most efficient way to do it, I just say that it's a simple way to improve things so which preserves the optimization we did above.)

I could compute the optimal permutation using English digraph frequency data, but this would take some work. I might do it in a future blog entry. In any case, the default layout clearly hasn't been optimized for this.

Distinguishing ambiguities (with T9)

The T9 system requires other kinds of optimizations in order to distinguish similar-looking words.

A first idea would be to ensure that the keys are as equiprobable as possible (where the probability of a key is the sum of the probabilities of its letters). This is an idea from information theory: if a key has a very low total probability and the user almost never presses it, the information given by the user for each keypress is reduced, and more keypresses will be required.

The frequency-optimized layout I presented earlier is a "good" solution to the problem of making all keys equiprobable, though it's not the optimal one. Of course, the default layout isn't optimal at all. Once again, the efficiency difference could be computed using the English letter frequency tables.

However, since the user doesn't enter random strings of letters following the frequency distribution of the English language, but real English words, it would be possible (though perhaps computationally intensive), using a list of the English words with their frequency, to find the layout which minimizes ambiguities. Once again, doing this is left as an exercise to the reader.

Using grammar (still with T9)

When you use T9, you see that it often suggests words which match what you typed but make no grammatical sense in the current context (suggesting "the" after "a", for instance). Wouldn't it be great if T9 only suggested words which "make sense" (or, rather, used the grammatical plausibility of its suggestions as a way to order them)?

This might sound unrealistic, because it seems that it would require your phone to understand (ie. correctly parse) the sentences you're writing. In fact, it doesn't. We could simply use an n-gram model to tag the words entered by the user with their probable grammatical role in the sentence as they are being typed. This would require a bit more memory on the phone (to store the likely tags for each word, and the likely n-grams of tags), but is computationally feasible: it is no harder than standard predictive text, just done on tags instead of letters.

Conclusion

My point was not to develop the optimal keypad, just to show that there are a lot of extremely simple ideas that could make keypads more efficient but which, to my knowledge, haven't been explored. Sadly, this is the kind of things which should have been done right the first time, because, once widespread, they are well nigh impossible to change. If we didn't manage to get rid of QWERTY as a default keyboard layout and to replace it with DVORAK or any of its more efficient variants, then what are the odds of displacing the inefficient alphabetical keypad layout?