a3nm's blog

Mobile phones and privacy

— updated

There are multiple independent reasons to oppose mobile phones on privacy grounds, and they should be carefully distinguished. In this post, I attempt to sketch an exhaustive list.

Location tracking
It is easy for your mobile phone provider to know where your phone is located by looking at which cell sites it is connected to. As the provider usually knows your identity for billing reasons, and as your mobile phone is usually located on your person, this means that your provider usually knows where you are.
Avoiding this problem is not trivial. One solution is to use prepaid SIMs or other such systems where the billing is not performed directly by the cell phone operator (although in some countries, for instance France1, it is a legal requirement to provide proof of your identity when buying a prepaid SIM). Alternatively, you could also consider that your current location is not private information, because of all the other trends that tend to make it public (CCTV, etc.).
Internet interference
Internet traffic on mobile phones is often subjected to more invasive analysis than Internet traffic on computers through regular access providers. This may be because of the widespread policy of accounting for the volume of data transferred on mobile phones network, which is not so common (at least in France) for landline Internet connections, or because of the wish of some mobile phone operators to restrict the services that they allow in order to bill different Internet services separately (because they are used to having complete control on the phone used by the subscriber to access the Internet through the connection they provide). Because of such violations of net neutrality, the Internet access provided on mobile phones seems less trustworthy than a regular broadband connection.
The problem can be circumvented by using a different medium to access the Internet on your mobile, such as Wi-Fi. Otherwise, there is little objective reason to believe that the undesirable behavior of your mobile Internet access provider could not be replicated, at least in principle, by your landline Internet access provider.
Transfer security
Even without assuming interference from the phone provider, one can reasonably doubt the security of the encryption used between the phone and the cell site, leaving the transferred data potentially available to nearby attackers: weaknesses of A5/1, possible spoofing of cell sites, etc.
Of course, this is not worse than using, say, an open Wi-Fi network. You just have to put your own encryption on top of the link layer encryption. This may be harder, however, for standard phone calls and texts.
Phone wiretapping
There is a long history of police forces and other governmental services using wiretapping to access an individual's phone calls. This precedent is what motivates intrusive access to mobile phone calls, text messages, and to some metadata (e.g., who calls or texts whom and when) which is specific to telephones in general and mobile phones in particular.
Such wiretapping can be avoided by using encrypted Internet-based alternatives to standard telephony or text, though this is usually inconvenient because of sparser connectivity, more expensive billing, high bandwidth requirements, and reduced battery life.
Non-federated protocols
One can dislike the standard telephone network because it is less federated than the Internet. Of course, one could also criticize the Internet because it is not exactly federated, but it is certainly undesirable to use this additional single purpose network for voice and text messages with its strange historical billing policies.
This is not really a privacy problem, however, except for the reason that poorly federated protocols may promote bad security and privacy violations.
Proprietary software
Mobile phones ship with software that may be proprietary. This is, of course, a danger to privacy, as such software may misuse your data or incorporate deliberate backdoors or involuntary security flaws (conceivably opening your phone's microphone to a third party...) without any possibility of reviewing what is going on. On current smartphones, for instance, Apple iOS is proprietary, and Google's Android is mostly open source but all Google-branded applications (Google Maps, Google Play Store, etc.) are proprietary and some critical low-level components are also proprietary. Furthermore, if you obtained your phone from your phone operator (rather than buying the naked phone with a stock Android install), the carrier may have added its own applications which are probably proprietary and maybe pretty treacherous. Much more worryingly, the radio firmware of mobile phones is essentially always proprietary.
This issue can be mitigated by using free software on your phone. On Android phones, a first easy step, which does not however eliminate all proprietary dependencies, is to use a community-maintained ROM such as CyanogenMod without installing the proprietary Google applications. More radically, you can use Replicant to eliminate proprietary dependencies altogether (except for radio). It is also interesting to investigate existing or upcoming options such as the Maemo-based Nokia N900, the Openmoko-based Neo Freerunner, the Firefox-OS-based GeeksPhone Keon, the Ubuntu Touch OS, etc. As for the radio firmware, my knowledge of this is somewhat limited, but it seems like there is one (only one) open source alternative, namely OsmocomBB, which you can use on some very specific phones (for GSM, not 3G). So the issue of the radio firmware can also be solved, at least in principle.
Undesirable integrated services
Even if you trust the software on your phone to serve your needs, mobile phone operating systems today are usually configured to be very tightly integrated with third party services that you may not trust. For instance, Android phones with Google applications will encourage you to hand over your email, calendar, location (with Google Maps), searches (with Google Search), nearby Wi-Fi networks, etc., to Google.
Of course, to solve this, you just have to avoid the default services recommended by your phone software, in favor of trusted, privacy-aware or (ideally) self-hosted alternatives. This can be easier said than done, however. Most Android software is only distributed through the Google Play Store, meaning that you will be forced to use this service if you want to use such software. As another example, consider the task of maintaining a calendar on your Android phone and synchronizing with the outside world, without using Google Calendar or the protocol of the proprietary Microsoft Exchange: to my knowledge, the only way to perform this using free software has appeared only fairly recently.

My point in making this list is just of making people aware of what I hope is the complete privacy case against mobile phones, so that they can distinguish the various possible dangers and know where they stand relative to each of them. From there, the decision of what to do is a personal choice; for instance, my own choice is to give up on privacy for my physical location, mistrust the Internet connection as I would mistrust an open Wi-Fi, using a VPN and/or SSL connections, use standard calls and texts but keeping in mind that they are insecure, using Cyanogen (without Google applications, but with some proprietary blobs and proprietary radio firmware), and avoiding third-party services in favor of self-hosted ones.

Of course, this list does not cover other reasons to oppose mobile phones, such as boycotting them because of how they are produced, boycotting mobile phone plans, avoiding them on unclear health grounds, refusing to be constantly available and taking time to disconnect, etc. (this latter list is certainly not complete).

1 I tried for about two hours to to figure out the exact law which imposes this, but I couldn't find it. To my knowledge, all French mobile phones operators have a suspiciously similar activation procedure for prepaid SIMs requiring you to provide some proof of ID to use (or continue to use) the SIM you bought; however, it is never explained exactly why this activation procedure exists, the most explicit references in the TOS being "conformément à une demande ministérielle intervenue dans le cadre de la loi 91-646 du 10 juillet 1991 et à l’article L34-1-1 du code des postes et communications électroniques" which is pretty vague: I couldn't find anywhere the exact nature or text of this ministerial demand, and find this vaguely worrying.

Ambiguous verbal forms in French: a larger list

— updated

In a previous post I gave a list of 44 French verbal forms that are ambiguous (in the sense that they can correspond to different verbs), and hoped that it was exhaustive. How wrong was I. Following Erik McDonald's suggestion that entries were missing from the list, I used Verbiste to compute a new list. Added 59 more forms found using the awesome Lefff's extensional lexicon. The result contains 529 verbal forms (!) and subsumes the previous list. I will not dare to hope that this list is complete, but it is certainly more complete than the previous one.

At such a scale, I elect to divide the list in chunks.

Case 1: the -ss-

A frequent situation is that two verbs will clash because one can be formed from the other one by adding "-ss-" (often to denote a pejorative connotation), which may also happen at the imperfect tense of the subjunctive mood. In the following verbs from the first conjugation group, this is exactly what happens.

baver and bavasser
bavasse, bavassent, bavasses, bavassiez, bavassions
brouiller and brouillasser
brouillasse, brouillassent, brouillasses, brouillassiez, brouillassions
cailler and caillasser
caillasse, caillassent, caillasses, caillassiez, caillassions
crever and crevasser
crevasse, crevassent, crevasses, crevassiez, crevassions
damer and damasser
damasse, damassent, damasses, damassiez, damassions
débarrer and débarrasser
débarrasse, débarrassent, débarrasses, débarrassiez, débarrassions
dégueuler and dégueulasser
dégueulasse, dégueulassent, dégueulasses, dégueulassiez, dégueulassions
embarrer and embarrasser
embarrasse, embarrassent, embarrasses, embarrassiez, embarrassions
encrer and encrasser
encrasse, encrassent, encrasses, encrassiez, encrassions
enlier and enliasser
enliasse, enliassent, enliasses, enliassiez, enliassions
enter and entasser
entasse, entassent, entasses, entassiez, entassions
grogner and grognasser
grognasse, grognassent, grognasses, grognassiez, grognassions
ramer and ramasser
ramassent, ramasse, ramasses, ramassiez, ramassions
rêver and rêvasser
rêvassent, rêvasse, rêvasses, rêvassiez, rêvassions
terrer and terrasser
terrassent, terrasses, terrasse, terrassiez, terrassions
tourner and tournasser
tournassent, tournasses, tournasse, tournassiez, tournassions
traîner and traînasser
traînassent, traînasses, traînasse, traînassiez, traînassions

The same situation, but involving two impersonal verbs:

brumer and brumasser
brumasse
frimer and frimasser
frimasse
mouiller and mouillasser
mouillasse

The same situation sometimes occurs between first group and second group verbs:

pâtir and pâtisser
pâtissaient, pâtissais, pâtissait, pâtissant, pâtissent, pâtisse, pâtisses, pâtissez, pâtissiez, pâtissions, pâtissons
tapir and tapisser
tapissaient, tapissais, tapissait, tapissant, tapissent, tapisses, tapisse, tapissez, tapissiez, tapissions, tapissons
vernir and vernisser
vernissaient, vernissais, vernissait, vernissant, vernissent, vernisses, vernisse, vernissez, vernissiez, vernissions, vernissons

It may also occur between first group and third group verbs, as in the case of "voir" and derivatives:

voir and visser
vissent, visses, visse, vissiez, vissions
revoir and revisser
revissent, revisse, revisses, revissiez, revissions

Or in the more complex case of "bruir" and "bruire" (two verbs with different meanings):

bruir and bruisser
bruissais, bruisses, bruissez, bruissiez, bruissions, bruissons
bruire and bruir and bruisser
bruissaient, bruissait, bruissant, bruisse, bruissent
bruire and bruir
bruit

Case 2: the -i-

Adding an "-i-" to the infinitive can give a different verb, but the conjugations will once again overlap. The boundary is often blurry here: I judged that "tarifer" and "tarifier" were alternative spellings of the same verbs and did not include it in the list, but there is no doubt that "parer" and "parier" are entirely different verbs:

affiler and affilier
affiliez, affilions
aller and allier
alliez, allions
colorer and colorier
coloriez, colorions
déparer and déparier
dépariez, déparions
distancer and distancier
distanciez, distancions
parer and parier
pariez, parions
rader and radier
radiez, radions
raller and rallier
ralliez, rallions
référencer and référencier
référenciez, référencions

Case 3: other cases with the first group

I removed several entries from the list which were forms ambiguous between two alternative spellings of the same first group infinitive, like "interpeler" and "interpeller". Sometimes it was a closer call ("rengrener" and "rengréner"). However, there seems to be no doubt that "taveler" and "taveller" are very different verbs that just happen to share a large part of their conjugation:

taveler and taveller
tavellent, tavelleraient, tavellerais, tavellerai, tavellerait, tavelleras, tavellera, tavellerez, tavelleriez, tavellerions, tavellerons, tavelleront, tavelles, tavelle

Then there is the case of verbs where the third person plural of the simple past tense indicative matches another verb with an additional -er-:

galérer and galer
galèrent
maniérer and manier
manièrent
lacérer and lacer
lacèrent

And there are two more cases within the first group, after which point all cases will include at least a verb of the second or third group; one of them includes the funny verb "raller" (which, in the sense of "to go again", is putatively conjugated as the very irregular aller), we will meet it again later:

capéer and caper
capée, capées
raller and railler
raille, railles, raillent

Case 4: non-homophones

These verbs are exceptional because the forms are ambiguous in writing but are pronounced differently, so they also appear in my list of French non-homophonous homographs.

obvenir and obvier
obvient
convenir and convier
convient
pressentir and presser
pressent
surfaire and surfer
surfais, surfait, surfassent, surfasses, surfasse, surfassiez, surfassions, surferaient, surferais, surferai, surferait, surferas, surfera, surferez, surferiez, surferions, surferons, surferont

Case 5: être

The unique conjugation of "être" has three forms which overlaps with other verbs. Too bad.

être and suivre
suis
être and sommer
sommes
être and étayer
étaient

Case 6: large overlaps

In some cases a third group verb's conjugation is very irregular but looks like that of a regular first group verb. For example, "peindre" often looks like "peigner":

peindre and peigner
peignaient, peignais, peignait, peignant, peignent, peigne, peignes, peignez, peigniez, peignions, peignons
dépeindre and dépeigner
dépeignaient, dépeignais, dépeignait, dépeignant, dépeigne, dépeignent, dépeignes, dépeignez, dépeigniez, dépeignions, dépeignons
repeindre and repeigner
repeignaient, repeignais, repeignait, repeignant, repeignent, repeigne, repeignes, repeignez, repeigniez, repeignions, repeignons

Or "raire" and "rayer":

raire and rayer
raient, raie, raies, rayaient, rayais, rayait, rayant, rayez, rayiez, rayions, rayons
braire and brayer
braie, braient, braies, brayaient, brayais, brayait, brayant, brayez, brayiez, brayions, brayons

Or "ouvrir" and "ouvrer":

ouvrir and ouvrer
ouvraient, ouvrais, ouvrait, ouvrant, ouvrent, ouvre, ouvres, ouvrez, ouvriez, ouvrions, ouvrons
recouvrir and recouvrer
recouvraient, recouvrais, recouvrait, recouvrant, recouvrent, recouvre, recouvres, recouvrez, recouvriez, recouvrions, recouvrons

Or "faillir" and "failler", "saillir" and "sailler" (but pay attention to the fact that those two cases are slightly different because "faillir" and "saillir" do not follow the same pattern):

faillir and failler
faillaient, faillais, faillait, faillant, faillent, faillez, failliez, faillions, faillons
saillir and sailler
saillaient, saillait, saillant, saillent, sailleraient, saillerait, saillera, sailleront, saille

Or, well, a bunch of other cases:

fondre and fonder
fondaient, fondais, fondait, fondant, fonde, fondent, fondes, fondez, fondiez, fondions, fondons
refondre and refonder
refondaient, refondais, refondait, refondant, refonde, refondent, refondes, refondez, refondiez, refondions, fondons
moudre and mouler
moulaient, moulais, moulait, moulant, moule, moulent, moules, moulez, mouliez, moulions, moulons
remoudre and remouler
remoulaient, remoulais, remoulait, remoulant, remoule, remoulent, remoules, remoulez, remouliez, remoulions, remoulons
vermoudre and vermouler
vermoulaient, vermoulais, vermoulait, vermoulant, vermoule, vermoulent, vermoules, vermoulez, vermouliez, vermoulions, vermoulons
mouvoir and mouver
mouvaient, mouvais, mouvait, mouvant, mouvez, mouviez, mouvions, mouvons
venir and vener
venaient, venais, venait, venant, venez, veniez, venions, venons
matir and mater
mataient, matais, matait, matant, mate, matent, mates, matez, matiez, mations, matons
mouvoir and musser
musse, mussent, musses, mussiez, mussions
choir and cherrer
cherra, cherrai, cherraient, cherrais, cherrait, cherras, cherrez, cherriez, cherrions, cherrons
savoir and saurer
sauraient, saurai, saurais, saurait, saura, sauras, saurez, sauriez, saurions, saurons
médire and médiser
médisaient, médisais, médisait, médisant, médise, médisent, médises, médisez, médisiez, médisions, médisons
choir and choyer
choient, choyant, choyez, choyons

Case 7: slight overlaps

In some cases, the overlap is just on one form (here, the first and second person present indicative of "paraître" versus the imperfect indicative of "parer", not the third person because of the "î"):

paraître and parer
parais
comparaître and comparer
comparais

Or a first group verb with a misplaced -r- ends up sharing its third person plural present indicative with the third person plural simple past indicative of an irregular verb:

mettre and mirer
mirent
admettre and admirer
admirent
voir and virer
virent
revoir and revirer
revirent
moudre and moulurer
moulurent
devoir and durer
durent
mouvoir and murer
murent

Or the third group verb's feminine past participle has a tempting "-e" ending which makes it look like the present indicative tense of a perfectly regular verb. Do not miss the triple ambiguity case between "paître" and "pouvoir" in addition to "puer", the only case of triple ambiguity in the whole list along with "bruir"/"bruire"/"bruisser":

prendre and priser
prise, prises
déprendre and dépriser
déprise, déprises
méprendre and mépriser
méprise, méprises
reprendre and repriser
reprise, reprises
cuire and cuiter
cuite, cuites
médire and méditer
médite, médites
feindre and feinter
feinte, feintes
teindre and teinter
teinte, teintes
mettre and miser
mise, mises
remettre and remiser
remise, remises
remplir and remplier
remplie, remplies
décroître and décruer
décrue, décrues
mouvoir and muer
mue, mues
paître and pouvoir and puer
pue, pues
savoir and suer
sue, sues
joindre and jointer
jointe, jointes
traire and traiter
traite, traites
taire and tuer
tue, tues

We mentioned "faillir" and "failler" above, but the list would be incomplete without:

faillir and falloir
faut
failler and falloir
faille

And we still have a few more:

rentraire and rentrer
rentraient, rentrais, rentrait
ailler and aller
aille, aillent, ailles

Case 8: second and third group

Those last cases only involve second and third group verbs, and thus differ from all of the preceding ones (except the "bruir"/"bruire" case handled with "bruisser" above, the "être"/"suivre" case above, and the "paître"/"pouvoir" subcase).

vivre and voir
vis, vit
revivre and revoir
revis, revit
croire and croître
crois, cru, crue, crues, crûmes, crurent, crus, crusse, crussent, crusses, crussiez, crussions, crut, crût, crûtes
rasseoir and rassir
rassîmes, rassirent, rassis, rassise, rassises, rassissent, rassisse, rassisses, rassissiez, rassissions, rassîtes, rassit, rassît
plaire and pleuvoir
plue, plues, plu, plus, plut, plût, plurent
paître and pouvoir
pu, pus
raller and rire
rira, rirai, riraient, rirais, rirait, riras, rirez, ririez, ririons, rirons, riront

The complete list

Here is the complete list, in case you want to process it automatically:

admirent
affiliez
affilions
aille
aillent
ailles
alliez
allions
bavasse
bavassent
bavasses
bavassiez
bavassions
braie
braient
braies
brayaient
brayais
brayait
brayant
brayez
brayiez
brayions
brayons
brouillasse
brouillassent
brouillasses
brouillassiez
brouillassions
bruissaient
bruissais
bruissait
bruissant
bruisse
bruissent
bruisses
bruissez
bruissiez
bruissions
bruissons
bruit
brumasse
caillasse
caillassent
caillasses
caillassiez
caillassions
capée
capées
cherra
cherrai
cherraient
cherrais
cherrait
cherras
cherrez
cherriez
cherrions
cherrons
choient
choyant
choyez
choyons
coloriez
colorions
comparais
convient
crevasse
crevassent
crevasses
crevassiez
crevassions
crois
cru
crue
crues
crûmes
crurent
crus
crusse
crussent
crusses
crussiez
crussions
crut
crût
crûtes
cuite
cuites
damasse
damassent
damasses
damassiez
damassions
débarrasse
débarrassent
débarrasses
débarrassiez
débarrassions
décrue
décrues
dégueulasse
dégueulassent
dégueulasses
dégueulassiez
dégueulassions
dépariez
déparions
dépeignaient
dépeignais
dépeignait
dépeignant
dépeigne
dépeignent
dépeignes
dépeignez
dépeigniez
dépeignions
dépeignons
déprise
déprises
distanciez
distancions
durent
embarrasse
embarrassent
embarrasses
embarrassiez
embarrassions
encrasse
encrassent
encrasses
encrassiez
encrassions
enliasse
enliassent
enliasses
enliassiez
enliassions
entasse
entassent
entasses
entassiez
entassions
étaient
faillaient
faillais
faillait
faillant
faille
faillent
faillez
failliez
faillions
faillons
faut
feinte
feintes
fondaient
fondais
fondait
fondant
fonde
fondent
fondes
fondez
fondiez
fondions
fondons
fondons
frimasse
galèrent
grognasse
grognassent
grognasses
grognassiez
grognassions
jointe
jointes
lacèrent
manièrent
mataient
matais
matait
matant
mate
matent
mates
matez
matiez
mations
matons
médisaient
médisais
médisait
médisant
médise
médisent
médises
médisez
médisiez
médisions
médisons
médite
médites
méprise
méprises
mirent
mise
mises
mouillasse
moulaient
moulais
moulait
moulant
moule
moulent
moules
moulez
mouliez
moulions
moulons
moulurent
mouvaient
mouvais
mouvait
mouvant
mouvez
mouviez
mouvions
mouvons
mue
mues
murent
musse
mussent
musses
mussiez
mussions
obvient
ouvraient
ouvrais
ouvrait
ouvrant
ouvre
ouvrent
ouvres
ouvrez
ouvriez
ouvrions
ouvrons
parais
pariez
parions
pâtissaient
pâtissais
pâtissait
pâtissant
pâtisse
pâtissent
pâtisses
pâtissez
pâtissiez
pâtissions
pâtissons
peignaient
peignais
peignait
peignant
peigne
peignent
peignes
peignez
peigniez
peignions
peignons
plu
plue
plues
plurent
plus
plut
plût
pressent
prise
prises
pu
pue
pues
pus
radiez
radions
raie
raient
raies
raille
raillent
railles
ralliez
rallions
ramasse
ramassent
ramasses
ramassiez
ramassions
rassîmes
rassirent
rassis
rassise
rassises
rassisse
rassissent
rassisses
rassissiez
rassissions
rassit
rassît
rassîtes
rayaient
rayais
rayait
rayant
rayez
rayiez
rayions
rayons
recouvraient
recouvrais
recouvrait
recouvrant
recouvre
recouvrent
recouvres
recouvrez
recouvriez
recouvrions
recouvrons
référenciez
référencions
refondaient
refondais
refondait
refondant
refonde
refondent
refondes
refondez
refondiez
refondions
remise
remises
remoulaient
remoulais
remoulait
remoulant
remoule
remoulent
remoules
remoulez
remouliez
remoulions
remoulons
remplie
remplies
rentraient
rentrais
rentrait
repeignaient
repeignais
repeignait
repeignant
repeigne
repeignent
repeignes
repeignez
repeigniez
repeignions
repeignons
reprise
reprises
rêvasse
rêvassent
rêvasses
rêvassiez
rêvassions
revirent
revis
revisse
revissent
revisses
revissiez
revissions
revit
rira
rirai
riraient
rirais
rirait
riras
rirez
ririez
ririons
rirons
riront
saillaient
saillait
saillant
saille
saillent
saillera
sailleraient
saillerait
sailleront
saura
saurai
sauraient
saurais
saurait
sauras
saurez
sauriez
saurions
saurons
sommes
sue
sues
suis
surfais
surfait
surfasse
surfassent
surfasses
surfassiez
surfassions
surfera
surferai
surferaient
surferais
surferait
surferas
surferez
surferiez
surferions
surferons
surferont
tapissaient
tapissais
tapissait
tapissant
tapisse
tapissent
tapisses
tapissez
tapissiez
tapissions
tapissons
tavelle
tavellent
tavellera
tavellerai
tavelleraient
tavellerais
tavellerait
tavelleras
tavellerez
tavelleriez
tavellerions
tavellerons
tavelleront
tavelles
teinte
teintes
terrasse
terrassent
terrasses
terrassiez
terrassions
tournasse
tournassent
tournasses
tournassiez
tournassions
traînasse
traînassent
traînasses
traînassiez
traînassions
traite
traites
tue
tues
venaient
venais
venait
venant
venez
veniez
venions
venons
vermoulaient
vermoulais
vermoulait
vermoulant
vermoule
vermoulent
vermoules
vermoulez
vermouliez
vermoulions
vermoulons
vernissaient
vernissais
vernissait
vernissant
vernisse
vernissent
vernisses
vernissez
vernissiez
vernissions
vernissons
virent
vis
visse
vissent
visses
vissiez
vissions
vit

Even more Kobo hacking

— updated

I broke my Kobo Touch (the screen was damaged, probably because the device was crushed against something in a bag, interesting to know that you ought to be careful with it), and bought a Kobo Glo (model N613) to replace it. Here is some info about hacks I've done.

Old stuff

You can check my original post for details about what needs to be done at first, I'm just going to allude to it. You need to fake activation in the usual way, though you might need some more clever choices to make it look plausible for the Kobo (caution though, the last column mentioned by those instructions did not exist in my sqlite file). Install the latest firmware, prepare a fake update to activate a telnet daemon, and get root. Install dropbear, edit /etc/hosts. From the contents of /mnt/onboard/.kobo/Kobo/Kobo eReader.conf I feel safer adding the following to my previous list:

0.0.0.0 www.kobobooks.com webstore.kobobooks.com webstore2.kobobooks.com
0.0.0.0 secure.kobobooks.com ecimages.kobobooks.com social.kobobooks.com
0.0.0.0 partner.kobobooks.com mobilepartner.kobobooks.com

There is no home button anymore, but factory reset can be performed by booting while pressing the light button (the LED will turn to red). Reset button is still here. Interestingly, you need to press the reset button to reboot when nickel is dead, a long press on the power switch will not be enough like I think it used to be on the Touch. (Remember that nickel is the Kobo's proprietary frontend software.) So... have a paperclip ready whenever you kill nickel, or be sure to always use busybox reboot from the shell (and not to drop the connection, of course...).

Connecting to the device via USB

A useful trick from here. Just add the following at the end of /etc/init.d/rcS:

busybox insmod /drivers/ntx508/usb/gadget/arcotg_udc.ko
busybox insmod /drivers/ntx508/usb/gadget/g_ether.ko

Add the following at the end of /usr/local/Kobo/udev/ac and /usr/local/Kobo/udev/plug:

/sbin/ifconfig usb0 192.168.2.2

You should now connect the device to your computer, issue ifconfig usb0 192.168.2.1, and connect to 192.168.2.2. Depending on your network connection manager and the phase of the moon, it might help to rerun this command "occasionnally" (I did it every 2 seconds or so).

Interestingly, this trick does not interfere with the proper workings of nickel, though it will prevent you from mounting /mnt/onboard as UMS.

Putting an offline copy of Wikipedia on the device

I find it pretty cool to have a copy of the entire Wikipedia on my device. I managed to do so using Kiwix, which is comparatively easy, but then some effort is needed to use the built-in browser in offline mode.

Retrieve the ZIM file corresponding to the Wikipedia that you want from this page. For the English Wikipedia without images, the onboard storage of the device will not be sufficient, and you will need a MicroSD card. If the ZIM file is over 4 GB, you will not be able to put it on a FAT32 filesystem. This is not a problem for the Linux kernel running on the device, of course, but by default the device will complain unless the first partition of the SD card isn't a FAT partition.

Fortunately, this isn't managed by nickel and we can do things properly. The file to edit is /usr/local/Kobo/udev: for intance, you can add mount /dev/mmcblk1p2 /mnt/wikipedia before the dosfsck command and umount -l /mnt/wikipedia after the umount command. This assumes that your Wikipedia SD card has a first FAT partition and a second partition containing Wikipedia, and will mount the Wikipedia partition on /mnt/wikipedia (or fail silently if you insert a card with no suitable second partition). You can tune this to your liking. Once you're done, reboot the device and check that your Wikipedia ZIM file is indeed visible at the expected location at boot.

We now need a tool to browse the ZIM file. Fortunately, the Kiwix project has a very nice tool called kiwix-serve which runs as a HTTP server to serve the content of the dump (unlike lots of other offline Wikipedia tools which insist on serving the content with their own crappy user interface that we couldn't use here even if we wanted to). What's even more fortunate, there are ARM binaries of the Kiwix tools available, so we won't need to cross-compile. Retrieve an ARM build of Kiwix from this page. Transfer it to the device (say in /root), and add the following at the end of /etc/init.d/rcS to run the HTTP server:

(sleep 10; /root/kiwix-serve --port=80 /mnt/wikipedia/wikipedia_en_all_nopic_01_2012.zim) &

For convenience, add 127.0.0.1 a to /etc/hosts to make access to localhost easier. It seems that everything's been taken care of and that we just have to access "a" (i.e. localhost) from nickel's built-in web browser in Settings -> Extras... except that, as you will be pleased to notice, this won't work because nickel will require you to connect to a Wifi network to use the browser, even if what you want to do is just access localhost. Damn. Damn!

It seems that the only way around this extremely annoying misfeature is to reverse-engineer and patch nickel. What follows is not my own work (although it's not available online elsewhere to the best of my knowledge): I am extremely grateful to Glyn from Oxford Hackspace who managed to achieve this while I generously volunteered subtly misleading information to make his job a bit harder.

The relevant file to edit is /usr/local/Kobo/libnickel.so. If you have firmware version 2.5.2 (i.e., the SHA1 sum of your copy of this file is 4c3d7d8cdce4927cbffbde8d3d4c6b7bd35de5c1) and you are in a hurry, you can just grab this file and apply it with bspatch to your libnickel.so file (keep a backup copy of the original file to restore it if things go wrong!), and reboot your Kobo, and hopefully things should work. If you're not in a hurry or don't have the same version, I will go into some detail about how this patch was prepared, so that the process can still be applied to different versions of the firmware (assuming that this part of the code doesn't change too much between versions).

We will need to patch at two places. First, in the function _ZN23WirelessWorkflowManager11openBrowserERK4QUrl that is invoked when opening the browser from settings, we need to work around an attempt to connect to a Wifi network. On firmware 2.5.2, the objdump output looks like this:

007e52a4 <_ZN23WirelessWorkflowManager11openBrowserERK4QUrl>:
  7e52a4:       e92d4070        push    {r4, r5, r6, lr}
  7e52a8:       e1a04000        mov     r4, r0
  7e52ac:       e24dd008        sub     sp, sp, #8
  7e52b0:       e1a06001        mov     r6, r1
  7e52b4:       e59f5048        ldr     r5, [pc, #72]   ; 7e5304 <_ZN23WirelessWorkflowManager11openBrowserERK4QUrl+0x60>
  7e52b8:       ebf24996        bl      477918 <_init+0x256b8>
  7e52bc:       e1a01006        mov     r1, r6
  7e52c0:       e2840010        add     r0, r4, #16
  7e52c4:       ebf262a1        bl      47dd50 <_init+0x2baf0>
  7e52c8:       e59f1038        ldr     r1, [pc, #56]   ; 7e5308 <_ZN23WirelessWorkflowManager11openBrowserERK4QUrl+0x64>
  7e52cc:       e59f3038        ldr     r3, [pc, #56]   ; 7e530c <_ZN23WirelessWorkflowManager11openBrowserERK4QUrl+0x68>
  7e52d0:       e08f5005        add     r5, pc, r5
  7e52d4:       e0851001        add     r1, r5, r1
  7e52d8:       e3a0c080        mov     ip, #128        ; 0x80
  7e52dc:       e0853003        add     r3, r5, r3
  7e52e0:       e1a00004        mov     r0, r4
  7e52e4:       e1a02004        mov     r2, r4
  7e52e8:       e58dc000        str     ip, [sp]
  7e52ec:       ebf1ebab        bl      4601a0 <_init+0xdf40>
  7e52f0:       e1a00004        mov     r0, r4
  7e52f4:       e3a01001        mov     r1, #1
  7e52f8:       e28dd008        add     sp, sp, #8
  7e52fc:       e8bd4070        pop     {r4, r5, r6, lr}
  7e5300:       eaf252c3        b       479e14 <_init+0x27bb4>
  7e5304:       00e7eae8        rsceq   lr, r7, r8, ror #21
  7e5308:       ffce57e8                        ; <UNDEFINED> instruction: 0xffce57e8
  7e530c:       ffd28838                        ; <UNDEFINED> instruction: 0xffd28838

That last jump at 0x7e5300 needs to be changed to jump instead to a function called _ZN23WirelessWorkflowManager25openBrowserAfterConnectedEv that is located at offset 0x7e0e88 in version 2.5.2. So, the binary must be patched to change the four bytes at position 0x7e5300 to branch instead to the previously mentioned function. This should involve leaving the fourth byte at 0xea (the opcode for an unconditional jump without link) and changing the first three bytes to the correct offset.

Second, even once the browser has been started, there will be regular checks for an Internet connection. The incriminated function is _ZN19N3BrowserController15checkConnectionEv which looks like this:

00a80914 <_ZN19N3BrowserController15checkConnectionEv>:
  a80914:       e92d4070        push    {r4, r5, r6, lr}
  a80918:       e1a04000        mov     r4, r0
  a8091c:       ebe75f20        bl      4585a4 <_init+0x6344>
  a80920:       e5901000        ldr     r1, [r0]
  a80924:       e5913030        ldr     r3, [r1, #48]   ; 0x30
  a80928:       e12fff33        blx     r3
  a8092c:       e3500000        cmp     r0, #0
  a80930:       18bd8070        popne   {r4, r5, r6, pc}
  a80934:       ebe75f1a        bl      4585a4 <_init+0x6344>
  a80938:       ebe7ecbe        bl      47bc38 <_init+0x299d8>
  a8093c:       e3500000        cmp     r0, #0
  a80940:       1a000004        bne     a80958 <_ZN19N3BrowserController15checkConnectionEv+0x44>
  a80944:       e594002c        ldr     r0, [r4, #44]   ; 0x2c
  a80948:       e3500000        cmp     r0, #0
  a8094c:       0a000005        beq     a80968 <_ZN19N3BrowserController15checkConnectionEv+0x54>
  a80950:       e8bd4070        pop     {r4, r5, r6, lr}
  a80954:       eae787ce        b       462894 <_init+0x10634>
  a80958:       ebe75f11        bl      4585a4 <_init+0x6344>
  a8095c:       e3a01001        mov     r1, #1
  a80960:       e8bd4070        pop     {r4, r5, r6, lr}
  a80964:       eae7e52a        b       479e14 <_init+0x27bb4>
  a80968:       e3a00014        mov     r0, #20
  a8096c:       ebe77415        bl      45d9c8 <_init+0xb768>
  a80970:       e1a01004        mov     r1, r4
  a80974:       e1a05000        mov     r5, r0
  a80978:       ebe771d8        bl      45d0e0 <_init+0xae80>
  a8097c:       e594202c        ldr     r2, [r4, #44]   ; 0x2c
  a80980:       e1520005        cmp     r2, r5
  a80984:       01a00005        moveq   r0, r5
  a80988:       0afffff0        beq     a80950 <_ZN19N3BrowserController15checkConnectionEv+0x3c>
  a8098c:       e284002c        add     r0, r4, #44     ; 0x2c
  a80990:       e1a01005        mov     r1, r5
  a80994:       ebe7b2ba        bl      46d484 <_init+0x1b224>
  a80998:       e594002c        ldr     r0, [r4, #44]   ; 0x2c
  a8099c:       eaffffeb        b       a80950 <_ZN19N3BrowserController15checkConnectionEv+0x3c>
  a809a0:       e1a04000        mov     r4, r0
  a809a4:       e1a00005        mov     r0, r5
  a809a8:       ebe77124        bl      45ce40 <_init+0xabe0>
  a809ac:       e1a00004        mov     r0, r4
  a809b0:       ebe79be5        bl      46794c <_init+0x156ec>

The first blx instruction is calling something else to check if a WiFi connection exists, and the cmp is checking its return value. We need to alter the next instruction so that the result of this comparison is ignored, so that the pop proceeds unconditionally. So, the binary must be patched to change the four bytes starting at 0xa80930 to make the pop unconditional. This amounts to changing the fourth byte from 0x18 to 0xe8.

Now that this has been taken care of, the setup is pretty usable. Just connect to "a" and you should be all set. A word of caution, though: if the browser can't connect to the HTTP server (e.g. if kiwix-serve isn't running properly), it will fail silently. The static Wikipedia copy that you can browse in this way looks like what you would expect from running kiwix-serve on your own machine. I did not try to generate a full-text index, but maybe it is possible to use one (see kiwix-index), though it does seem to take up a lot of space. Oh, another caveat: the article names in the search box need to be typed with an initial capital, because kiwix-serve is too dumb to figure things out otherwise.

Debian chroot

You can install Debian into an image and chroot there, which makes it easier to install software thanks to Debian's package management system. What's more, you can even get an X server to run on the device, including touchscreen management. I won't go into the details of this because it's been covered elsewhere. I will just mention that you might have some commands fail because you need to tweak the PATH and LD_LIBRARY_PATH, i.e. something like:

export PATH="$PATH:/usr/local/sbin:/usr/local/bin:/sbin:/usr/bin:/usr/sbin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/lib/arm-linux-gnueabi"

To get the X server running correctly, you should also replace the binaries in the /fb_update folder of the Debian chroot by those from this post. A very fun thing to do is to install openssh-server (don't forget to change its port to something else than 22 if you have the dropbear ssh server running outside the chroot!), install x2x, and, from your PC, assuming that the sshd port is 2222 and the Kobo's IP is 192.168.2.2, run:

ssh -XC -p 2222 root@192.168.2.2 x2x -north -to :0

This allows you to control the Kobo's X server using your computer's keyboard and mouse. You could conceivably use it as some sort of additional screen (keeping in mind that this is only about input events, you couldn't "move" programs from your computer to the Kobo and back).

This shows that there is basically no limit to how much you could tweak your Kobo's interface. For instance, people on the Mobileread forums have gotten alternative PDF readers to run, e.g. to get reflow fuctionality.

From now on, I'll assume that you have a Debian chroot, so that you can easily install Debian programs using apt-get and run them in the chroot. If you don't, you can still download the armel packages from packages.debian.org, extract them (using ar) and install their contents manually.

Wifi AP

Something that I find funny is the idea to use your Kobo's WiFi to serve an access point. The ability to do so depends on the exact WiFi hardware used, but on my hardware at least it works.

To enable an open WiFi AP, ensure that you are running on the USB connection as described above so as not to kill your connection, and issue:

busybox insmod /drivers/ntx508/wifi/dhd.ko
busybox insmod /drivers/ntx508/wifi/sdio_wifi_pwr.ko
pkill wpa_supplicant
wlarm_le up
wlarm_le ap 1
wlarm_le ssid your_ssid

You can also use wlarm_le cur_etheraddr to change your MAC address. While we're at it, I also advise you to run wlarm_le | less and marvel at all those options. I'm not sure they are all operational. For instance I didn't manage to set it in monitor mode for iwconfig's purposes, even though it seems that, when enabling promiscuous and monitor mode and running tcpdump -i eth0, you can receive traffic on the channel that is not addressed to you -- but it looks like this confuses the hell out of tcpdump because the MAC addresses and ethertype appear garbled. I'm a bit curious about wlarm_le PM as it seems the power saving mode isn't being enabled by default for some reason.

The AP will not be very useful unless you install something to serve DHCP leases. You can apt-get install dnsmasq as described here, adjusting the addresses so as not to conflict with that of the USB network connection. However, the sad truth is that the Kobo's kernel has no support for iptables, which severely restricts the use of what you can do to intercept network connections (e.g. to redirect people in a user-friendly way to the web services running on the Kobo), or relay them (e.g. bridging the wireless and USB interfaces to share your computer's Ethernet connection by creating a WiFi access point with your Kobo would be very nice...). I have tried to cross-compile iptables as a module for the Kobo's kernel, but so far I have mostly failed. I'll do a followup if I manage to achieve something there.

How to compile wl

You don't need to do that -- I just didn't realize at once that a functional wl was provided on the device. I'm just providing this for reference.

Retrieve the source archive from the Kobo github repository. We will need the wl tool from this repository to enable SoftAP mode. Sadly, the precomputed version segfaults, so we will need to cross-compile your own.

I will assume you are running Debian. Install the emdebian-archive-keyring package. Add the following to /etc/apt/sources.list, adapting for your version:

deb http://www.emdebian.org/debian wheezy main

Run apt-get update, and install g++-4.7-arm-linux-gnueabi and xapt. Extract the archive that you downloaded above, go in src/wl/exe, and cross-compile:

make -f GNUmakefile CC=arm-linux-gnueabi-gcc-4.7

Learning the gender of French nouns

— updated

The gender of French nouns is a pain for foreigners and even occasionally for native speakers. Learners of French usually rely (besides rote learning) on rules that classify words as masculine or feminine depending on their ending. In this post, I present what happens if you try to derive the minimal set of rules to determine the gender of a French noun from its ending. (In brief: it doesn't give a very compact set of rules because there are too many exceptions.)

The problem that we will study is: given a French noun, determine its gender. Let us start by taking the database from Lexique and keep the words that:

  • are not "derived" forms (e.g., plurals);
  • are nouns;
  • are either masculine or feminine (there's not much I can say about nouns that can be both, except that most of them are words like "journaliste" or "enfant" that are used to refer to people so the gender to choose is usually clear depending on the person you're referring to);
  • do not contain spaces or hyphens (because the gender of such words is usually determined from the component words, so a strategy that looks at their ending will not work well)
  • do not contain dots (remove pesky abbreviations)

Here is the code I use (see lexique.org to obtain lexique, and note that I use a custom version with some errors fixed by hand so your result may differ slightly).

cut -f1,4,5,14 lexique |
  grep '1$' |
  cut -f1,2,3 |
  grep NOM |
  grep  '[mf]$' |
  cut -f1,3 |
  grep -v ' ' |
  grep -v -- '[-\.]' > nouns.txt

Now, we need to find rules to predict the label ('m' or 'f') of nouns in the input list, in a manner that is as concise as possible. To do so, we said that we would try to determine the label by reading the noun starting from its ending. I will describe what we want to do with an example. Suppose we get an unknown noun and start reading it. The last letter is 'e'. At this point we don't know the gender, so we continue. The two last letters are 've'. We must continue still. The three last letters are 'uve'. This narrows down the set of possible nouns, but it can still be either masculine or feminine (think "fauve" vs "guimauve"). The last four letters are "luve". At this point, we know that all nouns ending in "luve" are masculine (there is only "effluve"), so we answer 'm'.

According to this example, we want a set of rules that says, for every possible suffix read so far, whether we can decide 'm' or 'f' or must continue reading. The set of rules should be minimal, which means that it should decide 'm' or 'f' as soon as possible (i.e., as soon as all nouns ending with this suffix have the same gender). Such classification strategies look a lot like deterministic finite automata, except they are acyclic. A more standard term is trie. With such strategies, you can determine the gender of all nouns of the list, and (hopefully) do a reasonable job for unknown nouns by answering given on the longest common prefix.

Now, it turns out I already wrote some code to generate tries from examples, for my project about determining if an initial 'h' in a French word is aspirated or not. Let us reuse that.

So, let us reverse those nouns, and pass them to programs from the haspirater suite to compile the trie and obtain the leaves of the trie, namely, the suffixes at which a decision is taken (and sort them nicely).

rev nouns.txt |
  buildtrie.py |
  compresstrie.py |
  leavestrie.py -1 |
  rev |
  LC_ALL=C sort -k1,1 |
  rev > leaves.txt

Following our previous example, observe that the leaves.txt file contains a line for "luve" (line 4,417). This means that "luve" is a suffix at which we decide 'm', but all shorter suffixes ("uve", "ve", "e", "") were still ambiguous. An initial space in a word indicates "beginning of word" (when we read "rive" we don't know yet between, say, "dérive" and "drive", but if the full word is "rive" then we should decide 'f'). To determine the gender of a noun using this list, look at the line containing the longest suffix of the noun, and the first field of the line should be its gender. Note that the longest leaves in this file are "patriarche" and "matriarche", for which reading "atriarche" is still insufficient to decide (that illustrates that sometimes the relevant info isn't at the end of words...).

The leaves file has 7,032 lines, to be compared to the 24,839 initial nouns in the example list. Thus, the strategy of looking at word endings gives classification rules that are shorter than the full example list, but not by much. In a way, this result illustrates that rules telling you "words in -tion are feminine" and such will always lead to mistakes, unless you have a large number of them.

To see how bad this is, I tested a strategy which reads words from the beginning instead of from the end, which seems to be a worse idea: it has 20,607 leaves, so reading from the end is definitely a better idea than reading from the beginning. Maybe different rules would be more helpful to classify (maybe using general decision trees without restricting the order of choices by saying "read from the end" or "read from the beginning"), but it doesn't seem that obvious to me.

If you ever learnt this list by heart (for instance using a spaced repetition system), you would know the gender of every French noun (except the ones with hyphens, except the ones missing from Lexique, except the ones in which both genders are possible depending on meaning, and accounting for possible errors in Lexique). I wouldn't recommend it, though, because of those caveats, and also because it still seems too long so there has to be a better way than what I did. If you still wanted to do it, though, it might be more convenient to use this file, in which I replaced the suffixes by one noun that matches this suffix (the one with the highest registered frequency in Lexique). So, if you know this last file by heart, your intuition for gender will be flawless, modulo the caveats and modulo the big assumption that your intuition proceeds by matching the longest suffix of the unknown word with a word that you know.

[Further work: looking at pronunciation instead of spelling (or in addition to it), give weights to the rules and rank them by weight, have a richer rule language (e.g., allow to give a fixed list of exceptions for each rule, which would seriously cut down the impact of pesky words like "cation")...]

Portrait of a hacker

— updated

Version française de cet article sur le blog de F.. Translations in other languages much welcome!

In this post, I present the world view, philosophy and thought system of a hypothetic person, John Hacker. Of course, what I am going to say about John Hacker also applies to me to some extent, as well as to a certain type of people which are familiar with computers in a certain way; however, because I did not want to give the impression that all of those people share all of John Hacker's beliefs, and because I am not sure anymore to what extent I myself agree with him, I will ascribe this system to this fictitious persona.

Given Internet access and sufficient time, you can learn any intellectual skill.

John Hacker is aware of the mind-boggling quantity of information available online, and knows how to find what he is looking for. Because he is largely self-educated, he is confident in his intellectual capacities and in the plasticity of his brain. He does not think that missing skills like drawing, math, music, are forever out of his reach; he could learn them if he wanted to.

John Hacker is sometimes too enthusiastic; he underestimates the wealth of information that is not on the Web (e.g. the vast majority of books) and forgets that the Web isn't the best reference about all topics yet. He sometimes fails to remember that learning from a real teacher in the real world can be more efficient: you can ask questions whenever you want, get spontaneous feedback, and influence your memory more because going to a class, as opposed to sitting in front of your computer, means moving to a specific place with specific people in front of a specific teacher. He has to remember that he cannot necessarily become the best at any skill... because a lot of other people are usually trying as hard as him and the Copernican principle means some of them will probably beat him.

Computers and the Internet are an extension of your brain.

John Hacker takes it for granted that he will have access to his computer and to the Internet. He knows that his memory is limited and unreliable. For this reason, he sees little value in memorizing things rather than storing them on a computer and being able to look them up when needed. He stores photos, emails, IRC transcripts as a way to archive memories about events, discussions, relationships. He treats his brain memory as a cache for the things he needs to think about, and serializes the results of his thoughts in writeups so that he can forget them and load them back as needed. He specializes in remembering where information is located, and how to access it, rather than memorizing the information itself. For this reason, John Hacker may seem mentally crippled whenever he has no access to a computer or when Wikipedia is down, but may seem uncannily smart whenever he interacts with someone through the Internet because no one sees him look up words and search for documents and sift through archives.

Use abstractions to tame complexity and work around your mind's shortcomings.

An abstraction is something that you can use without having to understand how it works. John Hacker is intelligent, but he is lazy, and values abstractions because they can be used to forget about unnecessary complexity. He is thus eager to abstract things away, as long as the abstraction is not treacherous, is not too leaky, and can be broken down and unfolded if the need arises. Conversely, John Hacker will be eager to peer into things that try to hide how they work (such as proprietary software or hobbyist-unfriendly electronics). John Hacker has no trouble building hierarchies of abstractions. In the real world, he will spontaneously build models of how complicated things (such as human beings) work, and will be confused whenever his abstractions leak or break down.

A computer is almost all you need.

Many people are stuck with a world view in which the technical means to get your work published, or to manufacture a product, or to have access to arcane knowledge, were only available to a tiny minority. Now, these powers are still restricted to the minory of people that have a computer and Internet access and sufficient free time, but that's a much larger group of people than before. Anyone with a computer has the tools necessary to write and publish an opinion piece that will change the world. Anyone with a computer has the material possibility to design a program that will make them rich and famous and change the life of millions of people. The cost of entry that must be paid to build and host something cool is ridiculously low (counted in thousands of dollars), and the material headstart that a Google engineer would have over a teenager in his parents' basement is not really discriminating. This feeling can make John Hacker very unhappy whenever he is unable to be productive: he could create the next big thing, so he'd rather die trying.

Good ideas aren't hard to come by.

People have this stereotype that the great inventors of the past were acting out of some great Vision that had been Imparted to them, and that the ideas of today that will define the computing of tomorrow are highly confidential trade secrets in underground vaults at Google. John Hacker, however, knows that great ideas and good ideas are hard to tell apart before the fact, and also knows that good ideas aren't a scarce resource: he has lots of them stored in todo files, ideas of things that ought to be built, that would certainly be cool, and that, maybe, could even be successful — who knows? He also has dozens of side projects inspired by such ideas, but most of them are unfinished, and those that are finished would appeal to no one except (maybe) fellow hackers. John Hacker thus understands that true limiting resources aren't the good ideas themselves, but:

  • Time: sometimes you would like to work on a cool idea but you are kept busy by other things.
  • Motivation: sometimes you have time to work on a cool idea but you can't manage to do it and just waste your time instead for no apparent reason.
  • Quality execution: building something that doesn't suck is hard.
  • Marketing: even when you have built something, you need to polish it so that people will want to use it, and you may even need to go out there and promote it to potential users.
  • Brand appeal: no matter the intrinsic quality of what you do, a lot of people will take it more seriously if you are a big IT company, so it may be a bit harder to get things to take off on your own.
  • Luck: lots of technologies were worse than their competitors when they came out, were neither marketed in a clever way nor backed by the right people, but just stuck for entirely random reasons.

On the Internet, what matters is what you do, not who or what you are.

Though most people on the Internet go by their real name, John Hacker knows that it is possible to have a pseudonymous identity or multiple identities. If you do something great, its quality can be appreciated without anyone knowing who you are. Your sex, age, race, nationality, are not relevant. It doesn't even matter if you're a robot. On the Internet, nobody knows you're a dog: that's a feature, not a bug. John Hacker knows the extent to which interpersonal relations in the real world are affected by the physical traits of the people who are interacting; he is more secure with pure disembodied information exchange.

Failure and success should preferably have an unambiguous definition.

When you work with a computer, you will often use it as a neutral and unambiguous judge of whether you achieved a certain goal or not. It is true that some goals, like "writing a bug-free program", cannot be checked by a computer, but others, like "getting a program to compile" or "getting a unit test to pass", are. For this reason, John Hacker often relies on having a clear definition of success and failure when he tries to do something. He will therefore prefer creative activities that require him to follow (mostly) unambiguous constraints (e.g. writing verse, or writing lipograms) rather than those that rely on a subjective or social feeling of beauty or quality.

It is not intrinsically bad to use things in unexpected ways.

Most things that you buy are marketed as having a certain purpose. Most people will not think of using them for some other purpose, or will assume it's a bad idea to try to do so. However, John Hacker is familiar with agnostic tools such as programming languages that can be bent to do things that were never envisioned by their designers. For this reason, when he tries to do something in the real world, he will see objects for what they are rather than for what they were designed to do, so he may take them apart (and void the warranty) or maybe just find new creative ways to use them. When he does so, he follows his own judgement, without the safety net of the designer's invisible hand. Sometimes, of course, he is wrong, and he will break things or his contraptions will miserably fail. Still, John Hacker thinks that it's always better to think out of the box in this way, and believes that submitting exclusively to a designer's impression of how their object should be used is nothing short of intellectual slavery.

Decentralized systems are more robust than centralized systems.

John Hacker designs computer systems, so he can identify possible points of failure — the minimal set of things that can bring the whole system down if they break. He knows a centralized system has one single point of failure (its center), whereas a decentralized system is more robust because it achieves better redundancy. John Hacker is aware of the history of technology and knows that centralized technologies (that depend on a single manufacturer) disappear as time passes and companies go bankrupt and the markets change, but that decentralized technologies (the Web, Internet, email) tend to stick around as long as there are sufficiently many people that use them, because there is no single entity that has the power to kill them and make everyone switch to something else. Of course, decentralization also implies disadvantages (like increased complexity, or trust management issues), but decentralized systems that managed to get moderately successful are usually there to stay. For this reason, John Hacker is suspicious towards organizations that have only one leader, towards political structures whose central offices have the power to bring the whole system down through incompetence or (more rarely) malice, and towards anything that depends too much on a small designated group of people or institutions. He is worried when too many things depend on single companies that get too big to fail, and would insist that crucial organizations such as the State should be as decentralized as possible.

Distinguish between the formal and the informal.

John Hacker spends his time formalizing processes, that is, explaining them in a language so basic and unambiguous that a mechanical machine will be able to perform them. For this reason, John Hacker is very good at spotting informality — those parts of a process that appeal to human intuition and judgement and that will be impossible to automatize. For instance, sorting forms in drawers depending on the first letter of their last name is a formal process, but whenever a secretary decides that someone inverted their first and last name and sorts one file according to the first name, then the process suddenly turns into an informal one. John Hacker believes that appeals to human intuition are best avoided, or should at least be clearly labeled so that you're aware whenever you need to use this black box.

Repetitive processes should be automatized.

Computers are machines that can perform repetitive tasks if you formalize them. For this reason, when you use a computer, the difficulty of accomplishing something is usually the difficulty of describing it rather than performing it. For instance, when asking a computer to count from 0 to 999, the difficulty is to describe what "to count" means, not the fact that 999 is a large number. In the real world, where repetitive things are often much harder to automatize, John Hacker will sometimes be frustrated and try to automatize them anyway. When he builds a machine for, say, fold T-shirts automatically, it's because of the conviction that a repetitive task like T-shirt folding should be automatized for once so that you can then fold as many T-shirts as you want with no effort — even though, in the real world, it is much harder to make a machine fold T-shirts in a reproducible fashion, and the long-term benefits of such a machine are not worth the investment.

Society would work better if people were technologically competent.

Bureaucracies usually work with paper, and many secretaries spend tremendous amounts of time doing repetitive things that could be automatized. If people had sufficient computer skills to automatize the bulk of their work and only leave to a human those parts which require intelligence and good judgement, then organizations could be made much more efficient while employing less people. John Hacker feels physical pain for people that spend their working day doing things that he could automatize in one week so that they could be done by a computer in a split second. Of course, he knows that you cannot just train everyone to be a computer expert, but he would say that society is still lacking in basic technological literacy.

John Hacker is annoyed whenever his government's administration reminds him (usually by requiring him to perform paperwork on actual paper, or talk directly to humans through a phone, or move physically in meatspace) that they do not understand computers and that it will take decades for them to get it.

Information should be preserved.

In the real world, you cannot keep everything, because it would take up too much room. Old buildings must be destroyed to make way for new ones: though you would want to preserve some important ones, it is clear that a compromise has to be found and that you cannot preserve everything. With computers, though, you can store information at absurdly high and continuously increasing densities. For this reason, the default strategy is to keep, and whenever you run out of space, you only delete the very largest things ... or buy a new hard drive. John Hacker thinks it a crime to deliberately remove potentially useful information from a computer, or to let valuable information decay. He experiences the same awe towards full hard disk drives or datacenters that normal people do in front of full bookshelves. There are no logs of what happens to him in meatspace, so he compensates by logging everything he does in cyberspace, and preserving and redundantly archiving his valuable records, in a way that may seem futile to outside observers. As for real life possessions, John Hacker will sometimes be a packrat and keep everything because of the memories attached to objects, or, alternatively, throw them away whenever the need arises... after having taken pictures of them so that the information survives somewhere.

Information should be free.

John Hacker knows that, with access to the Internet and with cryptography, you can communicate in a secure manner. Hence, any attempts to censor information will seem misguided to him, because he knows that such measures can always be circumvented in an demonstrably safe way. To John Hacker, information is neutral, and sharing information should never be a crime; the law should punish action, not communication. Sharing data may infringe on someone's copyright, but enforcing copyright is certainly less important than having the freedom to share. Data may have been private, but once it's been leaked, there's no point in trying to put the genie back in the bottle. Data may be obscene, but no one forces you to look at it, and it is no one's business to impose their moral standards on other people. Data may be wrong, or subtly misleading, but the solution is not to censor it: let people read it and make their own opinion. Of course, all of this only applies to data thas has been made public (or that should be public, like most State records); John Hacker would never give anyone access to the his private data, or store it unencrypted on untrusted third-party servers.

Information should be dematerialized.

Meatspace has a lot of annoying defects, and cyberspace has a lot of pleasant properties, so information should belong to cyberspace, not meatspace. Of course, in the end, information is always stored on physical media such as hard drives, but John Hacker abstracts this away (in much the same way as you abstract away the working of your organs when you think about your body). In particular, he sees single-purpose sub-optimal physical media as a heresy. He dislikes books (replace them by ebooks), CDs and DVDs (it's stupid to load and unload the media by hand, just digitize them and store them all on an hard drive or on a remote machine or something), paper mail and postcards (emails are a much more efficient way to move information around), paper forms and certificates and authentic documents (you should send digital versions with a cryptographic signature instead), etc.

Human interaction is limited to information transfer through language.

Computers communicate to transmit information on well-defined channels following specific languages carefully engineered to suit this purpose. John Hacker expects human interaction to serve the same goals and work according to the same rules. He dislikes side-channel non-verbal information transfer such as tone or body language. He might insist on saying only things which are literally true: this means that he will not lie and will expect people not to lie, but that he will indulge in cheap fun by saying things that are logically correct but useless or intuitively misleading (e.g. "Do you want tea or coffee?" "Yes."). He might dislike communication such as small talk whose aim is not in the information transferred but in the act of communication itself. He may insist on consistency in language use, may take an extremely prescriptivist approach to language and grammar, and will not hesitate to use language constructions that are grammatically correct but hard to parse for the human brain (long sentences, nested propositions, etc.). He will insist on proper use of quotation marks ("bananas" is a funny word, bananas are delicious fruits), and will be tempted to model human communication as formal languages that are simpler to parse for a computer.

Of course, John Hacker's model of people as fully autonomous systems that exchange information isn't just inaccurate because of his simplistic picture of communication itself. First, it does not account for the basic fact that the vast majority of people cannot survive in isolation and just need some form of human contact — maybe this even applies to him and he hasn't realized yet. Second, it neglects other important aspects of human interpersonal relationships, such as helping others out, influencing others, caring for others, letting oneself be influenced by others, sometimes unhealthily so — and, of course, loving and being loved. John Hacker is usually puzzled and uncomfortable about all this complicated mess; for these reasons, he may sometimes seem blunt, asocial, careless, or cold towards people, or avoid human interaction.

Rationality is the only acceptable framework.

John Hacker thinks that in the real world, like in cyberspace, events have causes and obey general laws. He is aware of the uncanny complexity and chaos-like behavior that can arise from the interaction of extremely rich rules and from the use of randomness, but knows that, in principle, everything could be explained. He rejects the paranormal. He is not superstitious or religious. He rejects non-falsifiable theories, because they do not make predictions that are practically useful. He avoids discussing or thinking about the unknowable, such as the existence of God or the nature of death, because he knows that no definite answer can be given about them and that they have no influence on his life so they are a waste of time. He sometimes speculates about the future or about philosophical questions, but that's mostly because of the influence of science-fiction and he doesn't take this too seriously. He tries to be consistent and to steer clear of contradictions. He tries never to indulge in wishful thinking. He tries to remain critical towards his own beliefs, and to stay in control of the influence of others on them. When person X asserts fact Y, he remembers that the implied fact is not "Y" but "X asserts Y", no matter how much he trusts X or would like to believe Y.

Reality is imperfect, cyberspace is more important.

John Hacker is used to working in cyberspace, so whenever he operates in meatspace he sees what is missing. Why isn't there an undo or a save/restore function? Why isn't there an archive of what happened or what was said? Why do you need to move to go somewhere? Why doesn't this dead tree book have a search function? In some cases, because he is used to building his own virtual universe that behaves like he wants it to, John Hacker will try to adapt reality to his liking. In other cases, he will just give up, neglect his body, clothing, and dwelling, and remain mostly indifferent towards money or physical belongings. His focus on productivity in cyberspace and his disdain for low-level happiness in meatspace could almost be confused with religious moral standards by normal people.

Your brain is a primitive version of a computer.

John Hacker sees his brain as a device that carries a lot of evolution-mandated low level support for some specific tasks (e.g. distinguishing faces) but has extremely primitive support for abstract tasks (e.g. checking if the string of parentheses "(((())()())()))" is correctly balanced, or computing 269 times 42, or reasoning about geometry in higher dimensions), only carries limited support for introspection (most thought processes, like deciding to remember or forget something, or to focus on something, cannot really be controlled consciously) and suffers from documented bugs. He knows the brain was designed through evolution and only achieved a primitive form of higher-level intelligence as a byproduct of trying to satisfy the evolutionary drive of getting genes to reproduce. John Hacker hopes that the human race will eventually achieve true intelligence from this bootstrap (by improving the interface between brain and cyberspace, or maybe hacking the brain itself) and transcend the evolutionary drive (going from genetics to memetics). He believes that his conscious goals (such as the search for beauty, truth, or interestingness) are an attempt to reach for this notion of "true intelligence".

John Hacker feels that this "true intelligence" is a universal property of nature. He believes that if we met intelligent aliens, they should be studying essentially the same mathematics as us: that notions such as integers or prime numbers are universally fundamental, not that they were "invented" by humans or are only interesting to human minds. This belief is grounded in John Hacker's day-to-day interaction with non-human intelligence, namely computers: though they were designed by humans, they seem to reason about abstract things like we do (or would like to do). This belief is also grounded in John Hacker's experience of having his mind hijacked ("nerdsniped") by problems that are natural, beautiful and deep — John Hacker cannot believe that his profound fascination for such problems could have anything to do with his human nature.

This vision can make John Hacker forget about his human nature, because he does not see it as a crucial part of himself. John Hacker sees himself as a proto-intelligent being first and foremost; belonging to the human species is a distant second, and being a man or woman an even more distant third. This can lead him to wishful thinking, namely, believing that his human nature has no hold on his thought processes because he would like things to be so. He will do his best to try to get closer to this ideal state through self-control, but he sometimes fails to realize that it doesn't work completely. Whenever he forgets about primitive urges such as the need for food, sleep, physical exercise, human contact and love, he discovers that those unsatisfied urges can be an obstacle to the proper function of his intelligence.

The soul is information.

Because he is so used to working with information, John Hacker considers that the soul of a person is the information contained in their brain and that death is the destruction of this information. He considers that if you could transplant someone's brain in a different (possibly artificial) body then it would remain the same person; that a brain in a vat would be a full person; that the seat of consciousness and personhood is the brain and that the rest of the body is simply a supporting device for the brain (and that there is no immaterial "soul" or token of a person's existence); that cryonics is probably a good idea on paper, flawed though its current implementations may be; that if you could retrieve the information contained in the brain and store it in a computer then the person would still be alive. He is conscious of the numerous paradoxes left unanswered by such views, but can see no better position. He does not think that mankind has an ethical duty to stay true to its biological nature instead of trying to improve it; quite the contrary. He considers the technological singularity as interesting speculation, though he does not hope to see something of this kind occur within his lifetime.

Related reading: the Jargon File has an appendix called "A Portrait of J. Random Hacker". It is more about how (the file's notion of) hackers live, rather than what they think. Thanks to Pablo Rauzy for pointing it out to me.