Android from the command-line

Something I hate about Android is that there are almost no bridges between the usual *nix command-line world and the brave new world of Java. If things had been done properly, for instance, you would be able to ssh to your phone and run a trivial bash one-liner to save the current GPS coordinates to the SD card every 30 seconds, instead of having to write a verbose Java application to do that (or installing a third-party app which will come with all sorts of bells and whistles). Also, if things had been done properly, there wouldn't be obscure terminal glitches and bizarre and gratuitous deviations from the FHS...

Now, my Android phone fell, the screen broke, and I am left with a perfectly functional ARM-based computer with a lot of usable hardware, such as:

Wifi with master mode support
Speakers and headphone plug
Microphone and camera
GPS
Vibrator and notification LED
GSM
Bluetooth

I have root access to the device, I can get a shell through adb, there are a lot of cool things that I could do with this if only using Android from the command-line wasn't that much of a pain. So here are a few notes I took as I explored all of this, in case someone finds them useful. The phone is a HTC Desire, the ROM is Cyanogen 7.1.0 if I remember correctly (not that it matters), I am root, and NAND protection is disabled (aka. S-OFF).

Connecting to WiFi (managed mode)

It is not that hard to connect to WiFi (using wpa_supplicant and dhcpcd) except for a few quirks. I assume that you are familiar with wpa_supplicant configuration files and with wpa_cli's rudimentary interface (just a tip if you didn't know: you just need to type non-ambiguous prefixes for commands).

# this will fail unless you are S-OFF
mount -o remount,rw /
mount -o remount,rw /system
# took some time to find out
insmod /system/lib/modules/bcm4329.ko 
# yes, the wifi interface is called "eth0"...
mkdir /eth0
cd /
# edit /etc/wifi/wpa_supplicant.conf to define your APs
wpa_supplicant -B -Dwext -ieth0 -c/etc/wifi/wpa_supplicant.conf
# use wpa_cli to select the AP to associate with
wpa_cli -p /eth0/
pkill dhcpcd
# remove dhcp leases
rm -Rf /data/misc/dhcp/*
dhcpcd eth0
# beware: DNS is slow and not very reliable, if pinging a domain doesn't work
# try pinging an IP before declaring failure

Setting the time

The time is reset each time you remove the battery, and of course you must set it from the command line. To do so, if you have an Internet connection, an easy way is: ntpd -p 64.90.182.55. (Alternatively, substitute the IP of your favorite public NTP server.) Don't forget to do this, or you can get weird errors (for instance, openvpn will complain about certificates not being valid yet).

Getting Debian

Since Android does not come with any sort of package manager, the standard way to install stuff on Android is by chrooting in a Debian install. This process is pretty well documented here and here so I won't explain it all over again. As Daniel Wallenborn pointed out, recent open source applications such as Lil' Debi or Debian Kit can now be used to make this process even simpler. Notice however that the first guide forgets to say that you should export TERM=xterm and export PATH="/bin:/usr/bin:/usr/sbin:$PATH just after chrooting. If you follow the first guide, you will get a VNC server, so you can VNC to your phone and interact with graphical programs. (Graphical programs from the chrooted Debian, of course, not from Android.) Not extremely useful, but pretty neat.

Creating a WiFi access point (master mode)

This is much more difficult. First, you must load the WiFi module with an alternative firmware. (Obviously, rmmod it first if it is already loaded. Or, to be sure, reboot the phone...)

insmod /system/lib/modules/bcm4329.ko firmware_path=/system/vendor/firmware/fw_bcm4329_apsta.bin

Now, using iwconfig directly doesn't seem to work. The only reproducible way I found to reliably create an AP is to use the mysterious binary program res/raw/ultra_bcm_config from android-wifi-tether. Check out the code, send this file to the phone (using adb push or anything else) and run (where AP is the name of the access point you would like to create and CHAN is the channel):

ultra_bcm_config eth0 softap_htc AP none "" CHAN

I wonder what this program does. Next, perform:

iwconfig eth0 essid AP
ifconfig eth0 address 192.168.2.1

Now, you need to serve DHCP leases. Android uses dnsmasq to do this.

mkdir -p /var/run
cat > /data/dnsmasq.conf <<EOF
dhcp-authoritative 
interface=eth0
dhcp-range=192.168.2.100,192.168.2.105,12h
user=root
no-negcache
EOF
dnsmasq -C /data/dnsmasq.conf

You should now be able to connect to the phone and get a DHCP lease. Hurray!

Changing your MAC

The usual command ip link set eth0 address 00:11:22:33:44:55 will sometimes work. If it doesn't, reboot and try again.

Creating a captive portal

I didn't even try to get a data connection over GSM operational from the command-line (I would probably need the screen to enter the SIM PIN anyway) and I didn't try yet to "reverse-tether" and create an AP to share an ethernet connection (going through a laptop, ie. the laptop has ethernet connection, the phone is plugged to the laptop and shares the connection over WiFi). Hence, the phone cannot have Internet access when it creates an access point. Yet, there are a lot of neat things to do with a battery-powered WiFi AP that you can carry in your pocket. A WiDrop? Access to a static Wikipedia mirror?

Yet, to do that, you need to present these services to the user. As far as I know, there is no standard describing how this should be done, and the best you can do is intercept all HTTP traffic and hope that the user is running a web browser... Here is a simple solution which does not require you to install real fat captive portal software. Warning, just in case you're skim-reading through this, the setup described here only tries to intercept HTTP connections, not every kind of traffic--don't do this if the phone has an Internet connection somehow and you want to be sure the clients are blocked for real.

First, we need to ensure that clients can get DNS service. Since usual DNS servers are not reachable (because the phone has no Internet connection), we need to (a.) run a DNS server on the device and (b.) intercept all DNS requests to third party hosts and answer them ourselves. Pretty easy to do, since dnsmasq can serve as a lightweight DNS server:

cat >> /data/dnsmasq.conf <<EOF
address=/#/192.168.2.1 
EOF
iptables -t nat -A PREROUTING -p udp --dport 53 -j REDIRECT

Second, we need to redirect all HTTP connections towards ourselves, just in case someone tries an HTTP connection without using DNS first (you never know, especially if the client has a DNS cache or something like that).

iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination 192.168.2.1

Third, we need to run an HTTP server on the device. The easiest way to do that is to run it within the Debian chroot. It just works.

Fourth, we need the HTTP server to answer even if a weird path is requested. The quick and dirty way to do this is to set up the 404 page as a redirection to the index page. For instance, with lighttpd:

server.error-handler-404 = "/index.html"

Of course, this messes up the HTTP status code, though. Finding a better solution is left as an exercise. (For my purposes, I don't really care.)

Knowing the battery status

Don't even think about using acpi! I hunted for relevant stuff in /sys but couldn't find anything, so the only approach I found is:

dmesg | grep batt: | tail -1

Lars Rasmusson pointed out that relevant files are /sys/devices/platform/msm-battery/power_supply/battery/voltage_*. It might depend on your phone model, but it is probably a good idea to run find power_supply /sys.

Activating the vibrator

The vibrator is a useful way to get notified about what happens.

# the bigger the value, the longer the vibration will last
echo 100 > /sys/devices/virtual/timed_output/vibrator/enable

If you want to trigger the vibrator from a program which isn't run as root (e.g. the HTTP server), don't forget to change the permissions of the file.

Activating the flashlight

# values can be between 1 and 255, though the led seems to power down by itself
# for values over 128, maybe because the hardware cannot sustain them for long
echo 1 /sys/devices/platform/flashlight.0/leds/flashlight/brightness

Playing music

An extremely pleasant surprise: there is a command to play a sound file!

stagefright -a -o file.mp3

An extremely unpleasant surprise: the command isn't documented and is extremely limited. Probably a debugging tool that someone forgot to remove.

Webcam acquisition

Don't get your hopes up, I didn't manage to get this to work. If you're only interested in stuff that works, you can stop reading this post here. Ogurets pointed out that webcam aquisition is possible using ad-hoc code on some chipsets to invoke ioctl, for instance with the vc088x SoC. Ogurets writes: This stuff is very hardware-dependent, you could quickly check your hardware by listing /dev: if you have "cif", "video", "venc", "v8-tv" devs, then you have a VC088x SoC and the code will work for you. Otherwise, you could download kernel source code for your device (the manufacturer has to publish it due to GPL) and find proper ioctls for your camera, then write a similar program for yourself. I have not tested this code on the HTC Desire, however. For completeness I reproduce below my original steps to try to get the webcam to work, even though they seem much less promising.

Now, here are some notes about what I tried. There is no /dev/video0 device and there doesn't seem to be any comparable device anywhere, so my guess is there is no V4L support. I ran mplayer and vlc from the Debian chroot in the hope that things could automagically work, but to no avail. Apparently, the Android piece of software which handles the webcam is called Stagefright, the documentation of which is a blogpost by a Google intern and the source code itself (think there must be better docs? think again). The job is split between Java (which provide the bindings that regular app developers use) and C++ (the native code you're not supposed to be writing) which performs the actual work including stuff like hardware acceleration of some codecs. Apparently, someone did something clever: an stagefright interface for libav. Sadly, the only documentation available, this time, is a Youtube video. I managed to compile it after days of stupid mistakes (if you want to try it yourself, make sure you check out the right branch, use tools/build_libstagefright with the NDK variable set to the path to the Android NDK (which contains the cross-compiler), and use make V=1 to troubleshoot issues). Sadly, it either hangs or fails with unhelpful error messages at runtime. Maybe it's because the guy doesn't have the same phone model. The next way would be to use gdb or something like that, but I'm a bit discouraged... please tell me if you managed to make this work.

Forkability of community projects

A community project is an interaction between a community of users who create a resource and a host which stores and serves this resource. Extremely useful and valuable resources such as Wikipedia have been created in this way, and it is easy to feel compelled to contribute to such projects to "give back" to the community. However, in some cases, you could end up benefiting the host more than the community, because the terms of the relationship between community and host are unfair.

Here is an example of this. CDDB was an early collaborative effort to create a database of audio CDs. It started as a one-man effort to which anyone could contribute by email. As time passed, it was incorporated, then bought, then renamed, and access to the database became burdened with restrictions to serve the commercial interests of the host. The users who contributed to the project had actually helped someone to create their product, and that someone ended up using the product against the community's interests.

What went wrong here? Does this mean that the Wikimedia foundation could start to act unethically towards the community? Fortunately not: there is a difference in forkability between Wikipedia and CDDB. I say that a community project is forkable if anyone in the community can take a copy of the content and host it somewhere else. Forkability ensures that the host cannot take the content away from the community. Furthermore, it is a strong guarantee of the optimality of the hosting service, because it ensures that anyone can start to compete with the host.

There are two facets to forkability, which are:

Legal forkability.: Do you have the right to fork? This is satisfied if the resource is under a free license; it is not satisfied if users keep their copyright but grant the current hosting service a right to host the content, or (worse) if they assign their copyright to the host. To publish their content, users should waive the rights that stand in the way, not privilege the current host in any way.
Practical forkability.: Do you have the capacity to fork? This is satisfied if dumps of the resource are provided by the hosting service in an open format (ie. not requiring specific proprietary software); it can still be satisfied if the hosting service allows users to crawl the resource. It is not satisfied if the hosting service tries to prevent crawling or forbids it in their TOS.

Some community projects today are forkable:

Wikimedia projects are under the free CC-BY-SA license and dumps are available. (Incidentally, these dumps aren't just an abstract guarantee against wrongdoing; they are extremely useful resources for researchers or for people who need a local mirror of Wikipedia.)
StackExchange projects such as StackOverflow are under CC-BY-SA too and dumps are available (though, sometimes, Stack Exchange thinks they can decide how their users' content should be attributed -- a host should specify a suggested mode of attribution but should not assume that users will not be more lenient).
MusicBrainz (a modern alternative to CDDB) is available under a combination of public domain and CC-BY-NC-SA and provides dumps.
OpenStreetMap provides dumps, and though its legal situation isn't clear (it seems that you have to assign copyright to the OpenStreetMap foundation who guarantee that the content will always be available under a free license), I'm pretty sure that this is fine in practice.
Project Gutenberg has a convoluted license for its ebooks, but you can strip it from the public domain books and get public domain content, and you are welcome to mirror it.

Sadly, examples of non-forkable projects today are also numerous:

Google Maps welcomes people to contribute, but it is not forkable.
Most reviews websites are not forkable. For instance, the Yelp TOS require you to grant them a license to use your content, and prohibits any practical attempts to crawl the content. Reviews on websites such as Amazon are also examples of collaboration to create non-forkable content.
ReCAPTCHA is not really a community project but is worth mentioning because you get the same sort of enthusiasm ("awesome! I can help digitize books by completing captchas") before you realize that reCAPTCHA is really Google, that Google never guarantees that you will be able to fork the content that you helped to digitize, and that they are using reCAPTCHA to improve StreetView which is definitely not-forkable.

I don't know of any forkable alternative to Yelp or reCAPTCHA, though I can't see any good reason why such alternatives couldn't exist and thrive (except that they are hard to bootstrap).

So, before you contribute to a community project, make sure that the resource doesn't just belong to the host, but really belongs to the community (and just happens to be hosted in a centralized place). Forkable community projects are, in my opinion, the only ethical alternatives to federated projects; they are a bit worse because they require one master host to exist and because there will always be some degree of inertia before people start to fork, but they are the best that we can do whenever centralization is a technical requirement.

plint -- a French poetry validator

English version

Version française ci-dessous.

I'm constantly annoyed by French poetry which sort-of rhymes but does not respect metric constraints (and don't get me started on rhyme genres), so I wrote a tool to validate French poetry against metric, rhyme and rhyme genre constraints, which is called plint for lack of a better name.

The code is available so you can run it on your server (or use a CLI or an experimental IRC interface). It uses haspirater as well as a small tool to infer the end phonemes of a word called frhyme (and itself based on Lexique.

Of course, formal constraints are not the only important thing in poetry and you're welcome to deviate from them if you do so willingly. This tool is mostly designed for people who think they're following the rules.

Version française

English version above.

Je suis souvent confronté à de la poésie qui rime plus ou moins mais qui ne respecte aucune contrainte métrique (et ne parlons pas des rimes féminines et masculines), donc j'ai écrit un outil pour vérifier automatiquement les poèmes (pour la métrique, la rime et le genre des rimes). Il s'appelle plint.

Le code est disponible, donc vous pouvez le faire tourner sur votre propre serveur, ou utiliser une interface en ligne de commande ou une interface IRC expérimentale. Il utilise haspirater ainsi qu'un petit outil pour inférer les quelques derniers phonèmes d'un vers qui s'appelle frhyme (utilisant lui-même Lexique).

Bien sûr, la poésie, ce n'est pas que le respect des contraintes formelles, et il peut être raisonnable de s'en écarter à condition de le faire délibérément. Cet outil s'adresse surtout aux gens qui pensent suivre les règles classiques.

RATP, informatique et libertés

Summary in English

This post isn't of much interest unless you live in France, so I'll write it in French. The gist is that a French law allows you to ask for a copy of the personal data that a company has about you, that I contacted the Paris public transporation agency to request mine, and that I got an actual reply.

L'histoire, en français

Dans les tramways et bus RATP, on peut observer une affiche se terminant par les paragraphes suivants :

Différents sujets évoqués sur cette affiche font l'objet d'un traitement automatisé. (Conformément à la loi 78-17 du 6 janvier 1978 relative à l'informatique, aux fichiers et aux libertés, toute personne peut obtenir communication des données à caractère personnel la concernant et, le cas échéant, exercer son droit de rectification).

Le droit d'accès peut s'exercer auprès du correspondant Informatique et Libertés de la RATP, soit par courrier électronique à cil-ratpNOSPAM@ratp.net, soit par correspondance à l'adresse suivante : RATP - Service de la Direction Générale LAC JV27 - 13, rue Jules Vallès - 75547 Paris

Toute demande doit être accompagnée d'une copie d'une pièce d'identité.

Cette mention légale est fréquente, mais la possibilité d'exercer le droit d'accès par courriel l'est moins. Aussi, curieux de savoir si ça fonctionnait vraiment, j'ai envoyé à tout hasard à l'adresse susmentionnée, le soir du vendredi 27 janvier, le courriel suivant, auquel était joint un scan de mon passeport :

Sujet: Accès aux données à caractère personnel Imagin'R xxxxxxxx

Bonjour,

En application de la loi 78-17 du 6 janvier 1978 relative à l'informatique, aux
fichiers et aux libertés, je souhaiterais exercer mon droit d'accès aux données
à caractère personnel me concernant. En conséquence, je vous prie de me
transmettre une copie de toutes les informations personnelles enregistrées pour
ma carte Imagin'R numéro xxxxxxxx (notamment le relevé des validations
effectuées). Une copie de pièce d'identité est jointe à ce message.

Bien cordialement,

-- 
Antoine Amarilli

J'ai eu la surprise de recevoir, le 30 janvier, une réponse de la RATP :

Pour répondre à votre demande, je vous transmets ci-joints :

- la fiche client des informations de votre passe NAVIGO et du contrat Imagine'R
étudiant qui lui est associé

- le cumul des validations de ce passe effectuées en entrée ou en sortie de
nos réseaux ferrés au cours de ce mois (Janvier 2012)  et du mois précédent
(Décembre 2011). Les données de cumul journalier sont conservées pendant ce
maximum de 2 mois à seule fin de vérification de la fiabilité des passes NAVIGO.
Nous ne conservons ni les heures ni les lieux de passage.

Le cumul des validations est un document PDF créé à partir de Microsoft Word. Il se borne effectivement à une indication du nombre de passages pour les dates des mois de décembre et janvier. La fiche client est un document PDF un peu plus intéressant contenant toutes sortes d'informations dont :

Le code Hexaclé de l'adresse postale indiquée par le client. Je ne connaissais même pas l'existence de ce code, mais il semble assez difficile d'obtenir le sien par ailleurs vu que la base de données Hexaclé n'est pas gratuite...
Des indications de si vous acceptez ou non la pub : "Opt-in source", "Opt-in console", "Stop pub source" et "Stop pub console", ainsi que de plus énigmatiques "Demande de non regroupement" et "Déduplication à tort".
Toutes sortes d'information sur vos passes : numéro de lot, numéro de version, etc.
Des événements concernant le rechargement de passes Navigo. L'historique remonte assez loin, mais je n'en vois qu'un par passe, donc il doit en manquer et il se pourrait que ce soit juste la création des passes.

Je reste assez surpris d'avoir effectivement obtenu ces données de la RATP. Quelques remarques en conclusion :

Peu d'informations.: La RATP ne garde (ou ne prétend garder) que peu d'informations. J'espérais obtenir le relevé des dates, heures et lieux de validation du passe Navigo sur une durée arbitrairement longue. À moins qu'on m'ait menti, je suis assez agréablement surpris de voir que la RATP prend effectivement la peine d'effacer les informations.
Exploitation régulière: Évidemment, plutôt qu'une communication ponctuelle, je préférerais que la RATP m'envoie ces informations au fur et à mesure qu'elles se construisent (pour conserver la liste de toutes mes validations de Navigo), et si possible dans un format facile à traiter (CSV, SQL...). J'ai répondu à leur message pour les interroger à ce sujet et on m'a répondu que le format PDF satisfaisait les obligations de la loi 78-17, qu'il n'était pas prévu de transmettre de manière régulière ces données, et qu'une demande tous les deux mois pourrait être considérée comme abusive du fait de son caractère répétitif (ce qui est effectivement conforme à la loi en question, article 39, paragraphe II).
Sécurité: L'authentification des demandes par la RATP s'appuie exclusivement sur la fourniture d'une copie de pièce d'identité. Il y a là confusion entre l'identification d'une personne (désigner un individu de façon non-ambiguë) et l'authentification d'une demande (garantir que c'est bien la personne concernée qui effectue la demande). Une copie de pièce d'identité, si elle n'est pas falsifiée, permet d'identifier la personne, mais ne certifie pas qu'elle a autorisé la demande : certaines personnes tierces disposent d'un scan de mon passeport et pourraient obtenir de la RATP mes informations personnelles (incluant le numéro de téléphone, l'adresse, et des informations agrégées donnant cependant une idée de ma présence ou non en région parisienne...) en se faisant passer pour moi. À mon avis, il serait déjà préférable que la RATP vérifie qu'elle transmet bien les informations à l'adresse de courriel indiquée sur la fiche client (ce qui n'a pas été fait dans mon cas, puisque j'ai effectué la demande depuis une autre adresse). J'ai adressé ces suggestions à la RATP, qui n'y a pas apporté de réponse.

htmlrebase -- relative link resolution in HTML according to a given base URL

I found this code lying around, so I'm dumping it here in case someone needs it. It takes an HTML file on standard input and a URL as a command-line argument and produces the HTML file on standard output where all relative links have been resolved according to the given base URL.

#!/usr/bin/env python

"""Resolve relative links in an HTML blob according to a base"""

from BeautifulSoup import BeautifulSoup
import sys
import urlparse

# source: http://stackoverflow.com/q/2725156/414272
# TODO: "These aren't necessarily simple URLs ..."
targets = [
    ('a', 'href'), ('applet', 'codebase'), ('area', 'href'), ('base', 'href'),
    ('blockquote', 'cite'), ('body', 'background'), ('del', 'cite'),
    ('form', 'action'), ('frame', 'longdesc'), ('frame', 'src'),
    ('head', 'profile'), ('iframe', 'longdesc'), ('iframe', 'src'),
    ('img', 'longdesc'), ('img', 'src'), ('img', 'usemap'), ('input', 'src'),
    ('input', 'usemap'), ('ins', 'cite'), ('link', 'href'),
    ('object', 'classid'), ('object', 'codebase'), ('object', 'data'),
    ('object', 'usemap'), ('q', 'cite'), ('script', 'src'), ('audio', 'src'),
    ('button', 'formaction'), ('command', 'icon'), ('embed', 'src'),
    ('html', 'manifest'), ('input', 'formaction'), ('source', 'src'),
    ('video', 'poster'), ('video', 'src'),
]

def rebase_one(base, url):
    """Rebase one url according to base"""

    parsed = urlparse.urlparse(url)
    if parsed.scheme == parsed.netloc == '':
        return urlparse.urljoin(base, url)
    else:
        return url

def rebase(base, data):
    """Rebase the HTML blob data according to base"""

    soup = BeautifulSoup(data)

    for (tag, attr) in targets:
        for link in soup.findAll(tag):
            try:
                url = link[attr]
            except KeyError:
                pass
            else:
                link[attr] = rebase_one(base, url)
    return unicode(soup)


if __name__ == '__main__':
    try:
        base = sys.argv[1]
    except IndexError:
        print >> sys.stderr, "Usage: %s BASEURL" % sys.argv[0]
        sys.exit(1)

    data = sys.stdin.read()
    print rebase(base, data)