a3nm's blog

Mobile apps I use

Smartphones nowadays seem to have a "social" aspect, where people talk about which hardware and accessories they use, and which apps and services they rely on. I don't care so much about this as about my computer setup, but it seems that for most other people the opposite is true. Hence, this post describes the software part of my smartphone setup.

For reasons related to privacy, ethics, and ideology, I try to avoid proprietary software to the extent possible. Hence, my phone runs mostly free software (with exceptions, see below). This implies that my choice of apps to run on my phone is limited, which is part of the reason why my phone is a comparatively unimportant part of my general computing setup, and why this list of apps is short and easy to write. Other reasons why my phone is not important are:

  • personal preference: I'd much rather use a computer with a real keyboard than a phone, and I often have a computer at hand.
  • ethics: the phone ecosystem is much less open-source-friendly and much less privacy-friendly than the computer one (at least when you run Linux); on phones it takes way more effort to evade proprietary software, anti-features, tight integration with cloud services that I don't want, etc.;
  • laziness: it's nevertheless clear that I should be doing more with this Internet-connected computer I always carry with me, even if it means I have to write my own software, but this would be a lot of work.

CyanogenMod

I have never tried non-Android smartphones. Apple and Microsoft smartphones are obviously a no-go in terms of freedom and privacy, but Firefox OS and Ubuntu phone look like decent options that I would need to investigate. For now, however, I use Android.

The precise operating system I use is CyanogenMod. There are multiple reasons. I prefer using a system which is (in principle) community-managed, and I prefer using one I installed myself rather than the default one (which I feel is more likely to come with anti-features). Further, CyanogenMod's slight power-user focus means that it sometimes includes some additional options and/or is less eager on hiding advanced settings to protect users against their own foolishness.1 The most important reason, however, is that I know no easier way to have a mostly open-source Android system without the proprietary Google applications. I do have a Google account for various reasons,2 but nothing I'd like to associate with my phone. Using CyanogenMod allows me to have an Android-system which is as Google-free as possible.3

A slight downside of this choice is that you cannot use some of the proprietary Google applications that have nice features even though they shouldn't in principle require a Google Account, e.g., the Google Maps app. Instead, I use the Web version, even though it is inferior. The major drawback, however, is that you cannot use the Google Play Store client. As it is not possible otherwise to download the APK packages of applications (even the freeware ones), it means you cannot install any of the Android apps which are only available on Play Store, i.e., a vast majority of them. No Google apps also means no Google Play Services, and even some apps available outside of the Play Store, e.g., TextSecure, cannot be used then because they require them (see, e.g., TextSecure's FAQ entry about this).

For many real-world services (banks, restaurants, cinemas, etc.), as "mobile apps" means "Android and iPhone", and as "Android" means "Google Play", this means that I cannot use their apps. This would not be a problem if such services had decent Web interfaces, but many of them spend most of their efforts polishing their closed apps rather than their websites, so it is sometimes annoying.

Another downside is that CyanogenMod still contains proprietary software and drivers, so it would be better to switch, e.g., to Replicant, but its support of my phone is not sufficient for me to consider it yet.

F-Droid

To install apps and keep them up-to-date, instead of the Google Play Store, I use F-Droid, a repository of apps which only includes free software.

The main drawbacks of F-Droid is that there are comparatively little applications available (1563 as of this writing, against 1.4 M on the Play Store), and that the interface is a bit primitive in some respects (e.g., apps are not downloaded in the background, apps must be manually updated one by one, etc.). The advantage is that there is only free software, and that the developers are serious about identifying anti-features, building apps without tracking or advertising libraries, etc. This saves me the effort of figuring out whether all those apps are ethical.

OsmAnd

To display maps and compute routes when moving my body in the real world, I use OsmAnd, which relies on OpenStreetMap (OSM) data. OSM is a free map database which is the main open-source competitor to Google Maps. OsmAnd uses Android's feature to figure out the phone's location (using GPS, for instance), and it can either draw maps retrieved directly using an Internet connection, or it can draw them from offline OSM data downloaded beforehand, which requires no data connection. The offline feature is extremely useful abroad, where data is unreliable and prohibitively priced, and it is something that Google Maps does not always allow (for licensing reasons that don't exist with OSM).

Except the nicety of being usable offline, OsmAnd has a lot of drawbacks relative to the Google Maps app. The interface is ridiculously counter-intuitive. It is sometimes sluggish (especially when drawing the offline maps); this is quite legitimate as it's harder to decompress and draw maps than to just fetch them, but it seems like it could be improved (for instance, it doesn't seem to cache a lot of what is drawn). Offline routing is quite brittle. There is no public transportation routing, and of course no Street View. Also, addresses need to be input in a structured format, there is no good unstructured search.4

As a proud OSM editor, I use OsmAnd's editing features to fix OSM or leave notes whenever it does not match the real world.

WeeChat

I use IRC to chat, mostly on a self-hosted server, and I use XMPP via a bitlbee gateway. On my server, I use weechat. To support this on my phone, I use the Weechat-Relay Android client.

The reason why I use weechat is mostly because of this relay feature, to replace a former setup with irssi and the bip proxy which was unmaintained and had bugs. Still the result doesn't really work that well, and I can't really trust weechat to reconnect reliably whenever my data connection disappears and reappears. (In contrast with previous setups, however, this one has a crucial feature which isn't necessarily a given: when the Android client displays a message that I sent, then it has got the confirmation that the message was actually sent; it won't display my messages before sending them and then silently realize it cannot send them because it disconnected.)

ConnectBot

I use ConnectBot as an SSH client to my various machines. I use it whenever I need to check something on the machines, be it searching mail archives with notmuch,5 administering mailing-lists with listadmin, anything really...

I also use ConnectBot to create SOCKS proxies, for various purposes: whenever I need to evade network restrictions; whenever I have the feeling that my data connection lags when opening new TCP connections but is usable otherwise; when I want to use Tor on my phone through one of my machines. Port forwards and SOCKS proxies can be configured by long-tapping hosts in ConnectBot.

DAVdroid

I use DAVdroid to synchronize my calendar with my server: more details in my blogpost about this. I use the default AOSP Calendar application to manage my calendar.

K-9 mail

I use K-9 mail to manage mail on my phone, accessing my account via IMAP. I use IMAP IDLE to get my mail when it arrives without having to refresh periodically. I haven't yet invested the time to support reading OpenPGP-encrypted mail on my phone.

When I'm abroad and data is expensive, I use a procmail filter to send myself an SMS for each incoming mail using my mobile provider Free's API. I pipe the mail through this script and then through an invocation of this script. This gives me the sender, subject, and the first few lines of the message. That way, I can at least read email in real-time for free, and punctually enable data when I want to get the full email, or reply to it.

Aard Dict

I query Wikipedia very often, and I want to be able to do so when I have no Internet access, and to do it fast when data is too slow. For this, I use Aard Dict, with the French Wikipedia. (I would prefer the English one for most purposes, but it is too large, given that I also want to store music on my phone.) The dump is almost two years old, but it doesn't matter so much for most of what I need it for. Beyond their use when fact-checking, Wikipedia pages make for fairly interesting reading when bored.

I don't use yet version 2 of Aard Dict, because it hasn't been packaged for F-Droid yet. I'm not in a hurry, I'm quite satisfied with version 1.

On my computers, where I have more space, I use Kiwix with more dumps, see the blogpost about this. This is very useful on planes or in other places with no or crappy Wi-Fi.

Firefox Mobile and default browser

To browse the Internet, I use either the default browser or Firefox for Android. I'm not especially fond of either, especially their sluggishness, and their inexplicable habit of flushing pages after they have been downloaded, so that if there is no longer a data connection they can't be displayed anymore.

I need Firefox Mobile whenever I need to use a SOCKS proxy, because the default browser doesn't seem to support it: configuring Firefox Mobile to use a SOCKS proxy must be done via the about:config page, with the following settings: network.proxy.socks, network.proxy.socks_port, network.proxy.socks_remote_dns, and network.proxy.type. Conversely, I rely on the default browser's tolerably functional feature to save pages for offline viewing.

Anki

I haven't used it in a while, but Anki is a very good flashcard application, and spaced repetition, that this app implements, is a nice way to optimize writes to your brain.


  1. Not that I'm not foolish, but with software that takes it for granted that you are stupid, you usually have little opportunity to become wiser. 

  2. The main Google services I use are Webmaster Tools, Alerts, Code Jam, Hash Code, and the occasional Google Calc shared document. 

  3. CyanogenMod, however, is certainly not perfect in terms of Google-proofing, but it's better than other Android alternatives I know of.  

  4. Google Map's search is extraordinarily good, with a brilliant ability to disambiguate between place names, addresses, search queries, and good tradeoffs between popular places and impopular but nearby places (depending on where the user is currently looking). I have the impression that this must be a major engineering achievement precisely because no one notices how hard it must have been to pull off. 

  5. One day I'll write more about my email setup... 

Cruising at -41.8 million feet: Antipodal airports

In the original version of my previous entry about the most distant airports by minimal number of connections, I made an offhand remark about the alternative problem of finding the two most geographically distant airports. I originally answered it by copying a random answer to that question from the Web without thinking much about it, but after stumbling upon a different answer, I started having doubts. After more investigation, it appears that the Web has no clue what It is doing, and that these answers were erroneous.

Having recomputed everything myself from the OpenFlights dataset, I believe that the two most geographically distant airports in the world are the following, with a distance of 20002 km:

Dataset and preparation

OpenFlights gives you a (slightly noisy) dataset with airport codes, latitudes, longitudes, and altitudes. As I am just interested in the one most distant airport pair, and not in the complete rankings, I just had to clean up the one bad offender that I saw, namely, the Budapest Keleti station and its mysterious antipode double: it is probably a dataset bug, and anyway this is a train station so it has limited aerial value. See the whole script for details about the preprocessing.

The silly method: distances on cylinders, and equirectangular projections

The latitude is a decimal value between -90 and 90 that describes on which parallel a point is located: 0 is the equator, 90 is the Geographic South Pole, and -90 is the Geographic North Pole. The longitude is a decimal value between -180 and 180 that describes on which meridian a point is located: 0 is the IERS Reference Meridian, positive values go eastwards, and negative values go westwards.

What's the dumbest way to find out, given a bunch of such latitude/longitude coordinates, which points seem to be exact opposites on Earth? Well, just compute the latitude and longitude of the antipodal point, namely, the opposite point on Earth relative to the center of the Earth. This is fairly easy:

  • given a latitude \(\phi\), the antipode has latitude \(-\phi\) (it is opposite with respect to the equator);
  • given a longitude \(\lambda\), the antipode has longitude \(180+\lambda\) (i.e.,you go around the globe for half a turn) except you have to bring it back to \([-180, 180]\) by adding or subtracting 360.

Now, you estimate the distance of the second point to the antipode of the first point in the crudest possible fashion: just pretend that the latitude and longitude are two-dimensional coordinates and use the Euclidean distance, where the square brackets \([\cdot]_{-180, 180}\) denote the operation of going back to \([-180, 180]\).

\(d_{\text{e}}((\phi_1, \lambda_1), (\phi_2, \lambda_2)) = \sqrt{(\phi_1 + \phi_2)^2 + [\lambda_1 - \lambda_2 - 180]_{-180, 180}^2}\)

Of course, this distance estimation is wildly inaccurate. In geographical terms, it amounts to computing distances directly on the equirectangular projection of Earth. This is inaccurate, because the Earth is not flat, and the distortion in distances depends on latitudes and longitudes. In particular, distances near the poles are wildly overestimated. Yet, my friend Mc, when he suggested this crude approach to me, claimed that this would probably not matter much, because the distances in consideration for nearly antipodal airports are small and there aren't many airports near the poles. As we will see, he was right.

To perform this computation, I simply evaluated it on all pairs with a simple C++ program, that completes in seconds. To get a distance estimation from this, I subtract the distance between the antipode and second point to the distance of going around the Earth obtained from the Earth's mean radius. The best pairs are here, sorted with more distant pairs at the bottom (i.e., the most antipodal, so the most interesting). Hence, according to this method, the best pair is the following:

I did not check that the other results corresponded to anything sensible (and not, e.g., train stations that don't really exist), so take them with a grain of salt. For NVA-PLM, however, you can check from online sources that these airports indeed exist at the given coordinates.

The sensible method: distances on spheres, and haversines

Can we do better? Well, let us notice that the Earth is not a cylinder but a sphere, with mean radius of 6,371.009 kilometers according to the IUGG. Given two airports, what interests us is the length of the shortest route from one to the other on the sphere, which is known as the great-circle distance: the name is because the route from one point to the other will follow a great circle, a circle on the sphere whose center is that of the sphere. The standard way to do so is to use the Haversine formula. So I just plugged it in the previous code.

The results are here, and we can see that Mc was right: they don't change much. In particular, the top pair is still the same, though our distance estimate changed by 3 kilometers, i.e., quite a lot compared to the distance to the second best pair. The first difference is at position 6, where the 6th best pair and the 8th best pair are swapped. The result file contains all 3435 pairs estimated to be at a distance of at least 19800 km by the Haversine formula, with the computed distances differing from up to 107 km compared to the results of the previous section on these pairs.

The serious method: distances on ellipsoids, and WGS-84

Do you recall anything from school about the Earth not being a perfect sphere, but being a bit rounder at the Equator? How bad is this? Well, the radius varies by about 33 km; it's small compared to the overall radius, but huge relative to the distance differences between our top airport pairs. So we have to be more precise to get a more definite answer.

Fortunately, serious people already know how to deal with this problem, and model the Earth, not as a sphere, but as an ellipsoid, more precisely an oblate spheroid. An ellipsoid is less scary than it sounds: it is just what you get by rotating an ellipse around the North-South pole axis, rather than a circle (in which case you get a sphere). The most common such model seems to be the reference ellipsoid of WGS-84, the World Geodetic System used by GPS. I wouldn't want to implement geodesic calculations on ellipsoids, but fortunately other people have done it before: I used GeographicLib, CLI bindings of which are packaged for Debian as geographiclib-tools. For instance, the GeodSolve -i tool eats latitude/longitude pairs as input and produces as output two azimuths (that we don't care about) and more importantly the distance of the shortest path from one point to the other.

I wasn't sure about the feasibility of feeding all 32 million airport pairs to this tool to perform complicated computations, so I restricted the study to the 3435 top pairs given in the previous output. This is more than sufficient to be sure not to miss anything: the difference in radii is < 33 km and the margin to the best is > 200 km, which is clearly more than the maximal error we could have made. The computation on these 3435 pairs is essentially instantaneous: the results are here.

Observe that the top pair changed to the one given at the beginning of this post, with its longest distance estimate decreased by 8 km. Overall, the error between the Haversine and ellipsoid estimates is at most 23 km on these 3435 pairs. Of course, discriminating the first and second best pair, with only about 700 m difference, is a bit problematic, as the "position" of an airport is hard to define (and 700 m is less than the length of a runway...). So I picked the first one as a winner, but we start hitting the limits of the definition of our problem.

The surreal method: accounting for altitude

Having reach the limit, let us go even further! The Earth is not flat, is not a sphere, but it is not a perfect ellipsoid either. Notably, it has local variations in radius, a.k.a. mountains. What if the altitude of the airports changed the ranking? Two airports with a large altitude difference could be more distant than a more antipodal pair where both airports are at sea level, because of the need to cover the altitude difference. Further, even if two airports have the same altitude, the distances between both are greater if the altitude is high, because you will travel further from the ellipsoid center. This is not entirely negligible as there are airports with an elevation of 4-5 km, which is larger than our 700 m margin between the top pair and the second one.

Of course, thinking about it will make you realize that we cannot conceivably account for this. If you start taking precise altitude into account, the shortest path between two points may be twisted because of the need to avoid mountains. Further, this question is totally disconnected from reality: commercial flights going from a point to another first ascend to a cruise altitude of about 11-12 km, and then descend to the target, and of course this contributes to the distance travelled, and means that accounting for airport altitude would not be doable.

As a purely theoretical exercise, though, let me continue to model the Earth as the WGS-84 ellipsoid, with no mountains, and let us position the airports at their latitude/longitude and with an altitude that elevates them away from the spheroid's surface. Now, we must bound the length of the shortest path between two given airports. Clearly, the shortest path with altitude is at least the length of the path at sea level, which we computed before. Further, it is at most the sum of both altitudes plus the sea level path, because one possible path is to go down to sea level, travel, and go up again, so this path must be longer than the shortest path. I computed this dumb upper bound, and it seems like, for our 3435 pairs, the bound suffices to show that the path with altitude is never longer that that of the top pair. So taking altitude into account cannot imply that other pairs will beat the PGK-LMC pair.

The subterranean method (bonus)

In all this post I have made the implicit assumption that we are going from an airport to the other while remaining above sea level. However, the literally minded will note that when I talked about the "most distant" airport pair, I never said anything about having to go around the big spherical, er, ellipsoidal, obstacle that bars the way. So, neglecting the existence of planet Earth, what are the two most distant airports by straight line distance in three-dimensional space, going straight from one to the other?

To answer this pressing question, we can again rely on GeographicLib, whose CartConvert program takes latitude, longitude (and even altitude!) and produces Geocentric coordinates, i.e., coordinates in a three-dimensional Cartesian coordinate system. The straight line distance is then computed with the usual Euclidean distance formula:

\(d_{\text{e2}}((x_1, y_1, z_1), (x_2, y_2, z_2)) = \sqrt{(x_1 - x_2)^2 + (y_1 + y_2)^2 + (z_1 - z_2)^2}\)

Computing this only for the restricted pair subset of the previous sections, we obtain this. Amusingly, the most distant pair by this criterion is NVA-PLM, the same one as with our initial method (which also used Euclidean distance but in two-dimensional space). As far as I can tell, this is entirely coincidental. Note that the distance of 12758 km is slightly more than the average diameter of Earth (though it is less than the maximal diameter, of course): this is explained by the fact that these airports are close to the equator.

Confidence in the result

I now conclude by giving more information about whether my results can be trusted to be accurate.

The GPS coordinates for LMC and PGK in OpenFlights are essentially the same as those given in Wikipedia and are both, according to OpenStreetMap, at some position on each of these airport's main runway (LMC, PGK), which is further confirmed by the satellite imagery of Google Maps (LMC, PGK).

The fact that the distance between these two points is indeed 20001571.135 meters was computed directly by GeographicLib. However, here I must point out that this result seems to disagree with many websites that offer such calculations. This is probably caused by the fact that nearly antipodal points are corner cases for geodesic computations on ellipsoids.

It is conceivable that the sources that overestimate the distance (except Great Circle Mapper, where the difference is too high, and otherwise explained) are using a geoid model, to account for variations in radius beyond what the WGS-84 ellipsoid accounts for. This is beyond the scope of what I did.

That this pair is indeed the most distant depends on whether positions for other airports in OpenFlights are accurate, and whether this source is exhaustive, i.e., contains all airports. I did not really try to check any of these two points.

It's a big world after all: Computing the diameter of the flights graph

What are the two most distant airports on Earth? A natural definition is by geographical distance: see my followup post for this definition. However, an alternative question asks what are the two most distant airports in terms of the number of different flights required to go from one to the other. For the most geographically distant pair, LMC and PGK, the answer appears to be 5 flights, because you must go to BOG, then fly through e.g. FRA and BKK to CGK, and then to PGK. Are there other pairs where most flights are needed?

The only relevant reference I found is this thread. Quora has the awful habit of blocking the content to unregistred users, so I will summarize it: at the time of writing, the only answer estimates that there should be no airport pair requiring more than 5 flights, as you should always be able to fly via a regional hub, a global hub, a global hub, and a regional hub. This post tries to give a more definite answer; as we will show, that intuition is incorrect, and up to eight flights can be needed.

Dataset

While any form of precise timing or pricing information about flights is hard to obtain, one would think that a public database of flights with flight number, company identifier, origin airport, and destination airport, should not be hard to come by. This is not the case: the only public structured database I could find is the OpenFlights route database, which is apparently derived from the one distributed with the Airline Route Mapper, a mysterious proprietary Windows program with no contact information other than a FlyerTalk thread.1 The dataset does not include the flight number, which we will see is extremely problematic.

We can start cleaning up the routes by removing codeshare flights and flights by airlines that the OpenFlights airline database lists as defunct. For flights with an airline unknown to OpenFlights, we generate fresh airline identifiers instead. See the exact code.

Computation

Now we just compute the diameter of this graph, i.e., the maximum over all airport pairs of the length of the shortest path from one airport to the other, using a straightforward C++ program that implements Floyd-Warshall with path reconstruction. This runs in cubic time in the number of airports (around 3100), taking about 20 seconds on my machine.

The problem is that OpenFlights is not perfect, also because of the fuzzy definition of what a flight is. Indeed, the naive result gives the following paths of length 15:

15: XEQ -> YPO: XEQ QUV QFN JNN JJU UAK GOH KEF YYZ YQT YTS YMO YFA ZKE YAT YPO 
15: YPO -> QFN: YPO YAT ZKE YFA YMO YTS YTZ IAD KEF GOH UAK JJU JNN XEQ QUV QFN

However, investigating on the website of Air Greenland2, Tasiusaq Heliport and Nanortalik Heliport are connected by a direct flight (GL9402), with the intermediate stops listed as "technical stops". It a bit problematic to call those changes of flights, as you can probably just remain in the aircraft. This could be accounted for if only we had the flight number in the dataset, but sadly we do not have it. Likewise, Thunder Airlines seems to be selling the flight from Timmins to Peawanauck as one single flight (flight 105).

So I tried to fix these problems until I found an example that I liked, which required an surprising amount of manual intervention. I added a bunch of missing flights by considering stopovers with the same flight number as being free. I had to merge a few airports because some complicated routes were jumping through hoops to get from the domestic airport of a city to its international airport: for instance, to go from Sde Dov to Ben Gurion by plane only, you need to fly to Eilat and back. Last, I had to eliminate many airports for which no flights could actually be bought online from mainstream providers. I only retained flights featured either on Google Flights3 or on Skyscanner. This extremely tedious iterative process took me several hours and I don't know why I did it, but I did, hopefully without introducing grave errors. All paths from length 15 down to length 9 turned out to be spurious according to my criteria.

An eight-flight result

Here is the result for eight flights, after sufficiently refining the dataset to rule out all longer paths as they were erroneous. Now, the vast majority of these paths are probably crap, but here is an example of one that seems correct to me:

8: AUY -> IOQ: AUY TAH VLI SYD LHR KEF KUS CNP OBY

Indeed, Anatom Airport in Vanuatu seems to be served only by a flight to Tanna, from which it is only useful to connect to Port Vila; Air Vanuatu sells them as separate flights (NF253 and NF239) with different planes, and you can buy the tickets from any online source (beware, however, it seems that the flights only run on Saturdays and Tuesdays). From Port Vila you can fly only to Oceanian hubs, for instance Sydney, so you need three hops in total to get there.

Conversely, to go to Ittoqqortoormiit Heliport in Greenland, the only way is from Nerlerit Inaat Airport (and only on Wednesdays). To go there, no matter what the Wikipedia page says, it seems that there are only two reasonable ways: with Air Greenland from Kulusuk on Wednesday early afternoon (in time for your connection), which you can reach with Air Iceland (in the earlier morning, with a tighter connection) from the domestic Reykjavík Airport; or directly with Air Iceland (and on Wednesdays, but too late to connect) but only from Akureyi in Iceland, which you can then reach from Reykjavík on the same day (the two legs are flown by the exact same plane, TF-JMT, but they are sold with different flight numbers, NY-112 and NY-511). So that's three hops altogether from Reykjavík in either case. (Alternate options are to arrive from Nuuk, which is e.g. reachable from Copenhagen, but they are not competitive.) The path via Kulusuk can be bought easily from many online sources (but not Google Flights), though you may have to buy separately the Air Iceland and Air Greenland parts.

Now, to go from Sydney to Reykjavík, I claim that you need at least one stop. This is clear4 because a direct flight would beat, by far, the current longest scheduled flights. However, some flights from Australia to Europe are marketed as a single flight even though they include a refueling stop. This is particular the case of flights BA16 (via SIN) and QF1 (via DXB). Fortunately, however, no such flight goes directly to Reykjavík, so even though you can go from Sydney to London in one "flight" with this trick, you need an additional flight to reach Reykjavík.5 Hence, following our choice of sticking with flight numbers, that's two flights.6

So we have a total of eight flights. Of course, while many websites offer the individual legs, none of them will manage to sell them simultaneously. Still, eight flights is quite a far cry from the five flights initially conjectured. Of course, the fuzziness of the problem means that this numbers does not mean that much: if you include flights that can only be bought from their own companies, charter flights, count differently stops that involve no change of aircraft, etc., your mileage may vary. In particular, the "flights" from SYD to LHR probably require you to leave the plane, whereas I'm not sure that the "change" in Akureyi would require doing this, so this illustrates a bit the absurdity of the problem if you look too closely.


  1. This leads to an awful dilution in responsibility: you cannot report errors in the data, because OpenFlights says (e.g., in this bug report) that the data comes from a third party, but that third party has no apparent way to report bugs... As of this writing it looks like the discussion is happening on FlyerTalk and there is no way yet to edit the dataset collaboratively... 

  2. By the way, how ironic is it that the website of Air Greenland is one of the neatest and fastest airline websites I ever interacted with? 

  3. I did not keep Google Flights options for which the price had to be checked on the airline website: the redirection usually points to the website's landing page and not to a specific flight, suggesting that Google probably has as little information about the flight as I do. 

  4. A previous version of this post claimed that a direct flight from SYD to KEF would have been impossible because of fuel requirements. This is incorrect, as there have been longer exceptional flights. Thanks to Mc for pointing this out. 

  5. What's slightly more surprising is that using such flights is the only solution, e.g., were it not for that marketing trick, three flights would be required to go from Sydney to Reykjavík, not two. I guess that this is because KEF is not connected to anything sufficiently close to Australia or Asia that could play the role of the single stop. 

  6. If you plan to schedule a trip, the better option is then probably to leave AUY with the Saturday flights, to reach KEF on Monday evening, and then take the Wednesday flights and hopefully reach OBY on Wednesday evening. I don't know if anyone ever tried to fly from AUY to OBY, but somehow it looks unlikely. In terms of cost, if you agree to give up on the absurdly expensive "direct" flights from SYD to LHR, the price would be around 2500 EUR for a one-way trip, with around 34:30 hours spent in airplanes. 

Arbitrary-length unambiguous buffalo sentences

Consider the well-known sentence:

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.

As Wikipedia explains, this is a grammatically correct sentence using the ambiguity of Buffalo (the town) and buffalo (either a noun, the animal, or a verb, "to bully"). Let me illustrate the grammatical structure of the sentence, using words from the corresponding parts of speech, and using square brackets to help the parsing:

Boston deer [(that) Seattle doves annoy] pester Austin kittens.

The Wikipedia article points out that buffalo sentences can be created with any number of repetitions of the word buffalo. For instance:

Buffalo buffalo Buffalo buffalo Buffalo buffalo buffalo buffalo buffalo.

One can easily interpret this as, e.g.,

Puppies tickle Chicago unicorns [(that) Dallas ponies [(that) squirrels mock] nag].

In other words, considering squirrels that mock Dallas ponies that nag Chicago unicorns, then there are puppies tickling such unicorns.1 For legibility we can also write this in the following way, using Nouns, Verbs, and Cities:

N V (C N (C N (N V) V))

The problem is that this sentence can equivalently be understood as:

C N (C N (C N V) V) V

In my silly rephrasing language:

Houston birds [(that) Phoenix bunnies [(that) Memphis butterflies bother] peeve] displease.

Note that I'm allowing here an intransitive use of buffalo (or "displease" in the gloss) as a verb, which I guess is OK. (Indeed, Wikipedia claims that Buffalo! is a valid imperative sentence.)

Hence, the above buffalo sentence has the unpleasant aspect that it can be parsed in many different ways. By contrast, the original one seems to have only one reasonable parse, as long as we can rely on case to distinguish Buffalo and buffalo. Hence the simple and tantalizing question: can we construct arbitrary-length buffalo sentences that are unambiguous, i.e., have a single grammatical interpretation?2

In the rest of this post I show the following, for a certain choice of grammar which I hope covers all reasonable interpretations: for any length greater than one, the (unique) case-insensitive buffalo sentence is ambiguous; by contrast, if we are case sensitive, we can build unambiguous buffalo sentences of any length, even if we have case ambiguity on the first word.

Grammar

I will use the following grammar, in simili-BNF:

S    ->   V NPO | NP V NPO
NPO  ->   NP    | C        | ε
NP   ->   NPS   | NPS PP
NPS  ->   N     | C N
PP   ->   NP V
C    ->   Buffalo
V    ->   buffalo
N    ->   buffalo

In other words:

  • A sentence is either an imperative sentence or a declarative sentence, both of which have an object noun phrase.
  • An object noun phrase can be a noun phrase, the empty sequence (corresponding to an intransitive use of the main verb) or a single Buffalo. The last case handles the case of Buffalo buffalo Buffalo understood as "Dragonflies tease New York" (i.e., its inhabitants). I only allow Buffalo to be an object, because my understanding of grammar suggests that sentence's main verb should be "buffaloes" if Buffalo were the subject.
  • A noun phrase is a simple noun phrase optionally followed by a relative proposition: we will be able to nest relatives but importantly we cannot juxtapose them.
  • A simple noun phrase is either buffalo (the noun) or Buffalo buffalo (the city followed by the noun).
  • A relative proposition has a subject noun phrase and a verb, with the object of the verb being the antecedent of the proposition.

This grammar can uniquely parse the example sentence,3 and includes a few additional features not exemplified by that sentence, which I already commented about: buffalo as an imperative and/or intransitive verb, and Buffalo used as an object.

Case-insensitive case

In the fully case-insensitive case, we write everything as buffalo. As we can show, all sentences of length greater than 1 are ambiguous.

The reason is that S -> V NP and S -> NP V are two different possible parses, and it is not hard to see that we can construct NPs of arbitrary length: we can choose any number \(n\) of nested PP in the derivation, and choose either N or C N at each GN, of which there are \(n+1\). So to realize a length of \(k\), choose to create \(n = \lfloor (k-1) / 2 \rfloor\) PP, and create \(k - n\) words in the GN (which is between \(n+1\) and \(2(n+1)\)).

As an example, for buffalo buffalo buffalo buffalo buffalo, parse buffalo buffalo buffalo buffalo as NP, e.g., as, N (C N V), and parse the sentence either as V NP or NP V.

Case-sensitive case

We now assume that we can distinguish Buffalo and buffalo, except maybe for the first word, intuitively because it must carry a capital because it starts the sentence. Let us consider small lengths as examples.

  • For length 1, the sentence Buffalo can only be interpreted as V, namely, the imperative with no complement. My grammar forbids the declarative N, a nominal sentence declaring the existence of buffaloes, but of course this is debatable.
  • For length 2, Buffalo Buffalo has exactly one interpretation because the last Buffalo must clearly be a C, so the unique parse is V C. Again, we are prohibiting the nominal sentence asserting the existence of buffalo in Buffalo.
  • For length 3, considering Buffalo Buffalo buffalo, the first Buffalo cannot be a C as otherwise the next symbol should be buffalo, and it cannot be an N as otherwise the next Buffalo buffalo must start a PP that cannot finish. Hence the first Buffalo is a verb and we have a unique interpretation: V (C N).

For larger lengths, we will first observe that if any sequence of buffalo and Buffalo can be parsed as an PP, then there is only exactly one way to do so. To see why, observe that Buffalo must always be followed by buffalo in which case we always parse both symbols as an NPS. So write b for buffalo and b2 for Buffalo buffalo, and rewrite our sentence (uniquely) as a sequence of b's and b2's. We then see that each b or b2 at the left must pop a b at the right, so that parse fails unless the number of b and b2's is even and the rightmost half contains only b's; conversely, if this condition is satisfied, there is a unique parse. So we have proven our initial observation.

Second, we claim that for any (possibly empty) sequence S, there is at most one way to parse buffalo S or Buffalo buffalo S as an NP. This is clear as we must parse buffalo or Buffalo buffalo as an NPS and the rest as an PP, and we can then use the previous reasoning. It is not so hard to see then that this claim extends to the situation where the first symbol has ambiguous case. Indeed, for any sequence S, at most one of S and buffalo S has a successful parse as a PP (because, rewriting them using b's and b2's, only one rewriting can have an even number of symbols). Hence, for Buffalo buffalo S, only one option succeeds between parsing the initial Buffalo as an N (using case ambiguity) and buffalo S as a PP, or parsing Buffalo buffalo as an NPS and S as a PP. Likewise, for buffalo S, either we parse buffalo as an N and S as a PP, or we parse buffalo as a C (using case ambiguity): in this second case the first symbol of S cannot be Buffalo, so that, writing S as buffalo S', we parse S' as a PP. By the previous reasoning only one of these two options can succeed, and if it does then it succeeds in one unique way. So we have proven our initial claim.

We now show that for any non-empty sequence S, the only way to parse S buffalo Buffalo as a sentence is to parse S as an NP. This is because a final Buffalo is necessarily an NPO. Now, as S buffalo contains at least two symbols, it cannot be V, so it must be NP V, so the buffalo is V and we must parse S as an NP. Hence, combining this with our previous claim, for any sequence S, there is at most one way to parse the following sentences:

  • Buffalo S buffalo Buffalo
  • Buffalo buffalo S buffalo Buffalo

It is now easy to define recursively the language of sequences S that have a successful parse as a PP (or as the empty sequence \(\epsilon\)), abbreviating buffalo and Buffalo as b and B. By our initial observation, such parses are then unique.

  • \(L_0 = \epsilon\)
  • \(L_n = \bigcup_{w \in L_{n-1}} \{b \, w \, b, B \, b \, w \, b\}\)

So using \(L_0\) we can construct buffalo sentences with unique parses for length 3 and 4; and using the first rule in the definition of \(L_n\) we can then reach \(k+2\) for any \(k\) that we can reach. Hence, as 3 is odd, 4 is even, and 1 and 2 were previously covered, we have shown our claim: we can construct uniquely parsable buffalo sentences of any length, even with ambiguity on the case of the first word.

To illustrate, is an example of a sentence of length 42 which we have shown to have a unique parse.4

Buffalo buffalo Buffalo buffalo Buffalo buffalo buffalo Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo Buffalo buffalo Buffalo buffalo Buffalo buffalo buffalo Buffalo buffalo Buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo Buffalo.


  1. For all animal species in the sentence, it seems semantically unclear whether the individuals are existentially or universally quantified. For instance, we do not know whether the sentence applies to all such unicorns or only some of them. 

  2. By "grammatical interpretation" I mean parsing, not meaning; I am not interested in possible doubts about whether the verb buffalo means "intimidate" or "hunting buffaloes", nor do I care about which buffalo species or Buffalo city the speaker is referring to. 

  3. If the initial Buffalo is V, then buffalo is an NP, we must parse a PP, Buffalo buffalo is an NP, we must parse a PP, then either we start unstacking the V and we are left with Buffalo buffalo which we cannot parse, or we further parse buffalo as an NP and then we can never unstack enough V. If the initial Buffalo is N, then for a similar reason we cannot parse what follows as a PP (if we unstack early then we are left with the second Buffalo where the verb should be), so the second buffalo is V, Buffalo buffalo is then NP, and once again we cannot parse a PP. So the initial Buffalo is C, and buffalo is then N, and Buffalo buffalo must then be a PP. We must unstack immediately, we must parse buffalo as the verb, and we have no more choices. 

  4. Of course, this is just an example, and in general unique parse sentences are not themselves unique. We can in fact build unambiguous sequences following entirely different schemes; e.g., we can also show that Buffalo Buffalo buffalo w has a unique parse (with the initial Buffalo being necessarily a V), for any w in the language inductively defined by \(L_1 = \{B \, b \, b\}\) and \(L_n = \bigcup_{w \in F_{n-1}} \{B \, b \, B \, b \, b \, w \, b \, b \, b, B \, b \, w \, b\}\)

Configuring screens with xrandr

There are various GUI tools on Linux to change the screen resolution and configure the various screens of your machine, which is useful to do, e.g., when plugging in a video projector to give talks. From my experience, however, the right tool, the one that does not hide anything from you and allows you to get the job done, is the CLI utility xrandr.

This post is a quick guide to perform common tasks with xrandr. I'm certainly not an X guru, but maybe this can serve as an introduction about how to perform common display configuration tasks with it.

The starting point is always to see the current state with:

xrandr -q

This will print out information about all screens and outputs known to the system. For instance:

Screen 0: minimum 320 x 200, current 3360 x 1080, maximum 16384 x 16384
LVDS connected 1920x1080+1440+0 (normal left inverted right x axis y axis) 382mm x 215mm
   1920x1080     60.01*+
   1680x1050     59.95  
   1400x1050     59.98  
   1280x1024     59.89  
   1440x900      59.89  
   1280x960      59.94  
   1280x854      59.89  
   1280x800      59.81  
   1280x720      59.86  
   1152x768      59.78  
   1024x768      59.92  
   800x600       59.86  
   848x480       59.66  
   720x480       59.71  
   640x480       59.38  
DisplayPort-0 disconnected (normal left inverted right x axis y axis)
DisplayPort-1 disconnected (normal left inverted right x axis y axis)
DisplayPort-2 disconnected (normal left inverted right x axis y axis)
VGA-0 connected 1440x900+0+0 (normal left inverted right x axis y axis) 408mm x 255mm
   1440x900      59.89*+  74.98  
   1280x1024     75.02    60.02  
   1280x800      59.81  
   1152x864      75.00  
   1024x768      75.08    70.07    60.00  
   832x624       74.55  
   800x600       72.19    75.00    60.32    56.25  
   640x480       75.00    72.81    66.67    60.00  
   720x400       70.08

The first thing is to figure out which output corresponds to which of the screens you have in front of you. You can usually determine it from the output names: VGA, HDMI, and DVI usually correspond to the connector type on the motherboard. On the example output above from my work computer, VGA-0 is indeed an external screen plugged in my computer's VGA port. LVDS is usually a laptop's internal screen, which is the case in the example. Sometimes the names are off, e.g., on my home computer, a DVI port is mistakenly labeled as an HDMI port. Alternatively, you can identify screens by their modes if you know which resolutions they support. For instance, above, I know that my main screen supports 1920x1080, so it is necessarily the LVDS one. (If you don't know what a screen resolution is, read on, I'll explain it.)

If you're trying to configure an external screen or video projector, and it doesn't show up in the output of xrandr -q, then you have a problem that's deeper than just persuading X to use the output properly. You should probably investigate whether the kernel or X server can see anything (in case your output is not listed) or check the cable (in case the output is listed but indicated as disconnected). When debugging things, know that all external screen standards that I know of (VGA, DVI, HDMI) are perfectly plug-and-play, meaning that it is safe to play with the cables. However, in some cases (e.g., laptops), it could be the case that an output will not work unless the system has booted with a usable screen on that output, or unless some magic key combination is pressed to enable the output.

If all outputs that you want to use seem to be listed and have a screen connected to them, congrats! You can now configure them. The two simplest commands are the following. The first one enables one output that has a connected screen and sets it to its preferred resolution, which is usually what you want to do. The second one disables one output (any screen connected to it will usually go in power-save mode).

# enable output VGA1 and set its screen to its preferred mode
xrandr --output VGA1 --auto
# disable output VGA1
xrandr --output VGA1 --off

Now, how to decide what to put on each screen? The first possibility is to tile the display across multiple screens. This makes sense when you have a laptop and an external screen, or multiple screens on one machine, and want to show different windows on them. If you are trying to use a video projector, this is reasonable if you are using a fancy presentation tool like pdfpc that can show the slides on one screen and use your laptop screen to show the next slide, notes, etc. Here is an example to illustrate how this is configured1:

# my home setup is as follows, from left to right:
# one VGA screen, one DVI screen, one HDMI screen
xrandr --output VGA1 --left-of DVI1
xrandr --output DVI1 --left-of HDMI1

The second thing that you could want to do is to have two screens that show the same thing. For instance, for presentations using a non-fancy PDF reader, you usually want both your screen and the video projector to show your slides. This is achieved very simply:

# set VGA1 (the projector) to show the same thing as LVDS1 (laptop screen)
xrandr --output VGA1 --same-as LVDS1

However, if your projector is not set to the same resolution as your screen, your presentation will be clipped or mis-centered on one of the displays. Let me explain a bit about this. What a screen displays is made of atomic dots, aka pixels2, and the resolution of a screen is the number of pixels it displays, given as the width times the height, for instance 1024x768 (with the letter 'x' commonly used for the times symbol instead of the Unicode character '×'). Screen resolutions also have names, such as "VGA" (for 640x480), or "FHD" (for Full HD, that is3, 1920x1080). Screen resolution tells you nothing about the actual size of the display, which depends on the size of the physical pixels on the screen, aka pixel density4.

Screens can support multiple resolutions. In the vast majority of cases, the screen is designed to be used at its highest possible resolution, and lower resolutions look a bit ugly because they crudely scale up the lower resolution to the screen's physical resolution (the number of physical pixels it has). Screens can be set at a lower resolution as a poor man's version of zooming, although it is usually far prettier to do the resizing at the software level, e.g., use larger fonts, so that fonts are large but remain crisp. However, lower resolutions are also useful to show the same thing (e.g., your presentation) on two screens with different characteristics.

Doing xrandr -q indicates which modes are supported by the screen connected to each output, with a '+' next to the preferred mode (the one the screen is "designed" to be used in) and a '*' next to the currently used mode. Each mode consists of a screen resolution, given at the left, and a refresh rate5.

Thus, to fix the problem with --same-as, you should determine from the list what is the highest resolution supported by both outputs. (With some luck, it will be the projector's preferred resolution, so only your laptop screen will look a bit blurry; but this is not always the case, especially when aspects ratios do not match.) You should then set the outputs to that resolution; for instance:

xrandr --output VGA1 --mode 1024x768
xrandr --output LVDS1 --mode 1024x768

Once you are done giving your presentation, you will probably want to revert your laptop's screen to its preferred resolution using --auto, as previously explained.

You can do a bunch of other things with xrandr, such as rotating displays, applying transformation matrices to them, software brightness and gamma correction, zooming, etc., see man xrandr to know more. Playing with these options should be safe (although it's true that, in ancient days, there were stories about software ways to destroy CRT screens...). A real risk, however, is that if you make all your screens unusable (e.g., turn them off with --off, make them display garbage with bogus scales, flip them, etc.,), xrandr has no way to allow you to revert to sane settings automatically after a time, as GUI tools often do. You will have to undo your mistake in the blind. If you can't do this, one possible option is to switch to a TTY with Ctrl-Alt-F1, and use xrandr from there; you will probably have to pass it an environment variable so that it can talk to your X server, for instance:

DISPLAY=":0" xrandr --output LDVS1 --auto

Worst comes, of course, you can always restart the machine, as these settings do not persist across reboots.


  1. Experts will note that you can combine this in one invocation to the xrandr command, but for simplicity I will not address this here. 

  2. Pixel is the abbreviation of "picture element" and is supposed to be the smallest displayable unit. However, in fact, on LCD screens, individual pixels are further split in red, green, and blue subpixels, as you can easily see with a magnifying glass (or, e.g., a drop of rain on your smartphone's screen). This fact is commonly used to improve the quality of the display to something finer than what the resolution would suggest, especially for text; this is known as subpixel rendering

  3. It is typically quite annoying to see screens advertised as "Full HD" as if it were some magical statement about the quality of the screen, whereas it is nothing more than a technical statement about the number of pixels it can display. 

  4. In particular, what Apple brands as "retina displays" are displays with a certain high pixel density, which supposedly exceeds the human eye's own density for a certain typical viewing distance. Nothing magical here either. 

  5. The refresh rate is the number of times per second the screen should update, in Hertz. You can select it using --rate. I will not talk about it further, because it usually does not matter.