a3nm's blog

It's a big world after all: Computing the diameter of the flights graph

— updated

What are the two most distant airports on Earth? A natural definition is by geographical distance: see my followup post for this definition. However, an alternative question asks what are the two most distant airports in terms of the number of different flights required to go from one to the other. For the most geographically distant pair, LMC and PGK, the answer appears to be 5 flights, because you must go to BOG, then fly through e.g. FRA and BKK to CGK, and then to PGK. Are there other pairs where more flights are needed? Note: this article was written in 2015, so some of the info here may have gone out of date. Proceed at your own risk.

The only relevant reference I found is this thread. Quora has the awful habit of blocking the content to unregistred users, so I will summarize it: at the time of writing, the only answer estimates that there should be no airport pair requiring more than 5 flights, as you should always be able to fly via a regional hub, a global hub, a global hub, and a regional hub. There is also this presentation which claims that the diameter of the flight graph is more than 20, but no examples are given. This post tries to give a more definite answer; as we will show, up to 8 flights can be needed.

Dataset

While any form of precise timing or pricing information about flights is hard to obtain, one would think that a public database of flights with flight number, company identifier, origin airport, and destination airport, should not be hard to come by. This is not the case: the only public structured database I could find is the OpenFlights route database, which is apparently derived from the one distributed with the Airline Route Mapper, a mysterious proprietary Windows program with no contact information other than a FlyerTalk thread.1 The dataset does not include the flight number, which we will see is extremely problematic.

We can start cleaning up the routes by removing codeshare flights and flights by airlines that the OpenFlights airline database lists as defunct. For flights with an airline unknown to OpenFlights, we generate fresh airline identifiers instead. See the exact code.

Computation

Now we just compute the diameter of this graph, i.e., the maximum over all airport pairs of the length of the shortest path from one airport to the other, using a straightforward C++ program that implements Floyd-Warshall with path reconstruction. This runs in cubic time in the number of airports (around 3100), taking about 20 seconds on my machine.

The problem is that OpenFlights is not perfect, also because of the fuzzy definition of what a flight is. Indeed, the naive result gives the following paths of length 15:

15: XEQ -> YPO: XEQ QUV QFN JNN JJU UAK GOH KEF YYZ YQT YTS YMO YFA ZKE YAT YPO 
15: YPO -> QFN: YPO YAT ZKE YFA YMO YTS YTZ IAD KEF GOH UAK JJU JNN XEQ QUV QFN

However, investigating on the website of Air Greenland2, Tasiusaq Heliport and Nanortalik Heliport are connected by a direct flight (GL9402), with the intermediate stops listed as "technical stops". It a bit problematic to call those changes of flights, as you can probably just remain in the aircraft. This could be accounted for if only we had the flight number in the dataset, but sadly we do not have it. Likewise, Thunder Airlines seems to be selling the flight from Timmins to Peawanauck as one single flight (flight 105).

So I tried to fix these problems until I found an example that I liked, which required an surprising amount of manual intervention. I added a bunch of missing flights by considering stopovers with the same flight number as being free. I had to merge a few airports because some complicated routes were jumping through hoops to get from the domestic airport of a city to its international airport: for instance, to go from Sde Dov to Ben Gurion by plane only, you need to fly to Eilat and back. Last, I had to eliminate many airports for which no flights could actually be bought online from mainstream providers. I only retained flights featured either on Google Flights3 or on Skyscanner. This extremely tedious iterative process took me several hours and I don't know why I did it, but I did, hopefully without introducing grave errors. All paths from length 15 down to length 9 turned out to be spurious according to my criteria.

An eight-flight result

Here is the result for eight flights, after sufficiently refining the dataset to rule out all longer paths as they were erroneous. Now, the vast majority of these paths are probably crap, but here is an example of one that seems correct to me:

8: AUY -> IOQ: AUY TAH VLI SYD LHR KEF KUS CNP OBY

Indeed, Anatom Airport in Vanuatu seems to be served only by a flight to Tanna, from which it is only useful to connect to Port Vila; Air Vanuatu sells them as separate flights (NF253 and NF239) with different planes, and you can buy the tickets from any online source (beware, however, it seems that the flights only run on Saturdays and Tuesdays). From Port Vila you can fly only to Oceanian hubs, for instance Sydney, so you need three hops in total to get there.

Conversely, to go to Ittoqqortoormiit Heliport in Greenland, the only way is from Nerlerit Inaat Airport (and only on Wednesdays). To go there, no matter what the Wikipedia page says, it seems that there are only two reasonable ways: with Air Greenland from Kulusuk on Wednesday early afternoon (in time for your connection), which you can reach with Air Iceland (in the earlier morning, with a tighter connection) from the domestic Reykjavík Airport; or directly with Air Iceland (and on Wednesdays, but too late to connect) but only from Akureyi in Iceland, which you can then reach from Reykjavík on the same day (the two legs are flown by the exact same plane, TF-JMT, but they are sold with different flight numbers, NY-112 and NY-511). So that's three hops altogether from Reykjavík in either case. (Alternate options are to arrive from Nuuk, which is e.g. reachable from Copenhagen, but they are not competitive.) The path via Kulusuk can be bought easily from many online sources (but not Google Flights), though you may have to buy separately the Air Iceland and Air Greenland parts.

Now, to go from Sydney to Reykjavík, I claim that you need at least one stop. This is clear4 because a direct flight would beat, by far, the current longest scheduled flights. However, some flights from Australia to Europe are marketed as a single flight even though they include a refueling stop. This is particular the case of flights BA16 (via SIN) and QF1 (via DXB). Fortunately, however, no such flight goes directly to Reykjavík, so even though you can go from Sydney to London in one "flight" with this trick, you need an additional flight to reach Reykjavík.5 Hence, following our choice of sticking with flight numbers, that's two flights.6

So we have a total of eight flights. Of course, while many websites offer the individual legs, none of them will manage to sell them simultaneously. Still, eight flights is quite a far cry from the five flights initially conjectured. Of course, the fuzziness of the problem means that this numbers does not mean that much: if you include flights that can only be bought from their own companies, charter flights, count differently stops that involve no change of aircraft, etc., your mileage may vary. In particular, the "flights" from SYD to LHR probably require you to leave the plane, whereas I'm not sure that the "change" in Akureyi would require doing this, so this illustrates a bit the absurdity of the problem if you look too closely.

There is a related question on travel.SE.


  1. This leads to an awful dilution in responsibility: you cannot report errors in the data, because OpenFlights says (e.g., in this bug report) that the data comes from a third party, but that third party has no apparent way to report bugs... As of this writing it looks like the discussion is happening on FlyerTalk and there is no way yet to edit the dataset collaboratively... 

  2. By the way, how ironic is it that the website of Air Greenland is one of the neatest and fastest airline websites I ever interacted with? 

  3. I did not keep Google Flights options for which the price had to be checked on the airline website: the redirection usually points to the website's landing page and not to a specific flight, suggesting that Google probably has as little information about the flight as I do. 

  4. A previous version of this post claimed that a direct flight from SYD to KEF would have been impossible because of fuel requirements. This is incorrect, as there have been longer exceptional flights. Thanks to Mc for pointing this out. Also note that, from 2018, [there is now](https://en.wikipedia.org/wiki/Longest_flights#Non-stop_flights_(top_30,_by_great_circle_distance) a direct Quantas flight between London and Perth. Again, all data in this post may have gone out of date. 

  5. What's slightly more surprising is that using such flights is the only solution, e.g., were it not for that marketing trick, three flights would be required to go from Sydney to Reykjavík, not two. I guess that this is because KEF is not connected to anything sufficiently close to Australia or Asia that could play the role of the single stop. 

  6. If you plan to schedule a trip, the better option is then probably to leave AUY with the Saturday flights, to reach KEF on Monday evening, and then take the Wednesday flights and hopefully reach OBY on Wednesday evening. I don't know if anyone ever tried to fly from AUY to OBY, but somehow it looks unlikely. In terms of cost, if you agree to give up on the absurdly expensive "direct" flights from SYD to LHR, the price would be around 2500 EUR for a one-way trip, with around 34:30 hours spent in airplanes. 

comments welcome at a3nm<REMOVETHIS>@a3nm.net