A French translation of this post is
available. — Une traduction de ce billet en
français est disponible.
In April 2014,
Google Paris
organized an onsite programming contest, the
Google Hash Code.
The task to solve was quite fun. This post describes the problem, and some of
the strategies to solve it that were used by the ENS teams.
The reason why this gets published now is because the
2015 edition
is currently being organized, which for some reason gave us the impetus to
document our work.
ENS scored
pretty well
in the contest
(ENS students
were in teams ranked 1st, 2nd, 3rd, 4th, 7th, 10th; mine was "ENS Ulm 1" which
ranked 7th), and substantially improved on their solutions afterwards during the
Extended Contest,
which lasted for some time after the contest.
Problem description
The full task assignment by Google is available.
We are given a
map of Paris
(from OpenStreetMap data)
that includes all intersections and streets.
As in real life, some of the streets are one-way.
The segments are labeled by their length in meters and the time it takes to traverse them.
We have 8 Street View cars
that start from the Google office and want to photograph as many street meters as possible
in a limited amount of time (15 hours).
You can traverse the same street as many times as you want, but it only counts
once towards your total meters.
The goal is to choose a route for the 8 cars which allows you to traverse as
many unique meters of the streets of Paris as possible, within the alloted time.
The brute-force method
Of course, in principle we can solve the problem by enumerating all of the
possible routes for the 8 cars, and picking the best one. There are finitely
many routes, so this approach is guaranteed to find the optimal solution. This
is good enough for a mathematician, but not for a computer scientist:
the paths are too numerous and you will never
be able to enumerate them in practice.
Imagine there is a single car that reaches every minute an intersection with at
least two possible streets, during 10 hours. The number of paths is 210×60, which is about
10180. Even if every atom in the Solar System were an impossibly powerful
supercomputer, you wouldn't be able to reach this number before the Sun goes
bust.
The greedy method
At the opposite end of the spectrum, the simplest possible approach (beyond
purely random approaches) is to choose every street to take, one after the
other, based on some greedy optimization criterion.
Basic greedy
For instance, plan a route for each car, one after the other. Whenever the car
reaches an intersection that allows it to reach streets that haven't been
traversed yet, choose one of them. Maybe pick the one with most meters, or the
one with most meters per second. Otherwise, if all reachable streets have
been visited, choose one of them at random. Repeat while there is time, then do
the same for all cars.
The greedy strategy will yield reasonable results. There are, of course, tweaks
to make, for instance planning the route for all cars simultaneously instead of
sequentially... Or sending the cars at the beginning to hardcoded start positions which are far
apart, to increase coverage.
However, the key drawback of greedy solutions is that they
cannot plan things in advance. When choosing among already visited streets, for
instance, you are not trying to get closer to unvisited streets. If
visiting a new street will trap you in a maze of already visited streets, you
will still fall in the trap.
Greedy with lookahead
A simple solution to this problem is to improve the greedy strategy with
brute-force that looks ahead for a few steps. Whenever you need to take a decision,
don't take it based on what it will bring you now. Rather, consider all possible
paths of length, say, 10, that you can do from this point, and see which one is
bringing you the best results. Choose your next move to be the first move of the
best path. Do the routing for cars in parallel following this idea.
The path length that you use is tweakable, and in general using higher path
lengths will improve the result: I think we used something like 17 in practice.
This simple idea is easy to implement and yields pretty good results.
My team
used
it
with a few tweaks, and to my knowledge many of the top-ranking teams
also did.
The clever method
Of course, the point of this post is to describe the clever solution, the one
that was implemented after the contest, for the extended round.
Rephrasing the problem as Eulerian paths
The key insight is to notice the similarity between our problem and the
Eulerian path
problem: computing a path in a graph which visits each edge exactly once.
The map of Paris is seen as a graph, where the intersections are the vertices,
and the streets are the edges.
If we could find an Eulerian path in this graph, then it would be the ideal
solution, going through all segments without wasting any time in already visited
segments.
Of course, you need to squint a bit to see the reduction. First, we have 8
cars; but for now we will assume that we have only one, and will split the
path between the various cars later. Second, we have a time limit, and edges
have a value in terms of their number of meters; but we will disregard this. We
will pretend that we will be able to visit all edges in the alloted time
(which isn't so unrealistic, as we will see ;)), so that we ignore the time limit
and the lengths of the streets in meters (we only care about the time needed to
traverse them).
Third, and more importantly, our graph is a bit weird, because some edges are
undirected, and others (the one-way streets) are directed.
Because of the one-way streets, we cannot rely on undirected Eulerian paths, and
will be forced to reduce to directed Eulerian path. We will thus need to choose
an orientation for the undirected edges.
Fourth: well, obviously, we won't be able to find an Eulerian path. There is a
well-known characterization of directed graphs that have Eulerian paths (I call
them Eulerian graphs), which is not hard to prove: the in-degree of each vertex
(the number of edges that go towards it) must be equal to its
out-degree. Does the input graph satisfy
this condition? Not a chance. However, in our case it isn't really forbidden
to traverse edges multiple times; we just want to avoid doing it as it is
wasteful.
Hence, there will be two steps to our process to make the graph Eulerian.
First, we will choose a direction for all undirected edges, in a way that makes the
graph "more Eulerian". Second, we will add edges to make it really Eulerian,
which materialize the (wasteful) choice of going through edges which are
otherwise visited (that is, that have been visited or will be visited at another
point in the solution).
Orienting undirected edges
In order to reduce to a directed Eulerian path, we have to assign a direction to
all undirected edges. To make the Eulerian path problem easier to solve on the
resulting graph, we will try to minimize the sum, across all vertices, of the
difference between in-degree and the out-degree: intuitively, the smaller this
quantity, the less edges we have to add to make the graph Eulerian.
To introduce some terminology, I will say that a vertex is overloaded (in the
original graph, or in a directed graph obtained from it by edge orientation) if
its in-degree (counting only directed edges) is more than its out-degree; it is
underloaded if it is less, and balanced if both are equal. The load of the
vertex is the difference between in-degree and out-degree (which is >0 for
overloaded vertices and <0 for underloaded vertices).
How can we choose the best orientation of the edges? Our choice of terminology
gives some insight: we would like edge orientations that allow originally
overloaded vertices to "transfer" some of their load to the underloaded vertices,
following directed paths, which have to be disjoint to avoid double-counting the
intermediate edges. It turns out that this can actually be phrased neatly as
a flow problem.
Overloaded and underloaded vertices are respectively represented as sources and
sinks, with capacity equal to their load. Each undirected edge is represented as
a pair of edges going in both directions, with capacity 1. Edges that are
already oriented are ignored.
We solve this flow problem using any flow algorithm. By the
integral flow theorem,
the flow along each edge in the result is either 0 or 1. Hence, the result
describes a way to orient some (but not all) of the undirected edges, depending
on the direction of the flow going through them.
We still have the problem that some edges in the result are still undirected,
namely those which were not used at all in the optimal flow. To take care of
this, we simply orient all of the remaining edges at random. Of course,
this may make the final sum worse, and to alleviate this we enforce manually the
property that the flow algorithm had enforced for us before: for any path
of our newly oriented edges, if reversing the direction of
the path makes the sum better (because the path origin was underloaded and the
path target was overloaded), we do the reversal, which increases the load of the
origin by one, decreases the load of the target by one, and does not change the
other degrees.
To perform this last step efficiently, observe that it suffices to consider all
pairs of vertices. Whenever we find a pair of an underloaded vertex and an overloaded
vertex, we can search for a path from one to the other, and reverse it if it
exists.
Hence, at the end of this process, we have a directed graph where we have tried
to minimize the sum of differences between in-degree and out-degree, to make it
as Eulerian as possible. We now make it Eulerian for real, by adding edges so
that all vertices are balanced.
Adding edges
Intuitively, the edges that we add to make the graph Eulerian correspond to the
possibility of adding already traversed edges to the path.
In other words, if we add an edge from vertex i to vertex j, it means that
we will be going from i to j through a path of edges that will be visited
anyway. All the time that we spend doing such traversals is wasted, and does not
contribute towards our objective. Hence, the cost of adding an edge from i to
j should be the time needed to go from i to j following any edge path. In
fact, we may follow a path that violates our chosen orientation for the edges
that used to be undirected, and it does not matter; so we are really talking
about following a shortest path in the original graph.
Hence, we start by precomputing
the shortest path time between all pairs of vertices in the input graph, using
the
Floyd-Warshall algorithm.
Now, we model the problem of adding more edges as a
minimum weight matching in a weighted bipartite graph.
On one part of the bipartite graph, we put all vertices that are overloaded in
the result of the previous phase, duplicated as many times as their load.
We analogously create the other part
of the graph with the underloaded vertices.
We add edges in the bipartite graph between any two vertices in the two parts,
weighted by the time of the shortest path from one to the other.
Now, a
perfect matching
in this graph represents a way to add the right number of edges to ensure that
all vertices are balanced,
and if its weight is minimal then it wastes little
total time as possible traversing edges that are otherwise visited. We use the
Hungarian algorithm to
solve this problem, and to complete the oriented graph.
Constructing the Eulerian path
The resulting graph is now directed, and each vertex is balanced, so it has an
Eulerian path. To compute it, it is well-known that a
simple greedy approach
works. (Maybe there
would be possible improvements with respect to the later steps, but we didn't
look into this.)
From the Eulerian path, expanding the edges that were added in the previous
phase to the shortest paths to which they correspond, we obtain an itinerary,
which is a fairly good solution for a single car to traverse all the
streets of Paris in as little time as possible.
Splitting the path among cars
We now remember that we have 8 cars rather than a single one, so that our route
must be split among them.
Here is our approach: we take the first car, and have it run the beginning of
the route for one quarter of the alloted time. Then, we look at the rest of the
route until the end of the alloted time, and search for the traversed vertex
that is the easiest to reach (in terms of shortest path time) from the starting
vertex (the Google office). We continue the route of the first car until it
reaches this vertex, at which point we end it. Then, the next
car starts at the starting vertex, goes to the cut vertex following the shortest
path, follows the rest of the route again for one quarter of the alloted time,
and continues until the closest point to the starting vertex. We repeat until
the last car, which takes care of all the rest of the route.
Of course, the reason why we cut when each car goes close to the starting vertex
is to minimize the effort spent by the next car to reach the position where they
can resume the route.
The solution will still give too much work to the last car, but we will see later
how to balance the duration of the routes of all cars. The reason why we ensured
that each car ran for at least one quarter of the time is to make this balancing
operation easier, by ensuring that the car paths are sufficiently long to
intersect many times.
We use this as our initial solution, and now optimize it further.
Optimizations
We now iterate a bunch of optimizations to tweak the paths and make them better,
until we get something which we judge is good enough.
Of course, not all optimizations yield the same savings; and the
order in which they are applied can make a large difference. That said, I don't
remember in which order, or how many times, they were applied;
there probably wasn't much logic to it. :)
Pruning
For any subpath of the path of any car that consists only of otherwise visited
edges, we replace it by a shortest path between its endpoints (courtesy of the precomputed
Floyd-Warshall).
Loop swapping
Assume some car's path includes a move from vertex i to vertex j, then a
cycle C, then a move from vertex j to vertex i; and that the edge from i
to j is otherwise visited. If we can find any other
car that passes through an intersection of C, we can have that other car take
care of the edges of C on the way, so that the first car doesn't need to do
C and doesn't need to do the steps from i to j and back (so we save twice
the time of the i—j edge).
Of course, this can lead to a situation where some cars have a longer path than
others, but this will be dealt with later.
Trimming
If the last edge of a car's path is already visited by someone else, just remove it.
Brute-force
For every subpath of every car's path, up to certain length, if the path can be
replaced by any shorter path that leaves no street unvisited, then do so. This
differs from the pruning optimization in that the subpaths are allowed to
contain edges that are not otherwise visited; of course, the replacement subpath
will also need to traverse these edges (possibly in a different order).
Balancing
The time limit of 15 hours has been ignored until now, but it has to be obeyed
by all cars, so we shouldn't return a solution where one car does all the work
and the others do not do much.
Further, in case we do manage to traverse all edges (which we did!), the quality
of a solution is judged according to its remaining time, which is
T−maxc∈1,…,8tc
where T is the time limit and tc the time of car c.
Hence, we would ideally want all the cars to follow routes that take the same
time. This implies that the busiest cars should try to swap longer parts of
their paths with shorter parts of less busy cars.
In practice, look for any two vertices x and y and two cars such that both
cars will pass through x before passing through y. Now, in general the paths
followed by the two cars between x and y will take a different amount of
time; if swapping the paths will decrease the time difference between the two
cars' routes, then swap them.
This is a simple greedy scheme that can be iterated as long as we want. In fact,
as the paths of cars will tend to intersect very often, it works very well and
quickly converges to a state where all cars take the same time to do their
routes, no matter the initial routes.
Results
The clever strategy described here managed to traverse all streets on Paris in
the alloted time (in fact, in 9 minutes less than the alloted time). It's a pity
that this didn't happen during the contest proper, but still, it is nice. :)
Related work
We developed all of this on our own, and we didn't look seriously into the
related work of people who already tried to solve similar problems. It turns out
that our problem is known as the
postman problem,
because of course it is also useful to optimize mail delivery.
More specifically, our
version is known as the "k-mixed postman problem", because we have multiple
cars (here, k=8), and because the graph is mixed: some of the edges are
directed while others are undirected. Surprisingly, the directed postman problem
and the undirected postman problem (for k=1) can be optimally solved in polynomial
time, whereas the mixed postman problem is
NP-hard.
It would be nice to compare the results of our method to state-of-the-art
approaches. We did a preliminary comparison to
existing datasets and we have no
reason to believe that our algorithms are competitive. But, still, we had fun. :)
Credits
The ideas in this post are not mine: essentially everything was done by
other ENS students. All steps of the clever solution except for the
post-processing were conceived and implemented by
gnomnain
(code).
The optimizations were implemented by
elarnon
(code),
and
Mc and
mehdi
(code).
(All code is purely write-only and there is no documentation of any kind, so
don't try to use it.)
This post was re-written from an earlier version authored by Mc.
Thanks to Jill-Jênn,
elarnon, Armavica
and gprano for proofreading drafts and offering feedback.
There is another
write-up about the
topic, in French, by Jill-Jênn.