2013-05-02

In this article, we want to find the town with the greatest number of
inhabitants near a given location.



A very localized example

We first need to find and import some data, and I found at the following
place a CSV listing of french cities with coordinates and population and
some numbers of interest for the exercise here.

To import the data set, we first need a table, then a COPY command:

With that data in place, we can find the 10 nearest towns of a random
choosing of us, let's pick Villeurbanne which is in the region of Lyon.

We find Lyon in our list in there, and we want the query now to return only
that one as it has the greatest number of inhabitants in the list:

Well, thank you PostgreSQL, that was easy!

Note that you can actually index such queries, that's called a KNN index.
PostgreSQL knows how to use some kind of indexes to fetch data matching an
expression such as ORDER BY a
b, which allow you to consider a KNN
search in your application.

Let's get worldwide

The real scope of our exercise is to associate every known town in the world
with some big city around, so let's first fetch and import some worldwide
data this time, from
http://download.maxmind.com/download/worldcities/worldcitiespop.txt.gz.



This time you can see that I created an extra column with the location in
there, so that I don't have to compute it each time I need it, like I did
before.

Now is the time to test that data set and hopefully fetch the same result as
before when we only had french cities loaded:

Ok, looks like we're all set for the real problem. Now we want to pick for
each of those cities it's nearest neighboor, so here's how to do that:

To be fair, I have to tell you that this query took almost 2 hours to
complete on my laptop here, but as I'm doing that for friend and a blog
article, I've been lazy and didn't try to optimise it. It could be using
LATERAL for sure, I don't know if that would help very much with
performances: I didn't try.

With that in hands we can now check some cities and their biggest
neighbours, as in the following query:

And looking for New-York City suburbs I did find a chinatown, which is a
pretty common smaller town name apparently:

Big Cities in the big world



We might need to change some of our views

So, let's see how many smaller towns each of those random big cities have:

And now let's be fair and see where are the cities with the greatest number
of towns nearby them, with the following query:

Show more