Sunday, February 11, 2007

Using POSTGIS to find points of interest within a radius

I've recently looked into using PostGIS, an extension that enables support for spatial objects in PostgresSQL database.

The problem I was trying to solve is, given a point with (longitude, latitude), how can I find all other points within a certain radius. I've solved it before using formula to calculate distances between 2 points but it's pretty slow if your table has a lot of records. PostGIS can help solve this problem faster and more elegant.

First thing first, let's get us some data to play around with. I found this excellent tutorial that teaches you how to get zipcode data from the US Census. I'll skim through it here, but you should definitely check out that article.

Assume you have a zipcode table with these columns:

state zip latitude longitude
CA 92230 33.911404 -116.768347
CA 92234 33.807761 -116.464731
CA 92236 33.679872 -116.176562

Now you need to add a column to this table to hold the geometric data represents by [longitude, latitude] pair.

SELECT AddGeometryColumn( 'public', 'zipcode', 'geom', 32661, 'POINT', 2 );

* add column 'geom' into table 'zipcode' under schema 'public'
* the column holds a point with 2 dimensions and has spherical reference id (srid) 32661

* now we populate the table with data converted from srid 4269 (longitude/latitude measurement) to srid 32661 (WGS 84 system)

UPDATE zipcode
SET geom = transform(PointFromText('POINT(' || longitude || ' ' || latitude || ')',4269),32661) ;

The transformation from one spherical reference to another is needed for calculating the surface distance. WGS84 is measured in meters by the way.

Now your table will look like this

state zip latitude longitude geom
CA 92230 33.911404 -116.768347 0101000020957F0000B7FB4ED5F2C44EC166AFB20D1A3D5341
CA 92234 33.807761 -116.464731 0101000020957F0000AA8102ECECFD4EC18284302439245341
CA 92236 33.679872 -116.176562 0101000020957F000067A4B44D2D3B4FC1596E974B370E5341

Now we only need to write a query, say, find me all the zipcode within 10 mile radius of "92230". (or 16093 meters)

SELECT state, zip
FROM zipcode
   transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661),
   geom) < 16093

Here function distance(here_geom, there_geom) returns meters. Notice how you would read in a point(long/lat) from string using PointFromText() then do the transform.

However, this query isn't optimal. It calculates distances from zip 92230 to all other zipcodes. We would rather have a bounding box of 10 miles around our interested point, filtering out the points that interact with this box then calculate the exact distances to the center. Sounds like a lot of work but in reality, it can be achieve by adding another condition:

SELECT state, zip
FROM zipcode
geom && expand(transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661), 16093)
   transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661),
   geom) < 16093

The new query is much much faster. My benchmark shows 200% in speed compared to the first query. Pretty sweet!