## Sunday, February 11, 2007

### Using POSTGIS to find points of interest within a radius

I've recently looked into using PostGIS, an extension that enables support for spatial objects in PostgresSQL database.

The problem I was trying to solve is, given a point with (longitude, latitude), how can I find all other points within a certain radius. I've solved it before using formula to calculate distances between 2 points but it's pretty slow if your table has a lot of records. PostGIS can help solve this problem faster and more elegant.

First thing first, let's get us some data to play around with. I found this excellent tutorial that teaches you how to get zipcode data from the US Census. I'll skim through it here, but you should definitely check out that article.

Assume you have a zipcode table with these columns:
`state    zip     latitude        longitude-------------------------------------------CA 92230 33.911404 -116.768347CA 92234 33.807761 -116.464731CA 92236 33.679872 -116.176562`

Now you need to add a column to this table to hold the geometric data represents by [longitude, latitude] pair.
`SELECT AddGeometryColumn( 'public', 'zipcode', 'geom', 32661, 'POINT', 2 );`
* add column 'geom' into table 'zipcode' under schema 'public'
* the column holds a point with 2 dimensions and has spherical reference id (srid) 32661

* now we populate the table with data converted from srid 4269 (longitude/latitude measurement) to srid 32661 (WGS 84 system)
`UPDATE zipcodeSET geom = transform(PointFromText('POINT(' || longitude || ' ' || latitude || ')',4269),32661) ;`
The transformation from one spherical reference to another is needed for calculating the surface distance. WGS84 is measured in meters by the way.

Now your table will look like this
`state zip  latitude   longitude    geom-----------------------------------------------CA  92230  33.911404  -116.768347  0101000020957F0000B7FB4ED5F2C44EC166AFB20D1A3D5341CA  92234  33.807761  -116.464731  0101000020957F0000AA8102ECECFD4EC18284302439245341CA  92236  33.679872  -116.176562  0101000020957F000067A4B44D2D3B4FC1596E974B370E5341`

Now we only need to write a query, say, find me all the zipcode within 10 mile radius of "92230". (or 16093 meters)

SELECT state, zip
FROM zipcode
WHERE
distance(
transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661),
geom) < 16093

Here function distance(here_geom, there_geom) returns meters. Notice how you would read in a point(long/lat) from string using PointFromText() then do the transform.

However, this query isn't optimal. It calculates distances from zip 92230 to all other zipcodes. We would rather have a bounding box of 10 miles around our interested point, filtering out the points that interact with this box then calculate the exact distances to the center. Sounds like a lot of work but in reality, it can be achieve by adding another condition:

SELECT state, zip
FROM zipcode
WHERE
geom && expand(transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661), 16093)
AND
distance(
transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661),
geom) < 16093

The new query is much much faster. My benchmark shows 200% in speed compared to the first query. Pretty sweet!