Sunday, February 11, 2007

Using POSTGIS to find points of interest within a radius

I've recently looked into using PostGIS, an extension that enables support for spatial objects in PostgresSQL database.

The problem I was trying to solve is, given a point with (longitude, latitude), how can I find all other points within a certain radius. I've solved it before using formula to calculate distances between 2 points but it's pretty slow if your table has a lot of records. PostGIS can help solve this problem faster and more elegant.

First thing first, let's get us some data to play around with. I found this excellent tutorial that teaches you how to get zipcode data from the US Census. I'll skim through it here, but you should definitely check out that article.

Assume you have a zipcode table with these columns:

state zip latitude longitude
-------------------------------------------
CA 92230 33.911404 -116.768347
CA 92234 33.807761 -116.464731
CA 92236 33.679872 -116.176562

Now you need to add a column to this table to hold the geometric data represents by [longitude, latitude] pair.

SELECT AddGeometryColumn( 'public', 'zipcode', 'geom', 32661, 'POINT', 2 );

* add column 'geom' into table 'zipcode' under schema 'public'
* the column holds a point with 2 dimensions and has spherical reference id (srid) 32661

* now we populate the table with data converted from srid 4269 (longitude/latitude measurement) to srid 32661 (WGS 84 system)

UPDATE zipcode
SET geom = transform(PointFromText('POINT(' || longitude || ' ' || latitude || ')',4269),32661) ;

The transformation from one spherical reference to another is needed for calculating the surface distance. WGS84 is measured in meters by the way.

Now your table will look like this

state zip latitude longitude geom
-----------------------------------------------
CA 92230 33.911404 -116.768347 0101000020957F0000B7FB4ED5F2C44EC166AFB20D1A3D5341
CA 92234 33.807761 -116.464731 0101000020957F0000AA8102ECECFD4EC18284302439245341
CA 92236 33.679872 -116.176562 0101000020957F000067A4B44D2D3B4FC1596E974B370E5341

Now we only need to write a query, say, find me all the zipcode within 10 mile radius of "92230". (or 16093 meters)


SELECT state, zip
FROM zipcode
WHERE
distance(
   transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661),
   geom) < 16093

Here function distance(here_geom, there_geom) returns meters. Notice how you would read in a point(long/lat) from string using PointFromText() then do the transform.

However, this query isn't optimal. It calculates distances from zip 92230 to all other zipcodes. We would rather have a bounding box of 10 miles around our interested point, filtering out the points that interact with this box then calculate the exact distances to the center. Sounds like a lot of work but in reality, it can be achieve by adding another condition:


SELECT state, zip
FROM zipcode
WHERE
geom && expand(transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661), 16093)
AND
distance(
   transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661),
   geom) < 16093


The new query is much much faster. My benchmark shows 200% in speed compared to the first query. Pretty sweet!

8 comments:

Nate said...
This comment has been removed by the author.
Nate said...

This was exactly the kind of example I was looking for. Thanks!

hari said...

I am not sure how this works. The Postgis function distance(geom,geom) gives the distance in cartitian length. In the current query you will get all points of interest pretty much within 1000 miles. If you want the right distance, you may have to use the distance_spheroid(geom,geom,spheroid).

Probably you may have to update your post.

-- Hari Gangadharan

Hung Huynh said...

Hari,

Acutally, that's why the transform() is there to convert the geom points to surface distance (srid 32661). This way, the distance() function will return the distance we want. I have tested against my other method of using lng/lat so I'm sure this is correct.

yansen said...

Why when I try you method in my database it returns 55 km and when I compared it with many online distance calculator, they return 48 km? How can the distance difference is so big like that?

The coordinates that I test:
- coordinate 1: lat = 48.11027778; long = 16.56972222
- coordinate 2: lat = 48.16666667; long = 17.21666667

Your clarification is highly appreciated.

cbs228 said...

I agree with yansen—the distances are incorrect. EPSG 32661 is only valid near the north pole (latitudes north of 60° N), and you are using it way outside of its designated area. I suggest either picking a better projection such as EPSG 3309 (which is valid for the entire state of California) or leaving the data in latitude/longitude form and using the ST_Distance_Sphere() or ST_Distance_Spheroid() functions. However, in the latter case you will need to write your own bounding box function that converts from meters to degrees of latitude and longitude. (PostGIS's support for latitude/longitude coordinate systems is abhorrently bad.)

Thebambuproject said...

use a ST_intersects(points_geom, ST_buffer(search_point_geom, Distance))
and you are done!

Neville said...

Great article. Well explained Hung. Thanks for sharing this stuff!!