Categories
Uncategorized

PostGIS vs earthdistance for spatial queries

So ya thought ya might not need PostGIS for location-based queries in PostgreSQL…

As a PostGIS afficionado, I became triggered by this post. But after a while, I decided to give earthdistance a try – it’s just a piece of technology, after all. And sticking fanatically to one or another isn’t the best way to go.

Round one: setup

For test purposes, I imported a full Geonames dump (the detailed setup is described in a previous post).

The bootstraping required for earthdistance consists of creating an extension:

CREATE EXTENSION earthdistance CASCADE;

(the CASCADE modifier is needed for “cube” extension, which is a required dependency)

The earthdistance extension can work out of the box with regular numeric “latitude” and “longitude” columns, but it has two problems: the distance units are hardcoded to statute miles, and there is no index support. Using miles in continental Europe and sequential scan on a table with 12 million rows are both bad ideas. Fortunately, there is an option of using index and meters: cube-based earthdistance functions, that use a special “earth” type. So I created an index using GiST method and ll_to_earth function:

CREATE INDEX ON geoname USING GIST(ll_to_earth(latitude, longitude));

Creating this index took 11 minutes and 57 seconds on a laptop.

For PostGIS, the CREATE EXTENSION step is also required:

CREATE EXTENSION postgis;

(no CASCADE here). After that, we can create a geography column (yes, not geometry – our test dataset has a global coverage and lon-lat coordinates, a perfect use case for geography and bad for geometry):

ALTER TABLE geoname ADD COLUMN geom GEOGRAPHY;
UPDATE geoname SET geom = ST_MakePoint(longitude, latitude)::geography;

-- for Postgres 12 and newer, if you don't plan to edit location data in specialized GIS software, you can also use a generated column instead:

ALTER TABLE geoname ADD COLUMN geom GEOGRAPHY GENERATED ALWAYS AS (ST_MakePoint(longitude, latitude)::geography) STORED;

This took 7 mins 47 secs. If you are into GIS, you may notice that I skipped all the coordinate systems-related black magic. That’s because PostGIS assumes coordinates in geography type to be WGS-84, aka “GPS coordinates”, and specyfying a coordinate system is not required. The second step is to create an index:

CREATE INDEX ON geoname USING GIST(geom);

which took 2 minutes and 20 seconds.

Despite the fact that the PostGIS setup is a bit more complicated (new column and index instead of a functional index on existing columns), PostGIS is actually a bit faster here.

Round two: querying

After the necessary bootstrapping is done, it’s time to do some spatial queries like “find 10 nearest pubs in 10 kilometer radius”. As GeoNames dataset doesn’t include pubs (you’ll need OpenStreetMap data for that), I queried for nearest mountain peaks located within 10 kilometers from Zakopane railway station.

For PostGIS, such query is done using ST_DWithin (for filtering) and ST_Distance (for precise distance calculation and sorting) functions:

SELECT name, elevation, ST_Distance(geom, ST_MakePoint(19.95,49.3)) AS dist FROM geoname WHERE fclass='T' AND elevation !='' AND ST_DWithin(ST_MakePoint(19.95,49.3), geom, 10000) ORDER BY dist ASC;

The query was executed in 3 ms.

Next, the earthdistance way:

SELECT name, elevation, earth_distance(ll_to_earth(latitude, longitude), ll_to_earth(49.3, 19.95)) AS dist FROM geoname WHERE fclass='T' AND elevation !='' AND earth_box(ll_to_earth(49.3, 19.95), 10000) @> ll_to_earth(latitude, longitude) AND earth_distance(ll_to_earth(latitude, longitude), ll_to_earth(49.3, 19.95)) < 10000 ORDER BY dist ASC;

This one was executed in 4.5 ms.

Note that the earth_box function alone is not sufficient. The WHERE clause must be refined with additional earth_distance -based filter, and the docs explictly say so:

ome points in this box are further than the specified great circle distance from the location, so a second check using earth_distance should be included in the query.

Round three: precision

The earthdistance extension assumes a spherical Earth, while PostGIS geography uses a WGS84 ellipsoid as the Earth model (same as used by GPS system). The distance calculations in PostGIS are thus more precise. But what’s the difference?

SELECT name, elevation, earth_distance(ll_to_earth(latitude, longitude), ll_to_earth(49.3, 19.95)),
ST_Distance(geom, ST_MakePoint(19.95, 49.3))
FROM geoname
WHERE fclass='T' AND elevation != '' AND
ST_DWithin(geom, ST_MakePoint(19.95, 49.3), 10000)
ORDER BY st_distance ASC LIMIT 3;

name | elevation | earth_distance | st_distance
------------------------------------------------
Gubałówka | 1126 | 1813.123 | 1816.16
Samkowa Czuba | 1189 | 3528.23 | 3527.26
Krokiew | 1378 | 3770.05 | 3767.39

At 50 degrees of latitude, the earthdistance’s results are about 1-3 meters off.

Conclusion

For very basic spatial queries, both Postgres extensions offer very similar performance, with PostGIS being slightly faster in some cases and offering better precision. The accuracy provided by earthdistance lies within the error margin of an average GPS receiver (but the hardware is continously getting better, and the sphere-based calculations aren’t!). My recommendation is: if you can use the PostGIS extension, do so. The basic setup is not that complicated (you don’t have to know anything about SRIDs and projections when using the geography type!), and you get better precision and slightly better performance for free. Also, your design will be future-proof: if any more sophisticated spatial features like geofencing, measuring distance along a line, or matching a GPS track to street network are necessary, PostGIS will be there to help. For constrained environments, where installing another software package is either impossible or requiring a mountain of paperwork (like in a bank), the earthdistance remains a viable option.