DuckDB vs PostGIS: Which is Really the King of Local Geospatial Analysis?

We were recently brought in by an urban mobility startup that was struggling with their data ops. Every afternoon, they’d receive gigantic CSV dumps of dockless scooter locations. To figure out which neighborhoods were being underserved, their analysts were importing these CSVs into a local PostGIS instance, running spatial joins against neighborhood polygons, and extracting the results.

The process was slow, clunky, and killing their iteration speed. They asked if there was a better way to do ad-hoc geospatial analysis without the overhead of moving data into a traditional database.

Enter the DuckDB spatial extension.

DuckDB is designed for fast, in-process analytical queries. With its spatial extension, it can rip through local files (CSVs, Parquet, GeoJSON) and perform spatial joins faster than you can say ST_Intersects.

To prove it, we ran a bake-off. Here’s a look at the equivalent queries we used.

The PostGIS Way:


-- ... (Data loading and indexing omitted via ogr2ogr and psql)

SELECT
    n.neighborhood_name,
    COUNT(s.scooter_id) as scooter_count
FROM
    neighborhoods n
JOIN
    scooter_drops s
ON
    ST_Intersects(n.geom, s.geom)
GROUP BY
    n.neighborhood_name
ORDER BY
    scooter_count DESC;

-- ...

Loading data and creating an index took minutes, followed by the query:

The DuckDB Way:


import duckdb

# ...

# Connect to an in-memory DuckDB instance
con = duckdb.connect(database=':memory:')

# Install and load the spatial extension
con.execute("""
    INSTALL spatial;
    LOAD spatial;
""")

# Execute the spatial join directly on the local CSV and GeoJSON files!
query = """
    SELECT
        n.neighborhood_name,
        COUNT(s.scooter_id) as scooter_count
    FROM
        ST_Read('neighborhoods.geojson') n
    JOIN
        read_csv_auto('scooter_drops_daily.csv') s
    ON
        ST_Intersects(
            n.geom,
            ST_Point(s.longitude, s.latitude)
        )
    GROUP BY
        n.neighborhood_name
    ORDER BY
        scooter_count DESC;
"""

df_result = con.execute(query).fetchdf()
print(df_result.head())

# ...

DuckDB queries the raw files directly—no database ingestion phase required. We executed this straight from Python using DuckDB’s native connector. DuckDB processed the spatial join directly from the CSV and GeoJSON in a fraction of the time it took PostGIS just to COPY the data in.

While PostGIS remains the undisputed champion for robust, multi-user geospatial backends and complex transaction processing, DuckDB has entirely replaced it for our rapid, local, file-based analytical workflows.

Godspeed!

 

Cart (0 items)

Create your account