
A few weeks ago, we got an intriguing question: “Why do I have to submit a ticket to the GIS department to pull a list of high-density commercial parcels every time? Why can’t I just type that into our dashboard like ChatGPT?”
It sounded like science fiction a year ago, but with recent advancements in Large Language Models (LLMs), enabling Text-to-Spatial-SQL is entirely feasible right now.
We built them a prototype that connects OpenAI’s GPT-4 directly to their read-only PostGIS database. By passing the exact database schema and using a solid prompt that defines the behavior of PostGIS functions, the LLM translates plain English questions into valid, complex spatial queries.
Here is a simplified Python example of how we wired this up using the langchain library:
from langchain_community.utilities import SQLDatabase
from langchain_community.chat_models import ChatOpenAI
from langchain.chains import create_sql_query_chain
# ... Setup OpenAI API keys and environment ...
# Connect to the PostGIS instance
# CRITICAL: We explicitly use an engine user with strict READ-ONLY permissions!
db = SQLDatabase.from_uri("postgresql+psycopg2://reader:password@localhost:5432/realestate_db")
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4", temperature=0)
# Create a prompt that specifically guides the LLM to use PostGIS functions
system_prompt = """
You are a PostGIS expert. Given an input question, create a syntactically correct PostgreSQL query to run.
...
Always use proper spatial functions like ST_Intersects and ST_DWithin.
Always cast geometries to Geography types when calculating distances in meters!
"""
# Create the LangChain SQL chain
chain = create_sql_query_chain(llm, db)
# Have the user ask a spatial question
response = chain.invoke({
"question": "Which commercial parcels are within 500 meters of the new subway station?"
})
print(f"Generated Query:\n{response}")
The LLM effortlessly returns the correct spatial SQL, handling the underlying table joins and PostGIS predicate logic:
SELECT
p.parcel_id,
p.address
FROM
parcels p
JOIN
subway_stations s
ON
s.station_name = 'New Subway Station'
WHERE
p.zoning = 'Commercial'
AND ST_DWithin(p.geom::geography, s.geom::geography, 500);
While you still need strict guardrails (like restricting the database connection to a read-only role or implementing an intermediate semantic validation layer), seeing an executive ask a map a complex spatial question and get the right answer immediately feels like the future of geospatial data interaction has finally arrived.
Godspeed!