Ibis and Geospatial Operations

One of the most popular extensions to PostgreSQL is PostGIS, which adds support for storing geospatial geometries, as well as functionality for reasoning about and performing operations on those geometries.

This is a demo showing how to assemble ibis expressions for a PostGIS-enabled database. We will be using a database that has been loaded with an Open Street Map extract for Southern California. This extract can be found here, and loaded into PostGIS using a tool like ogr2ogr.

Preparation

We first need to set up a demonstration database and load it with the sample data. If you have Docker installed, you can download and launch a PostGIS database with the following:

[1]:
# Launch the postgis container.
# This may take a bit of time if it needs to download the image.
!docker run -d -p 5432:5432 --name postgis-db -e POSTGRES_PASSWORD=supersecret mdillon/postgis:9.6-alpine
cd5e5e60822770ddc628adce3f4d26975fa4cfe0325d81925a6fe70b088253cc

Next, we download our OSM extract (about 400 MB):

[2]:
!wget https://download.geofabrik.de/north-america/us/california/socal-latest.osm.pbf

Finally, we load it into the database using ogr2ogr (this may take some time):

[3]:
!ogr2ogr -f PostgreSQL PG:"dbname='postgres' user='postgres' password='supersecret' port=5432 host='localhost'" -lco OVERWRITE=yes --config PG_USE_COPY YES socal-latest.osm.pbf
0...10...20...30...40...50...60...70...80...90...100 - done.

Connecting to the database

We first make the relevant imports, and connect to the PostGIS database:

[4]:
import os
import geopandas
import ibis

%matplotlib inline
[5]:
client = ibis.postgres.connect(
    url='postgres://postgres:supersecret@localhost:5432/postgres'
)

Let’s look at the tables available in the database:

[6]:
client.list_tables()
[6]:
['geography_columns',
 'geometry_columns',
 'lines',
 'multilinestrings',
 'multipolygons',
 'other_relations',
 'points',
 'raster_columns',
 'raster_overviews',
 'spatial_ref_sys']

As you can see, this Open Street Map extract stores its data according to the geometry type. Let’s grab references to the polygon and line tables:

[7]:
polygons = client.table('multipolygons')
lines = client.table('lines')

Querying the data

We query the polygons table for shapes with an administrative level of 8, which corresponds to municipalities.

We also reproject some of the column names so we don’t have a name collision later.

[8]:
cities = polygons[polygons.admin_level == '8']

cities = cities[
    cities.name.name('city_name'),
    cities.wkb_geometry.name('city_geometry')
]

We can assemble a specific query for the city of Los Angeles, and execute it to get the geometry of the city. This will be useful later when reasoning about other geospatial relationships in the LA area:

[9]:
los_angeles = cities[cities.city_name == 'Los Angeles']
la_city = los_angeles.execute()
la_city_geom = la_city.iloc[0].city_geometry
la_city_geom
[9]:
../../_images/notebooks_tutorial_11-Geospatial-Analysis_18_0.svg

Let’s also extract freeways from the lines table, which are indicated by the value 'motorway' in the highway column:

[10]:
highways = lines[(lines.highway == 'motorway')]
highways = highways[
    highways.name.name('highway_name'),
    highways.wkb_geometry.name('highway_geometry')
]

Making a spatial join

Let’s test a spatial join by selecting all the highways that intersect the city of Los Angeles, or one if its neighbors.

We begin by assembling an expression for Los Angeles and its neighbors. We consider a city to be a neighbor if it has any point of intersection (by this critereon we also get Los Angeles itself).

We can pass in the city geometry that we selected above when making our query by marking it as a literal value in ibis:

[11]:
la_neighbors_expr = cities[
    cities.city_geometry.intersects(
        ibis.literal(la_city_geom, type='multipolygon;4326:geometry')
    )
]
[12]:
la_neighbors = la_neighbors_expr.execute().dropna()
la_neighbors
[12]:
city_name city_geometry
0 Inglewood (POLYGON ((-118.346107 33.932694, -118.346076 ...
1 Long Beach (POLYGON ((-118.232975 33.714615, -118.231803 ...
2 Vernon (POLYGON ((-118.204281 33.988934, -118.204265 ...
3 Simi Valley (POLYGON ((-118.6334489 34.2697148, -118.63339...
4 Los Angeles (POLYGON ((-118.464911 34.330061, -118.457574 ...
5 Gardena (POLYGON ((-118.326689 33.91283, -118.326502 3...
6 Commerce (POLYGON ((-118.127398 34.008033, -118.127411 ...
7 Glendale (POLYGON ((-118.183869 34.148557, -118.18392 3...
8 Rancho Palos Verdes (POLYGON ((-118.417912 33.762329, -118.415164 ...
9 Lomita (POLYGON ((-118.30879 33.8022812, -118.308786 ...
10 Torrance (POLYGON ((-118.3404761 33.7862518, -118.33789...
11 Hawthorne (POLYGON ((-118.3787037 33.901905, -118.378660...
12 Calabasas (POLYGON ((-118.6097079 34.1472647, -118.61058...
13 Hidden Hills (POLYGON ((-118.6681703 34.1767732, -118.65858...
14 Santa Monica (POLYGON ((-118.5534678 33.9843349, -118.53781...
15 Culver City (POLYGON ((-118.3865712 33.9768672, -118.38547...
16 San Fernando (POLYGON ((-118.454741 34.283709, -118.454756 ...
17 Malibu (POLYGON ((-118.9517221 33.9928639, -118.94901...
18 El Segundo (POLYGON ((-118.4758021 33.8881822, -118.49898...
19 Beverly Hills (POLYGON ((-118.3897334 34.0765007, -118.38986...
20 West Hollywood (POLYGON ((-118.3958762 34.091079, -118.395592...
21 Carson (POLYGON ((-118.286874 33.7978063, -118.287645...
22 Lynwood (POLYGON ((-118.187027 33.905485, -118.186491 ...
23 South Pasadena (POLYGON ((-118.1552947 34.09859, -118.1567112...
24 Burbank (POLYGON ((-118.292761 34.221654, -118.337462 ...
25 Pasadena (POLYGON ((-118.197819 34.237289, -118.197804 ...
26 Alhambra (POLYGON ((-118.164835 34.062283, -118.164805 ...
27 Monterey Park (POLYGON ((-118.164835 34.062283, -118.164735 ...
28 Huntington Park (POLYGON ((-118.204281 33.988934, -118.204265 ...

Now we join the neighbors expression with the freeways expression, on the condition that the highways intersect any of the city geometries:

[13]:
la_highways_expr = highways.inner_join(
    la_neighbors_expr,
    highways.highway_geometry.intersects(la_neighbors_expr.city_geometry)
).materialize()
[14]:
la_highways = la_highways_expr.execute()
la_highways.plot()
[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x22ddaec08d0>
../../_images/notebooks_tutorial_11-Geospatial-Analysis_26_1.png

Combining the results

Now that we have made a number of queries and joins, let’s combine them into a single plot. To make the plot a bit nicer, we can also load some shapefiles for the coast and land:

[15]:
ocean = geopandas.read_file('https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_ocean.zip')
land = geopandas.read_file('https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_land.zip')
[16]:
ax = la_neighbors.dropna().plot(figsize=(16,16), cmap='rainbow', alpha=0.9)
ax.set_autoscale_on(False)
ax.set_axis_off()
land.plot(ax=ax, color='tan', alpha=0.4)

ax = ocean.plot(ax=ax, color='navy')
la_highways.plot(ax=ax, color='maroon')
[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x22df02fb198>
../../_images/notebooks_tutorial_11-Geospatial-Analysis_29_1.png