what on earth is a geo panda?

GeoPandas is an open-source Python library that extends the functionality of the popular pandas library to handle geospatial data. It provides data structures and tools for working with geospatial vector data, such as points, lines, and polygons, in a tabular format. GeoPandas combines the capabilities of pandas for data manipulation and analysis with the spatial operations and visualization tools of other libraries like shapely and Fiona.

With GeoPandas, you can read, write, and manipulate geospatial datasets in various formats, including shapefiles, GeoJSON, and more. It allows you to perform common geospatial operations like spatial joins, overlays, and buffering. You can also apply attribute queries, create new geometries, and perform spatial transformations and projections.

GeoPandas provides an intuitive and user-friendly interface to work with geospatial data, allowing you to perform complex spatial analyses with ease. It integrates well with other Python libraries such as Matplotlib and seaborn for data visualization, and scikit-learn for machine learning tasks on geospatial data.

data structures

There are two data structures in the geopandas module: Geoseries and GeoDataFrame

A Geoseries and a GeoDataFrame are two distinct data structures in the Geopandas library, each serving unique purposes in geospatial analysis.

A Geoseries represents a column of geometric objects, such as points, lines, or polygons, along with associated attributes. It is akin to a one-dimensional array, allowing for efficient manipulation and analysis of geospatial data.

A GeoDataFrame is a comprehensive data structure that combines geospatial and tabular data. It extends the functionality of a Pandas DataFrame by incorporating a special “geometry” column for storing geometric objects, alongside other columns for attribute data. This integration enables seamless handling, analysis, and visualisation of both spatial and attribute information within a single tabular framework.

a simple tutorial to get you started

This tutorial takes two datasets – OS Open Greenspace and Edinburgh’s Natural Neighbourhoods. A map is then created that shows what percentage of each neighbourhood is greenspace.

The IDE I originally used for this is PyCharm, but Jupyter Notebook is a great alternative, especially when producing visual outputs.

Firstly we import the necessary libraries: geopandas, webbrowser and os.

import geopandas as gpd
import webbrowser
import os

We then read the two shapefiles (*.shp) using the geopandas.read_file() function

greenspace_data = gpd.read_file("W:/Python/geospatial/Greenspace/OS Open Greenspace (ESRI Shape File) GB/data/GB_GreenspaceSite.shp")
edi_neighbourhoods = gpd.read_file("W:/Python/geospatial/Edinburgh/Natural_Neighbourhoods.shp")

An empty GeoDataFrame is created with two columns that are taken from the natural neighbourhoods shapefile: Neighbourhoods (NATURALCOM) and geometry.

edi_gdf = gpd.GeoDataFrame()
edi_gdf["Neighbourhoods"] = edi_neighbourhoods["NATURALCOM"]
edi_gdf["geometry"] = edi_neighbourhoods["geometry"]

The natural neighbourhoods shapefile is then iterated through and checked to make sure the geometry is valid.

  • The column NATURALCOM is assigned to a variable name
  • The geometry column is assigned to a variable clipper
for i, row in edi_neighbourhoods.iterrows():
    if row["geometry"].is_valid:
        name = row["NATURALCOM"]
        clipper = row["geometry"]

The clip function is then used to find areas of greenspace that fall in the same area as each natural neighbourhood. The clip is then assigned to the variable greenspace_clipped.

greenspace_clipped = greenspace_data.clip(clipper)

By taking the area of each neighbourhood and the greenspace within it, we can then find what percentage of greenspace is in each neighbourhood.

la_area = float(clipper.area)
greenspace_area = float(sum(greenspace_clipped['geometry'].area))
perc_greenspace = round((greenspace_area / la_area) * 100, 1)

The greenspace percentage of each area is now assigned to a newly created column ‘Greenspace’ in the edi_gdf GeoDataFrame.

edi_gdf.loc[i, "Greenspace Percentage"] = perc_greenspace

Afterwards an interactive map is created using the .explore() function and the Greenspace column. The map is then saved and opened using os and the webbrowser modules.

m = edi_gdf.explore(column="Greenspace Percentage",
                    cmap="Set2")

m.save("W:PythongeospatialEdinburghGreenspace.html")

webbrowser.open("file://" + os.path.realpath("W:PythongeospatialEdinburghGreenspace.html"))

Here’s the entire script:

import geopandas as gpd
import webbrowser
import os

# Read the greenspace data shapefile
greenspace_data = gpd.read_file("W:/Python/geospatial/Greenspace/OS Open Greenspace (ESRI Shape File) GB/data/GB_GreenspaceSite.shp")

# Read the Edinburgh neighborhood shapefile
edi_neighbourhoods = gpd.read_file("W:/Python/geospatial/Edinburgh/Natural_Neighbourhoods.shp")

# Create an empty GeoDataFrame
edi_gdf = gpd.GeoDataFrame()

# Assign the neighborhood names to the "Neighbourhoods" column
edi_gdf["Neighbourhood"] = edi_neighbourhoods["NATURALCOM"]

# Assign the geometry to the "geometry" column
edi_gdf["geometry"] = edi_neighbourhoods["geometry"]


# Iterate over each neighborhood
for i, row in edi_neighbourhoods.iterrows():
    if row["geometry"].is_valid:  # Check if the geometry is valid
        name = row["NATURALCOM"]  # Extract the neighborhood name
        clipper = row["geometry"]  # Extract the neighborhood geometry

        # Clip the greenspace data to the neighborhood boundary
        greenspace_clipped = greenspace_data.clip(clipper)

        # Calculate the area of the neighborhood
        la_area = float(clipper.area)

        # Calculate the area of the clipped greenspace
        greenspace_area = float(sum(greenspace_clipped['geometry'].area))

        # Calculate the percentage of greenspace coverage
        perc_greenspace = round((greenspace_area / la_area) * 100, 1)

        # Print the neighborhood name and greenspace coverage percentage
        print(f"{name} Greenspace Coverage = {perc_greenspace}%")

        # Assign the greenspace coverage percentage to the corresponding row in edi_gdf
        edi_gdf.loc[i, "Greenspace Percentage"] = perc_greenspace

print(edi_gdf)  # Print the resulting GeoDataFrame

# Create a map visualization of the greenspace coverage
m = edi_gdf.explore(column="Greenspace Percentage", cmap="Set2")

# Save the map as an HTML file
m.save("W:Python/geospatial/EdinburghGreenspace.html")

# Open the saved HTML file in the default web browser
webbrowser.open("file://" + os.path.realpath("W:Python/geospatial/EdinburghGreenspace.html"))

This is a very simple use case for the geopandas module but it hopefully highlights how quickly a map can be created using geospatial tools.

Here’s a link to the map in codepen: click

what else can geopandas do?

Tools

  1. Geometry Operations:
    • Union: Combines multiple geometries into a single geometry representing their union.
    • Intersection: Computes the intersection of two or more geometries.
    • Difference: Computes the difference between two geometries.
    • Symmetric Difference: Computes the symmetric difference between two geometries.
    • Buffer: Creates a buffer around a geometry at a specified distance.
    • Simplify: Simplifies the geometry by reducing the number of vertices.
  2. Spatial Joins:
    • Spatial Join: Performs a spatial join between two GeoDataFrames based on their spatial relationship (e.g., points within polygons, polygons intersecting polygons).
    • Nearest Neighbor Join: Performs a join between two GeoDataFrames based on the nearest neighbor relationship.
  3. Spatial Queries:
    • Point-in-Polygon: Checks if points are located within polygons.
    • Polygon Overlaps: Identifies overlapping polygons.
  4. Aggregation and Summary Statistics:
    • Dissolve: Aggregates geometries based on a common attribute value, resulting in new polygons.
    • Aggregate: Aggregates geometries based on a common attribute value, resulting in a new GeoDataFrame.
  5. Geometric Measurements:
    • Area: Calculates the area of geometries.
    • Length: Calculates the length of line geometries.
    • Centroid: Computes the centroid of polygons.

functions

  1. Read and write geospatial data: Geopandas allows you to read various geospatial data formats, such as shapefiles (.shp), GeoJSON (.geojson), and more. You can also write geospatial data in different formats.
  2. Data exploration and manipulation: Geopandas provides a familiar DataFrame interface for working with geospatial data. You can perform various data exploration and manipulation operations, such as filtering, selecting, grouping, merging, and aggregating data based on attributes or spatial relationships.
  3. Spatial operations: Geopandas enables you to perform spatial operations on geometries, such as geometric calculations, buffering, simplification, centroid calculation, intersection, union, difference, and more. These operations allow you to analyze and manipulate geometries based on their spatial relationships.
  4. Attribute and spatial joins: Geopandas allows you to join attribute data from one geospatial dataset to another based on a common attribute field. You can also perform spatial joins to combine geometries based on their spatial relationships, such as points within polygons or polygons intersecting other polygons.
  5. Geometric and attribute-based querying: Geopandas provides powerful querying capabilities to select specific features based on their attributes or spatial properties. You can apply attribute filters, perform spatial queries like point-in-polygon or polygon overlaps, and retrieve subsets of data based on specific conditions.
  6. Visualization: Geopandas integrates with popular visualization libraries, such as Matplotlib and Seaborn, to create static maps and plots of geospatial data. You can visualize points, lines, and polygons, apply color maps, customize symbology, and create choropleth maps to represent attribute data.
  7. Spatial indexing and optimization: Geopandas leverages spatial indexing techniques, such as R-tree, to efficiently handle large geospatial datasets. This allows for faster spatial queries and operations on the data.
  8. Coordinate reference system (CRS) management: Geopandas supports the handling of different coordinate reference systems. You can assign, transform, and reproject geometries to ensure consistent spatial analysis and visualization.
  9. Integration with other geospatial libraries: Geopandas integrates well with other geospatial libraries, such as Shapely, Fiona, PyProj, and GDAL. This enables seamless interoperability and access to additional functionality for geospatial analysis and data manipulation.

Overall

To conclude, geopandas is a great module to use for geoprocessing, manipulate and combine different datasets and create maps quickly. It integrates well with many other modules and areas such as machine learning.

 

Leave a comment

Latest Stories