Python Tutorial 3 - Extract data within a shapefile using ERDDAP
This tutorial will teach you how to extract and display SST values for a particular time period or average SST over the whole time-series available within a shapefile.
The shapefile for the NOAA Marine National Monument and sanctuaries boundaries can be downloaded here:
http://sanctuaries.noaa.gov/library/imast_gis.html
Links to an external site..
We are going to extract SST data for the Papahanaumokuakea Marine National Monument (PMNM) in Hawaii. However, because the Monument boundaries cross the dateline, the shapefile provided on the website is tricky to work with. We'll work with a cleaned up version, available here:
https://oceanwatch.pifsc.noaa.gov/files/PMNM_bounds.csv
Links to an external site.
This tutorial is also available as a Jupyter notebook Links to an external site..
Load packages
import pandas as pd
import numpy as np
import urllib.request
import xarray as xr
import netCDF4 as nc
from matplotlib import pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from shapely.geometry import Point, Polygon
import geopandas as gpd
np.warnings.filterwarnings('ignore')
Load the Monument boundary
df=pd.read_csv('PMNM_bounds.csv')
Transform the boundary to a Polygon
geometry = [Point(xy) for xy in zip(df.lon, df.lat)]
poly = Polygon([(p.x, p.y) for p in geometry])
poly
Data extraction
The example below extracts monthly 5km CoralTemp SST Links to an external site. data within the monument boundary.
- We are going to download data from ERDDAP for the smallest bounding box that contains our polygon
xcoord1 = (np.min(df.lon), np.max(df.lon))
ycoord1 = (np.min(df.lat), np.max(df.lat))
- let's select a date range:
tcoord = ("2019-01-15", "2019-12-15")
- and let's build our ERDDAP URL:
url='
https://oceanwatch.pifsc.noaa.gov/erddap/griddap/CRW_sst_v1_0_monthly.nc?analysed_sst[('+
Links to an external site. tcoord[0] +'):1:('+ tcoord[1] +')][('+ str(ycoord1[0]) +'):1:('+ str(ycoord1[1]) +')][(' + str(xcoord1[0]) +'):1:('+ str(xcoord1[1]) +')]'
url
'https://oceanwatch.pifsc.noaa.gov/erddap/griddap/CRW_sst_v1_0_monthly.nc?analysed_sst[(2019-01-15):1:(2019-12-15)][(19.2345832):1:(31.79786423)][(177.84422):1:(198.9827)]'
- now we can download the data:
urllib.request.urlretrieve(url, "sst.nc")
- and load it as an xarray dataset:
ds = xr.open_dataset('sst.nc',decode_cf=False)
ds.analysed_sst.shape
(12, 252, 424)
We now have data for a box around our polygon, for 12 monthly time steps (= 1 year).
Masking the data outside the Monument boundary
The .within()
function from the shapely
package checks if a point is within a polygon. We are using it to create a mask which will take the value 1 within the polygon boundary, and NaN outside.
(This takes about 1min or less to run).
mask=np.empty((len(ds.latitude.values),len(ds.longitude.values)))
mask[:]=np.NaN
for i in range(len(ds.latitude.values)):
for j in range(len(ds.longitude.values)):
p=Point(ds.longitude.values[j],ds.latitude.values[i],)
if int(p.within(poly))==1:
mask[i,j]=int(p.within(poly))plt.contourf(ds.longitude,ds.latitude,mask)
We now multiply the SST data we downloaded by the mask values:
SST=ds.analysed_sst*mask
Plotting the data
The extracted data contains several time steps (months) of sst data in the monument boundaries. Let's make a plot of the 4th time step for example.
- setting up the colormap
np.min(SST),np.max(SST)
array(16.863333), array(28.78)
levs = np.arange(16, 29, 0.05)
jet=["blue", "#007FFF", "cyan","#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"]
cm = LinearSegmentedColormap.from_list('my_jet', jet, N=len(levs))
- loading data to plot the coastline. The file can be downloaded here Links to an external site., and was provided by https://eric.clst.org/tech/usgeojson/ Links to an external site.. Download the file and save it to your computer.
country = gpd.read_file("gz_2010_us_outline_20m.json")
- plot:
import cartopy.crs as ccrs
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 10))
# Label axes of a Plate Carree projection with a central longitude of 180:
ax1 = plt.subplot(211, projection=ccrs.PlateCarree(central_longitude=180))
ax1.set_extent([177, 207, 18, 32], ccrs.PlateCarree())
ax1.coastlines()
ax1.set_xticks(range(178,207,4), crs=ccrs.PlateCarree())
ax1.set_yticks(range(20,32,2), crs=ccrs.PlateCarree())
lon_formatter = LongitudeFormatter(zero_direction_label=True)
lat_formatter = LatitudeFormatter()
ax1.xaxis.set_major_formatter(lon_formatter)
ax1.yaxis.set_major_formatter(lat_formatter)
ax1=plt.contourf(ds.longitude,ds.latitude,SST[4,:,:],levs,cmap=cm,transform=ccrs.PlateCarree())
cbar=plt.colorbar(fraction=0.022)
cbar.ax.tick_params(labelsize=12)
#cs.ax.tick_params(labelsize=12)
plt.title('SST - April 2019', fontsize=20)
plt.show()
Example of loading and working which an ESRI shapefile instead of .csv file
Let's download a shapefile of the main Hawaiian Islands, from the State of Hawaii website: https://planning.hawaii.gov/gis/download-gis-data-expanded/ Links to an external site.
Download the coastline shapefile: https://files.hawaii.gov/dbedt/op/gis/data/coast_n83.shp.zip Links to an external site. and save it to your computer, and unzip.
from shapely.geometry import Point, Polygon
import geopandas as gpd
from matplotlib import pyplot as plt
fp='coast_n83.shp/coast_n83.shp'
data = gpd.read_file(fp)
data.plot(figsize=(14, 10))
plt.show()
data
AREA | PERIMETER | COASTLINE_ | COASTLIN_1 | WATER | geometry | |
---|---|---|---|---|---|---|
0 | 1.436729e+09 | 201403.677 | 2 | 6 | 0 | POLYGON ((460110.193 2457446.021, 460127.255 2... |
1 | 1.099133e+06 | 6195.871 | 3 | 14 | 0 | POLYGON ((386773.512 2434856.011, 386762.074 2... |
2 | 1.863355e+08 | 83549.716 | 4 | 15 | 0 | POLYGON ((390419.743 2432820.417, 390426.993 2... |
3 | 1.546408e+09 | 381624.267 | 5 | 17 | 0 | POLYGON ((603081.220 2399886.893, 603149.469 2... |
4 | 1.844934e+06 | 7897.713 | 6 | 18 | 0 | POLYGON ((607748.827 2363405.920, 607730.327 2... |
5 | 8.768437e+05 | 4463.074 | 7 | 19 | 1 | POLYGON ((609318.894 2357506.733, 609343.269 2... |
6 | 2.072612e+06 | 7228.803 | 8 | 20 | 0 | POLYGON ((616130.011 2357149.079, 616123.136 2... |
7 | 6.752768e+08 | 191967.661 | 9 | 12 | 0 | POLYGON ((687118.835 2346925.744, 687142.210 2... |
8 | 1.887675e+09 | 309213.835 | 10 | 10 | 0 | POLYGON ((746643.293 2326473.297, 746688.044 2... |
9 | 3.654021e+08 | 93216.920 | 11 | 8 | 0 | POLYGON ((709229.796 2315624.344, 709236.171 2... |
10 | 7.982950e+04 | 1977.211 | 12 | 4 | 0 | POLYGON ((760627.186 2283442.561, 760631.185 2... |
11 | 1.155304e+08 | 69408.819 | 13 | 3 | 0 | POLYGON ((753039.059 2280106.643, 753046.185 2... |
12 | 1.045876e+10 | 640064.228 | 14 | 1 | 0 | POLYGON ((833230.796 2243109.785, 833236.921 2... |
data.geometry[0]