# Import Libraries
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import xarray as xr
import rioxarray as rioxr
import geopandas as gpd
The Great Thomas Fire Analysis: A Tour Of Functions
Amanda G. Overbye
Have you ever dreamed of analyzing a wildfire all by yourself using Python and a handful of functions? Well if you have, today is your lucky day! The goal of this blog post is to give a quick guide on how to do just that so you can make your dream come true or live vicariously through me. First, let’s go over some background on the Thomas Fire that impacted Santa Barbara and the surrounding areas.
I am going to be doing this in two parts. In the first part, I will be making a graph to visualize how the fire impacted air quality or the AQI. For the second part, I am going to be using false color imagery to get a better picture, or just a picture at least, of the impact the fire had on vegetation in the area. As I mentioned earlier, this post is going to highlight the functions I am going to use,
And the functions are…
Concat()
Rolling()
To_file()
Squeeze()
Imshow()
Throughout this analysis, I will highlight the places where I used the functions, mention the library they are from, and explain, roughly, what they do. In addition to the functions, I will be highlighting the overall skills I will use in each section.
Some Background on the Thomas Fire
The Thomas Fire was an extremely large wildfire that burned 378,000 acres from December 4th, 2017 to January 12th, 2018. At the time, it was considered the largest wildlife in modern California history. The fire caused massive destruction both from the flames and the subsequent mudslides. It was truly devastating. Hopefully, in learning to do this analysis we can help in finding new ways to mitigate and respond to these disasters.
Some Background on the Data
I am going to be using three data sets. Each data set should be openly available to use. I will talk more about the data in the sections where I use it.
Part 1. Visualizing AQI
The data we are going to be using is about the Air Quality Index (AQI) from the US Environmental Protection Agency. The particular data sets we are using are from 2017 and 2018 and go by county. This dataset has measurements of criteria gases (things like ground-level ozone, carbon monoxide, sulfur dioxide, lead, etc.), and Particulate Matter (PM10 and PM2.5). Particulate matter is just a bunch of super tiny bits of debris that float in the air and can get into our lungs. It is measured by its diameter in microns, which are one-millionth of a meter. PM10 is larger and can sometimes be visible, whereas PM2.5 is super, super tiny and can get into our bloodstream through our lung tissue. Unsurprisingly, it is bad for us. Particulate matter is often generated by fires through all the burning, and that is why I would like to measure it here.
My goal for this section is to create a simple graph that will show the measurement of AQI over time. There are going to be a couple of steps in this process.
- Cleaning the data
- Changing the index
- Averaging data
- Visualizing data
So let’s begin!
# Read in data for the AQI analysis
= pd.read_csv('https://aqs.epa.gov/aqsweb/airdata/daily_aqi_by_county_2017.zip', compression='zip')
aqi_17 = pd.read_csv('https://aqs.epa.gov/aqsweb/airdata/daily_aqi_by_county_2018.zip', compression='zip')
aqi_18
# Define the file path to the Landsat data file
= os.path.join('/courses/EDS220/data/hwk4_landsat_data/landsat8-2018-01-26-sb-simplified.nc')
fp
# Open the Landsat dataset using rioxarray
= rioxr.open_rasterio(fp)
landsat
# Define the file path to the Thomas Fires data
= os.path.join('/', 'Users', 'overbye', 'MEDS', 'EDS-220', 'eds220-hwk4', 'data', 'thomas_fire.geojson')
fp2
# Read the file for the Thomas Fire dataset using GeoPandas
= gpd.read_file(fp2) thomas_fire
# Read in California fire perameter data
= os.path.join('/', 'Users', 'overbye', 'MEDS', 'EDS-220', 'eds220-hwk4', 'data', 'California_Fire_Perimeters_-9021497757356455187')
fp
# Read the file for the Thomas Fire dataset using GeoPandas
= gpd.read_file(fp) ca_fires
A Sweeping Explanation of Data Cleaning
When starting a project, I am often struck by how much I do not like the format the data is in. This is something every young data scientist and coder must experience and develop on their own. We all must find a format that calls to us and spend the rest of our coding lives cleaning every data set we get until it is in that format. My personal favorite is lower snake case. It makes sense to me because we say lines of text are like strings, and snakes are the most string-like animals in my opinion.
# Combine the two data sets using pd.concat
= pd.concat([aqi_17, aqi_18], ignore_index=True)
aqi
# Format column names into lower snake case
= (aqi.columns
aqi.columns str.lower()
.str.replace(' ','_')
.
)
# Select the data from Santa Barbara County
= aqi[(aqi['county_name'] == 'Santa Barbara')]
aqi_sb
# Select columns of interest
= aqi_sb[['date', 'aqi', 'category', 'defining_parameter', 'defining_site', 'number_of_sites_reporting']] aqi_sb
Setting the Index As the Date
The data is being analyzed over time, and setting the date as the index allows for more convenient handling of time-based operations. By changing the index, I am making sure that I and the computer are on the same page beside the webpage.
# Update the `date` column to be datetime object
= pd.to_datetime(aqi_sb['date'])
aqi_sb.date
# Update the index to be the `date` column
= aqi_sb.set_index('date') aqi_sb
Averaging The AQI Over Time
In order to be able to see the impact of the fire on air quality, we need to know what the air quality was before.
The rolling()
function
Here we are using a function called rolling(). What this function does is show how averages change over time. They call them rolling window calculations because they calculate over a “window” of time, and since time is moving the function must be moving too so we say it’s “rolling.” In the code, you may notice we put ‘5D’ in the parathesis, this is specifying what I want the window of time to be. Here I do 5D because I want it to be five days.
Plot
The result of this analysis is a plot. In regards to AQI, the higher the value, the worse the air quality. There is a spike where December is, which is what I expected because that is when the fire was
# Calculate AQI rolling average over 5 days
= aqi_sb['aqi'].rolling('5D').mean()
rolling_average
# Add mean of AQI over the 5-day rolling window as a new column
'five_day_average'] = rolling_average
aqi_sb[
# Create a plot of the 5-day rolling average AQI and the daily average AQI
'aqi', 'five_day_average']].plot(kind='line',
aqi_sb[[= ['Daily AQI', 'Five Day Average AQI'],
legend =' Daily and 5-day Average AQI In Santa Barbara County',
title='AQI',
ylabel='Date',
xlabel )
Part 2. Using False Color Imagery to view the impact of the fire on vegetation.
False color imagery is a way to see different colors in an image than we would normally with just the naked eye. It can be especially helpful when dealing with wildfires because it can allow us to see beyond the smoke and it can highlight healthy vegetation. The next section is going to demonstrate how to create a false color image in Python.
The skills for this section include:
- Filtering geo data and saving it as a file
- Dropping bands on geospatial data
- Creating and plotting a “true” color RGB image
- Creating a false color image
Data
For this part of my project, I will be using data from Microsoft Landsat Collection 2 Level-2 satellite imagery and data from the California Fire Perimeters (all) dataset
Filtering The Data For the 2017 Thomas Fire and Creating a GeoJSON File
After doing all the data cleaning and filtering this next part should be relevily easy. I am going to take the data from CALFIRE and filter it by the year (there have been multiple Thomas Fires), and plot it.
The to_file()
Function
This is a function that comes from the GeoPandas library. Very practically named, it takes a GeoDataFrame and saves it as a file. Below I specified that I wanted a GeoJSON file (so that I could have one file instead of multiple). If not specified, it will default to a shapefile.
# Filter for the 2017 Thomas Fire
= ca_fires[(ca_fires['YEAR_'] == 2017) & (ca_fires['FIRE_NAME'] == 'THOMAS')]
thomas_fire
# Plot Thomas Fire
thomas_fire.plot()
# Saving Thomas_fire as a GeoJSON file.
"thomas_fire.geojson", driver="GeoJSON") thomas_fire.to_file(
Dropping bands
I am now going to be working with my Landsat data. I can see when I print the data set that it has an extra band that will not be necessary in this analysis.
The Squeeze()
Function
The squeeze() function is from the Numpy library and allows us to get rid of any dimensions with the size of 1. This keeps the data from taking up too much space. It will also help me remove the band so I can replace it with my own band.
# Print to get a general overview of the landsat data
print(landsat)
# Remove extra dimensions
= landsat.squeeze()
landsat
# Drop 'band'
= landsat.drop_vars('band') landsat
<xarray.Dataset> Size: 25MB
Dimensions: (band: 1, x: 870, y: 731)
Coordinates:
* band (band) int64 8B 1
* x (x) float64 7kB 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05
* y (y) float64 6kB 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06
spatial_ref int64 8B 0
Data variables:
red (band, y, x) float64 5MB ...
green (band, y, x) float64 5MB ...
blue (band, y, x) float64 5MB ...
nir08 (band, y, x) float64 5MB ...
swir22 (band, y, x) float64 5MB ...
Plotting the True Color Image
This is where I have replaced the band I dropped earlier with red, green, and blue.
The imshow(robust = True)
Function
The imshow()
function comes from the xarray library and works with a function of the same name from the matplotlib.pyplot. This is a function that shows a 2-D picture. Incorporating the the xarray library makes it extra useful here because I can use the (robust = True) argument to ignore any outlier colors, which creates a more realistic-looking picture.
# Select 'red' 'green' and 'blue' bands
# Convert to a NumPy array using .to_array()
'red', 'green', 'blue']].to_array().plot.imshow(robust=True) landsat[[
Plotting The False Color Image
I used SWIR, NIR, and Green because they can be especially helpful when looking at fires as they show healthy vegetation as bright green.
Even without the fire perimeter overlayed, it is easy to tell where the fire was based on this false color image. The areas in red are the areas that were burning.
# Select SWIR, NIR, and Red bands and plot false color image
'swir22', 'nir08', 'red']].to_array().plot.imshow(robust=True) landsat[[
Mapping the Perimeter and false color image
# Check if thomas_fire and Landsat have matching CRSs
print("Landsat CRS:", landsat.rio.crs)
print("Thomas Fire CRS:", thomas_fire.crs)
# Reproject thomas_fire to match the CRS of the landsat data if they do not match
if thomas_fire.crs != landsat.rio.crs:
= thomas_fire.to_crs(landsat.rio.crs) thomas_fire
Landsat CRS: EPSG:32611
Thomas Fire CRS: EPSG:3857
The Final Map:
# Plot False Color Image
= plt.subplots(figsize=(10, 10))
fig, ax 'swir22', 'nir08', 'red']].to_array().plot.imshow(ax=ax,
landsat[[=True,
robust=0.8)
alpha
# Plot Thomas Fire perimeter
=ax,
thomas_fire.plot(ax='maroon',
edgecolor='none',
facecolor=2)
linewidth
# Create a custom legend
= mpatches.Patch(edgecolor='maroon',
fire_legend ='none',
facecolor="Thomas Fire Perimeter")
label
# Add legend manually
=[fire_legend],
ax.legend(handles="upper right",
loc=12)
fontsize
# Label the map
"False Color Image of the 2017 Thomas Fire Perimeter In Santa Barbara County", fontsize=16)
ax.set_title("X Coordinate")
ax.set_xlabel("Y Coordinate")
ax.set_ylabel(
plt.show()
And there you have it. Here is the map with the fire perimeter overlayed. One of the reasons false color imagery can be useful is that it can highlight where vegetation was most affected. We can see that not all the vegetation in the fire perimeter was impacted by the same amount. This can shed light on why fires burn the way they do, which, in turn, can help us with fire management.
I hope you enjoyed this wordy blog post, and now have a couple more functions under your belt you can use next time you want to analyze a wildfire!
References
Andone, D. (2018, June 2). The largest wildfire in California’s modern history is finally out, more than 6 months after it started. CNN. Retrieved December 2, 2024, from https://www.cnn.com/2018/06/02/us/thomas-fire-officially-out/index.html
Riebeek, R. (2014, March 4). Why is that Forest Red and that cloud blue? NASA. https://earthobservatory.nasa.gov/features/FalseColor
U.S. Environmental Protection Agency. (2024, June 20). Particulate matter (PM) basics. EPA. Retrieved December 2, 2024, from https://www.epa.gov/pm-pollution/particulate-matter-pm-basics