![]() |
![]() |
|---|---|
In late March 2020, EPA released a memo announcing that it would not penalize regulated industries that fail to meet their monitoring and reporting requirements due to COVID-19. Specifically EPA has said that it:
"is not seeking penalties for noncompliance only in circumstances that involve routine monitoring and reporting requirements, if, on a case-by-case basis, EPA agrees that such noncompliance was caused by the COVID-19 pandemic."
This may have a number of public and environmental health impacts if facilities respond by increasing their emissions and discharges. Our response to this memo states that the EPA’s COVID-19 leniency is a “free pass to pollute.”
Using this notebook, you can track how facilities' releases—as well as monitoring and reporting—of air and water hazards has changed over the past few months, compared to previous years.
There are three scenarios we may see playing out:
Monitoring and reporting violations
Environmental violations
We may also see facilities that both meet their reporting obligations and do not exceed their permitted limits. These facilities may still pose a risk to community and environmental health but are not the focus of this investigation. Please see…
This investigation relies upon data from the EPA’s Enforcement & Compliance History Online (ECHO), the primary open data portal supplied by the EPA, and a major basis for EPA’s decisions around enforcement. Please see EPA’s “About the Data” page for known limitations on the data’s quality and completeness. Link to data source (https://echo.epa.gov/tools/data-downloads#downloads) and limitations (https://echo.epa.gov/resources/echo-data/about-the-data)
Organization of this notebook:
Here we load some helper code to get us going.
!git clone https://github.com/edgi-govdata-archiving/ECHO_modules.git -b database-views
!git clone https://github.com/edgi-govdata-archiving/ECHO-COVID19.git -b postresql_migrate
# Import code libraries
import urllib.parse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import requests
import csv
import datetime
import folium
from folium.plugins import FastMarkerCluster
import ipywidgets as widgets
from pandas.errors import EmptyDataError
%run ECHO-COVID19/utilities.py
%run ECHO_modules/DataSet.py
We'll look at trends for March and April 2020, corresponding to the early phase of the COVID pandemic and EPA's response to it, and compare them to previous Marches and Aprils.
mnth_name="March and April"
Stack tests involve measuring the volume of pollutants coming out of the facility's smokestack.
The following cell will grab EPA data on facility stack tests for every one of the Marches and Aprils on record (up to 19 years ago). Some pollutant releases may be seasonal, so by looking only at March and April, we can account for this variation and ensure an apples-to-apples comparison.
We will only look at data from 2001 on, as EPA notes that its data systems prior to that year are incomplete and hence "unknown."
# Use SQL to search for and select the data about air stack tests
stack_data = None
full_stack_data = None
try:
sql = 'select * from \"ICIS-AIR_STACK_TESTS\"'
# Download the data from that URL
full_stack_data = get_data( sql, 'pgm_sys_id' )
except EmptyDataError:
print( "No data found")
if ( full_stack_data is not None ):
full_stack_data["ACTUAL_END_DATE"]= pd.to_datetime(full_stack_data["ACTUAL_END_DATE"], format="%m/%d/%Y", errors = 'coerce')
full_stack_data['month'] = pd.DatetimeIndex(full_stack_data["ACTUAL_END_DATE"]).month
stack_data = full_stack_data[ (full_stack_data['month'] == 3) | (full_stack_data['month'] == 4) ].copy()
# This is your data!
stack_data
The height of each bar will indicate how many tests there were, while the orange line will show us the average number of these since 2001 and the pink line indicates the average for the past three years.
charter(stack_data, 'ACTUAL_END_DATE', "%m/%d/%Y", mnth_name, 'STATE_EPA_FLAG',
"Number of stack tests", "air_monitoring.csv", "# of stack tests",
"count")
eval_data = None
try:
sql = 'select * from "ICIS-AIR_FCES_PCES" where "ACTUAL_END_DATE" like \'03-%\' or "ACTUAL_END_DATE" like \'04-%\''
eval_data = get_data( sql, "PGM_SYS_ID" )
except EmptyDataError:
print( "No data found")
eval_data
charter(eval_data, 'ACTUAL_END_DATE', "%m-%d-%Y", mnth_name, 'ACTIVITY_TYPE_DESC',
"Number of inspections", "caa_inspections.csv", "# of inspections", "count")
Air pollution is associated with greater virulence of COVID-19. What are facilities releasing into the air in spite of this greater risk to human life?
We'll start by looking at those facilities that were found to be in violation for their emissions.
air_data = None
try:
sql = 'select * from "ICIS-AIR_VIOLATION_HISTORY" where "HPV_DAYZERO_DATE" like \'03-%\' or "HPV_DAYZERO_DATE" like \'04-%\''
air_data = get_data( sql, "pgm_sys_id" )
# Remove "FACIL" violations, which are paperwork violations according to: https://19january2017snapshot.epa.gov/sites/production/files/2013-10/documents/frvmemo.pdf
air_data = air_data.loc[(air_data["POLLUTANT_DESCS"]!="FACIL")]
except EmptyDataError:
print( "No data found")
air_data
The height of each bar indicates how many emissions violations there have been. The orange line shows the average number of emissions violations since 2001, and the pink line indicates the average for the past three years.
charter(air_data, 'HPV_DAYZERO_DATE', "%m-%d-%Y", mnth_name, 'ENF_RESPONSE_POLICY_CODE',
"Number of violations", "air_violations_total.csv", "# of Clean Air Act violations", "count")
In other words, we'll zoom in on the right-most bar in the above chart and show what pollutants facilities released, causing these Clean Air Act violations.
latest = air_data[(air_data["HPV_DAYZERO_DATE"] >= '2020')]
pollutants = latest.groupby(['POLLUTANT_DESCS'])[['ACTIVITY_ID']].count()
pollutants = pollutants.rename(columns={'ACTIVITY_ID': "Violations"})
pollutants = pollutants.sort_values(by='Violations', ascending=False)
fig = plt.figure(1, figsize=(20,20))
ax = fig.add_subplot(111)
wedges, labels = ax.pie(pollutants["Violations"], labels = pollutants.index, radius = 5);
for pos,lab in enumerate(labels):
if pos < 10:
lab.set_fontsize(96)
else:
lab.set_fontsize(0)
Even if, on the whole, there are fewer exceedances, the places that are emitting more pollutants are important to track.
Run the next two cells to set up the analysis.
# Get everything we will need from ECHO_EXPORTER in a single DB query.
# We can then use the full dataframe to specialize views of it.
full_echo_data = None
column_mapping = {
'"REGISTRY_ID"': str,
'"FAC_NAME"': str,
'"FAC_LAT"': float,
'"FAC_LONG"': float,
'"FAC_PERCENT_MINORITY"': float,
'"AIR_IDS"': str,
'"NPDES_IDS"': str,
'"CAA_QTRS_WITH_NC"': float,
'"CWA_QTRS_WITH_NC"': float,
'"FAC_QTRS_WITH_NC"': float,
'"DFR_URL"': str,
'"AIR_FLAG"': str,
'"NPDES_FLAG"': str,
'"GHG_CO2_RELEASES"': float
}
# not currently using: "FAC_INFORMAL_COUNT", "FAC_FORMAL_ACTION_COUNT"
column_names = list( column_mapping.keys() )
columns_string = ','.join( column_names )
sql = 'select ' + columns_string + ' from "ECHO_EXPORTER" where "AIR_FLAG" = \'Y\' or "NPDES_FLAG" = \'Y\''
try:
# Don't index.
full_echo_data = get_data( sql )
except EmptyDataError:
print("\nThere are no EPA facilities for this query.\n")
latest = air_data[(air_data["HPV_DAYZERO_DATE"] >= '2020')]
# Pull out Ids to match ECHO_EXPORTER
ids = latest['PGM_SYS_ID'].unique()
if (len(latest.index)>0):
# Get facility information from ECHO
air_echo_data = full_echo_data[ full_echo_data['AIR_FLAG'] == 'Y' ].copy().reset_index( drop=True )
# Filter ECHO EXPORTER data to rows containing AIR_IDs from latest / air_data
idxs=set() # Use a set to get unique index positions in ECHO_EXPORTER (i.e. unique facilities)
for index,value in air_echo_data["AIR_IDS"].items(): # For each record in E_E
for i in value.split(): # For each NPDES_ID in the record
if i in ids: # If the AIR_ID is in the list of non-reporters
idxs.add(index) # Add its E_E position
idxs=list(idxs)
latest = air_echo_data.iloc[idxs,:] # Instead of join, just use E_E, since we don't need the reporting details to map facilities
qnc = latest # For later analysis of non-compliance trends
print(latest)
else:
print("Actually, there were no permit exceedences for %s" %(mnth_name))
The map shows us all the facilities that report emitting more than their permitted levels in March and April 2020.
missing = latest[(np.isnan(latest["FAC_LAT"])) | (np.isnan(latest["FAC_LONG"]))]
count = missing.index.unique()
print("There are "+str(len(count))+" facilities we can't map because they have incomplete data")
# Filter to remove NaNs - missing data!
latest = latest[~(np.isnan(latest["FAC_LAT"])) | ~(np.isnan(latest["FAC_LONG"]))]
print("There are "+str(len(latest))+" facilities mapped below.")
map_of_facilities = mapper(latest)
map_of_facilities
These may be habitually "bad actors" who should not be let off the hook (but likely will be given EPA's non-enforcement policy)
bad_actors = qnc.groupby(qnc.index)[["CAA_QTRS_WITH_NC"]].mean()
bad_actors = bad_actors[~(np.isnan(bad_actors["CAA_QTRS_WITH_NC"]))]
plt.hist(bad_actors["CAA_QTRS_WITH_NC"], density=False, bins=np.arange(14)-0.5);
plt.xticks([0,3,6,9,12])
plt.ylabel('Number of facilities')
plt.xlabel('Number of the last 12 quarters non-compliant with the Clean Air Act');
NOTE: Because there are so many facilities that discharge into waters of the US, there's a lot of data! The following cell may take a little while to run.
# Find facilities with pollutant exceedences
exceeds = None
try:
sql = 'select "NPDES_ID", "EXCEEDENCE_PCT", "MONITORING_PERIOD_END_DATE", "PARAMETER_DESC"' + \
' from "NPDES_EFF_VIOLATIONS" where "EXCEEDENCE_PCT" > 0 and ("MONITORING_PERIOD_END_DATE" like \'03/%\' or "MONITORING_PERIOD_END_DATE" like \'04/%\')'
dis_data = get_data( sql, "NPDES_ID" )
exceeds = dis_data
except EmptyDataError:
print( "No data found")
exceeds
Are facilities exceeding their permits more this month in 2020 than previous years? Like with air emissions and monitoring, we need to compare month-month (e.g. March/April 2019 to March/April 2020) because there is a seasonality to many discharges.
The height of each bar will indicate how many pollution permits have been exceeded, while the orange line will show us the average number of these since 2001 and the pink line indicates the average for the past three years.
charter(exceeds, 'MONITORING_PERIOD_END_DATE', "%m/%d/%Y", mnth_name, 'EXCEEDENCE_PCT',
"Number of pollution permits exceeded", "cwa_violations_total.csv",
"# of permit exceedances", "count")
That is, we're going to zoom in on the rightmost bar in the bar charts above.
latest = exceeds[(exceeds['MONITORING_PERIOD_END_DATE'] >= '2020')]
pollutants = latest.groupby(['PARAMETER_DESC'])[['MONITORING_PERIOD_END_DATE']].count()
pollutants = pollutants.rename(columns={'MONITORING_PERIOD_END_DATE': "Violations"})
pollutants = pollutants.sort_values(by='Violations', ascending=False)
fig = plt.figure(1, figsize=(20,20))
ax = fig.add_subplot(111)
wedges, labels = ax.pie(pollutants["Violations"], labels = pollutants.index, radius = 5);
for pos,lab in enumerate(labels):
if pos < 10:
lab.set_fontsize(96)
else:
lab.set_fontsize(0)
Remember, these are only the facilities that chose to report exceedances. Under the current policy, facilities can be not reporting at all, legally.
latest = exceeds[(exceeds['MONITORING_PERIOD_END_DATE'] >= '2020')]
# Pull out Ids to match ECHO_EXPORTER
ids = latest.index.unique()
if (len(latest.index)>0):
# Get facility information from ECHO
water_echo_data = full_echo_data[ full_echo_data['NPDES_FLAG'] == 'Y' ].copy().reset_index( drop=True )
# Filter ECHO EXPORTER data to rows containing pgm_sys_ids from latest
idxs=set() # Use a set to get unique index positions in ECHO_EXPORTER (i.e. unique facilities)
for index,value in water_echo_data["NPDES_IDS"].items(): # For each record in E_E
for i in value.split(): # For each NPDES_ID in the record
if i in ids: # If the NPDES_ID is in the list of non-reporters
idxs.add(index) # Add its E_E position
idxs=list(idxs)
latest = water_echo_data.iloc[idxs,:] # Instead of join, just use E_E, since we don't need the reporting details to map facilities
qnc = latest # For later analysis of non-compliance trends
print(latest)
else:
print("Actually, there were no reporting violations for %s" %(mnth_name))
missing = latest[(np.isnan(latest["FAC_LAT"])) | (np.isnan(latest["FAC_LONG"]))]
count = missing.index.unique()
print("There are "+str(len(count))+" facilities we can't map because they have incomplete data")
latest = latest[~(np.isnan(latest["FAC_LAT"])) | ~(np.isnan(latest["FAC_LONG"]))] # Filter to remove NaNs - missing data!
print("There are "+str(len(latest))+" facilities mapped below.")
map_of_facilities = mapper(latest)
map_of_facilities