Central England Temperature Analysis 1659 - 2019
Introduction
I have been following Tony Brown's articles on the The Rise and Fall of Central England Temperatures on the Wattsupwiththat website for some time.
He has recently written
Part 3 looking at 2000 - 2019 to identify a decline in temperatures over these years. I encourage all to read his articles. They are well researched and make for interesting reading.
The MET office data sets can be found HERE. and they ask that this paper should be referenced:
A new Central England Daily Temperature Series by Parker, Legg and Folland (1992) which I will call paper 1.
It should, I think, be read in conjuction with Uncertainties in Central England Temperatures 1878 - 2003 and Some Improvements to the Maximum and Minimum Series by Parker and Horton (2005) which I will call Paper 2.
They are both required reading before undertaking any data exploration.
Initially, I will use the Monthly Mean Central England Temperature data set and see what happens when Python and all it's scientific libraries can examine it, but I may move into more data sets as time goes on.
May I say that I have nothing but admiration for the work that goes into the preparation of these data. There are some seriously clever scientists at the Met Office.
Personally I have no axe to grind regarding the efficacy of the data as given. It is what it is.
I do have some misgivings about adjustments made for Urban Heat Island effects which I discovered in a previous study on the temperature from the Radcliffe Observatory in Oxford versus measured temperatures at Benson some 14 miles away in regards to daily minimum temperatures where I found a 1.17o centigrade difference.
As a disclaimer, I also have yet to see proof of a direct correlation between CO2 levels and temperature or the theory of back radiation.
There will be a lot of python code to follow which you can ignore if you want or criticise if you want. Most of the code will contain comments as to it's purpose and my own text will be in the same format as this text, although different background colours may be used to differentiate between subjects.
#Imprt some libraries which will be needed for the analyses
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy.stats as stats
from sklearn.linear_model import LinearRegression
from sklearn.cluster import KMeans
import seaborn as sns
import folium
from folium import plugins
from IPython.display import display, Image
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
Background
The Abstract from Paper 1 is as follows:
'In 1974 Manley produced a time series of monthly average temperatures representative of central England for 1659-1973.
The present paper describes how a series of homogenized daily values representative of the same region has been formed.
This series starts in 1772, and is consistent with Manley’s monthly average values. Between 1772 and 1876 the daily series
is based on a sequence of single stations whose variance has been reduced to counter the artificial increase that results from
sampling single locations. For subsequent years, the series has been produced from combinations of as few stations as can
reliably represent central England in the manner defined by Manley. We have used the daily series to update Manley’s
published monthly series in a consistent way.
We have evaluated recent urban warming influences at the chosen stations by comparison with nearby rural stations,
and have corrected the series from 1974 onwards. The corrections do not (yet) exceed 0-1°C.
We present all the monthly data from 1974, along with averages and standard deviations for 1961-1990. We also show
sequences of daily central England temperature for sample years. All the daily data are available on request.'
The authors cite Manley's work from 1974 and try to be consistent with his methodology and original data.
Manley's original data is shown in the image below.
display(Image(filename='../img/manley.png'))
Fig 1 Manley Central England 1659 - 1973
Paper 1 (page 319) gives a list of all the meterological stations used to build the data set. There is an interactive map below where youc zoom in/out and get some basic data on the stations.
Before that, I want to have a quick look at station elevation in metres for furure reference with regards to temperature adjustment for altitude.
df_stations = pd.read_csv('../Data/stations.csv')
# Drop the redundant data from the data loaded from the CSV file
df_stations.drop(['latitude','directionns','Longitude','directionew'],axis=1,inplace=True)
plt.figure(figsize=(12,6))
plt.xticks(rotation=90)
plt.figtext(.5,.9,'Elevation in metres from sea level',fontsize=20, ha='center')
plt.ylabel('Metres')
plt.bar(df_stations.station,df_stations.Elevation);
Fig 2 Elevation in metres for stations used in Paper 1
#Setup the map name and strting point
cet_stations=folium.Map(location = (52.38333,-1.48333),zoom_start=7)
#Add the station data as markers
for (index,row) in df_stations.iterrows():
folium.Marker(
location=[row.loc['latDec'],row.loc['lonDec']],
popup= row.loc['station'] + ' Elevation ' + str(row.loc['Elevation'])+'m',
tooltip= '<strong>Click for Station Info</strong>',
icon=folium.Icon(color='cadetblue',icon='cloud')).add_to(cet_stations)
#Add terrain layer
folium.raster_layers.TileLayer('Stamen Terrain').add_to(cet_stations);
plugins.ScrollZoomToggler().add_to(cet_stations)
#Display the map
cet_stations
CET data analyses
Now I can look at the actual data file. Note that I have renamed one column from YEAR to YrAvg for clarity. However it is not the actual mean of monthly values. Not even close. This is very strange. The column 'avg' is the calculated column from the monthly data points.
I am assu,img that the data points are (Tmax + Tmin) / 2.
#Read the data into pandas and work out a mean from the monthly data points
df_cet = pd.read_csv('../Data/cetmonthlymean1659.data',skiprows=5,sep='\s+')
df_cet['avg'] = df_cet.iloc[:,1:13].mean(axis=1)
#There is no data for 2020 so it's safe to delete it
df_cet.drop(df_cet.index[361],inplace=True)
df_cet.tail(5)
The first thing to note is that the Yearly Averages given in the data set do not match with the calculated values. No reason why is given in the data set. Let's have a look at the basic statistics for each.
#The stats for the mean yearly temperatures
a = df_cet.avg.describe()
b = df_cet.YrAvg.describe()
#Put them into a DF for presentation purposes
dfcet_stats = pd.DataFrame(columns=['calcAvg','cetAvg'])
dfcet_stats['calcAvg'] = a
dfcet_stats['cetAvg'] = b
#Lets have a look at them
dfcet_stats
I always like to look at a heat map of the data just to get a basic visual understanding.
#Create a working copy of the original df
dfcet_working = df_cet.copy()
#Drop the average columns
dfcet_working.drop(['YrAvg','avg'],axis=1,inplace=True)
#Rest the index to Year to plot the heatmap
dfcet_working.set_index(keys='Year',inplace=True)
#PLot the Seaborn heatmap
plt.figure(figsize=(20,8))
plt.figtext(.5,.9,'CET Heat map of Monthly Mean Temperatures Centigrade',
fontsize=20, ha='center')
plt.ylabel('Year')
plt.ylabel('Month')
sns.heatmap(data=dfcet_working,cmap='viridis',robust=True);
Fig 3 Simple heat map of monthly temperatures for the entire data set
Now we can plot the average yearly data (the calculated number) and see if it compares with Manley's in Fig. 1. From Fig. 1 we see that Manley has plotted a 10 year rolling average so I will do the same.
df_cet['rolling_average'] = df_cet.iloc[:,14].rolling(window=25).mean()
sns.set_style(style='whitegrid', rc=None)
#Plot the mean temps - use a fig size of 50,20 to make it a bit more legible
plt.figure(figsize=(50,20))
#Plot title
plt.figtext(.5,.9,'Calculated Mean Yearly Temperatures',fontsize=50, ha='center')
#Plot Y label
plt.ylabel('Temperature Centigrade',fontsize=40)
#arrange the tick marks and text to make everything clear
plt.xticks(np.arange(min(df_cet.Year)-5,
max(df_cet.Year)+1, 5.0),
rotation=60,fontsize=30)
plt.yticks(np.arange(min(df_cet.avg),
max(df_cet.avg), 0.5),
fontsize=30)
plt.plot(df_cet.Year,df_cet['avg'],label='Mean Annual Temperature')
plt.plot(df_cet.Year,df_cet.rolling_average,color='red',label='10 Year Rolling Average')
plt.legend(fontsize=30);
Fig 4 Mean Yearly Temperatures 1659-2019 with 10 year Rolling Average
And, unsurprisingly, it matches.
There are some significantly low outliers in the data set. Look up the Great Frost of 1740, especially in Ireland to see how bad low temperatures are!.
df_cet.loc[lambda df_cet: df_cet['Year'] == 1740]
Puzzling Anomalies
I've always been puzzled by the term 'anomaly' used when referring to temperature measurements. It always puzzles me why the mean from the range 1960 - 1990 is used as a baseline and temperatures (up or down) are measured against it. Is it just for visual effect? Why introduce more calculations on the data than are required? I have to take a side track to have a look at this period.
#create a data frame with just the data from 1960 to 1990
df19601990 = pd.DataFrame(df_cet.loc[(df_cet['Year'] >= 1960) & (df_cet['Year'] <= 1990)])
#Basic stats for the calculated average value
df19601990.avg.describe()
I prefer to look at z-scores and probability density. 'Anomalies' suggest that there is something unusual about a data point whereas a z-score looks at where that point is relative to the mean and standard deviation.
But first let's look at the frequency distribution histogram.
plt.figure(figsize=(12,6))
plt.title('Frequency Distribution Plot of 1960 to 1990 Average Temperatures',fontsize=20)
plt.ylabel('Frequency',fontsize=20)
sns.distplot(df19601990['avg'],bins=10,kde=False)
plt.xlabel('Temperature',fontsize=20);
Fig 5 Frequency distribution plot of 1960-1990 temperatures
Just to be sure, I can run a check to confirm if he sample is indeed from a normal distribution.
k2, p = stats.normaltest(df19601990.avg)
alpha = 1e-3
print("p = {:g}".format(p))
if p < alpha: # null hypothesis: x comes from a normal distribution
print("The data does NOT come from a noraml distribution")
else:
print("The sample comes from a normal distribution")
Just for fun, let's do a cumulative frequency plot
plt.figure(figsize=(12,8))
plt.title('Cumulative Distribution Plot of 1960 to 1990 Average Temperatures',fontsize=20)
plt.ylabel('Frequency',fontsize=20)
plt.hist(df19601990.avg,cumulative=True, density=True, bins=30)
sns.kdeplot(df19601990.avg, cumulative=True)
plt.xlabel('Temperature',fontsize=20);
plt.show()
Fig 6 Cumulative Frequency distribution plot of 1960-1990 teemperatures
Now to get back to z-scores.
Simply put, a z-score (also called a standard score) gives you an idea of how far from the mean a data point is. But more technically it’s a measure of how many standard deviations below or above the population mean a raw score is.
zscores6090 = stats.zscore(df19601990.avg)
plt.figure(figsize=(12,8))
plt.title('Z-scores for the 1960-1990 dta with a Kernel Density Curve',fontsize=20)
sns.distplot(zscores6090,bins=10,kde=True)
plt.xlabel('Standard deviation from the mean which is normalised at zero',fontsize=20);
Fig 7 Z-scores for the 1960-1990 dta with a Kernel Density Curves
So what do all these simple statistical machinations prove?
The data comes from a normal or gaussian distribution.
Z-scores normalise the data by defining how far away they are from the mean. By doing this we can look at the probability of a certain temperature being measured in this sample.
In a normal distribution curve, 95% of all data will lie between -2 and +2 standard deviations from the mean. The chances of measuring a temperature less than -2 STD or more than +2 ST is small. (+-2.5%)
From Fig 4, it can be seen that temperatures started to rise after 1990 where before they had been falling slightly. As with all graphs, it is vitally important to look at the scale and determine the actual numerical and statistical differences rather than the eyball differences.
My own view is that it skews the readers perception into thinking that variations to the zero basline of a past 30 year sample suggest that something 'anomalous' to normal is going on.
All the data shouls be looked at as a whole.
Anomaly in its normally accepted meaning in the English language is highly misleading.
Getting back to the point!
After looking at 1960 - 1990, it's pertinent to ask the question: What's happened since?
Tony Brown's article identifies 1998 as the turning point where temperatures started to decline.
I would like to look at the 30 year period between 1989 and 2019. Why not compare it with the 1960-1990 data? That is where all the 'anomalies' are measured against.
#Create new df for 1960 - 1990
df19601990 = pd.DataFrame(df_cet.loc[(df_cet['Year'] >= 1960) & (df_cet['Year'] <= 1990)])
#create a data frame with just the data from 1989 to 2019
df19892019 = pd.DataFrame(df_cet.loc[(df_cet['Year'] >= 1989) & (df_cet['Year'] <= 2019)])
Just looking at the summary statsisticsgives a quick comparison between the two thirty year periods.
df19601990.avg.describe()
df19892019.avg.describe()
plt.figure(figsize=(20,8))
plt.title('Composite Plot of two 30 year Periods in Isolation - Lowess Line Plot for Each 30 year Period',fontsize = 20)
sns.regplot(df19601990.Year,df19601990.avg,lowess=True,truncate=True)
sns.regplot(df19892019.Year,df19892019.avg,lowess=True,truncate=True)
plt.xlabel('Year',fontsize=20)
plt.yticks(np.arange(min(df19601990.avg),
max(df19601990.avg)+1, 0.25),
fontsize=10)
plt.ylabel('Average Annual Temerature Centigrade',fontsize=20);
Fig 8 Composite plot of two 30 year time periods 1960 -90 & 1989 -2019
It's an oddly constructed chart being as it is actuall 2 charts in 1. However, it does show the almost 1 degree rise in measured temperatures in the 20 year period between 1980 and 2000. Sow what happened? Temperatures were actually falling before that and then go quite quickly up.
It's hard to say because we are only analysing mean average temperatures which does not give much away.
Did Tmax temperatures go up or did Tmin temperatures go up? That analysis will have to wait for now.
Let's move on to the seasonal averages for 1989 - 2019 and see what happens.
#Create the seasonal averages
winter = (df19892019.DEC + df19892019.JAN +df19892019.FEB) /3
spring = (df19892019.MAR + df19892019.APR +df19892019.MAY) /3
summer = (df19892019.JUN + df19892019.JUL +df19892019.AUG) /3
autumn = (df19892019.SEP + df19892019.OCT +df19892019.NOV) /3
plt.figure(figsize=(20,12))
sns.regplot(df19892019.Year,winter,lowess=True,label='Winter')
sns.regplot(df19892019.Year,spring,lowess=True,truncate=True,label='Spring')
sns.regplot(df19892019.Year,summer,lowess=True,truncate=True,label='Summer')
sns.regplot(df19892019.Year,autumn,lowess=True,truncate=True,label='Autumn')
sns.regplot(df19892019.Year,df19892019.avg,lowess=True,truncate=True,label='Yearly Average',color='black',marker='+')
plt.legend(fontsize=15,loc='lower left')
plt.xlabel('Year',fontsize=20)
plt.xticks(np.arange(min(df19892019.Year)-1,
max(df19892019.Year)+1, 2),
rotation=60,fontsize=10)
plt.yticks(np.arange(min(df19601990.JAN),
max(df19601990.JUL), 0.2),
fontsize=10)
plt.ylabel('Average Temerature Centigrade',fontsize=20)
plt.title('Average CET Seasonal Temperatures centigrade 1989 - 2019',fontsize = 20);
Fig 9 Average CET Yearly Seasonal Temperatures Centigrade with Lowess smoothing
Well it doesn't look like much has changed significantly in the last 30 years. There is however more variation in the winter season. For the last quick analysis, let's look at the winter months (December, January and February) to see what's happened. First, a look at the summary stats for Winter.
winter.describe()
There are not many conclusions to be drawn from those stats. How about looking at each individual month.
plt.figure(figsize=(20,10))
sns.regplot(df19892019.Year,df19892019.DEC,fit_reg=True,label='December')
sns.regplot(df19892019.Year,df19892019.JAN,lowess=True,truncate=True,label='January')
sns.regplot(df19892019.Year,df19892019.FEB,lowess=True,truncate=True,label='February')
plt.legend(fontsize=15,loc='lower left')
plt.xlabel('Year',fontsize=20)
plt.xticks(np.arange(min(df19892019.Year),
max(df19892019.Year)+1, 2),
rotation=60,fontsize=10)
plt.yticks(np.arange(min(df19892019.DEC),
max(df19892019.DEC), 0.2),
fontsize=10)
plt.ylabel('Average Temerature Centigrade',fontsize=20)
plt.title('Average CET Average Winter Month Temperatures centigrade 1989 - 2019',fontsize = 20);
Fig 10 Average CET individual Winter months Temperatures Centigrade with Lowess smoothing
December uses a regular linear regression to show the large confidence intervals.
December has two huge outliers, at opposite ends of the scale, in 2010 and 2015. I have checked these poits with one of my favourite sites timeanddate.comLondon reported a low of -14o in December 2010! You can check 2015 for yourself, but it matches up with the data point.
January and february are on a downward trend but it's hard to say what is happening with December although it looks as if it may have slightly warmed.
Conclusion
This has been a rather short analysis of a very small data set based on 1 file from the Met office Central England data set. Tony Brown's articles on WUWT piqued my interest, so thanks to him, I spent a few rainy days to knock something together. It is not meant to be any definitive analysis on CET data.
I do hate with a passion using 'anomalies' to present this kind of data which is why I felt it necessary to look at two 30 year periods of temperature measurements.
The interesting part for me is not why temperatures have really not moved much in the last 30 years, but why temperatures suddenly climbed from the 80's to the 90's.
Unfortunately, there has not been much use of python code in this short anlysis. Most of the time has been spent drawing graphs. A few basic graphs is all it takes to indicate to me that there is no 'climate emergency'.
It's amazing to think that 15,000 years ago there was a mile of ice above where I am sitting right now. I count myself lucky to be living in such a generous climate. Long may it continue.