{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "24e562c3",
   "metadata": {},
   "source": [
    "# Notebook for processing London Fire Brigade Data\n",
    "\n",
    "Author: [Huanfa Chen](https://github.com/huanfachen)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "ca908c7c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Last modified: 2026-01-08\n"
     ]
    }
   ],
   "source": [
    "from datetime import date\n",
    "import pandas as pd\n",
    "# suppress warnings\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "print(\"Last modified:\", date.today())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "56af03f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# read Excel file data from this link: https://data.london.gov.uk/download/em8xy/f5066d66-c7a3-415f-9629-026fbda61822/LFB%20Incident%20data%20from%202018%20-%202023.xlsx\n",
    "# data from 2024:\n",
    "# https://data.london.gov.uk/download/em8xy/58m/LFB%20Incident%20data%20from%202024%20onwards.xlsx\n",
    "import pandas as pd\n",
    "df = pd.read_excel('https://data.london.gov.uk/download/em8xy/f5066d66-c7a3-415f-9629-026fbda61822/LFB%20Incident%20data%20from%202018%20-%202023.xlsx')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "b6024ebc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['IncidentNumber', 'DateOfCall', 'CalYear', 'TimeOfCall', 'HourOfCall', 'IncidentGroup', 'StopCodeDescription', 'SpecialServiceType', 'PropertyCategory', 'PropertyType', 'AddressQualifier', 'Postcode_full', 'Postcode_district', 'UPRN', 'USRN', 'IncGeo_BoroughCode', 'IncGeo_BoroughName', 'ProperCase', 'IncGeo_WardCode', 'IncGeo_WardName', 'IncGeo_WardNameNew', 'Easting_m', 'Northing_m', 'Easting_rounded', 'Northing_rounded', 'Latitude', 'Longitude', 'FRS', 'IncidentStationGround', 'FirstPumpArriving_AttendanceTime', 'FirstPumpArriving_DeployedFromStation', 'SecondPumpArriving_AttendanceTime', 'SecondPumpArriving_DeployedFromStation', 'NumStationsWithPumpsAttending', 'NumPumpsAttending', 'PumpCount', 'PumpMinutesRounded', 'Notional Cost (£)', 'NumCalls']\n"
     ]
    }
   ],
   "source": [
    "print(df.columns.tolist())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "fde5f559",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>IncidentNumber</th>\n",
       "      <th>DateOfCall</th>\n",
       "      <th>CalYear</th>\n",
       "      <th>TimeOfCall</th>\n",
       "      <th>HourOfCall</th>\n",
       "      <th>IncidentGroup</th>\n",
       "      <th>StopCodeDescription</th>\n",
       "      <th>SpecialServiceType</th>\n",
       "      <th>PropertyCategory</th>\n",
       "      <th>PropertyType</th>\n",
       "      <th>...</th>\n",
       "      <th>FirstPumpArriving_AttendanceTime</th>\n",
       "      <th>FirstPumpArriving_DeployedFromStation</th>\n",
       "      <th>SecondPumpArriving_AttendanceTime</th>\n",
       "      <th>SecondPumpArriving_DeployedFromStation</th>\n",
       "      <th>NumStationsWithPumpsAttending</th>\n",
       "      <th>NumPumpsAttending</th>\n",
       "      <th>PumpCount</th>\n",
       "      <th>PumpMinutesRounded</th>\n",
       "      <th>Notional Cost (£)</th>\n",
       "      <th>NumCalls</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>000008-01012018</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>2018</td>\n",
       "      <td>00:04:25</td>\n",
       "      <td>0</td>\n",
       "      <td>False Alarm</td>\n",
       "      <td>AFA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Non Residential</td>\n",
       "      <td>Mosque</td>\n",
       "      <td>...</td>\n",
       "      <td>348.0</td>\n",
       "      <td>Finchley</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>60</td>\n",
       "      <td>328</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>000009-01012018</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>2018</td>\n",
       "      <td>00:04:30</td>\n",
       "      <td>0</td>\n",
       "      <td>False Alarm</td>\n",
       "      <td>AFA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Non Residential</td>\n",
       "      <td>Pub/wine bar/bar</td>\n",
       "      <td>...</td>\n",
       "      <td>144.0</td>\n",
       "      <td>Beckenham</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>60</td>\n",
       "      <td>328</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>000010-01012018</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>2018</td>\n",
       "      <td>00:04:34</td>\n",
       "      <td>0</td>\n",
       "      <td>Fire</td>\n",
       "      <td>Secondary Fire</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Outdoor Structure</td>\n",
       "      <td>Common external bin storage area</td>\n",
       "      <td>...</td>\n",
       "      <td>232.0</td>\n",
       "      <td>Southgate</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>60</td>\n",
       "      <td>328</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>000011-01012018</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>2018</td>\n",
       "      <td>00:04:58</td>\n",
       "      <td>0</td>\n",
       "      <td>Special Service</td>\n",
       "      <td>Special Service</td>\n",
       "      <td>RTC</td>\n",
       "      <td>Road Vehicle</td>\n",
       "      <td>Multiple Vehicles</td>\n",
       "      <td>...</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Enfield</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>60</td>\n",
       "      <td>328</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>000014-01012018</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>2018</td>\n",
       "      <td>00:07:47</td>\n",
       "      <td>0</td>\n",
       "      <td>Fire</td>\n",
       "      <td>Primary Fire</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Road Vehicle</td>\n",
       "      <td>Car</td>\n",
       "      <td>...</td>\n",
       "      <td>241.0</td>\n",
       "      <td>Stratford</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>60</td>\n",
       "      <td>328</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 39 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    IncidentNumber DateOfCall  CalYear TimeOfCall  HourOfCall  \\\n",
       "0  000008-01012018 2018-01-01     2018   00:04:25           0   \n",
       "1  000009-01012018 2018-01-01     2018   00:04:30           0   \n",
       "2  000010-01012018 2018-01-01     2018   00:04:34           0   \n",
       "3  000011-01012018 2018-01-01     2018   00:04:58           0   \n",
       "4  000014-01012018 2018-01-01     2018   00:07:47           0   \n",
       "\n",
       "     IncidentGroup StopCodeDescription SpecialServiceType   PropertyCategory  \\\n",
       "0      False Alarm                 AFA                NaN    Non Residential   \n",
       "1      False Alarm                 AFA                NaN    Non Residential   \n",
       "2             Fire      Secondary Fire                NaN  Outdoor Structure   \n",
       "3  Special Service     Special Service                RTC       Road Vehicle   \n",
       "4             Fire        Primary Fire                NaN       Road Vehicle   \n",
       "\n",
       "                       PropertyType  ... FirstPumpArriving_AttendanceTime  \\\n",
       "0                           Mosque   ...                            348.0   \n",
       "1                 Pub/wine bar/bar   ...                            144.0   \n",
       "2  Common external bin storage area  ...                            232.0   \n",
       "3                Multiple Vehicles   ...                             22.0   \n",
       "4                              Car   ...                            241.0   \n",
       "\n",
       "  FirstPumpArriving_DeployedFromStation SecondPumpArriving_AttendanceTime  \\\n",
       "0                              Finchley                               NaN   \n",
       "1                             Beckenham                               NaN   \n",
       "2                             Southgate                               NaN   \n",
       "3                               Enfield                               NaN   \n",
       "4                             Stratford                               NaN   \n",
       "\n",
       "   SecondPumpArriving_DeployedFromStation  NumStationsWithPumpsAttending  \\\n",
       "0                                     NaN                            1.0   \n",
       "1                                     NaN                            1.0   \n",
       "2                                     NaN                            1.0   \n",
       "3                                     NaN                            1.0   \n",
       "4                                     NaN                            1.0   \n",
       "\n",
       "  NumPumpsAttending PumpCount PumpMinutesRounded Notional Cost (£) NumCalls  \n",
       "0               1.0         1                 60               328      1.0  \n",
       "1               1.0         1                 60               328      1.0  \n",
       "2               1.0         1                 60               328      1.0  \n",
       "3               1.0         1                 60               328      1.0  \n",
       "4               1.0         1                 60               328      6.0  \n",
       "\n",
       "[5 rows x 39 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5b0031be",
   "metadata": {},
   "outputs": [],
   "source": [
    "# extract data of year 2023\n",
    "df_lfb = df[df['CalYear'] == 2023]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "230a2c98",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(126114, 39)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_lfb.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "dd1f5cd4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# save df_lfb as LFB_2023_data.csv\n",
    "df_lfb.to_csv('LFB_2023_data.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "73616a77",
   "metadata": {},
   "outputs": [],
   "source": [
    "# read CSV file data\n",
    "df_lfb = pd.read_csv('LFB_2023_data.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "855892c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# aggregate data to daily level\n",
    "df_lfb_daily = df_lfb.groupby('DateOfCall').size().reset_index(name='IncidentCount')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "580cddec",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>DateOfCall</th>\n",
       "      <th>IncidentCount</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2023-01-01</td>\n",
       "      <td>402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2023-01-02</td>\n",
       "      <td>293</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2023-01-03</td>\n",
       "      <td>267</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2023-01-04</td>\n",
       "      <td>285</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2023-01-05</td>\n",
       "      <td>305</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   DateOfCall  IncidentCount\n",
       "0  2023-01-01            402\n",
       "1  2023-01-02            293\n",
       "2  2023-01-03            267\n",
       "3  2023-01-04            285\n",
       "4  2023-01-05            305"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_lfb_daily.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0a4f988",
   "metadata": {},
   "source": [
    "## Linking with daily weather data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0bef82eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# read in weather data\n",
    "df_weather = pd.read_csv('./London_weather_data_1979_to_2023.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "5059a539",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>DATE</th>\n",
       "      <th>TX</th>\n",
       "      <th>Q_TX</th>\n",
       "      <th>TN</th>\n",
       "      <th>Q_TN</th>\n",
       "      <th>TG</th>\n",
       "      <th>Q_TG</th>\n",
       "      <th>SS</th>\n",
       "      <th>Q_SS</th>\n",
       "      <th>SD</th>\n",
       "      <th>...</th>\n",
       "      <th>RR</th>\n",
       "      <th>Q_RR</th>\n",
       "      <th>QQ</th>\n",
       "      <th>Q_QQ</th>\n",
       "      <th>PP</th>\n",
       "      <th>Q_PP</th>\n",
       "      <th>HU</th>\n",
       "      <th>Q_HU</th>\n",
       "      <th>CC</th>\n",
       "      <th>Q_CC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>19790101</td>\n",
       "      <td>23.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-75.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-41.0</td>\n",
       "      <td>0</td>\n",
       "      <td>70.0</td>\n",
       "      <td>0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>...</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0</td>\n",
       "      <td>52.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10190.0</td>\n",
       "      <td>0</td>\n",
       "      <td>83.0</td>\n",
       "      <td>0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>19790102</td>\n",
       "      <td>16.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-75.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>27.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10253.0</td>\n",
       "      <td>0</td>\n",
       "      <td>76.0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>19790103</td>\n",
       "      <td>13.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-72.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-28.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10205.0</td>\n",
       "      <td>0</td>\n",
       "      <td>86.0</td>\n",
       "      <td>0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>19790104</td>\n",
       "      <td>-3.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-65.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10084.0</td>\n",
       "      <td>0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>19790105</td>\n",
       "      <td>56.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-14.0</td>\n",
       "      <td>0</td>\n",
       "      <td>-8.0</td>\n",
       "      <td>0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>29.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10225.0</td>\n",
       "      <td>0</td>\n",
       "      <td>73.0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       DATE    TX  Q_TX    TN  Q_TN    TG  Q_TG    SS  Q_SS   SD  ...   RR  \\\n",
       "0  19790101  23.0     0 -75.0     0 -41.0     0  70.0     0  9.0  ...  4.0   \n",
       "1  19790102  16.0     0 -75.0     0 -26.0     0  17.0     0  8.0  ...  0.0   \n",
       "2  19790103  13.0     0 -72.0     0 -28.0     0   0.0     0  4.0  ...  0.0   \n",
       "3  19790104  -3.0     0 -65.0     0 -26.0     0   0.0     0  2.0  ...  0.0   \n",
       "4  19790105  56.0     0 -14.0     0  -8.0     0  20.0     0  1.0  ...  0.0   \n",
       "\n",
       "   Q_RR    QQ  Q_QQ       PP  Q_PP    HU  Q_HU   CC  Q_CC  \n",
       "0     0  52.0     0  10190.0     0  83.0     0  2.0     0  \n",
       "1     0  27.0     0  10253.0     0  76.0     0  6.0     0  \n",
       "2     0  13.0     0  10205.0     0  86.0     0  5.0     0  \n",
       "3     0  13.0     0  10084.0     0  68.0     0  8.0     0  \n",
       "4     0  29.0     0  10225.0     0  73.0     0  6.0     0  \n",
       "\n",
       "[5 rows x 21 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_weather.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "645258e9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DATE      int64\n",
       "TX      float64\n",
       "Q_TX      int64\n",
       "TN      float64\n",
       "Q_TN      int64\n",
       "TG      float64\n",
       "Q_TG      int64\n",
       "SS      float64\n",
       "Q_SS      int64\n",
       "SD      float64\n",
       "Q_SD      int64\n",
       "RR      float64\n",
       "Q_RR      int64\n",
       "QQ      float64\n",
       "Q_QQ      int64\n",
       "PP      float64\n",
       "Q_PP      int64\n",
       "HU      float64\n",
       "Q_HU      int64\n",
       "CC      float64\n",
       "Q_CC      int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_weather.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "640d4080",
   "metadata": {},
   "outputs": [],
   "source": [
    "# extract weather data of year 2023\n",
    "df_2023 = df_weather[df_weather['DATE'].astype(str).str.startswith('2023')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "aff107fe",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(365, 21)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_2023.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "9075b1a7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(364, 2)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_lfb_daily.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "5fe288ea",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_10264/228454178.py:2: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  df_2023['DATE'] = pd.to_datetime(df_2023['DATE'], format='%Y%m%d').dt.date\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(364, 23)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# merge fire incident data with weather data. Note that df_lfb_daily has a column of DateOfCall (2023-01-01, str type) while df_2023 has a column of DATE (20230101, int type)\n",
    "df_2023['DATE'] = pd.to_datetime(df_2023['DATE'], format='%Y%m%d').dt.date\n",
    "df_lfb_daily['DateOfCall'] = pd.to_datetime(df_lfb_daily['DateOfCall']).dt.date\n",
    "df_lfb_merged = pd.merge(df_lfb_daily, df_2023, left_on='DateOfCall', right_on='DATE', how='inner')\n",
    "df_lfb_merged.shape\n",
    "# 364 rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "76a93fa8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# add new columns to df_lfb_merged: 'IsWeekend' to indicate if the day is a weekend; 'weekday' to indicate the day of the week (string rather than integer)\n",
    "df_lfb_merged['IsWeekend'] = df_lfb_merged['DateOfCall'].apply(lambda x: 1 if pd.to_datetime(x).weekday() >= 5 else 0)\n",
    "df_lfb_merged['weekday'] = df_lfb_merged['DateOfCall'].apply(lambda x: pd.to_datetime(x).strftime('%A'))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "b6f1aa52",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>DateOfCall</th>\n",
       "      <th>IncidentCount</th>\n",
       "      <th>DATE</th>\n",
       "      <th>TX</th>\n",
       "      <th>Q_TX</th>\n",
       "      <th>TN</th>\n",
       "      <th>Q_TN</th>\n",
       "      <th>TG</th>\n",
       "      <th>Q_TG</th>\n",
       "      <th>SS</th>\n",
       "      <th>...</th>\n",
       "      <th>QQ</th>\n",
       "      <th>Q_QQ</th>\n",
       "      <th>PP</th>\n",
       "      <th>Q_PP</th>\n",
       "      <th>HU</th>\n",
       "      <th>Q_HU</th>\n",
       "      <th>CC</th>\n",
       "      <th>Q_CC</th>\n",
       "      <th>IsWeekend</th>\n",
       "      <th>weekday</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2023-01-01</td>\n",
       "      <td>402</td>\n",
       "      <td>2023-01-01</td>\n",
       "      <td>93.0</td>\n",
       "      <td>1</td>\n",
       "      <td>100.0</td>\n",
       "      <td>1</td>\n",
       "      <td>112.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>21.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10080.0</td>\n",
       "      <td>0</td>\n",
       "      <td>81.0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Sunday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2023-01-02</td>\n",
       "      <td>293</td>\n",
       "      <td>2023-01-02</td>\n",
       "      <td>134.0</td>\n",
       "      <td>0</td>\n",
       "      <td>52.0</td>\n",
       "      <td>0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>0</td>\n",
       "      <td>52.0</td>\n",
       "      <td>...</td>\n",
       "      <td>44.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10169.0</td>\n",
       "      <td>0</td>\n",
       "      <td>86.0</td>\n",
       "      <td>0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>Monday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2023-01-03</td>\n",
       "      <td>267</td>\n",
       "      <td>2023-01-03</td>\n",
       "      <td>139.0</td>\n",
       "      <td>0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>0</td>\n",
       "      <td>78.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>9.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10186.0</td>\n",
       "      <td>0</td>\n",
       "      <td>93.0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>Tuesday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2023-01-04</td>\n",
       "      <td>285</td>\n",
       "      <td>2023-01-04</td>\n",
       "      <td>135.0</td>\n",
       "      <td>0</td>\n",
       "      <td>92.0</td>\n",
       "      <td>0</td>\n",
       "      <td>116.0</td>\n",
       "      <td>0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>...</td>\n",
       "      <td>17.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10146.0</td>\n",
       "      <td>0</td>\n",
       "      <td>82.0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>Wednesday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2023-01-05</td>\n",
       "      <td>305</td>\n",
       "      <td>2023-01-05</td>\n",
       "      <td>127.0</td>\n",
       "      <td>0</td>\n",
       "      <td>89.0</td>\n",
       "      <td>0</td>\n",
       "      <td>112.0</td>\n",
       "      <td>0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>...</td>\n",
       "      <td>28.0</td>\n",
       "      <td>0</td>\n",
       "      <td>10186.0</td>\n",
       "      <td>0</td>\n",
       "      <td>82.0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>Thursday</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   DateOfCall  IncidentCount        DATE     TX  Q_TX     TN  Q_TN     TG  \\\n",
       "0  2023-01-01            402  2023-01-01   93.0     1  100.0     1  112.0   \n",
       "1  2023-01-02            293  2023-01-02  134.0     0   52.0     0   72.0   \n",
       "2  2023-01-03            267  2023-01-03  139.0     0   21.0     0   78.0   \n",
       "3  2023-01-04            285  2023-01-04  135.0     0   92.0     0  116.0   \n",
       "4  2023-01-05            305  2023-01-05  127.0     0   89.0     0  112.0   \n",
       "\n",
       "   Q_TG    SS  ...    QQ  Q_QQ       PP  Q_PP    HU  Q_HU   CC  Q_CC  \\\n",
       "0     1   0.0  ...  21.0     0  10080.0     0  81.0     0  7.0     0   \n",
       "1     0  52.0  ...  44.0     0  10169.0     0  86.0     0  3.0     0   \n",
       "2     0   0.0  ...   9.0     0  10186.0     0  93.0     0  7.0     0   \n",
       "3     0   4.0  ...  17.0     0  10146.0     0  82.0     0  6.0     0   \n",
       "4     0   4.0  ...  28.0     0  10186.0     0  82.0     0  8.0     0   \n",
       "\n",
       "   IsWeekend    weekday  \n",
       "0          1     Sunday  \n",
       "1          0     Monday  \n",
       "2          0    Tuesday  \n",
       "3          0  Wednesday  \n",
       "4          0   Thursday  \n",
       "\n",
       "[5 rows x 25 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_lfb_merged.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "7a2e3249",
   "metadata": {},
   "outputs": [],
   "source": [
    "# read in ukbankholidays-jul19.xls\n",
    "df_uk_holidays = pd.read_excel('./ukbankholidays-jul19.xls')\n",
    "# rename the first column to 'Date'\n",
    "df_uk_holidays.rename(columns={df_uk_holidays.columns[0]: 'Date'}, inplace=True)\n",
    "# ignore rows not in date format\n",
    "df_uk_holidays = df_uk_holidays[pd.to_datetime(df_uk_holidays['Date'], errors='coerce').notnull()]\n",
    "df_uk_holidays['Date'] = pd.to_datetime(df_uk_holidays['Date']).dt.date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "7e2e6de3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# keep only holidays in 2023\n",
    "df_uk_holidays = df_uk_holidays[df_uk_holidays['Date'].astype(str).str.startswith('2023')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "d0955ca0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# merge holiday info into df_lfb_merged. A new column 'IsBankHoliday' is added to indicate if the day is a holiday.\n",
    "df_lfb_merged = pd.merge(df_lfb_merged, df_uk_holidays[['Date']], left_on='DateOfCall', right_on='Date', how='left', indicator=True)\n",
    "df_lfb_merged['IsBankHoliday'] = df_lfb_merged['_merge'].apply(lambda x: 1 if x == 'both' else 0)\n",
    "df_lfb_merged.drop(columns=['Date', '_merge'], inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "9e9a3517",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "IsBankHoliday\n",
       "0    356\n",
       "1      8\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_lfb_merged.IsBankHoliday.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "b325be8e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['DateOfCall', 'IncidentCount', 'DATE', 'TX', 'Q_TX', 'TN', 'Q_TN', 'TG',\n",
       "       'Q_TG', 'SS', 'Q_SS', 'SD', 'Q_SD', 'RR', 'Q_RR', 'QQ', 'Q_QQ', 'PP',\n",
       "       'Q_PP', 'HU', 'Q_HU', 'CC', 'Q_CC', 'IsWeekend', 'weekday',\n",
       "       'IsBankHoliday'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_lfb_merged.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "48305a75",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DateOfCall        object\n",
       "IncidentCount      int64\n",
       "DATE              object\n",
       "TX               float64\n",
       "Q_TX               int64\n",
       "TN               float64\n",
       "Q_TN               int64\n",
       "TG               float64\n",
       "Q_TG               int64\n",
       "SS               float64\n",
       "Q_SS               int64\n",
       "SD               float64\n",
       "Q_SD               int64\n",
       "RR               float64\n",
       "Q_RR               int64\n",
       "QQ               float64\n",
       "Q_QQ               int64\n",
       "PP               float64\n",
       "Q_PP               int64\n",
       "HU               float64\n",
       "Q_HU               int64\n",
       "CC               float64\n",
       "Q_CC               int64\n",
       "IsWeekend          int64\n",
       "weekday           object\n",
       "IsBankHoliday      int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_lfb_merged.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "7c064be0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of rows in df_lfb_merged_backup: 364\n",
      "Number of rows in df_lfb_merged after removing rows with Q_* = 9: 364\n"
     ]
    }
   ],
   "source": [
    "# save a backup of df_lfb_merged as df_lfb_merged_backup\n",
    "# in df_lfb_merged, remove rows where any Q_* column has value of 9.\n",
    "# print the number of rows of df_lfb_merged_backup and df_lfb_merged\n",
    "df_lfb_merged_backup = df_lfb_merged.copy()\n",
    "q_columns = [col for col in df_lfb_merged.columns if col.startswith('Q_')]\n",
    "df_lfb_merged = df_lfb_merged[~df_lfb_merged[q_columns].isin([9]).any(axis=1)]\n",
    "print(f'Number of rows in df_lfb_merged_backup: {df_lfb_merged_backup.shape[0]}')\n",
    "print(f'Number of rows in df_lfb_merged after removing rows with Q_* = 9: {df_lfb_merged.shape[0]}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "37274adc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RMSE: 37.29923634910724\n",
      "R2 on training set: 0.9999933838844299\n",
      "R2 on testing set: 0.09958332777023315\n"
     ]
    }
   ],
   "source": [
    "# using XGBoost to predict IncidentCount using weather, weekday, weekend, and bank holiday info\n",
    "import xgboost as xgb\n",
    "from sklearn.model_selection import train_test_split\n",
    "# from sklearn.preprocessing import OneHotEncoder\n",
    "from sklearn.metrics import mean_squared_error\n",
    "# prepare data for modeling\n",
    "feature_cols = ['TX', 'TN', 'TG', 'SS', 'SD','RR','QQ', 'PP','HU','CC', 'IsWeekend', 'IsBankHoliday', 'weekday']\n",
    "X = df_lfb_merged[feature_cols]\n",
    "y = df_lfb_merged['IncidentCount']\n",
    "# one-hot encode the 'weekday' column\n",
    "X = pd.get_dummies(X, columns=['weekday'], drop_first=True)\n",
    "# split data into training and testing sets\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
    "# train XGBoost model\n",
    "# model = xgb.XGBRegressor(objective='reg:squarederror', n_estimators=100, learning_rate=0.1, max_depth=5, random_state=42)\n",
    "model = xgb.XGBRegressor(objective='reg:squarederror', random_state=42)\n",
    "model.fit(X_train, y_train)\n",
    "# make predictions\n",
    "y_pred = model.predict(X_test)\n",
    "# evaluate model performance\n",
    "mse = mean_squared_error(y_test, y_pred)\n",
    "rmse = mse ** 0.5\n",
    "print(f'RMSE: {rmse}')\n",
    "\n",
    "# R2 of model on training and testing sets\n",
    "r2_train = model.score(X_train, y_train)\n",
    "r2_test = model.score(X_test, y_test)\n",
    "print(f'R2 on training set: {r2_train}')\n",
    "print(f'R2 on testing set: {r2_test}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b92a2bc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Random Forest R2 on training set: 0.9136535356526865\n",
      "Random Forest R2 on testing set: 0.16263798826889153\n"
     ]
    }
   ],
   "source": [
    "# use random forest regressor from sklearn to predict IncidentCount. Don't split data again\n",
    "# first, import RandomForestRegressor\n",
    "from sklearn.ensemble import RandomForestRegressor\n",
    "# train Random Forest model\n",
    "rf_model = RandomForestRegressor(random_state=10)\n",
    "rf_model.fit(X_train, y_train)\n",
    "# make predictions\n",
    "y_pred_rf = rf_model.predict(X_test)\n",
    "# evaluate model performance: training and teseting R2\n",
    "r2_train_rf = rf_model.score(X_train, y_train)\n",
    "r2_test_rf = rf_model.score(X_test, y_test)\n",
    "print(f'Random Forest R2 on training set: {r2_train_rf}')\n",
    "print(f'Random Forest R2 on testing set: {r2_test_rf}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "7c5b846b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              Feature  Importance\n",
      "2                  TG    0.181695\n",
      "7                  PP    0.157313\n",
      "0                  TX    0.154714\n",
      "8                  HU    0.103176\n",
      "6                  QQ    0.093228\n",
      "5                  RR    0.084433\n",
      "1                  TN    0.062492\n",
      "9                  CC    0.056843\n",
      "3                  SS    0.055895\n",
      "14     weekday_Sunday    0.009379\n",
      "17  weekday_Wednesday    0.007540\n",
      "10          IsWeekend    0.007209\n",
      "13   weekday_Saturday    0.006130\n",
      "15   weekday_Thursday    0.005779\n",
      "16    weekday_Tuesday    0.005241\n",
      "11      IsBankHoliday    0.004779\n",
      "12     weekday_Monday    0.004155\n",
      "4                  SD    0.000000\n"
     ]
    }
   ],
   "source": [
    "# feature importance in rf_model\n",
    "importances = rf_model.feature_importances_\n",
    "feature_names = X.columns\n",
    "feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': importances})\n",
    "feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)\n",
    "print(feature_importance_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "d990e991",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['DateOfCall', 'IncidentCount', 'DATE', 'TX', 'Q_TX', 'TN', 'Q_TN', 'TG',\n",
       "       'Q_TG', 'SS', 'Q_SS', 'SD', 'Q_SD', 'RR', 'Q_RR', 'QQ', 'Q_QQ', 'PP',\n",
       "       'Q_PP', 'HU', 'Q_HU', 'CC', 'Q_CC', 'IsWeekend', 'weekday',\n",
       "       'IsBankHoliday'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_lfb_merged.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "bf02fa24",
   "metadata": {},
   "outputs": [],
   "source": [
    "# save to a csv file LFB_2023_daily_data.csv\n",
    "df_lfb_merged.to_csv('LFB_2023_daily_data.csv', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbb2595b",
   "metadata": {},
   "source": [
    "## Notes on location precision"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5919f60",
   "metadata": {},
   "source": [
    "Note that the easting/northing coordinates are not precise for residential buildings, as they point to the centroid of the postcode area for privacy reasons. The column of 'Postcode_district' contains the postcode district (e.g., 'E1 6') for each incident, which can be used for spatial aggregation.\n",
    "\n",
    "The dataset can be aggregated to the ward level using the 'WardCode' column, which contains the unique identifier for each ward in London. \n",
    "\n",
    "As ward is not a census geography, we need to use lookup table to map postcodes (``Postcode_district``) to MSOAs before linking with demographic data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "5bba140c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['LSOA21CD', 'LSOA21NM', 'LSOA21NMW', 'WD25CD', 'WD25NM', 'WD25NMW', 'LAD25CD', 'LAD25NM', 'LAD25NMW', 'ObjectId']\n"
     ]
    }
   ],
   "source": [
    "# read in LSOA to ward lookup table\n",
    "df_lookup = pd.read_csv('LSOA_(2021)_to_Electoral_Ward_(2025)_to_LAD_(2025)_Best_Fit_Lookup_in_EW_v2.csv')\n",
    "print(df_lookup.columns.tolist())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "sds2021",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}