{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ETL Pipeline Preparation\n",
    "Follow the instructions below to help you create your ETL pipeline.\n",
    "### 1. Import libraries and load datasets.\n",
    "- Import Python libraries\n",
    "- Load `messages.csv` into a dataframe and inspect the first few lines.\n",
    "- Load `categories.csv` into a dataframe and inspect the first few lines."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# import libraries\n",
    "import pandas as pd\n",
    "import numpy as n\n",
    "from sqlalchemy import create_engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>message</th>\n",
       "      <th>original</th>\n",
       "      <th>genre</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>Weather update - a cold front from Cuba that c...</td>\n",
       "      <td>Un front froid se retrouve sur Cuba ce matin. ...</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7</td>\n",
       "      <td>Is the Hurricane over or is it not over</td>\n",
       "      <td>Cyclone nan fini osinon li pa fini</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>Looking for someone but no name</td>\n",
       "      <td>Patnm, di Maryani relem pou li banm nouvel li ...</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>UN reports Leogane 80-90 destroyed. Only Hospi...</td>\n",
       "      <td>UN reports Leogane 80-90 destroyed. Only Hospi...</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12</td>\n",
       "      <td>says: west side of Haiti, rest of the country ...</td>\n",
       "      <td>facade ouest d Haiti et le reste du pays aujou...</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id                                            message  \\\n",
       "0   2  Weather update - a cold front from Cuba that c...   \n",
       "1   7            Is the Hurricane over or is it not over   \n",
       "2   8                    Looking for someone but no name   \n",
       "3   9  UN reports Leogane 80-90 destroyed. Only Hospi...   \n",
       "4  12  says: west side of Haiti, rest of the country ...   \n",
       "\n",
       "                                            original   genre  \n",
       "0  Un front froid se retrouve sur Cuba ce matin. ...  direct  \n",
       "1                 Cyclone nan fini osinon li pa fini  direct  \n",
       "2  Patnm, di Maryani relem pou li banm nouvel li ...  direct  \n",
       "3  UN reports Leogane 80-90 destroyed. Only Hospi...  direct  \n",
       "4  facade ouest d Haiti et le reste du pays aujou...  direct  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# load messages dataset\n",
    "messages = pd.read_csv('data/disaster_messages.csv')\n",
    "messages.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>categories</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>related-1;request-0;offer-0;aid_related-0;medi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7</td>\n",
       "      <td>related-1;request-0;offer-0;aid_related-1;medi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>related-1;request-0;offer-0;aid_related-0;medi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>related-1;request-1;offer-0;aid_related-1;medi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12</td>\n",
       "      <td>related-1;request-0;offer-0;aid_related-0;medi...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id                                         categories\n",
       "0   2  related-1;request-0;offer-0;aid_related-0;medi...\n",
       "1   7  related-1;request-0;offer-0;aid_related-1;medi...\n",
       "2   8  related-1;request-0;offer-0;aid_related-0;medi...\n",
       "3   9  related-1;request-1;offer-0;aid_related-1;medi...\n",
       "4  12  related-1;request-0;offer-0;aid_related-0;medi..."
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# load categories dataset\n",
    "categories = pd.read_csv('data/disaster_categories.csv')\n",
    "categories.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Merge datasets.\n",
    "- Merge the messages and categories datasets using the common id\n",
    "- Assign this combined dataset to `df`, which will be cleaned in the following steps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>message</th>\n",
       "      <th>original</th>\n",
       "      <th>genre</th>\n",
       "      <th>categories</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>Weather update - a cold front from Cuba that c...</td>\n",
       "      <td>Un front froid se retrouve sur Cuba ce matin. ...</td>\n",
       "      <td>direct</td>\n",
       "      <td>related-1;request-0;offer-0;aid_related-0;medi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7</td>\n",
       "      <td>Is the Hurricane over or is it not over</td>\n",
       "      <td>Cyclone nan fini osinon li pa fini</td>\n",
       "      <td>direct</td>\n",
       "      <td>related-1;request-0;offer-0;aid_related-1;medi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>Looking for someone but no name</td>\n",
       "      <td>Patnm, di Maryani relem pou li banm nouvel li ...</td>\n",
       "      <td>direct</td>\n",
       "      <td>related-1;request-0;offer-0;aid_related-0;medi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>UN reports Leogane 80-90 destroyed. Only Hospi...</td>\n",
       "      <td>UN reports Leogane 80-90 destroyed. Only Hospi...</td>\n",
       "      <td>direct</td>\n",
       "      <td>related-1;request-1;offer-0;aid_related-1;medi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12</td>\n",
       "      <td>says: west side of Haiti, rest of the country ...</td>\n",
       "      <td>facade ouest d Haiti et le reste du pays aujou...</td>\n",
       "      <td>direct</td>\n",
       "      <td>related-1;request-0;offer-0;aid_related-0;medi...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id                                            message  \\\n",
       "0   2  Weather update - a cold front from Cuba that c...   \n",
       "1   7            Is the Hurricane over or is it not over   \n",
       "2   8                    Looking for someone but no name   \n",
       "3   9  UN reports Leogane 80-90 destroyed. Only Hospi...   \n",
       "4  12  says: west side of Haiti, rest of the country ...   \n",
       "\n",
       "                                            original   genre  \\\n",
       "0  Un front froid se retrouve sur Cuba ce matin. ...  direct   \n",
       "1                 Cyclone nan fini osinon li pa fini  direct   \n",
       "2  Patnm, di Maryani relem pou li banm nouvel li ...  direct   \n",
       "3  UN reports Leogane 80-90 destroyed. Only Hospi...  direct   \n",
       "4  facade ouest d Haiti et le reste du pays aujou...  direct   \n",
       "\n",
       "                                          categories  \n",
       "0  related-1;request-0;offer-0;aid_related-0;medi...  \n",
       "1  related-1;request-0;offer-0;aid_related-1;medi...  \n",
       "2  related-1;request-0;offer-0;aid_related-0;medi...  \n",
       "3  related-1;request-1;offer-0;aid_related-1;medi...  \n",
       "4  related-1;request-0;offer-0;aid_related-0;medi...  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# merge datasets\n",
    "df = pd.merge(messages, categories, on = 'id')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Split `categories` into separate category columns.\n",
    "- Split the values in the `categories` column on the `;` character so that each value becomes a separate column. You'll find [this method](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.Series.str.split.html) very helpful! Make sure to set `expand=True`.\n",
    "- Use the first row of categories dataframe to create column names for the categories data.\n",
    "- Rename columns of `categories` with new column names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>26</th>\n",
       "      <th>27</th>\n",
       "      <th>28</th>\n",
       "      <th>29</th>\n",
       "      <th>30</th>\n",
       "      <th>31</th>\n",
       "      <th>32</th>\n",
       "      <th>33</th>\n",
       "      <th>34</th>\n",
       "      <th>35</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-0</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-0</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-0</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-0</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-0</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-0</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-1</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-0</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-1</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-1</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-0</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-0</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-0</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-0</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-0</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-1</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-1</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-1</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-0</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-0</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-0</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-0</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-0</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-0</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-0</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 36 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          0          1        2              3               4   \\\n",
       "0  related-1  request-0  offer-0  aid_related-0  medical_help-0   \n",
       "1  related-1  request-0  offer-0  aid_related-1  medical_help-0   \n",
       "2  related-1  request-0  offer-0  aid_related-0  medical_help-0   \n",
       "3  related-1  request-1  offer-0  aid_related-1  medical_help-0   \n",
       "4  related-1  request-0  offer-0  aid_related-0  medical_help-0   \n",
       "\n",
       "                   5                    6           7           8   \\\n",
       "0  medical_products-0  search_and_rescue-0  security-0  military-0   \n",
       "1  medical_products-0  search_and_rescue-0  security-0  military-0   \n",
       "2  medical_products-0  search_and_rescue-0  security-0  military-0   \n",
       "3  medical_products-1  search_and_rescue-0  security-0  military-0   \n",
       "4  medical_products-0  search_and_rescue-0  security-0  military-0   \n",
       "\n",
       "              9        ...                    26                      27  \\\n",
       "0  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "1  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "2  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "3  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "4  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "\n",
       "                  28        29       30      31            32      33  \\\n",
       "0  weather_related-0  floods-0  storm-0  fire-0  earthquake-0  cold-0   \n",
       "1  weather_related-1  floods-0  storm-1  fire-0  earthquake-0  cold-0   \n",
       "2  weather_related-0  floods-0  storm-0  fire-0  earthquake-0  cold-0   \n",
       "3  weather_related-0  floods-0  storm-0  fire-0  earthquake-0  cold-0   \n",
       "4  weather_related-0  floods-0  storm-0  fire-0  earthquake-0  cold-0   \n",
       "\n",
       "                34               35  \n",
       "0  other_weather-0  direct_report-0  \n",
       "1  other_weather-0  direct_report-0  \n",
       "2  other_weather-0  direct_report-0  \n",
       "3  other_weather-0  direct_report-0  \n",
       "4  other_weather-0  direct_report-0  \n",
       "\n",
       "[5 rows x 36 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create a dataframe of the 36 individual category columns\n",
    "categories = df['categories'].str.split(';', expand=True)\n",
    "categories.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0                    related\n",
      "1                    request\n",
      "2                      offer\n",
      "3                aid_related\n",
      "4               medical_help\n",
      "5           medical_products\n",
      "6          search_and_rescue\n",
      "7                   security\n",
      "8                   military\n",
      "9                child_alone\n",
      "10                     water\n",
      "11                      food\n",
      "12                   shelter\n",
      "13                  clothing\n",
      "14                     money\n",
      "15            missing_people\n",
      "16                  refugees\n",
      "17                     death\n",
      "18                 other_aid\n",
      "19    infrastructure_related\n",
      "20                 transport\n",
      "21                 buildings\n",
      "22               electricity\n",
      "23                     tools\n",
      "24                 hospitals\n",
      "25                     shops\n",
      "26               aid_centers\n",
      "27      other_infrastructure\n",
      "28           weather_related\n",
      "29                    floods\n",
      "30                     storm\n",
      "31                      fire\n",
      "32                earthquake\n",
      "33                      cold\n",
      "34             other_weather\n",
      "35             direct_report\n",
      "Name: 1, dtype: object\n"
     ]
    }
   ],
   "source": [
    "# select the first row of the categories dataframe\n",
    "row = categories.iloc[1]\n",
    "\n",
    "# use this row to extract a list of new column names for categories.\n",
    "# one way is to apply a lambda function that takes everything \n",
    "# up to the second to last character of each string with slicing\n",
    "category_colnames = row.apply(lambda x: x.split('-')[0])\n",
    "print(category_colnames)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>1</th>\n",
       "      <th>related</th>\n",
       "      <th>request</th>\n",
       "      <th>offer</th>\n",
       "      <th>aid_related</th>\n",
       "      <th>medical_help</th>\n",
       "      <th>medical_products</th>\n",
       "      <th>search_and_rescue</th>\n",
       "      <th>security</th>\n",
       "      <th>military</th>\n",
       "      <th>child_alone</th>\n",
       "      <th>...</th>\n",
       "      <th>aid_centers</th>\n",
       "      <th>other_infrastructure</th>\n",
       "      <th>weather_related</th>\n",
       "      <th>floods</th>\n",
       "      <th>storm</th>\n",
       "      <th>fire</th>\n",
       "      <th>earthquake</th>\n",
       "      <th>cold</th>\n",
       "      <th>other_weather</th>\n",
       "      <th>direct_report</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-0</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-0</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-0</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-0</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-0</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-0</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-1</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-0</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-1</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-1</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-0</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-0</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-0</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-0</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-0</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-1</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-1</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-1</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-0</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-0</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>related-1</td>\n",
       "      <td>request-0</td>\n",
       "      <td>offer-0</td>\n",
       "      <td>aid_related-0</td>\n",
       "      <td>medical_help-0</td>\n",
       "      <td>medical_products-0</td>\n",
       "      <td>search_and_rescue-0</td>\n",
       "      <td>security-0</td>\n",
       "      <td>military-0</td>\n",
       "      <td>child_alone-0</td>\n",
       "      <td>...</td>\n",
       "      <td>aid_centers-0</td>\n",
       "      <td>other_infrastructure-0</td>\n",
       "      <td>weather_related-0</td>\n",
       "      <td>floods-0</td>\n",
       "      <td>storm-0</td>\n",
       "      <td>fire-0</td>\n",
       "      <td>earthquake-0</td>\n",
       "      <td>cold-0</td>\n",
       "      <td>other_weather-0</td>\n",
       "      <td>direct_report-0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 36 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "1    related    request    offer    aid_related    medical_help  \\\n",
       "0  related-1  request-0  offer-0  aid_related-0  medical_help-0   \n",
       "1  related-1  request-0  offer-0  aid_related-1  medical_help-0   \n",
       "2  related-1  request-0  offer-0  aid_related-0  medical_help-0   \n",
       "3  related-1  request-1  offer-0  aid_related-1  medical_help-0   \n",
       "4  related-1  request-0  offer-0  aid_related-0  medical_help-0   \n",
       "\n",
       "1    medical_products    search_and_rescue    security    military  \\\n",
       "0  medical_products-0  search_and_rescue-0  security-0  military-0   \n",
       "1  medical_products-0  search_and_rescue-0  security-0  military-0   \n",
       "2  medical_products-0  search_and_rescue-0  security-0  military-0   \n",
       "3  medical_products-1  search_and_rescue-0  security-0  military-0   \n",
       "4  medical_products-0  search_and_rescue-0  security-0  military-0   \n",
       "\n",
       "1    child_alone       ...           aid_centers    other_infrastructure  \\\n",
       "0  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "1  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "2  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "3  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "4  child_alone-0       ...         aid_centers-0  other_infrastructure-0   \n",
       "\n",
       "1    weather_related    floods    storm    fire    earthquake    cold  \\\n",
       "0  weather_related-0  floods-0  storm-0  fire-0  earthquake-0  cold-0   \n",
       "1  weather_related-1  floods-0  storm-1  fire-0  earthquake-0  cold-0   \n",
       "2  weather_related-0  floods-0  storm-0  fire-0  earthquake-0  cold-0   \n",
       "3  weather_related-0  floods-0  storm-0  fire-0  earthquake-0  cold-0   \n",
       "4  weather_related-0  floods-0  storm-0  fire-0  earthquake-0  cold-0   \n",
       "\n",
       "1    other_weather    direct_report  \n",
       "0  other_weather-0  direct_report-0  \n",
       "1  other_weather-0  direct_report-0  \n",
       "2  other_weather-0  direct_report-0  \n",
       "3  other_weather-0  direct_report-0  \n",
       "4  other_weather-0  direct_report-0  \n",
       "\n",
       "[5 rows x 36 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rename the columns of `categories`\n",
    "categories.columns = category_colnames\n",
    "categories.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Convert category values to just numbers 0 or 1.\n",
    "- Iterate through the category columns in df to keep only the last character of each string (the 1 or 0). For example, `related-0` becomes `0`, `related-1` becomes `1`. Convert the string to a numeric value.\n",
    "- You can perform [normal string actions on Pandas Series](https://pandas.pydata.org/pandas-docs/stable/text.html#indexing-with-str), like indexing, by including `.str` after the Series. You may need to first convert the Series to be of type string, which you can do with `astype(str)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>1</th>\n",
       "      <th>related</th>\n",
       "      <th>request</th>\n",
       "      <th>offer</th>\n",
       "      <th>aid_related</th>\n",
       "      <th>medical_help</th>\n",
       "      <th>medical_products</th>\n",
       "      <th>search_and_rescue</th>\n",
       "      <th>security</th>\n",
       "      <th>military</th>\n",
       "      <th>child_alone</th>\n",
       "      <th>...</th>\n",
       "      <th>aid_centers</th>\n",
       "      <th>other_infrastructure</th>\n",
       "      <th>weather_related</th>\n",
       "      <th>floods</th>\n",
       "      <th>storm</th>\n",
       "      <th>fire</th>\n",
       "      <th>earthquake</th>\n",
       "      <th>cold</th>\n",
       "      <th>other_weather</th>\n",
       "      <th>direct_report</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 36 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "1  related  request  offer  aid_related  medical_help  medical_products  \\\n",
       "0        1        0      0            0             0                 0   \n",
       "1        1        0      0            1             0                 0   \n",
       "2        1        0      0            0             0                 0   \n",
       "3        1        1      0            1             0                 1   \n",
       "4        1        0      0            0             0                 0   \n",
       "\n",
       "1  search_and_rescue  security  military  child_alone      ...        \\\n",
       "0                  0         0         0            0      ...         \n",
       "1                  0         0         0            0      ...         \n",
       "2                  0         0         0            0      ...         \n",
       "3                  0         0         0            0      ...         \n",
       "4                  0         0         0            0      ...         \n",
       "\n",
       "1  aid_centers  other_infrastructure  weather_related  floods  storm  fire  \\\n",
       "0            0                     0                0       0      0     0   \n",
       "1            0                     0                1       0      1     0   \n",
       "2            0                     0                0       0      0     0   \n",
       "3            0                     0                0       0      0     0   \n",
       "4            0                     0                0       0      0     0   \n",
       "\n",
       "1  earthquake  cold  other_weather  direct_report  \n",
       "0           0     0              0              0  \n",
       "1           0     0              0              0  \n",
       "2           0     0              0              0  \n",
       "3           0     0              0              0  \n",
       "4           0     0              0              0  \n",
       "\n",
       "[5 rows x 36 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "for column in categories:\n",
    "    # set each value to be the last character of the string\n",
    "    categories[column] = categories[column].apply(lambda x: int(x.split('-')[1]))\n",
    "    \n",
    "categories.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Replace `categories` column in `df` with new category columns.\n",
    "- Drop the categories column from the df dataframe since it is no longer needed.\n",
    "- Concatenate df and categories data frames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>message</th>\n",
       "      <th>original</th>\n",
       "      <th>genre</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>Weather update - a cold front from Cuba that c...</td>\n",
       "      <td>Un front froid se retrouve sur Cuba ce matin. ...</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7</td>\n",
       "      <td>Is the Hurricane over or is it not over</td>\n",
       "      <td>Cyclone nan fini osinon li pa fini</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>Looking for someone but no name</td>\n",
       "      <td>Patnm, di Maryani relem pou li banm nouvel li ...</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>UN reports Leogane 80-90 destroyed. Only Hospi...</td>\n",
       "      <td>UN reports Leogane 80-90 destroyed. Only Hospi...</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12</td>\n",
       "      <td>says: west side of Haiti, rest of the country ...</td>\n",
       "      <td>facade ouest d Haiti et le reste du pays aujou...</td>\n",
       "      <td>direct</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id                                            message  \\\n",
       "0   2  Weather update - a cold front from Cuba that c...   \n",
       "1   7            Is the Hurricane over or is it not over   \n",
       "2   8                    Looking for someone but no name   \n",
       "3   9  UN reports Leogane 80-90 destroyed. Only Hospi...   \n",
       "4  12  says: west side of Haiti, rest of the country ...   \n",
       "\n",
       "                                            original   genre  \n",
       "0  Un front froid se retrouve sur Cuba ce matin. ...  direct  \n",
       "1                 Cyclone nan fini osinon li pa fini  direct  \n",
       "2  Patnm, di Maryani relem pou li banm nouvel li ...  direct  \n",
       "3  UN reports Leogane 80-90 destroyed. Only Hospi...  direct  \n",
       "4  facade ouest d Haiti et le reste du pays aujou...  direct  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# drop the original categories column from `df`\n",
    "df.drop('categories', axis = 1, inplace = True)\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>message</th>\n",
       "      <th>original</th>\n",
       "      <th>genre</th>\n",
       "      <th>related</th>\n",
       "      <th>request</th>\n",
       "      <th>offer</th>\n",
       "      <th>aid_related</th>\n",
       "      <th>medical_help</th>\n",
       "      <th>medical_products</th>\n",
       "      <th>...</th>\n",
       "      <th>aid_centers</th>\n",
       "      <th>other_infrastructure</th>\n",
       "      <th>weather_related</th>\n",
       "      <th>floods</th>\n",
       "      <th>storm</th>\n",
       "      <th>fire</th>\n",
       "      <th>earthquake</th>\n",
       "      <th>cold</th>\n",
       "      <th>other_weather</th>\n",
       "      <th>direct_report</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>Weather update - a cold front from Cuba that c...</td>\n",
       "      <td>Un front froid se retrouve sur Cuba ce matin. ...</td>\n",
       "      <td>direct</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7</td>\n",
       "      <td>Is the Hurricane over or is it not over</td>\n",
       "      <td>Cyclone nan fini osinon li pa fini</td>\n",
       "      <td>direct</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>Looking for someone but no name</td>\n",
       "      <td>Patnm, di Maryani relem pou li banm nouvel li ...</td>\n",
       "      <td>direct</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>UN reports Leogane 80-90 destroyed. Only Hospi...</td>\n",
       "      <td>UN reports Leogane 80-90 destroyed. Only Hospi...</td>\n",
       "      <td>direct</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12</td>\n",
       "      <td>says: west side of Haiti, rest of the country ...</td>\n",
       "      <td>facade ouest d Haiti et le reste du pays aujou...</td>\n",
       "      <td>direct</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 40 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   id                                            message  \\\n",
       "0   2  Weather update - a cold front from Cuba that c...   \n",
       "1   7            Is the Hurricane over or is it not over   \n",
       "2   8                    Looking for someone but no name   \n",
       "3   9  UN reports Leogane 80-90 destroyed. Only Hospi...   \n",
       "4  12  says: west side of Haiti, rest of the country ...   \n",
       "\n",
       "                                            original   genre  related  \\\n",
       "0  Un front froid se retrouve sur Cuba ce matin. ...  direct        1   \n",
       "1                 Cyclone nan fini osinon li pa fini  direct        1   \n",
       "2  Patnm, di Maryani relem pou li banm nouvel li ...  direct        1   \n",
       "3  UN reports Leogane 80-90 destroyed. Only Hospi...  direct        1   \n",
       "4  facade ouest d Haiti et le reste du pays aujou...  direct        1   \n",
       "\n",
       "   request  offer  aid_related  medical_help  medical_products      ...        \\\n",
       "0        0      0            0             0                 0      ...         \n",
       "1        0      0            1             0                 0      ...         \n",
       "2        0      0            0             0                 0      ...         \n",
       "3        1      0            1             0                 1      ...         \n",
       "4        0      0            0             0                 0      ...         \n",
       "\n",
       "   aid_centers  other_infrastructure  weather_related  floods  storm  fire  \\\n",
       "0            0                     0                0       0      0     0   \n",
       "1            0                     0                1       0      1     0   \n",
       "2            0                     0                0       0      0     0   \n",
       "3            0                     0                0       0      0     0   \n",
       "4            0                     0                0       0      0     0   \n",
       "\n",
       "   earthquake  cold  other_weather  direct_report  \n",
       "0           0     0              0              0  \n",
       "1           0     0              0              0  \n",
       "2           0     0              0              0  \n",
       "3           0     0              0              0  \n",
       "4           0     0              0              0  \n",
       "\n",
       "[5 rows x 40 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# concatenate the original dataframe with the new `categories` dataframe\n",
    "df = df.join(categories)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. Remove duplicates.\n",
    "- Check how many duplicates are in this dataset.\n",
    "- Drop the duplicates.\n",
    "- Confirm duplicates were removed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "170"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check number of duplicates\n",
    "sum(df.duplicated())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# drop duplicates\n",
    "df = df.drop_duplicates()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check number of duplicates\n",
    "sum(df.duplicated())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7. Save the clean dataset into an sqlite database.\n",
    "You can do this with pandas [`to_sql` method](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html) combined with the SQLAlchemy library. Remember to import SQLAlchemy's `create_engine` in the first cell of this notebook to use it below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "engine = create_engine('sqlite:///DisasterResponse.db')\n",
    "df.to_sql('DisasterResponse', engine, index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8. Use this notebook to complete `etl_pipeline.py`\n",
    "Use the template file attached in the Resources folder to write a script that runs the steps above to create a database based on new datasets specified by the user. Alternatively, you can complete `etl_pipeline.py` in the classroom on the `Project Workspace IDE` coming later."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}