{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Corona Virus analysis using 'Novel Corona Virus 2019-20 Dataset'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As Github does not support folium map and some rendering techniques, so if you want to see fully rendered notebook, click on below link\n", "\n", "https://nbviewer.jupyter.org/github/Mr-Piyush-Kumar/Data_Science_Projects/blob/master/Corona_virus_analysis/CoronaVirusAnalysis.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1 Background\n", "\n", "From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.\n", "\n", "So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Data acquisition and cleaning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All dataset will be download from below Kaggle webpages. \n", " \n", "Corona Virus dataset from here \n", "World Coordinates dataset from here \n", "World geoJson file from here \n", "China geoJson file from here" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: wget in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (3.2)\n", "file name = 2019_nCoV_data.csv\n" ] } ], "source": [ "# Downlodaing Corona Virus Dataset\n", "\n", "!pip install wget\n", "\n", "import wget\n", "url = 'https://storage.googleapis.com/kagglesdsdata/datasets/494724/966440/2019_nCoV_data.csv?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1582868558&Signature=JXjN269XVMhWbNH3BAyQozqwNFBePHHrrxpRMk%2BpD3ANojzrkvE5Qer9z1fk1uprgM0njfohll0yPHB8mZs4eN0Ucke7I4JtTTRyISKFWSO%2FCcDZanS9pWX%2FdnEBUV6QindIhmE%2FZeeeRAeKx%2Fd8h3RGlHpCeBm6%2BN7L6XnZj8N%2FsxlmIz%2FXwbxc%2BW2VRi4pG1P0AmJI5PjIc5OVcQISdy0RCGDaJuhAGm4TTarYuZt9j36rFlheXNXitn2EeutI4OI%2B%2BBqTV7fIR5mdBXCa1%2FKiutt3Lu%2BffT7N62dlMXLR4Z0aga4OvzzzwdVuf7TbLWBpqcx6LTQGGD03Fj5kEw%3D%3D&response-content-disposition=attachment%3B+filename%3D2019_nCoV_data.csv'\n", "file = wget.download(url)\n", "print('file name = ',file)\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "file name = world_coordinates.csv\n" ] } ], "source": [ "# Downloading word-coordinates data for visulization on world map\n", "\n", "url1 = 'https://storage.googleapis.com/kagglesdsdata/datasets/500356/927175/world_coordinates.csv?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1582868625&Signature=soCAku6MmJLB%2FL48Oxs%2FSxaPP11EASBJG%2BPMCRfhPcavNJBDhviiE12SKcfAje07vuDsWsuRiUxRu61KgBObNNe2brA9JHiEL4I4D1dgYk%2F6SKLJz6RFWCwYnr5b%2FAiHfQW2dLxNtRF35MEfR4ncnOW0FHLzgSTd8wq12QrV6y9%2B8XAOyjwcFdkCnLk0eTws0ERhUMC0CSeqFXqkS7leuBxSAUW2jUblrLvir3938JSZ2aqqbywj4eIT5jy0XvmpxepDsYOu1kJpkkyKq4lzyWTwNmr5ljZGWSoJrOVMAIU17z%2FnI3Is2s2YyysoUI6I0XoXRPqyN%2B8pZy0%2F0GPw1Q%3D%3D&response-content-disposition=attachment%3B+filename%3Dworld_coordinates.csv'\n", "file1 = wget.download(url1)\n", "print('file name = ',file1)\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "file name = world-countries.json\n" ] } ], "source": [ "# downloading world geoJson file to draw choropleth map of world\n", "\n", "url2 = 'https://storage.googleapis.com/kagglesdsdata/datasets/7923/11172/world-countries.json?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1582868666&Signature=ca8CoGxJ1mYdOR%2BKnaYdfO5mW15iNPvFWNOElOCDMUMRAU4wfO9G2W5980D6zYDFO94SrgiFm07enAWeuF3SmUF5XuU70bn%2FDID%2F%2F2%2FU4JHE00szhJWsIyLHm66R%2FInBNR4Wqg1OYgA0qlqw6Ewy9%2F0jAl4P%2Bv62voX7oO1rI%2BEGPwS0DBaHsUHCGwL4GNdxjJJ%2Ba05c6jpksTvP4V%2FWk%2FvLeyGOMa%2BE5FzExHInBlbiCCPX5bsUo%2BjG87kpIokoSIt2eoP7fiVzPnhB6Z83Td3zwr5ld0gJA%2FyxXagrDbB70JwmdgVO%2B3k2%2BySZqgev%2F6rLk7Lm%2FJFy%2B6ZMgPIVcw%3D%3D&response-content-disposition=attachment%3B+filename%3Dworld-countries.json'\n", "file2 = wget.download(url2)\n", "print('file name = ',file2)\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "file name = china.json\n" ] } ], "source": [ "# downloading china geoJson file to draw choropleth map of China\n", "\n", "url3 = 'https://storage.googleapis.com/kagglesdsdata/datasets/496669/922532/china.json?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1582868706&Signature=L4Xo4fynYwWPvUio3j1NWc4sQBFeAje1N3kZ4%2BzrLJdFZsQiyMJB3lisf%2FVS%2BD7Kq0Pe0aCgxhS1tpmaTbQFF87255Fy%2B7In%2BJOJZnmytgc8NGc6m9ZaOlzuLkgoAT3tgfLYWmRcWLvvBl1cdAArfhl%2BibxSl7%2BbjNvGo3Q0XolnhfdjKZCZ5egh%2FmSV8HUfrf0OBm0%2BXbUEcmV4wkJm8RM8UFSvE1t5HFrhvzO29WmZm9rcwiHKuLPxjl3KGtXpp4ysTJPpScM50UyAgWcrZH1LYeARmGhbnGRarblLGfeHoj2sbO9Nh%2FQAd5XmiNEMtdxTnR99p65Dd07v76979Q%3D%3D&response-content-disposition=attachment%3B+filename%3Dchina.json'\n", "file3 = wget.download(url3)\n", "print('file name = ',file3)\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SnoDateProvince/StateCountryLast UpdateConfirmedDeathsRecovered
0101/22/2020 12:00:00AnhuiChina01/22/2020 12:00:001.00.00.0
1201/22/2020 12:00:00BeijingChina01/22/2020 12:00:0014.00.00.0
2301/22/2020 12:00:00ChongqingChina01/22/2020 12:00:006.00.00.0
3401/22/2020 12:00:00FujianChina01/22/2020 12:00:001.00.00.0
4501/22/2020 12:00:00GansuChina01/22/2020 12:00:000.00.00.0
\n", "
" ], "text/plain": [ " Sno Date Province/State Country Last Update \\\n", "0 1 01/22/2020 12:00:00 Anhui China 01/22/2020 12:00:00 \n", "1 2 01/22/2020 12:00:00 Beijing China 01/22/2020 12:00:00 \n", "2 3 01/22/2020 12:00:00 Chongqing China 01/22/2020 12:00:00 \n", "3 4 01/22/2020 12:00:00 Fujian China 01/22/2020 12:00:00 \n", "4 5 01/22/2020 12:00:00 Gansu China 01/22/2020 12:00:00 \n", "\n", " Confirmed Deaths Recovered \n", "0 1.0 0.0 0.0 \n", "1 14.0 0.0 0.0 \n", "2 6.0 0.0 0.0 \n", "3 1.0 0.0 0.0 \n", "4 0.0 0.0 0.0 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# loading corona virus dataset as Pandas Dataframe\n", "import pandas as pd\n", "\n", "corona_df = pd.read_csv('2019_nCoV_data.csv')\n", "corona_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Corona Dataset columns decription:-**\n", "\n", "- Sno:- Serial Number. \n", "- Date:- In which date and time, a particular record wrote in the dataset. \n", "- Province/State:- Province/State name of belonging country. \n", "- Country:- Country Name. \n", "- Last Update:- At what time, last case was observed in a particular day. \n", "- Confirmed:- No. of confirmed infected people till now in a particular location (Representing Cumulative Sum). \n", "- Deaths:- No. of deaths till now due to Corona virus in a particular location (Representing Cumulative Sum). \n", "- Recovered:- No. of recovered people till now who was infected in a particular location (Representing Cumulative Sum). " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# loading both geoJson files\n", "\n", "world_geoJson = 'world-countries.json' # Path to file\n", "china_geoJson = 'china.json' # Path to file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**About geoJson file**\n", "\n", "GeoJSON is an open standard geospatial data interchange format that represents simple geographic features and their nonspatial attributes. Based on JavaScript Object Notation (JSON), GeoJSON is a format for encoding a variety of geographic data structures. It uses a geographic coordinate reference system, World Geodetic System 1984, and units of decimal degrees.\n", "\n", "Read more about geoJson file from here.\n", "\n", "I will use these two files to show choropleth map of world as well as of china to representing corona virus outbreak location wise.\n", "\n", "Read about choropleth map from here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Cleaning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "dropping Sno. from Dataset as it's not required for analysis." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateProvince/StateCountryLast UpdateConfirmedDeathsRecovered
001/22/2020 12:00:00AnhuiChina01/22/2020 12:00:001.00.00.0
101/22/2020 12:00:00BeijingChina01/22/2020 12:00:0014.00.00.0
201/22/2020 12:00:00ChongqingChina01/22/2020 12:00:006.00.00.0
301/22/2020 12:00:00FujianChina01/22/2020 12:00:001.00.00.0
401/22/2020 12:00:00GansuChina01/22/2020 12:00:000.00.00.0
\n", "
" ], "text/plain": [ " Date Province/State Country Last Update Confirmed \\\n", "0 01/22/2020 12:00:00 Anhui China 01/22/2020 12:00:00 1.0 \n", "1 01/22/2020 12:00:00 Beijing China 01/22/2020 12:00:00 14.0 \n", "2 01/22/2020 12:00:00 Chongqing China 01/22/2020 12:00:00 6.0 \n", "3 01/22/2020 12:00:00 Fujian China 01/22/2020 12:00:00 1.0 \n", "4 01/22/2020 12:00:00 Gansu China 01/22/2020 12:00:00 0.0 \n", "\n", " Deaths Recovered \n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 0.0 0.0 \n", "4 0.0 0.0 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corona_df.drop('Sno',axis=1,inplace=True)\n", "corona_df.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['China', 'US', 'Japan', 'Thailand', 'South Korea',\n", " 'Mainland China', 'Hong Kong', 'Macau', 'Taiwan', 'Singapore',\n", " 'Philippines', 'Malaysia', 'Vietnam', 'Australia', 'Mexico',\n", " 'Brazil', 'France', 'Nepal', 'Canada', 'Cambodia', 'Sri Lanka',\n", " 'Ivory Coast', 'Germany', 'Finland', 'United Arab Emirates',\n", " 'India', 'Italy', 'Sweden', 'Russia', 'Spain', 'UK', 'Belgium',\n", " 'Others', 'Egypt'], dtype=object)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corona_df.Country.unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "China and Mainland China are representing as two different countries in dataset. Lets replace Mainland china with China." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['China', 'US', 'Japan', 'Thailand', 'South Korea', 'Hong Kong',\n", " 'Macau', 'Taiwan', 'Singapore', 'Philippines', 'Malaysia',\n", " 'Vietnam', 'Australia', 'Mexico', 'Brazil', 'France', 'Nepal',\n", " 'Canada', 'Cambodia', 'Sri Lanka', 'Ivory Coast', 'Germany',\n", " 'Finland', 'United Arab Emirates', 'India', 'Italy', 'Sweden',\n", " 'Russia', 'Spain', 'UK', 'Belgium', 'Others', 'Egypt'],\n", " dtype=object)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corona_df.replace('Mainland China','China',inplace=True)\n", "corona_df.Country.unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Country column, there is one value 'Others'. As I have no information about how many countries are grouped as Others so I am going to drop those rows from dataset having country value Others." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateProvince/StateCountryLast UpdateConfirmedDeathsRecovered
001/22/2020 12:00:00AnhuiChina01/22/2020 12:00:001.00.00.0
101/22/2020 12:00:00BeijingChina01/22/2020 12:00:0014.00.00.0
201/22/2020 12:00:00ChongqingChina01/22/2020 12:00:006.00.00.0
301/22/2020 12:00:00FujianChina01/22/2020 12:00:001.00.00.0
401/22/2020 12:00:00GansuChina01/22/2020 12:00:000.00.00.0
........................
170302/17/2020 22:00:00Madison, WIUS2020-02-05T21:53:021.00.00.0
170402/17/2020 22:00:00Orange, CAUS2020-02-01T19:53:031.00.00.0
170502/17/2020 22:00:00San Antonio, TXUS2020-02-13T18:53:021.00.00.0
170602/17/2020 22:00:00Seattle, WAUS2020-02-09T07:03:041.00.01.0
170702/17/2020 22:00:00Tempe, AZUS2020-02-01T19:43:031.00.00.0
\n", "

1708 rows × 7 columns

\n", "
" ], "text/plain": [ " Date Province/State Country Last Update \\\n", "0 01/22/2020 12:00:00 Anhui China 01/22/2020 12:00:00 \n", "1 01/22/2020 12:00:00 Beijing China 01/22/2020 12:00:00 \n", "2 01/22/2020 12:00:00 Chongqing China 01/22/2020 12:00:00 \n", "3 01/22/2020 12:00:00 Fujian China 01/22/2020 12:00:00 \n", "4 01/22/2020 12:00:00 Gansu China 01/22/2020 12:00:00 \n", "... ... ... ... ... \n", "1703 02/17/2020 22:00:00 Madison, WI US 2020-02-05T21:53:02 \n", "1704 02/17/2020 22:00:00 Orange, CA US 2020-02-01T19:53:03 \n", "1705 02/17/2020 22:00:00 San Antonio, TX US 2020-02-13T18:53:02 \n", "1706 02/17/2020 22:00:00 Seattle, WA US 2020-02-09T07:03:04 \n", "1707 02/17/2020 22:00:00 Tempe, AZ US 2020-02-01T19:43:03 \n", "\n", " Confirmed Deaths Recovered \n", "0 1.0 0.0 0.0 \n", "1 14.0 0.0 0.0 \n", "2 6.0 0.0 0.0 \n", "3 1.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "... ... ... ... \n", "1703 1.0 0.0 0.0 \n", "1704 1.0 0.0 0.0 \n", "1705 1.0 0.0 0.0 \n", "1706 1.0 0.0 1.0 \n", "1707 1.0 0.0 0.0 \n", "\n", "[1708 rows x 7 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corona_df = corona_df[corona_df['Country']!='Others'].reset_index(drop=True)\n", "corona_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Data Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Handling NaN values**" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1708, 7)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# shape of our dataset\n", "corona_df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "total no of 1719 records are there in corona dataset." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date 0\n", "Province/State 462\n", "Country 0\n", "Last Update 0\n", "Confirmed 0\n", "Deaths 0\n", "Recovered 0\n", "dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# looking for null values present in corona datset\n", "\n", "corona_df.isna().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Province/State columns total 462 records have null value." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateProvince/StateCountryLast UpdateConfirmedDeathsRecovered
3501/22/2020 12:00:00NaNJapan01/22/2020 12:00:002.00.00.0
3601/22/2020 12:00:00NaNThailand01/22/2020 12:00:002.00.00.0
3701/22/2020 12:00:00NaNSouth Korea01/22/2020 12:00:001.00.00.0
7301/23/2020 12:00:00NaNJapan01/23/2020 12:00:001.00.00.0
7401/23/2020 12:00:00NaNThailand01/23/2020 12:00:003.00.00.0
........................
169502/17/2020 22:00:00NaNEgypt2020-02-14T23:53:021.00.00.0
169602/17/2020 22:00:00NaNFinland2020-02-12T00:03:121.00.01.0
169802/17/2020 22:00:00NaNNepal2020-02-12T14:43:031.00.01.0
169902/17/2020 22:00:00NaNSri Lanka2020-02-08T03:43:031.00.01.0
170002/17/2020 22:00:00NaNSweden2020-02-01T02:13:261.00.00.0
\n", "

462 rows × 7 columns

\n", "
" ], "text/plain": [ " Date Province/State Country Last Update \\\n", "35 01/22/2020 12:00:00 NaN Japan 01/22/2020 12:00:00 \n", "36 01/22/2020 12:00:00 NaN Thailand 01/22/2020 12:00:00 \n", "37 01/22/2020 12:00:00 NaN South Korea 01/22/2020 12:00:00 \n", "73 01/23/2020 12:00:00 NaN Japan 01/23/2020 12:00:00 \n", "74 01/23/2020 12:00:00 NaN Thailand 01/23/2020 12:00:00 \n", "... ... ... ... ... \n", "1695 02/17/2020 22:00:00 NaN Egypt 2020-02-14T23:53:02 \n", "1696 02/17/2020 22:00:00 NaN Finland 2020-02-12T00:03:12 \n", "1698 02/17/2020 22:00:00 NaN Nepal 2020-02-12T14:43:03 \n", "1699 02/17/2020 22:00:00 NaN Sri Lanka 2020-02-08T03:43:03 \n", "1700 02/17/2020 22:00:00 NaN Sweden 2020-02-01T02:13:26 \n", "\n", " Confirmed Deaths Recovered \n", "35 2.0 0.0 0.0 \n", "36 2.0 0.0 0.0 \n", "37 1.0 0.0 0.0 \n", "73 1.0 0.0 0.0 \n", "74 3.0 0.0 0.0 \n", "... ... ... ... \n", "1695 1.0 0.0 0.0 \n", "1696 1.0 0.0 1.0 \n", "1698 1.0 0.0 1.0 \n", "1699 1.0 0.0 1.0 \n", "1700 1.0 0.0 0.0 \n", "\n", "[462 rows x 7 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "corona_province_nan = corona_df[corona_df['Province/State'].isna()]\n", "corona_province_nan" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 2., 1., 3., 0., 5., 7., 4., 8., 14., 11., 10., 6., 17.,\n", " 19., 16., 20., 18., 15., 12., 25., 24., 22., 28., 45., 23., 30.,\n", " 13., 40., 32., 43., 27., 26., 47., 33., 50., 9., 58., 67., 29.,\n", " 72., 75., 59., 34., 77., 66., 35.])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corona_province_nan.Confirmed.unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So the conclusion is that if a country has low no. of cases then dataset creators didn't divide country data into Province/State." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Replacing Nan Values with respective country name." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "pd.options.mode.chained_assignment = None # avoiding setting with copy warning\n", "corona_df['Province/State'][corona_df['Province/State'].isna()] = corona_df['Country']" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total 32 countries have been affected by corona virus till now.\n" ] } ], "source": [ "print('Total',len(corona_df.Country.unique()),'countries have been affected by corona virus till now.')" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryConfirmedDeathsRecovered
0Australia19.00.010.0
1Belgium1.00.01.0
2Brazil0.00.00.0
3Cambodia1.00.01.0
4Canada11.00.01.0
5China72366.01864.012455.0
6Egypt1.00.00.0
7Finland1.00.01.0
8France12.01.04.0
9Germany23.00.01.0
10Hong Kong60.01.02.0
11India3.00.03.0
12Italy3.00.00.0
13Ivory Coast0.00.00.0
14Japan66.01.012.0
15Macau10.00.05.0
16Malaysia22.00.07.0
17Mexico0.00.00.0
18Nepal1.00.01.0
19Philippines3.01.01.0
20Russia2.00.02.0
21Singapore77.00.024.0
22South Korea30.00.010.0
23Spain2.00.02.0
24Sri Lanka1.00.01.0
25Sweden1.00.00.0
26Taiwan22.01.02.0
27Thailand35.00.015.0
28UK9.00.08.0
29US23.00.03.0
30United Arab Emirates9.00.04.0
31Vietnam16.00.07.0
\n", "
" ], "text/plain": [ " Country Confirmed Deaths Recovered\n", "0 Australia 19.0 0.0 10.0\n", "1 Belgium 1.0 0.0 1.0\n", "2 Brazil 0.0 0.0 0.0\n", "3 Cambodia 1.0 0.0 1.0\n", "4 Canada 11.0 0.0 1.0\n", "5 China 72366.0 1864.0 12455.0\n", "6 Egypt 1.0 0.0 0.0\n", "7 Finland 1.0 0.0 1.0\n", "8 France 12.0 1.0 4.0\n", "9 Germany 23.0 0.0 1.0\n", "10 Hong Kong 60.0 1.0 2.0\n", "11 India 3.0 0.0 3.0\n", "12 Italy 3.0 0.0 0.0\n", "13 Ivory Coast 0.0 0.0 0.0\n", "14 Japan 66.0 1.0 12.0\n", "15 Macau 10.0 0.0 5.0\n", "16 Malaysia 22.0 0.0 7.0\n", "17 Mexico 0.0 0.0 0.0\n", "18 Nepal 1.0 0.0 1.0\n", "19 Philippines 3.0 1.0 1.0\n", "20 Russia 2.0 0.0 2.0\n", "21 Singapore 77.0 0.0 24.0\n", "22 South Korea 30.0 0.0 10.0\n", "23 Spain 2.0 0.0 2.0\n", "24 Sri Lanka 1.0 0.0 1.0\n", "25 Sweden 1.0 0.0 0.0\n", "26 Taiwan 22.0 1.0 2.0\n", "27 Thailand 35.0 0.0 15.0\n", "28 UK 9.0 0.0 8.0\n", "29 US 23.0 0.0 3.0\n", "30 United Arab Emirates 9.0 0.0 4.0\n", "31 Vietnam 16.0 0.0 7.0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# total cases till now country wise\n", "\n", "countrywise_df = corona_df.groupby(['Country','Province/State']).max()\n", "countrywise_df.reset_index(inplace = True)\n", "countrywise_df = countrywise_df.groupby('Country').sum()\n", "countrywise_df.reset_index(inplace = True)\n", "countrywise_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "China is most affected by Corona virus." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "import seaborn as sns\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Visualizing spreading of corona virus from beginning to till now. day by day** \n", "\n", "as we have per day data of each country so there is no problem in it." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateConfirmedDeathsRecovered
001/22/2020 12:00:00555.00.00.0
101/23/2020 12:00:00653.018.030.0
201/24/2020 12:00:00941.026.036.0
301/25/2020 22:00:002019.056.049.0
401/26/2020 23:00:002794.080.054.0
501/27/2020 20:30:004473.0107.063.0
601/28/2020 23:00:006057.0132.0110.0
701/29/2020 21:00:007783.0170.0133.0
801/30/2020 21:30:009776.0213.0187.0
901/31/2020 19:00:0011374.0259.0252.0
1002/01/2020 23:00:0014549.0305.0340.0
1102/02/2020 21:00:0017295.0362.0487.0
1202/03/2020 21:40:0020588.0426.0644.0
1302/04/2020 22:00:0024503.0492.0899.0
1402/05/2020 12:20:0024630.0494.01029.0
1502/06/2020 20:05:0030806.0634.01487.0
1602/07/2020 20:24:0031471.0638.01763.0
1702/08/2020 23:04:0037488.0813.02701.0
1802/09/2020 23:20:0040472.0910.03312.0
1902/10/2020 19:30:0042632.01013.03950.0
2002/11/2020 20:44:0044982.01115.04781.0
2102/12/2020 22:00:0060153.01368.05986.0
2202/13/2020 21:15:0064204.01491.07064.0
2302/14/2020 22:00:0066669.01523.08058.0
2402/15/2020 22:00:0068747.01666.09395.0
2502/16/2020 22:00:0070871.01770.010865.0
2602/17/2020 22:00:0072806.01868.012583.0
\n", "
" ], "text/plain": [ " Date Confirmed Deaths Recovered\n", "0 01/22/2020 12:00:00 555.0 0.0 0.0\n", "1 01/23/2020 12:00:00 653.0 18.0 30.0\n", "2 01/24/2020 12:00:00 941.0 26.0 36.0\n", "3 01/25/2020 22:00:00 2019.0 56.0 49.0\n", "4 01/26/2020 23:00:00 2794.0 80.0 54.0\n", "5 01/27/2020 20:30:00 4473.0 107.0 63.0\n", "6 01/28/2020 23:00:00 6057.0 132.0 110.0\n", "7 01/29/2020 21:00:00 7783.0 170.0 133.0\n", "8 01/30/2020 21:30:00 9776.0 213.0 187.0\n", "9 01/31/2020 19:00:00 11374.0 259.0 252.0\n", "10 02/01/2020 23:00:00 14549.0 305.0 340.0\n", "11 02/02/2020 21:00:00 17295.0 362.0 487.0\n", "12 02/03/2020 21:40:00 20588.0 426.0 644.0\n", "13 02/04/2020 22:00:00 24503.0 492.0 899.0\n", "14 02/05/2020 12:20:00 24630.0 494.0 1029.0\n", "15 02/06/2020 20:05:00 30806.0 634.0 1487.0\n", "16 02/07/2020 20:24:00 31471.0 638.0 1763.0\n", "17 02/08/2020 23:04:00 37488.0 813.0 2701.0\n", "18 02/09/2020 23:20:00 40472.0 910.0 3312.0\n", "19 02/10/2020 19:30:00 42632.0 1013.0 3950.0\n", "20 02/11/2020 20:44:00 44982.0 1115.0 4781.0\n", "21 02/12/2020 22:00:00 60153.0 1368.0 5986.0\n", "22 02/13/2020 21:15:00 64204.0 1491.0 7064.0\n", "23 02/14/2020 22:00:00 66669.0 1523.0 8058.0\n", "24 02/15/2020 22:00:00 68747.0 1666.0 9395.0\n", "25 02/16/2020 22:00:00 70871.0 1770.0 10865.0\n", "26 02/17/2020 22:00:00 72806.0 1868.0 12583.0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Day_wise_world_data = corona_df.groupby('Date').sum().reset_index()\n", "Day_wise_world_data" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateConfirmedDeathsRecovered
001/22/2020 12:00:00549.00.00.0
101/23/2020 12:00:00639.018.030.0
201/24/2020 12:00:00916.026.036.0
301/25/2020 22:00:001979.056.049.0
401/26/2020 23:00:002737.080.051.0
501/27/2020 20:30:004409.0107.060.0
601/28/2020 23:00:005970.0132.0104.0
701/29/2020 21:00:007678.0170.0127.0
801/30/2020 21:30:009658.0213.0179.0
901/31/2020 19:00:0011221.0259.0242.0
1002/01/2020 23:00:0014375.0304.0331.0
1102/02/2020 21:00:0017114.0361.0478.0
1202/03/2020 21:40:0020400.0425.0635.0
1302/04/2020 22:00:0024290.0490.0890.0
1402/05/2020 12:20:0024405.0492.01020.0
1502/06/2020 20:05:0030541.0632.01476.0
1602/07/2020 20:24:0031215.0636.01750.0
1702/08/2020 23:04:0037198.0811.02678.0
1802/09/2020 23:20:0040160.0908.03286.0
1902/10/2020 19:30:0042310.01011.03921.0
2002/11/2020 20:44:0044641.01113.04730.0
2102/12/2020 22:00:0059805.01366.05915.0
2202/13/2020 21:15:0063841.01488.06982.0
2302/14/2020 22:00:0066292.01520.07973.0
2402/15/2020 22:00:0068347.01662.09294.0
2502/16/2020 22:00:0070446.01765.010748.0
2602/17/2020 22:00:0072364.01863.012455.0
\n", "
" ], "text/plain": [ " Date Confirmed Deaths Recovered\n", "0 01/22/2020 12:00:00 549.0 0.0 0.0\n", "1 01/23/2020 12:00:00 639.0 18.0 30.0\n", "2 01/24/2020 12:00:00 916.0 26.0 36.0\n", "3 01/25/2020 22:00:00 1979.0 56.0 49.0\n", "4 01/26/2020 23:00:00 2737.0 80.0 51.0\n", "5 01/27/2020 20:30:00 4409.0 107.0 60.0\n", "6 01/28/2020 23:00:00 5970.0 132.0 104.0\n", "7 01/29/2020 21:00:00 7678.0 170.0 127.0\n", "8 01/30/2020 21:30:00 9658.0 213.0 179.0\n", "9 01/31/2020 19:00:00 11221.0 259.0 242.0\n", "10 02/01/2020 23:00:00 14375.0 304.0 331.0\n", "11 02/02/2020 21:00:00 17114.0 361.0 478.0\n", "12 02/03/2020 21:40:00 20400.0 425.0 635.0\n", "13 02/04/2020 22:00:00 24290.0 490.0 890.0\n", "14 02/05/2020 12:20:00 24405.0 492.0 1020.0\n", "15 02/06/2020 20:05:00 30541.0 632.0 1476.0\n", "16 02/07/2020 20:24:00 31215.0 636.0 1750.0\n", "17 02/08/2020 23:04:00 37198.0 811.0 2678.0\n", "18 02/09/2020 23:20:00 40160.0 908.0 3286.0\n", "19 02/10/2020 19:30:00 42310.0 1011.0 3921.0\n", "20 02/11/2020 20:44:00 44641.0 1113.0 4730.0\n", "21 02/12/2020 22:00:00 59805.0 1366.0 5915.0\n", "22 02/13/2020 21:15:00 63841.0 1488.0 6982.0\n", "23 02/14/2020 22:00:00 66292.0 1520.0 7973.0\n", "24 02/15/2020 22:00:00 68347.0 1662.0 9294.0\n", "25 02/16/2020 22:00:00 70446.0 1765.0 10748.0\n", "26 02/17/2020 22:00:00 72364.0 1863.0 12455.0" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Day_wise_china_data = corona_df[corona_df['Country']=='China'].groupby('Date').sum().reset_index()\n", "Day_wise_china_data" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# World\n", "plt.figure(figsize=(20,10))\n", "plt.subplot(1,2,1)\n", "plt.plot(Day_wise_world_data.Date,Day_wise_world_data.Confirmed,label='Confirmed',color='r',marker='o')\n", "plt.plot(Day_wise_world_data.Date,Day_wise_world_data.Recovered,label='Recovered',color='g',marker='o')\n", "plt.plot(Day_wise_world_data.Date,Day_wise_world_data.Deaths,label='Deaths',color='b',marker='o')\n", "plt.title('World',size=14)\n", "plt.legend()\n", "plt.xticks(rotation=90)\n", "plt.xlabel('Date',size=14)\n", "plt.ylabel('No. of cases',size=14)\n", "\n", "# China\n", "plt.subplot(1,2,2)\n", "plt.plot(Day_wise_china_data.Date,Day_wise_china_data.Confirmed,label='Confirmed',color='r',marker='o')\n", "plt.plot(Day_wise_china_data.Date,Day_wise_china_data.Recovered,label='Recovered',color='g',marker='o')\n", "plt.plot(Day_wise_china_data.Date,Day_wise_china_data.Deaths,label='Deaths',color='b',marker='o')\n", "plt.title('China',size=14)\n", "plt.legend()\n", "plt.xticks(rotation=90)\n", "plt.xlabel('Date',size=14)\n", "plt.ylabel('No. of cases',size=14)\n", "plt.suptitle('No. of cases of Corona virus day by day',size=15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The pattern of spreading of corona virus is same for both whole world and China, it's because mostly Chinese people are affected by corona virus and whole world data is totally depend on what is happening in China.\n", "\n", "So we can find hidden pattern in Corona virus dataset using only Chinese Data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No. of confirmed cases are increasing exponentially but recovery is not fast as it should be." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Analyzing China Data State wise**" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateProvince/StateCountryLast UpdateConfirmedDeathsRecovered
001/22/2020 12:00:00AnhuiChina01/22/2020 12:00:001.00.00.0
101/22/2020 12:00:00BeijingChina01/22/2020 12:00:0014.00.00.0
201/22/2020 12:00:00ChongqingChina01/22/2020 12:00:006.00.00.0
301/22/2020 12:00:00FujianChina01/22/2020 12:00:001.00.00.0
401/22/2020 12:00:00GansuChina01/22/2020 12:00:000.00.00.0
\n", "
" ], "text/plain": [ " Date Province/State Country Last Update Confirmed \\\n", "0 01/22/2020 12:00:00 Anhui China 01/22/2020 12:00:00 1.0 \n", "1 01/22/2020 12:00:00 Beijing China 01/22/2020 12:00:00 14.0 \n", "2 01/22/2020 12:00:00 Chongqing China 01/22/2020 12:00:00 6.0 \n", "3 01/22/2020 12:00:00 Fujian China 01/22/2020 12:00:00 1.0 \n", "4 01/22/2020 12:00:00 Gansu China 01/22/2020 12:00:00 0.0 \n", "\n", " Deaths Recovered \n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 0.0 0.0 \n", "4 0.0 0.0 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "China_data = corona_df[corona_df['Country']=='China']\n", "China_data.head()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Province/StateDateCountryLast UpdateConfirmedDeathsRecovered
0Anhui02/17/2020 22:00:00China2020-09-02 01:23:00973.06.0280.0
1Beijing02/17/2020 22:00:00China2020-09-02 03:43:00381.04.0114.0
2Chongqing02/17/2020 22:00:00China2020-09-02 00:43:00553.05.0225.0
3Fujian02/17/2020 22:00:00China2020-09-02 03:43:00290.01.090.0
4Gansu02/17/2020 22:00:00China2020-08-02 15:13:0091.02.058.0
\n", "
" ], "text/plain": [ " Province/State Date Country Last Update Confirmed \\\n", "0 Anhui 02/17/2020 22:00:00 China 2020-09-02 01:23:00 973.0 \n", "1 Beijing 02/17/2020 22:00:00 China 2020-09-02 03:43:00 381.0 \n", "2 Chongqing 02/17/2020 22:00:00 China 2020-09-02 00:43:00 553.0 \n", "3 Fujian 02/17/2020 22:00:00 China 2020-09-02 03:43:00 290.0 \n", "4 Gansu 02/17/2020 22:00:00 China 2020-08-02 15:13:00 91.0 \n", "\n", " Deaths Recovered \n", "0 6.0 280.0 \n", "1 4.0 114.0 \n", "2 5.0 225.0 \n", "3 1.0 90.0 \n", "4 2.0 58.0 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "China_data_state_wise = China_data.groupby('Province/State').max().reset_index()\n", "China_data_state_wise.head()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(15,12))\n", "ax = sns.barplot(x='Confirmed',y='Province/State',data=China_data_state_wise,errwidth=0)\n", "plt.title('No. of Cases in China till now (State wise)',size=15)\n", "plt.xlabel('No. of Confirmed cases',size=14)\n", "plt.ylabel('Province/State',size=14)\n", "\n", "y=0\n", "for p in ax.patches :\n", " ax.annotate(str(int(p.get_width())),(p.get_width()+100,y+0.1),color='b')\n", " y+=1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hubei province is most affected by Corona virus in China, as corona virus started from Wuhan city which is in Hubei province." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryConfirmedDeathsRecovered
0Australia19.00.010.0
1Belgium1.00.01.0
2Brazil0.00.00.0
3Cambodia1.00.01.0
4Canada11.00.01.0
\n", "
" ], "text/plain": [ " Country Confirmed Deaths Recovered\n", "0 Australia 19.0 0.0 10.0\n", "1 Belgium 1.0 0.0 1.0\n", "2 Brazil 0.0 0.0 0.0\n", "3 Cambodia 1.0 0.0 1.0\n", "4 Canada 11.0 0.0 1.0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Country_wise_except_china = countrywise_df[countrywise_df['Country']!='China']\n", "Country_wise_except_china.head()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(15,12))\n", "sns.set_color_codes('pastel')\n", "sns.barplot(x='Confirmed',y='Country',data=Country_wise_except_china,errwidth=0,color='r',label='Confirmed')\n", "sns.barplot(x='Recovered',y='Country',data=Country_wise_except_china,errwidth=0,color='g',label='Recovered')\n", "sns.barplot(x='Deaths',y='Country',data=Country_wise_except_china,errwidth=0,color='b',label='Deaths')\n", "plt.title('No. of ceases till now country wise except China',size=14)\n", "plt.xlabel('No. of cases',size=13)\n", "plt.ylabel('Country',size=13)\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Geographical Visualization**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For Geographical Visualization I am using Folium library" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting folium==0.10\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/72/ff/004bfe344150a064e558cb2aedeaa02ecbf75e60e148a55a9198f0c41765/folium-0.10.0-py2.py3-none-any.whl (91kB)\n", "\u001b[K |████████████████████████████████| 92kB 5.7MB/s eta 0:00:011\n", "\u001b[?25hRequirement already satisfied: branca>=0.3.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from folium==0.10) (0.3.1)\n", "Requirement already satisfied: numpy in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from folium==0.10) (1.16.2)\n", "Requirement already satisfied: jinja2>=2.9 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from folium==0.10) (2.10.3)\n", "Requirement already satisfied: requests in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from folium==0.10) (2.22.0)\n", "Requirement already satisfied: six in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from branca>=0.3.0->folium==0.10) (1.13.0)\n", "Requirement already satisfied: MarkupSafe>=0.23 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from jinja2>=2.9->folium==0.10) (1.1.1)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->folium==0.10) (1.25.7)\n", "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->folium==0.10) (3.0.4)\n", "Requirement already satisfied: idna<2.9,>=2.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->folium==0.10) (2.8)\n", "Requirement already satisfied: certifi>=2017.4.17 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->folium==0.10) (2019.9.11)\n", "Installing collected packages: folium\n", " Found existing installation: folium 0.5.0\n", " Uninstalling folium-0.5.0:\n", " Successfully uninstalled folium-0.5.0\n", "Successfully installed folium-0.10.0\n" ] } ], "source": [ "!pip install folium==0.10" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "import folium as flm" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeCountrylatitudelongitude
0ADAndorra42.5462451.601554
1AEUnited Arab Emirates23.42407653.847818
2AFAfghanistan33.93911067.709953
3AGAntigua and Barbuda17.060816-61.796428
4AIAnguilla18.220554-63.068615
\n", "
" ], "text/plain": [ " Code Country latitude longitude\n", "0 AD Andorra 42.546245 1.601554\n", "1 AE United Arab Emirates 23.424076 53.847818\n", "2 AF Afghanistan 33.939110 67.709953\n", "3 AG Antigua and Barbuda 17.060816 -61.796428\n", "4 AI Anguilla 18.220554 -63.068615" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# loading world coordinates file\n", "\n", "world_coordinates = pd.read_csv('world_coordinates.csv')\n", "world_coordinates.head() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "dropping code from world_coordinates Dataframe" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Countrylatitudelongitude
0Andorra42.5462451.601554
1United Arab Emirates23.42407653.847818
2Afghanistan33.93911067.709953
3Antigua and Barbuda17.060816-61.796428
4Anguilla18.220554-63.068615
\n", "
" ], "text/plain": [ " Country latitude longitude\n", "0 Andorra 42.546245 1.601554\n", "1 United Arab Emirates 23.424076 53.847818\n", "2 Afghanistan 33.939110 67.709953\n", "3 Antigua and Barbuda 17.060816 -61.796428\n", "4 Anguilla 18.220554 -63.068615" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "world_coordinates.drop('Code',axis=1,inplace=True)\n", "world_coordinates.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Merging world_coordinates with country wise df" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryConfirmedDeathsRecoveredlatitudelongitude
0Australia19.00.010.0-25.274398133.775136
1Belgium1.00.01.050.5038874.469936
2Brazil0.00.00.0-14.235004-51.925280
3Cambodia1.00.01.012.565679104.990963
4Canada11.00.01.056.130366-106.346771
5China72366.01864.012455.035.861660104.195397
6Egypt1.00.00.026.82055330.802498
7Finland1.00.01.061.92411025.748151
8France12.01.04.046.2276382.213749
9Germany23.00.01.051.16569110.451526
10Hong Kong60.01.02.022.396428114.109497
11India3.00.03.020.59368478.962880
12Italy3.00.00.041.87194012.567380
13Ivory Coast0.00.00.0NaNNaN
14Japan66.01.012.036.204824138.252924
15Macau10.00.05.022.198745113.543873
16Malaysia22.00.07.04.210484101.975766
17Mexico0.00.00.023.634501-102.552784
18Nepal1.00.01.028.39485784.124008
19Philippines3.01.01.012.879721121.774017
20Russia2.00.02.061.524010105.318756
21Singapore77.00.024.01.352083103.819836
22South Korea30.00.010.035.907757127.766922
23Spain2.00.02.040.463667-3.749220
24Sri Lanka1.00.01.07.87305480.771797
25Sweden1.00.00.060.12816118.643501
26Taiwan22.01.02.023.697810120.960515
27Thailand35.00.015.015.870032100.992541
28UK9.00.08.055.378051-3.435973
29US23.00.03.037.090240-95.712891
30United Arab Emirates9.00.04.023.42407653.847818
31Vietnam16.00.07.014.058324108.277199
\n", "
" ], "text/plain": [ " Country Confirmed Deaths Recovered latitude longitude\n", "0 Australia 19.0 0.0 10.0 -25.274398 133.775136\n", "1 Belgium 1.0 0.0 1.0 50.503887 4.469936\n", "2 Brazil 0.0 0.0 0.0 -14.235004 -51.925280\n", "3 Cambodia 1.0 0.0 1.0 12.565679 104.990963\n", "4 Canada 11.0 0.0 1.0 56.130366 -106.346771\n", "5 China 72366.0 1864.0 12455.0 35.861660 104.195397\n", "6 Egypt 1.0 0.0 0.0 26.820553 30.802498\n", "7 Finland 1.0 0.0 1.0 61.924110 25.748151\n", "8 France 12.0 1.0 4.0 46.227638 2.213749\n", "9 Germany 23.0 0.0 1.0 51.165691 10.451526\n", "10 Hong Kong 60.0 1.0 2.0 22.396428 114.109497\n", "11 India 3.0 0.0 3.0 20.593684 78.962880\n", "12 Italy 3.0 0.0 0.0 41.871940 12.567380\n", "13 Ivory Coast 0.0 0.0 0.0 NaN NaN\n", "14 Japan 66.0 1.0 12.0 36.204824 138.252924\n", "15 Macau 10.0 0.0 5.0 22.198745 113.543873\n", "16 Malaysia 22.0 0.0 7.0 4.210484 101.975766\n", "17 Mexico 0.0 0.0 0.0 23.634501 -102.552784\n", "18 Nepal 1.0 0.0 1.0 28.394857 84.124008\n", "19 Philippines 3.0 1.0 1.0 12.879721 121.774017\n", "20 Russia 2.0 0.0 2.0 61.524010 105.318756\n", "21 Singapore 77.0 0.0 24.0 1.352083 103.819836\n", "22 South Korea 30.0 0.0 10.0 35.907757 127.766922\n", "23 Spain 2.0 0.0 2.0 40.463667 -3.749220\n", "24 Sri Lanka 1.0 0.0 1.0 7.873054 80.771797\n", "25 Sweden 1.0 0.0 0.0 60.128161 18.643501\n", "26 Taiwan 22.0 1.0 2.0 23.697810 120.960515\n", "27 Thailand 35.0 0.0 15.0 15.870032 100.992541\n", "28 UK 9.0 0.0 8.0 55.378051 -3.435973\n", "29 US 23.0 0.0 3.0 37.090240 -95.712891\n", "30 United Arab Emirates 9.0 0.0 4.0 23.424076 53.847818\n", "31 Vietnam 16.0 0.0 7.0 14.058324 108.277199" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corona_df_with_geo_location = countrywise_df.merge(world_coordinates,how='left',on='Country')\n", "corona_df_with_geo_location" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryConfirmedDeathsRecoveredlatitudelongitude
0Australia19.00.010.0-25.274398133.775136
1Belgium1.00.01.050.5038874.469936
2Brazil0.00.00.0-14.235004-51.925280
3Cambodia1.00.01.012.565679104.990963
4Canada11.00.01.056.130366-106.346771
5China72366.01864.012455.035.861660104.195397
6Egypt1.00.00.026.82055330.802498
7Finland1.00.01.061.92411025.748151
8France12.01.04.046.2276382.213749
9Germany23.00.01.051.16569110.451526
10Hong Kong60.01.02.022.396428114.109497
11India3.00.03.020.59368478.962880
12Italy3.00.00.041.87194012.567380
14Japan66.01.012.036.204824138.252924
15Macau10.00.05.022.198745113.543873
16Malaysia22.00.07.04.210484101.975766
17Mexico0.00.00.023.634501-102.552784
18Nepal1.00.01.028.39485784.124008
19Philippines3.01.01.012.879721121.774017
20Russia2.00.02.061.524010105.318756
21Singapore77.00.024.01.352083103.819836
22South Korea30.00.010.035.907757127.766922
23Spain2.00.02.040.463667-3.749220
24Sri Lanka1.00.01.07.87305480.771797
25Sweden1.00.00.060.12816118.643501
26Taiwan22.01.02.023.697810120.960515
27Thailand35.00.015.015.870032100.992541
28UK9.00.08.055.378051-3.435973
29US23.00.03.037.090240-95.712891
30United Arab Emirates9.00.04.023.42407653.847818
31Vietnam16.00.07.014.058324108.277199
\n", "
" ], "text/plain": [ " Country Confirmed Deaths Recovered latitude longitude\n", "0 Australia 19.0 0.0 10.0 -25.274398 133.775136\n", "1 Belgium 1.0 0.0 1.0 50.503887 4.469936\n", "2 Brazil 0.0 0.0 0.0 -14.235004 -51.925280\n", "3 Cambodia 1.0 0.0 1.0 12.565679 104.990963\n", "4 Canada 11.0 0.0 1.0 56.130366 -106.346771\n", "5 China 72366.0 1864.0 12455.0 35.861660 104.195397\n", "6 Egypt 1.0 0.0 0.0 26.820553 30.802498\n", "7 Finland 1.0 0.0 1.0 61.924110 25.748151\n", "8 France 12.0 1.0 4.0 46.227638 2.213749\n", "9 Germany 23.0 0.0 1.0 51.165691 10.451526\n", "10 Hong Kong 60.0 1.0 2.0 22.396428 114.109497\n", "11 India 3.0 0.0 3.0 20.593684 78.962880\n", "12 Italy 3.0 0.0 0.0 41.871940 12.567380\n", "14 Japan 66.0 1.0 12.0 36.204824 138.252924\n", "15 Macau 10.0 0.0 5.0 22.198745 113.543873\n", "16 Malaysia 22.0 0.0 7.0 4.210484 101.975766\n", "17 Mexico 0.0 0.0 0.0 23.634501 -102.552784\n", "18 Nepal 1.0 0.0 1.0 28.394857 84.124008\n", "19 Philippines 3.0 1.0 1.0 12.879721 121.774017\n", "20 Russia 2.0 0.0 2.0 61.524010 105.318756\n", "21 Singapore 77.0 0.0 24.0 1.352083 103.819836\n", "22 South Korea 30.0 0.0 10.0 35.907757 127.766922\n", "23 Spain 2.0 0.0 2.0 40.463667 -3.749220\n", "24 Sri Lanka 1.0 0.0 1.0 7.873054 80.771797\n", "25 Sweden 1.0 0.0 0.0 60.128161 18.643501\n", "26 Taiwan 22.0 1.0 2.0 23.697810 120.960515\n", "27 Thailand 35.0 0.0 15.0 15.870032 100.992541\n", "28 UK 9.0 0.0 8.0 55.378051 -3.435973\n", "29 US 23.0 0.0 3.0 37.090240 -95.712891\n", "30 United Arab Emirates 9.0 0.0 4.0 23.424076 53.847818\n", "31 Vietnam 16.0 0.0 7.0 14.058324 108.277199" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dropping rows with NaN values\n", "corona_df_with_geo_location.dropna(axis=0,inplace=True)\n", "corona_df_with_geo_location" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "world_map = flm.Map(location=[10,-10],zoom_start=2)\n", "for country,confirmed,lat,lng in zip(corona_df_with_geo_location.Country,\n", " corona_df_with_geo_location.Confirmed,\n", " corona_df_with_geo_location.latitude,\n", " corona_df_with_geo_location.longitude):\n", " \n", " flm.CircleMarker(\n", " location=[lat,lng],\n", " radius=5,\n", " tooltip=str(confirmed),\n", " fill=True,\n", " fill_color='red',\n", " fill_opacity=0.8).add_to(world_map)\n", "world_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above world map circle marker presents that these countries have been affected by corona virus." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Choropleth map**" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myscale = [0,15000,30000,45000,60000,73090] # manually defining threshold scale\n", "\n", "choropleth_map = flm.Map(location=[10,-10],zoom_start=2)\n", "\n", "\n", "flm.Choropleth(\n", " geo_data = world_geoJson,\n", " data = corona_df_with_geo_location,\n", " columns = ['Country','Confirmed'],\n", " key_on='feature.properties.name',\n", " fill_color= 'YlOrRd',\n", " fill_opacity=0.7, \n", " line_opacity=0.2,\n", " legend_name='No. of confirmed cases',\n", " threshold_scale = myscale,\n", " nan_fill_color='black',\n", " nan_fill_opacity=0.4,\n", " highlight=True\n", " ).add_to(choropleth_map)\n", "choropleth_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Black colored areas represents that there is no data for these countries in dataset.\n", "- Light yellow colored areas have less no of cases.\n", "- China is red colored because there are very large no. of cases.\n", "\n", "for more you can refer to threshold scale on map." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now visualizing the choropleth map for china only(Province wise)**" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "# loading china geoJson file\n", "\n", "import json\n", "\n", "with open(china_geoJson) as file:\n", " china = json.load(file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Checking whether province/state name in china geoJson file are in chinese language or in english language." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['新疆维吾尔自治区',\n", " '西藏自治区',\n", " '内蒙古自治区',\n", " '青海省',\n", " '四川省',\n", " '黑龙江省',\n", " '甘肃省',\n", " '云南省',\n", " '广西壮族自治区',\n", " '湖南省',\n", " '陕西省',\n", " '广东省',\n", " '吉林省',\n", " '河北省',\n", " '湖北省',\n", " '贵州省',\n", " '山东省',\n", " '江西省',\n", " '河南省',\n", " '辽宁省',\n", " '山西省',\n", " '安徽省',\n", " '福建省',\n", " '浙江省',\n", " '江苏省',\n", " '重庆市',\n", " '宁夏回族自治区',\n", " '海南省',\n", " '台湾省',\n", " '北京市',\n", " '天津市',\n", " '上海市',\n", " '香港特别行政区',\n", " '澳门特别行政区']" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "name_lst = []\n", "for i in range(len(china['features'])):\n", " name_lst.append(china['features'][i]['properties']['name'])\n", "name_lst" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Province/State name are in chinese language, need to translate in english because data present in dataset are in english language." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting googletrans\n", " Downloading https://files.pythonhosted.org/packages/fd/f0/a22d41d3846d1f46a4f20086141e0428ccc9c6d644aacbfd30990cf46886/googletrans-2.4.0.tar.gz\n", "Requirement already satisfied: requests in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from googletrans) (2.22.0)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->googletrans) (1.25.7)\n", "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->googletrans) (3.0.4)\n", "Requirement already satisfied: idna<2.9,>=2.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->googletrans) (2.8)\n", "Requirement already satisfied: certifi>=2017.4.17 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->googletrans) (2019.9.11)\n", "Building wheels for collected packages: googletrans\n", " Building wheel for googletrans (setup.py) ... \u001b[?25ldone\n", "\u001b[?25h Stored in directory: /home/jupyterlab/.cache/pip/wheels/50/d6/e7/a8efd5f2427d5eb258070048718fa56ee5ac57fd6f53505f95\n", "Successfully built googletrans\n", "Installing collected packages: googletrans\n", "Successfully installed googletrans-2.4.0\n" ] } ], "source": [ "# installing google translator library \n", "\n", "!pip install googletrans" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Xinjiang Uygur Autonomous Region',\n", " 'Tibet Autonomous Region',\n", " 'Inner Mongolia Autonomous Region',\n", " 'Qinghai Province',\n", " 'Sichuan Province',\n", " 'Heilongjiang Province',\n", " 'Gansu province',\n", " 'Yunnan Province',\n", " 'Guangxi Zhuang Autonomous Region',\n", " 'Hunan Province',\n", " 'Shaanxi Province',\n", " 'Guangdong Province',\n", " 'Jilin Province',\n", " 'Hebei Province',\n", " 'Hubei Province',\n", " 'Guizhou Province',\n", " 'Shandong Province',\n", " 'Jiangxi',\n", " 'Henan Province',\n", " 'Liaoning Province',\n", " 'Shanxi Province',\n", " 'Anhui Province',\n", " 'Fujian Province',\n", " 'Zhejiang Province',\n", " 'Jiangsu Province',\n", " 'Chongqing',\n", " 'Ningxia Hui Autonomous Region',\n", " 'Hainan',\n", " 'Taiwan Province',\n", " 'Beijing',\n", " 'Tianjin',\n", " 'Shanghai',\n", " 'Hong Kong SAR',\n", " 'Macao Special Administrative Region']" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from googletrans import Translator # importing translator \n", "translator = Translator() # creating an object of translator\n", "\n", "name_lst_en = []\n", "for text in name_lst:\n", " name_lst_en.append(translator.translate(text).text)\n", "name_lst_en" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Comparing above list of Province/State with Province/State in corona dataset.**" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Anhui\n", "1 Beijing\n", "2 Chongqing\n", "3 Fujian\n", "4 Gansu\n", "5 Guangdong\n", "6 Guangxi\n", "7 Guizhou\n", "8 Hainan\n", "9 Hebei\n", "10 Heilongjiang\n", "11 Henan\n", "12 Hong Kong\n", "13 Hubei\n", "14 Hunan\n", "15 Inner Mongolia\n", "16 Jiangsu\n", "17 Jiangxi\n", "18 Jilin\n", "19 Liaoning\n", "20 Macau\n", "21 Ningxia\n", "22 Qinghai\n", "23 Shaanxi\n", "24 Shandong\n", "25 Shanghai\n", "26 Shanxi\n", "27 Sichuan\n", "28 Taiwan\n", "29 Tianjin\n", "30 Tibet\n", "31 Xinjiang\n", "32 Yunnan\n", "33 Zhejiang\n", "Name: Province/State, dtype: object" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "China_data_state_wise['Province/State']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Need to modify name_lst_en according to dataset.**" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Xinjiang',\n", " 'Tibet',\n", " 'Inner',\n", " 'Qinghai',\n", " 'Sichuan',\n", " 'Heilongjiang',\n", " 'Gansu',\n", " 'Yunnan',\n", " 'Guangxi',\n", " 'Hunan',\n", " 'Shaanxi',\n", " 'Guangdong',\n", " 'Jilin',\n", " 'Hebei',\n", " 'Hubei',\n", " 'Guizhou',\n", " 'Shandong',\n", " 'Jiangxi',\n", " 'Henan',\n", " 'Liaoning',\n", " 'Shanxi',\n", " 'Anhui',\n", " 'Fujian',\n", " 'Zhejiang',\n", " 'Jiangsu',\n", " 'Chongqing',\n", " 'Ningxia',\n", " 'Hainan',\n", " 'Taiwan',\n", " 'Beijing',\n", " 'Tianjin',\n", " 'Shanghai',\n", " 'Hong',\n", " 'Macao']" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Keeping only first word in every item of name_lst_en\n", "\n", "for i in range(len(name_lst_en)):\n", " name_lst_en[i] = name_lst_en[i].split(\" \")[0]\n", "name_lst_en" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**now translated names are matching with the names present in dataset.**" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "# writing the tranlated name of province/state in china geoJson file\n", "\n", "for i in range(len(name_lst_en)):\n", " china['features'][i]['properties']['name']=name_lst_en[i]" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "China_lat = 35.8617\n", "China_lng = 104.1954\n", "\n", "China_map = flm.Map(location=[China_lat,China_lng],zoom_start=4)\n", "\n", "flm.Choropleth(\n", " geo_data=china,\n", " data=China_data_state_wise,\n", " columns=['Province/State','Confirmed'],\n", " key_on='feature.properties.name',\n", " fill_color='YlOrRd',\n", " fill_opacity=0.7,\n", " nan_fill_color='black',\n", " nan_fill_opacity=0.7,\n", " highlight=True,\n", " legend_name='No. of confirmed cases',\n", " ).add_to(China_map)\n", "China_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Black colored areas represents that there is no data for these states in dataset.\n", "- Light yellow colored areas have less no of cases.\n", "\n", "for more you can refer to threshold scale on map." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Only one province is of red color, indicates mostly infected people are here. It's Hubie." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Thank you for reading" ] } ], "metadata": { "kernelspec": { "display_name": "Python", "language": "python", "name": "conda-env-python-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 4 }