{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Information\n", "__Section__: Examine the datasets \n", "__Goal__: Understand the content and the distribution of the datasets we are using in this part. \n", "__Time needed__: 30 min \n", "__Prerequisites__: Understanding of AIS data\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Examine the datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import the data " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that you understand the meaning of each attribute, you can take a deeper look into the dataset itself, to learn from the distribution of the attributes and their possible relationships and interdependencies. This step will allow you to have a good overview on your data to better solve the diverse tasks for your customers later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{toggle} Advanced level\n", "First, you need to load the data again.\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dynamic_data = pd.read_csv('./dynamic_data.csv')\n", "static_data = pd.read_csv('./static_data.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dynamic data - let's do it together " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we can use various methods to get an overview of the dataset. Let's have a look at those together:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{toggle} Advanced level\n", "The method [info()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) returns the count of instances and attributes in the dataset, the method [describe()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html) prints the distribution of each numerical attribute.\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 100000 entries, 0 to 99999\n", "Data columns (total 26 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 MMSI 100000 non-null int64 \n", " 1 BaseDateTime 100000 non-null object \n", " 2 LAT 100000 non-null float64\n", " 3 LON 100000 non-null float64\n", " 4 SOG 100000 non-null float64\n", " 5 COG 100000 non-null float64\n", " 6 Heading 100000 non-null float64\n", " 7 VesselName 95405 non-null object \n", " 8 IMO 47669 non-null object \n", " 9 CallSign 80589 non-null object \n", " 10 VesselType 88926 non-null float64\n", " 11 Status 70249 non-null object \n", " 12 Length 85415 non-null float64\n", " 13 Width 71123 non-null float64\n", " 14 Draft 42189 non-null float64\n", " 15 TripID 100000 non-null int64 \n", " 16 DepTime 100000 non-null object \n", " 17 ArrTime 100000 non-null object \n", " 18 DepLat 100000 non-null float64\n", " 19 DepLon 100000 non-null float64\n", " 20 ArrLat 100000 non-null float64\n", " 21 ArrLon 100000 non-null float64\n", " 22 DepCountry 100000 non-null object \n", " 23 DepCity 100000 non-null object \n", " 24 ArrCountry 100000 non-null object \n", " 25 ArrCity 100000 non-null object \n", "dtypes: float64(13), int64(2), object(11)\n", "memory usage: 19.8+ MB\n" ] } ], "source": [ "dynamic_data.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "+ ``RangeIndex: 100000 entries, 0 to 99999``: this means that the dataset contains 100000 instances, or 100000 lines. This is the number of AIS messages that are represented in the dataset.\n", "+ ``Data columns (total 26 columns)``: this shows that the dataset contains 26 attributes (represented as columns in the dataset).\n", "+ Then follows a list of each attribute, with the number of recorded (non-null) values for each and their [type](./../../introduction/0-2-supervised-learning.html). \n", "+ Finally, we see a summary of the types and the number of attributes of each type, and the memory used by this dataset." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MMSIBaseDateTimeLATLONSOGCOGHeadingVesselNameIMOCallSignVesselTypeStatusLengthWidthDraftTripIDDepTimeArrTimeDepLatDepLonArrLatArrLonDepCountryDepCityArrCountryArrCity
03671146902017-01-01 00:00:0648.51094-122.607050.0-49.6511.0NaNNaNNaNNaNunder way using engineNaNNaNNaN12017-01-01 00:00:062017-01-01 02:40:4548.51094-122.6070548.51095-122.60705USAnacortesUSAnacortes
\n", "
" ], "text/plain": [ " MMSI BaseDateTime LAT LON SOG COG Heading \\\n", "0 367114690 2017-01-01 00:00:06 48.51094 -122.60705 0.0 -49.6 511.0 \n", "\n", " VesselName IMO CallSign VesselType Status Length Width \\\n", "0 NaN NaN NaN NaN under way using engine NaN NaN \n", "\n", " Draft TripID DepTime ArrTime DepLat \\\n", "0 NaN 1 2017-01-01 00:00:06 2017-01-01 02:40:45 48.51094 \n", "\n", " DepLon ArrLat ArrLon DepCountry DepCity ArrCountry ArrCity \n", "0 -122.60705 48.51095 -122.60705 US Anacortes US Anacortes " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_data.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above, we printed the first instance of the dataset. As we see, some attributes have the value ``NaN``. This means that this value is missing (it has not been recorded or saved). This is the reason why some attributes above don't have ``100000 non-null`` but rather a lower number of instances: some of these values are missing, and we call them __missing values__." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MMSILATLONSOGCOGHeadingVesselTypeLengthWidthDraftTripIDDepLatDepLonArrLatArrLon
count1.000000e+05100000.000000100000.000000100000.000000100000.000000100000.00000088926.00000085415.00000071123.00000042189.000000100000.000000100000.000000100000.000000100000.000000100000.000000
mean3.590440e+0846.221127-122.8933431.775973-16.296728369.527000952.37314260.89916513.6533666.055567668.13589046.222409-122.89236046.220749-122.897869
std5.927431e+073.8508650.7032684.491950118.459501174.132613237.05754474.52984110.5569484.340102409.2453173.8549980.7065393.8468430.705792
min3.160089e+0632.209370-125.998590-0.100000-204.8000000.0000000.0000006.7100000.0000000.0000001.00000032.220640-125.99561032.209370-125.998590
25%3.160334e+0846.137208-123.1902900.000000-116.600000205.0000001004.00000018.1400006.4300003.000000317.00000046.137290-123.20442046.144140-123.192990
50%3.669768e+0847.645370-122.6882100.000000-49.600000511.0000001018.00000026.4900009.3500004.500000646.00000047.645390-122.68380047.645110-122.688210
75%3.675157e+0848.621405-122.3857300.10000077.900000511.0000001019.00000060.84000018.2800009.1000001001.00000048.621300-122.38550048.621200-122.385970
max9.876543e+0849.890740-120.00242042.100000204.700000511.0000001025.000000349.00000050.00000018.8000001520.00000049.890740-120.00292049.832120-120.002420
\n", "
" ], "text/plain": [ " MMSI LAT LON SOG \\\n", "count 1.000000e+05 100000.000000 100000.000000 100000.000000 \n", "mean 3.590440e+08 46.221127 -122.893343 1.775973 \n", "std 5.927431e+07 3.850865 0.703268 4.491950 \n", "min 3.160089e+06 32.209370 -125.998590 -0.100000 \n", "25% 3.160334e+08 46.137208 -123.190290 0.000000 \n", "50% 3.669768e+08 47.645370 -122.688210 0.000000 \n", "75% 3.675157e+08 48.621405 -122.385730 0.100000 \n", "max 9.876543e+08 49.890740 -120.002420 42.100000 \n", "\n", " COG Heading VesselType Length Width \\\n", "count 100000.000000 100000.000000 88926.000000 85415.000000 71123.000000 \n", "mean -16.296728 369.527000 952.373142 60.899165 13.653366 \n", "std 118.459501 174.132613 237.057544 74.529841 10.556948 \n", "min -204.800000 0.000000 0.000000 6.710000 0.000000 \n", "25% -116.600000 205.000000 1004.000000 18.140000 6.430000 \n", "50% -49.600000 511.000000 1018.000000 26.490000 9.350000 \n", "75% 77.900000 511.000000 1019.000000 60.840000 18.280000 \n", "max 204.700000 511.000000 1025.000000 349.000000 50.000000 \n", "\n", " Draft TripID DepLat DepLon \\\n", "count 42189.000000 100000.000000 100000.000000 100000.000000 \n", "mean 6.055567 668.135890 46.222409 -122.892360 \n", "std 4.340102 409.245317 3.854998 0.706539 \n", "min 0.000000 1.000000 32.220640 -125.995610 \n", "25% 3.000000 317.000000 46.137290 -123.204420 \n", "50% 4.500000 646.000000 47.645390 -122.683800 \n", "75% 9.100000 1001.000000 48.621300 -122.385500 \n", "max 18.800000 1520.000000 49.890740 -120.002920 \n", "\n", " ArrLat ArrLon \n", "count 100000.000000 100000.000000 \n", "mean 46.220749 -122.897869 \n", "std 3.846843 0.705792 \n", "min 32.209370 -125.998590 \n", "25% 46.144140 -123.192990 \n", "50% 47.645110 -122.688210 \n", "75% 48.621200 -122.385970 \n", "max 49.832120 -120.002420 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_data.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function returns some statistics about the distribution of the numerical attributes in the dataset. For example, we can see that the 3rd quartile ('75%') of the attribute SOG has a value of 0.1, which means that 75% of the recorded values for SOG are less than 0.1 Knot: we can conclude from this information that most of the recorded datapoints in this dataset concern immobile ships." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, with some simple histograms, we can have a visualization the distribution of each attribute in the dataset. This will allow us to look a little deeper than with the above functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{toggle} Advanced level\n", "For that, we use the method [plot](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.html) of a Pandas Series, which allows to produce different types of plot for an attribute, here we choose [hist()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.hist.html).\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "dynamic_data['LAT'].plot.hist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This plot shows the distribution of the latitude attribute in the dataset. We can see that many values are comprised in the range [45.0 ; 50.0]. This means that many positions are recorded in this area, while the range [32.5 ; 45.0] is less dense in recorded positions. If you want to know more about the histograms and the different kinds of plots used in the course, you can visit [this page](./../../introduction/0-3-graphs.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can create this type of plot for every numerical attribute in the dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{toggle} Advanced level\n", "In the previous cell, try to change the name of the plotted attribute to visualize the other attributes. You can even try to put the name of a non-numerical attribute to see what happens.\n", "```" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4a9c6ab969a341bcb3fd5db75b808ba3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Dropdown(description='Attribute:', options=('MMSI', 'LAT', 'LON', 'SOG', 'COG', 'Heading…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For beginner version: cell to hide\n", "\n", "import numpy as np\n", "import ipywidgets as widgets\n", "from ipywidgets import interact\n", "\n", "num_attributes = dynamic_data.select_dtypes([np.number]).columns\n", "\n", "def plot_hist(att):\n", " dynamic_data[att].plot.hist()\n", "\n", "interact(plot_hist, att = widgets.Dropdown(\n", " options = num_attributes,\n", " value = num_attributes[0],\n", " description = 'Attribute:',\n", " disabled = False,\n", "))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can see the different values of each attribute." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{toggle} Advanced level\n", "For that, we use the method [unique()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.unique.html):\n", "```" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/plain": [ "array([ nan, 1012., 1019., 1001., 1025., 1024., 1004., 1018., 1005.,\n", " 70., 30., 1020., 99., 1003., 80., 1013., 31., 1002.,\n", " 52., 53., 1011., 50., 35., 1023., 69., 37., 7.,\n", " 60., 1022., 90., 51., 1010., 0., 79., 71., 1017.])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_data['VesselType'].unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this result, and by comparing the values with the AIS documention available [here](https://coast.noaa.gov/data/marinecadastre/ais/VesselTypeCodes2018.pdf), you can analyze what kind of ships are present in the dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{toggle} Advanced level\n", "Have a look at the different values for the other attributes by changing the name of the attribute in the previous cell.\n", "```" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "448006a60f634a9d9e191df60b34f168", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Dropdown(description='Attribute:', options=('MMSI', 'BaseDateTime', 'LAT', 'LON', 'SOG',…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For beginner version: cell to hide\n", "\n", "import ipywidgets as widgets\n", "from ipywidgets import interact\n", "\n", "attributes = dynamic_data.columns\n", "\n", "def get_unique(att):\n", " print(dynamic_data[att].unique())\n", "\n", "interact(get_unique, att = widgets.Dropdown(\n", " options = attributes,\n", " value = attributes[0],\n", " description = 'Attribute:',\n", " disabled = False,\n", "))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Additionaly, we can create a 2 dimensional plot, to represent 2 attributes against each other. For example, on this picture, we represent the attributes longitude and latitude, to get a geographical representation of the recorded datapoints:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{toggle} Advanced level\n", "For this, we use the library [pyplot](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.html) and the function [plot()](https://matplotlib.org/3.1.3/api/_as_gen/matplotlib.pyplot.plot.html). \n", "```" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "tags": [ "hide-input", "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "plt.figure(figsize = (12, 8))\n", "plt.plot(dynamic_data['Length'], dynamic_data['SOG'], 'x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The datapoints are represented as crosses, and on this plot, each separated blue line is a very likely one recorded trip." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{toggle} Advanced level\n", "Change the name of the attributes and try to compare other (numerical) attributes.\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6553ac13c3f44ff2bbca2695ef14bea8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Dropdown(description='Attribute (x):', options=('MMSI', 'LAT', 'LON', 'SOG', 'COG', 'Hea…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For beginner version: cell to hide\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import ipywidgets as widgets\n", "from ipywidgets import interact\n", "\n", "num_attributes = dynamic_data.select_dtypes([np.number]).columns\n", "\n", "def plot_2att(att1, att2):\n", " plt.figure(figsize = (12, 8))\n", " plt.plot(dynamic_data[att1], dynamic_data[att2], 'x')\n", " plt.xlabel(att1)\n", " plt.ylabel(att2)\n", "\n", "interact(plot_2att,\n", " att1 = widgets.Dropdown(options = num_attributes,\n", " value = num_attributes[0],\n", " description = 'Attribute (x):',\n", " disabled = False,),\n", " att2 = widgets.Dropdown(options = num_attributes,\n", " value = num_attributes[0],\n", " description = 'Attribute (y):',\n", " disabled = False,))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also choose to visualize the information of this dataset for only one trip. For example, we want to see the longitude and latitude values for the trip with the TripID 106:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [ "hide-input", "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# for one trip\n", "\n", "trip = dynamic_data.loc[dynamic_data['TripID'] == 106]\n", "\n", "plt.figure(figsize = (12, 8))\n", "plt.plot(trip['LON'], trip['LAT'], 'x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Change the value of the TripID attribute, and the names of the attributes, to plot anything else." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ab3e33e3223346d59eeb338d3e8285ee", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(BoundedIntText(value=1, description='TripID [1 ; 1520]:', max=1520, min=1), Dropdown(des…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For beginner version: cell to hide\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import ipywidgets as widgets\n", "from ipywidgets import interact\n", "\n", "num_attributes = dynamic_data.select_dtypes([np.number]).columns\n", "\n", "def plot_1trip(tripid, att1, att2):\n", " trip = dynamic_data.loc[dynamic_data['TripID'] == tripid]\n", " plt.figure(figsize = (12, 8))\n", " plt.plot(trip[att1], trip[att2], 'x')\n", " plt.xlabel(att1)\n", " plt.ylabel(att2)\n", "\n", "interact(plot_1trip,\n", " tripid = widgets.BoundedIntText(value = 1,\n", " min = 1,\n", " max = 1520,\n", " step = 1,\n", " description = 'TripID [1 ; 1520]:',\n", " disabled = False),\n", " att1 = widgets.Dropdown(options = num_attributes,\n", " value = num_attributes[0],\n", " description = 'Attribute (x):',\n", " disabled = False,),\n", " att2 = widgets.Dropdown(options = num_attributes,\n", " value = num_attributes[0],\n", " description = 'Attribute (y):',\n", " disabled = False,))" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import IFrame\n", "IFrame(\"https://h5p.org/h5p/embed/742748\", \"694\", \"600\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Static data - your turn " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the same tools as for the dynamic dataset, you can now analyze the static dataset." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 1520 entries, 0 to 1519\n", "Data columns (total 22 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 TripID 1520 non-null int64 \n", " 1 MMSI 1520 non-null int64 \n", " 2 MeanSOG 1520 non-null float64\n", " 3 VesselName 1442 non-null object \n", " 4 IMO 538 non-null object \n", " 5 CallSign 1137 non-null object \n", " 6 VesselType 1287 non-null float64\n", " 7 Length 1220 non-null float64\n", " 8 Width 911 non-null float64\n", " 9 Draft 496 non-null float64\n", " 10 Cargo 378 non-null float64\n", " 11 DepTime 1520 non-null object \n", " 12 ArrTime 1520 non-null object \n", " 13 DepLat 1520 non-null float64\n", " 14 DepLon 1520 non-null float64\n", " 15 ArrLat 1520 non-null float64\n", " 16 ArrLon 1520 non-null float64\n", " 17 DepCountry 1520 non-null object \n", " 18 DepCity 1520 non-null object \n", " 19 ArrCountry 1520 non-null object \n", " 20 ArrCity 1520 non-null object \n", " 21 Duration 1520 non-null object \n", "dtypes: float64(10), int64(2), object(10)\n", "memory usage: 261.4+ KB\n" ] } ], "source": [ "static_data.info()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TripIDMMSIMeanSOGVesselTypeLengthWidthDraftCargoDepLatDepLonArrLatArrLon
count1520.0000001.520000e+031520.0000001287.0000001220.000000911.000000496.000000378.0000001520.0000001520.0000001520.0000001520.000000
mean760.5000003.597421e+081.034825971.68065356.76959013.1045016.45705650.51587346.354331-122.86890546.353671-122.871346
std438.9305186.263661e+072.936439198.95788774.73935810.9033384.60752922.6938103.7667050.6819473.7620560.680604
min1.0000003.160089e+06-0.1000000.0000006.7100000.0000000.0000000.00000032.220640-125.99561032.209370-125.998590
25%380.7500003.380724e+080.0000001004.00000014.8400005.5000003.00000031.00000046.168652-123.17848046.168460-123.168262
50%760.5000003.669802e+080.0126331019.00000022.3400008.0000004.65000052.00000047.647795-122.65136547.646925-122.645290
75%1140.2500003.675663e+080.0720001019.00000041.27750016.35000010.02500070.00000048.656940-122.38656248.665710-122.386607
max1520.0000009.876543e+0820.3608111025.000000349.00000050.00000018.80000099.00000049.890740-120.00292049.832120-120.002420
\n", "
" ], "text/plain": [ " TripID MMSI MeanSOG VesselType Length \\\n", "count 1520.000000 1.520000e+03 1520.000000 1287.000000 1220.000000 \n", "mean 760.500000 3.597421e+08 1.034825 971.680653 56.769590 \n", "std 438.930518 6.263661e+07 2.936439 198.957887 74.739358 \n", "min 1.000000 3.160089e+06 -0.100000 0.000000 6.710000 \n", "25% 380.750000 3.380724e+08 0.000000 1004.000000 14.840000 \n", "50% 760.500000 3.669802e+08 0.012633 1019.000000 22.340000 \n", "75% 1140.250000 3.675663e+08 0.072000 1019.000000 41.277500 \n", "max 1520.000000 9.876543e+08 20.360811 1025.000000 349.000000 \n", "\n", " Width Draft Cargo DepLat DepLon \\\n", "count 911.000000 496.000000 378.000000 1520.000000 1520.000000 \n", "mean 13.104501 6.457056 50.515873 46.354331 -122.868905 \n", "std 10.903338 4.607529 22.693810 3.766705 0.681947 \n", "min 0.000000 0.000000 0.000000 32.220640 -125.995610 \n", "25% 5.500000 3.000000 31.000000 46.168652 -123.178480 \n", "50% 8.000000 4.650000 52.000000 47.647795 -122.651365 \n", "75% 16.350000 10.025000 70.000000 48.656940 -122.386562 \n", "max 50.000000 18.800000 99.000000 49.890740 -120.002920 \n", "\n", " ArrLat ArrLon \n", "count 1520.000000 1520.000000 \n", "mean 46.353671 -122.871346 \n", "std 3.762056 0.680604 \n", "min 32.209370 -125.998590 \n", "25% 46.168460 -123.168262 \n", "50% 47.646925 -122.645290 \n", "75% 48.665710 -122.386607 \n", "max 49.832120 -120.002420 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "static_data.describe()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "#static_data[''].plot.hist()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "10865dee12914bffb708dc20b7f5b734", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Dropdown(description='Attribute:', options=('TripID', 'MMSI', 'MeanSOG', 'VesselType', '…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For beginner version: cell to hide\n", "\n", "import numpy as np\n", "import ipywidgets as widgets\n", "from ipywidgets import interact\n", "\n", "num_attributes = static_data.select_dtypes([np.number]).columns\n", "\n", "def plot_hist(att):\n", " static_data[att].plot.hist()\n", "\n", "interact(plot_hist, att = widgets.Dropdown(\n", " options = num_attributes,\n", " value = num_attributes[0],\n", " description = 'Attribute:',\n", " disabled = False,\n", "))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "#static_data[''].unique()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1bc01167517a4c57807e79fdcae088c9", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Dropdown(description='Attribute:', options=('TripID', 'MMSI', 'MeanSOG', 'VesselName', '…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For beginner version: cell to hide\n", "\n", "import ipywidgets as widgets\n", "from ipywidgets import interact\n", "\n", "attributes = static_data.columns\n", "\n", "def get_unique(att):\n", " print(static_data[att].unique())\n", "\n", "interact(get_unique, att = widgets.Dropdown(\n", " options = attributes,\n", " value = attributes[0],\n", " description = 'Attribute:',\n", " disabled = False,\n", "))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "tags": [ "hide-input", "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "
" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "plt.figure(figsize = (12, 8))\n", "#plt.plot(static_data[''], static_data[''], 'x')" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9f05898b2c124d80bf5542f6edeee257", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Dropdown(description='Attribute (x):', options=('TripID', 'MMSI', 'MeanSOG', 'VesselType…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For beginner version: cell to hide\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import ipywidgets as widgets\n", "from ipywidgets import interact\n", "\n", "num_attributes = static_data.select_dtypes([np.number]).columns\n", "\n", "def plot_2att(att1, att2):\n", " plt.figure(figsize = (12, 8))\n", " plt.plot(static_data[att1], static_data[att2], 'x')\n", " plt.xlabel(att1)\n", " plt.ylabel(att2)\n", "\n", "interact(plot_2att,\n", " att1 = widgets.Dropdown(options = num_attributes,\n", " value = num_attributes[0],\n", " description = 'Attribute (x):',\n", " disabled = False,),\n", " att2 = widgets.Dropdown(options = num_attributes,\n", " value = num_attributes[0],\n", " description = 'Attribute (y):',\n", " disabled = False,))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import IFrame\n", "IFrame(\"https://h5p.org/h5p/embed/742791\", \"694\", \"600\")" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }