{
"cells": [
{
"cell_type": "markdown",
"id": "experienced-letter",
"metadata": {},
"source": [
"## Step - 2: A Framework for Unlocking and Linking WWII Japanese American Incarceration Biographical Data - Context Based Data Manipulation and Analysis - Part 2"
]
},
{
"cell_type": "markdown",
"id": "alternate-dragon",
"metadata": {},
"source": [
"The focus of this module will be manipulating the geographical data collected in part 1 to explore a variety of structures for visualizing spatial data. \n",
"\n",
"The actions taken in part 1 to locate the place of origin, assembly center, camp relocations, residence at Tule Lake and the final movement for George Kuratomi were repeated for the other 24 selected individuals and aggregated into an Excel spreadsheet which can be seen below. Following this, the latitude and longitude coordinates were added for all five movements for the 25 individuals. A separate spreadsheet, included below, was also created in Excel which structures the latitute and longitude coordinates in a format that will allow us to map out the paths of each person. ***Note:The creation of the paths spreadsheet can be done through python but would require a lot of manipulation of the data to structure it in a useful way, as a result and for ease the data was formatted in an excel spreadsheet. \n",
"\n",
"To begin working with the geographical data, the spreadsheet(s) should be saved as a comma separated value (.csv) file(s) then read into the jupyter notebook following the same process as in part 1. This process can be seen below. **Note: Excel files can also be read in, it's a matter of personal preference. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "limiting-slide",
"metadata": {},
"outputs": [],
"source": [
"# Import libraries used for dataframe (table-like) operations, and numeric data structure operations\n",
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "scenic-stanley",
"metadata": {},
"outputs": [],
"source": [
"# The below command will read your file into your notebook \n",
"fullstackeddf = pd.read_csv('python-fullmovements-stacked.csv',dtype=object,na_values=[],keep_default_na=False)\n",
"pathsdf = pd.read_csv('python-paths.csv',dtype=object,na_values=[],keep_default_na=False)\n"
]
},
{
"cell_type": "markdown",
"id": "fabulous-access",
"metadata": {},
"source": [
"### Creation of Points"
]
},
{
"cell_type": "markdown",
"id": "intense-scratch",
"metadata": {},
"source": [
"Spatial data is geographic information about the earth and typically references a specific geospatial area or location. To perform any kind of spatial analysis a dataset must include the latitude and longitude coordinates. Additional elements added like name, city, state, dates, etc. give rise to exploring other visualizations and also provide more context about the dataset.\n",
"\n",
"The headers for this dataset include, Name, lat, long, city, state, order, dates, fid, and notes, as shown below. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "imposed-mileage",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" lat | \n",
" long | \n",
" city | \n",
" state | \n",
" order | \n",
" dates | \n",
" fid | \n",
" abbrev | \n",
" Notes | \n",
" iso_alpha | \n",
" iso_no | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" george kuratomi | \n",
" 32.7157 | \n",
" -117.1611 | \n",
" san diego | \n",
" california | \n",
" origin | \n",
" | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 1 | \n",
" george kuratomi | \n",
" 34.1333 | \n",
" -118.0333 | \n",
" santa anita | \n",
" california | \n",
" assembly | \n",
" 1942-10-30 | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 2 | \n",
" george kuratomi | \n",
" 33.3833 | \n",
" -91.4667 | \n",
" jerome | \n",
" arkansas | \n",
" first camp | \n",
" 1943-09-26 | \n",
" 1 | \n",
" AR | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 3 | \n",
" george kuratomi | \n",
" 41.8931 | \n",
" -121.3735 | \n",
" tule lake | \n",
" california | \n",
" second camp | \n",
" 1943-09-30 | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 4 | \n",
" george kuratomi | \n",
" 40.7300 | \n",
" -77.9380 | \n",
" pennsylvania | \n",
" pennsylvania | \n",
" final departure | \n",
" 1946-01-10 | \n",
" 1 | \n",
" PA | \n",
" terminal departure with grant | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 5 | \n",
" tom (yoshio) kobayashi | \n",
" 34.0522 | \n",
" -118.2437 | \n",
" los angeles | \n",
" california | \n",
" origin | \n",
" | \n",
" 2 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 6 | \n",
" tom (yoshio) kobayashi | \n",
" 34.1404 | \n",
" -118.0442 | \n",
" santa anita | \n",
" california | \n",
" assembly | \n",
" 1942-09-04 | \n",
" 2 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 7 | \n",
" tom (yoshio) kobayashi | \n",
" 44.5167 | \n",
" -109.0501 | \n",
" heart mountain | \n",
" wyoming | \n",
" first camp | \n",
" 1943-09-27 | \n",
" 2 | \n",
" WY | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 8 | \n",
" tom (yoshio) kobayashi | \n",
" 41.8814 | \n",
" -121.3556 | \n",
" tule lake | \n",
" california | \n",
" second camp | \n",
" 1943-09-30 | \n",
" 2 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 9 | \n",
" tom (yoshio) kobayashi | \n",
" 46.8000 | \n",
" -100.7833 | \n",
" north dakota | \n",
" north dakota | \n",
" final departure | \n",
" 1945-02-11 | \n",
" 2 | \n",
" ND | \n",
" terminal internment | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name lat long city state \\\n",
"0 george kuratomi 32.7157 -117.1611 san diego california \n",
"1 george kuratomi 34.1333 -118.0333 santa anita california \n",
"2 george kuratomi 33.3833 -91.4667 jerome arkansas \n",
"3 george kuratomi 41.8931 -121.3735 tule lake california \n",
"4 george kuratomi 40.7300 -77.9380 pennsylvania pennsylvania \n",
"5 tom (yoshio) kobayashi 34.0522 -118.2437 los angeles california \n",
"6 tom (yoshio) kobayashi 34.1404 -118.0442 santa anita california \n",
"7 tom (yoshio) kobayashi 44.5167 -109.0501 heart mountain wyoming \n",
"8 tom (yoshio) kobayashi 41.8814 -121.3556 tule lake california \n",
"9 tom (yoshio) kobayashi 46.8000 -100.7833 north dakota north dakota \n",
"\n",
" order dates fid abbrev Notes \\\n",
"0 origin 1 CA \n",
"1 assembly 1942-10-30 1 CA \n",
"2 first camp 1943-09-26 1 AR \n",
"3 second camp 1943-09-30 1 CA \n",
"4 final departure 1946-01-10 1 PA terminal departure with grant \n",
"5 origin 2 CA \n",
"6 assembly 1942-09-04 2 CA \n",
"7 first camp 1943-09-27 2 WY \n",
"8 second camp 1943-09-30 2 CA \n",
"9 final departure 1945-02-11 2 ND terminal internment \n",
"\n",
" iso_alpha iso_no \n",
"0 USA 840 \n",
"1 USA 840 \n",
"2 USA 840 \n",
"3 USA 840 \n",
"4 USA 840 \n",
"5 USA 840 \n",
"6 USA 840 \n",
"7 USA 840 \n",
"8 USA 840 \n",
"9 USA 840 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The below command shows the first ten rows of the dataset\n",
"fullstackeddf.head(10)"
]
},
{
"cell_type": "markdown",
"id": "express-angel",
"metadata": {},
"source": [
"To view all reported movements of one person, we can use Python's contains function to return results of that particular individual. The data is already structured in a way that will make it easy to explore and plot locations for other indivdiuals or groups. \n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "worse-spectacular",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" lat | \n",
" long | \n",
" city | \n",
" state | \n",
" order | \n",
" dates | \n",
" fid | \n",
" abbrev | \n",
" Notes | \n",
" iso_alpha | \n",
" iso_no | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" george kuratomi | \n",
" 32.7157 | \n",
" -117.1611 | \n",
" san diego | \n",
" california | \n",
" origin | \n",
" | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 1 | \n",
" george kuratomi | \n",
" 34.1333 | \n",
" -118.0333 | \n",
" santa anita | \n",
" california | \n",
" assembly | \n",
" 1942-10-30 | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 2 | \n",
" george kuratomi | \n",
" 33.3833 | \n",
" -91.4667 | \n",
" jerome | \n",
" arkansas | \n",
" first camp | \n",
" 1943-09-26 | \n",
" 1 | \n",
" AR | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 3 | \n",
" george kuratomi | \n",
" 41.8931 | \n",
" -121.3735 | \n",
" tule lake | \n",
" california | \n",
" second camp | \n",
" 1943-09-30 | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 4 | \n",
" george kuratomi | \n",
" 40.7300 | \n",
" -77.9380 | \n",
" pennsylvania | \n",
" pennsylvania | \n",
" final departure | \n",
" 1946-01-10 | \n",
" 1 | \n",
" PA | \n",
" terminal departure with grant | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name lat long city state \\\n",
"0 george kuratomi 32.7157 -117.1611 san diego california \n",
"1 george kuratomi 34.1333 -118.0333 santa anita california \n",
"2 george kuratomi 33.3833 -91.4667 jerome arkansas \n",
"3 george kuratomi 41.8931 -121.3735 tule lake california \n",
"4 george kuratomi 40.7300 -77.9380 pennsylvania pennsylvania \n",
"\n",
" order dates fid abbrev Notes \\\n",
"0 origin 1 CA \n",
"1 assembly 1942-10-30 1 CA \n",
"2 first camp 1943-09-26 1 AR \n",
"3 second camp 1943-09-30 1 CA \n",
"4 final departure 1946-01-10 1 PA terminal departure with grant \n",
"\n",
" iso_alpha iso_no \n",
"0 USA 840 \n",
"1 USA 840 \n",
"2 USA 840 \n",
"3 USA 840 \n",
"4 USA 840 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The contains function can pull results specific to a name \n",
"kuratomi = fullstackeddf[fullstackeddf['name'].str.contains('kuratomi')]\n",
"kuratomi"
]
},
{
"cell_type": "markdown",
"id": "explicit-graduate",
"metadata": {},
"source": [
"Similiarly, as illustrated above the contains function can also return results for distinct cities, states, orders, and dates. \n",
"\n",
"This is particularly useful for exploring and viewing the data through a different lens, especially if you want to analyze where individuals or groups were on a particular date or location. In the example table below, the contains function was used to pull data that contain 'california'. When mapped the result will show points for individuals where their location, point of origin, assigned assembly center, first and/or second incarceration center, and final departure state, was in California. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "specialized-minority",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" lat | \n",
" long | \n",
" city | \n",
" state | \n",
" order | \n",
" dates | \n",
" fid | \n",
" abbrev | \n",
" Notes | \n",
" iso_alpha | \n",
" iso_no | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" george kuratomi | \n",
" 32.7157 | \n",
" -117.1611 | \n",
" san diego | \n",
" california | \n",
" origin | \n",
" | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 1 | \n",
" george kuratomi | \n",
" 34.1333 | \n",
" -118.0333 | \n",
" santa anita | \n",
" california | \n",
" assembly | \n",
" 1942-10-30 | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 3 | \n",
" george kuratomi | \n",
" 41.8931 | \n",
" -121.3735 | \n",
" tule lake | \n",
" california | \n",
" second camp | \n",
" 1943-09-30 | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 5 | \n",
" tom (yoshio) kobayashi | \n",
" 34.0522 | \n",
" -118.2437 | \n",
" los angeles | \n",
" california | \n",
" origin | \n",
" | \n",
" 2 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 6 | \n",
" tom (yoshio) kobayashi | \n",
" 34.1404 | \n",
" -118.0442 | \n",
" santa anita | \n",
" california | \n",
" assembly | \n",
" 1942-09-04 | \n",
" 2 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 118 | \n",
" yukio kobayashi | \n",
" 41.8936 | \n",
" -121.3678 | \n",
" tule lake | \n",
" california | \n",
" second camp | \n",
" 1943-09-30 | \n",
" 24 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 120 | \n",
" kazuo uneda | \n",
" 33.8910 | \n",
" -118.3010 | \n",
" gardena | \n",
" california | \n",
" origin | \n",
" | \n",
" 25 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 121 | \n",
" kazuo uneda | \n",
" 38.5737 | \n",
" -121.4945 | \n",
" sacramento (walerga) | \n",
" california | \n",
" assembly | \n",
" 1942-06-21 | \n",
" 25 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 122 | \n",
" kazuo uneda | \n",
" 41.8904 | \n",
" -121.3721 | \n",
" tule lake | \n",
" california | \n",
" first camp | \n",
" | \n",
" 25 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
" | 123 | \n",
" kazuo uneda | \n",
" 41.8936 | \n",
" -121.3590 | \n",
" tule lake | \n",
" california | \n",
" second camp | \n",
" | \n",
" 25 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
"
\n",
" \n",
"
\n",
"
74 rows × 12 columns
\n",
"
"
],
"text/plain": [
" name lat long city \\\n",
"0 george kuratomi 32.7157 -117.1611 san diego \n",
"1 george kuratomi 34.1333 -118.0333 santa anita \n",
"3 george kuratomi 41.8931 -121.3735 tule lake \n",
"5 tom (yoshio) kobayashi 34.0522 -118.2437 los angeles \n",
"6 tom (yoshio) kobayashi 34.1404 -118.0442 santa anita \n",
".. ... ... ... ... \n",
"118 yukio kobayashi 41.8936 -121.3678 tule lake \n",
"120 kazuo uneda 33.8910 -118.3010 gardena \n",
"121 kazuo uneda 38.5737 -121.4945 sacramento (walerga) \n",
"122 kazuo uneda 41.8904 -121.3721 tule lake \n",
"123 kazuo uneda 41.8936 -121.3590 tule lake \n",
"\n",
" state order dates fid abbrev Notes iso_alpha iso_no \n",
"0 california origin 1 CA USA 840 \n",
"1 california assembly 1942-10-30 1 CA USA 840 \n",
"3 california second camp 1943-09-30 1 CA USA 840 \n",
"5 california origin 2 CA USA 840 \n",
"6 california assembly 1942-09-04 2 CA USA 840 \n",
".. ... ... ... .. ... ... ... ... \n",
"118 california second camp 1943-09-30 24 CA USA 840 \n",
"120 california origin 25 CA USA 840 \n",
"121 california assembly 1942-06-21 25 CA USA 840 \n",
"122 california first camp 25 CA USA 840 \n",
"123 california second camp 25 CA USA 840 \n",
"\n",
"[74 rows x 12 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The below command displays values that contain california\n",
"california = fullstackeddf[fullstackeddf['state'].str.contains('california')]\n",
"california"
]
},
{
"cell_type": "markdown",
"id": "configured-warehouse",
"metadata": {},
"source": [
"### Creation of Cluster Data"
]
},
{
"cell_type": "markdown",
"id": "tamil-quantity",
"metadata": {},
"source": [
"So far we've only looked at and constructed tables that plot points on a map. An alternative approach for viewing our data is to create tables that show the relative size or cluster of given variables. \n",
"\n",
"Clustering data to view relative sizes is important for performing surface level analysis and can give us a better understainding of where large groups were concentrated at each time point. \n",
"\n",
"To view the number of individuals at each location for all movements, we can apply the value_counts method introduced in part 1 to return counts of unique values. As seen below, the list does include a couple of states in the cities column such as California, Pennsylvania, North Dakota, New Mexico, and Hawaii. This was strategically done for a few of the 25 individuals due to missing data in the FAR as a result it was unclear of their final departure city. Additionally, if these cells in the spreadsheet were left blank then they would be counted as a unique value when the value_counts function is performed, and in this specific case we do not want that value included. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "authorized-annotation",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tule lake 30\n",
"jerome 9\n",
"california 8\n",
"sand island 7\n",
"new mexico 7\n",
"santa anita 6\n",
"oahu 6\n",
"topaz 5\n",
"los angeles 4\n",
"heart mountain 3\n",
"pennsylvania 3\n",
"sacramento (walerga) 3\n",
"sacramento 3\n",
"north dakota 2\n",
"salinas 2\n",
"fresno 2\n",
"tanforan 2\n",
"manzanar 2\n",
"hawaii 2\n",
"gardena 2\n",
"san francisco 2\n",
"tokyo 2\n",
"none 1\n",
"terminal island 1\n",
"menlo park 1\n",
"artesia 1\n",
"rohwer 1\n",
"auburn 1\n",
"pomona 1\n",
"tulare 1\n",
"poston 1\n",
"gila river 1\n",
"san diego 1\n",
"garden grove 1\n",
"waikele 1\n",
"Name: city, dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The below command returns count values of the cities\n",
"fullstackeddf['city'].value_counts()"
]
},
{
"cell_type": "markdown",
"id": "magnetic-electronics",
"metadata": {},
"source": [
"Once the unique value count is retrieved then the values need to be appended (i.e., added) to the table. This can be achieved by using Pythons groupby function. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "healthy-encyclopedia",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" lat | \n",
" long | \n",
" city | \n",
" state | \n",
" order | \n",
" dates | \n",
" fid | \n",
" abbrev | \n",
" Notes | \n",
" iso_alpha | \n",
" iso_no | \n",
" counts | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" george kuratomi | \n",
" 32.7157 | \n",
" -117.1611 | \n",
" san diego | \n",
" california | \n",
" origin | \n",
" | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
" 1 | \n",
"
\n",
" \n",
" | 1 | \n",
" george kuratomi | \n",
" 34.1333 | \n",
" -118.0333 | \n",
" santa anita | \n",
" california | \n",
" assembly | \n",
" 1942-10-30 | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
" 6 | \n",
"
\n",
" \n",
" | 2 | \n",
" george kuratomi | \n",
" 33.3833 | \n",
" -91.4667 | \n",
" jerome | \n",
" arkansas | \n",
" first camp | \n",
" 1943-09-26 | \n",
" 1 | \n",
" AR | \n",
" | \n",
" USA | \n",
" 840 | \n",
" 9 | \n",
"
\n",
" \n",
" | 3 | \n",
" george kuratomi | \n",
" 41.8931 | \n",
" -121.3735 | \n",
" tule lake | \n",
" california | \n",
" second camp | \n",
" 1943-09-30 | \n",
" 1 | \n",
" CA | \n",
" | \n",
" USA | \n",
" 840 | \n",
" 30 | \n",
"
\n",
" \n",
" | 4 | \n",
" george kuratomi | \n",
" 40.7300 | \n",
" -77.9380 | \n",
" pennsylvania | \n",
" pennsylvania | \n",
" final departure | \n",
" 1946-01-10 | \n",
" 1 | \n",
" PA | \n",
" terminal departure with grant | \n",
" USA | \n",
" 840 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name lat long city state \\\n",
"0 george kuratomi 32.7157 -117.1611 san diego california \n",
"1 george kuratomi 34.1333 -118.0333 santa anita california \n",
"2 george kuratomi 33.3833 -91.4667 jerome arkansas \n",
"3 george kuratomi 41.8931 -121.3735 tule lake california \n",
"4 george kuratomi 40.7300 -77.9380 pennsylvania pennsylvania \n",
"\n",
" order dates fid abbrev Notes \\\n",
"0 origin 1 CA \n",
"1 assembly 1942-10-30 1 CA \n",
"2 first camp 1943-09-26 1 AR \n",
"3 second camp 1943-09-30 1 CA \n",
"4 final departure 1946-01-10 1 PA terminal departure with grant \n",
"\n",
" iso_alpha iso_no counts \n",
"0 USA 840 1 \n",
"1 USA 840 6 \n",
"2 USA 840 9 \n",
"3 USA 840 30 \n",
"4 USA 840 3 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The below command groups the city and order into a new column we titled as counts\n",
"fullstackeddf['counts'] = fullstackeddf.groupby(['city'])['order'].transform('count')\n",
"fullstackeddf.head()"
]
},
{
"cell_type": "markdown",
"id": "broken-australia",
"metadata": {},
"source": [
"### Creation of Paths"
]
},
{
"cell_type": "markdown",
"id": "rental-skating",
"metadata": {},
"source": [
"By using the paths dataset we can spatially view and analyze the movement of a person or group in a unique way. Mapping paths lets us connect points plotted on the map and visually see the routes and the distances between locations. We can use the paths data to identify if and where indvidual paths cross allowing us to glimpse where individuals might have met or at what point families were separated from one another. \n",
"\n",
"As seen below, the contains function can be used to extract path data from one or more persons. The pandas operator \"|\" aka \"OR\" tells the contains function to also search and pull specific value from separate columns."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "complex-breakfast",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" startlat | \n",
" startlong | \n",
" endlat | \n",
" endlong | \n",
" name | \n",
" loc1 | \n",
" loc2 | \n",
" uid | \n",
" dates | \n",
" year | \n",
" iso_alpha | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 32.7157 | \n",
" -117.1611 | \n",
" 34.1333 | \n",
" -118.0333 | \n",
" george kuratomi | \n",
" san diego | \n",
" santa anita | \n",
" 1 | \n",
" 1942-10-30 | \n",
" 1945 | \n",
" US-CA | \n",
"
\n",
" \n",
" | 1 | \n",
" 34.1333 | \n",
" -118.0333 | \n",
" 33.3833 | \n",
" -91.4667 | \n",
" george kuratomi | \n",
" santa anita | \n",
" jerome | \n",
" 1 | \n",
" 1943-09-26 | \n",
" 1943 | \n",
" US-CA | \n",
"
\n",
" \n",
" | 2 | \n",
" 33.3833 | \n",
" -91.4667 | \n",
" 41.8931 | \n",
" -121.3735 | \n",
" george kuratomi | \n",
" jerome | \n",
" tule lake | \n",
" 1 | \n",
" 1943-09-30 | \n",
" 1943 | \n",
" US-AR | \n",
"
\n",
" \n",
" | 3 | \n",
" 41.8931 | \n",
" -121.3735 | \n",
" 40.7300 | \n",
" -77.9380 | \n",
" george kuratomi | \n",
" tule lake | \n",
" pennsylvania | \n",
" 1 | \n",
" 1946-01-10 | \n",
" 1946 | \n",
" US-CA | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" startlat startlong endlat endlong name loc1 \\\n",
"0 32.7157 -117.1611 34.1333 -118.0333 george kuratomi san diego \n",
"1 34.1333 -118.0333 33.3833 -91.4667 george kuratomi santa anita \n",
"2 33.3833 -91.4667 41.8931 -121.3735 george kuratomi jerome \n",
"3 41.8931 -121.3735 40.7300 -77.9380 george kuratomi tule lake \n",
"\n",
" loc2 uid dates year iso_alpha \n",
"0 santa anita 1 1942-10-30 1945 US-CA \n",
"1 jerome 1 1943-09-26 1943 US-CA \n",
"2 tule lake 1 1943-09-30 1943 US-AR \n",
"3 pennsylvania 1 1946-01-10 1946 US-CA "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The below command will return path results for Kuratomi\n",
"kuratomipaths = pathsdf[pathsdf['name'].str.contains('kuratomi')]\n",
"kuratomipaths"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "bronze-skating",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" startlat | \n",
" startlong | \n",
" endlat | \n",
" endlong | \n",
" name | \n",
" loc1 | \n",
" loc2 | \n",
" uid | \n",
" dates | \n",
" year | \n",
" iso_alpha | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 32.7157 | \n",
" -117.1611 | \n",
" 34.1333 | \n",
" -118.0333 | \n",
" george kuratomi | \n",
" san diego | \n",
" santa anita | \n",
" 1 | \n",
" 1942-10-30 | \n",
" 1945 | \n",
" US-CA | \n",
"
\n",
" \n",
" | 1 | \n",
" 34.1333 | \n",
" -118.0333 | \n",
" 33.3833 | \n",
" -91.4667 | \n",
" george kuratomi | \n",
" santa anita | \n",
" jerome | \n",
" 1 | \n",
" 1943-09-26 | \n",
" 1943 | \n",
" US-CA | \n",
"
\n",
" \n",
" | 2 | \n",
" 33.3833 | \n",
" -91.4667 | \n",
" 41.8931 | \n",
" -121.3735 | \n",
" george kuratomi | \n",
" jerome | \n",
" tule lake | \n",
" 1 | \n",
" 1943-09-30 | \n",
" 1943 | \n",
" US-AR | \n",
"
\n",
" \n",
" | 3 | \n",
" 41.8931 | \n",
" -121.3735 | \n",
" 40.7300 | \n",
" -77.9380 | \n",
" george kuratomi | \n",
" tule lake | \n",
" pennsylvania | \n",
" 1 | \n",
" 1946-01-10 | \n",
" 1946 | \n",
" US-CA | \n",
"
\n",
" \n",
" | 24 | \n",
" 34.0430 | \n",
" -118.2190 | \n",
" 34.1396 | \n",
" -118.0430 | \n",
" singer terada | \n",
" los angeles | \n",
" santa anita | \n",
" 7 | \n",
" 1942-10-30 | \n",
" 1942 | \n",
" US-CA | \n",
"
\n",
" \n",
" | 25 | \n",
" 34.1396 | \n",
" -118.0430 | \n",
" 33.6284 | \n",
" -91.3957 | \n",
" singer terada | \n",
" santa anita | \n",
" jerome | \n",
" 7 | \n",
" 1943-09-15 | \n",
" 1943 | \n",
" US-CA | \n",
"
\n",
" \n",
" | 26 | \n",
" 33.6284 | \n",
" -91.3957 | \n",
" 41.8866 | \n",
" -121.3575 | \n",
" singer terada | \n",
" jerome | \n",
" tule lake | \n",
" 7 | \n",
" 1943-09-19 | \n",
" 1943 | \n",
" US-CA | \n",
"
\n",
" \n",
" | 27 | \n",
" 41.8866 | \n",
" -121.3575 | \n",
" 40.6230 | \n",
" -77.8520 | \n",
" singer terada | \n",
" tule lake | \n",
" pennsylvania | \n",
" 7 | \n",
" 1946-01-10 | \n",
" 1946 | \n",
" US-AZ | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" startlat startlong endlat endlong name loc1 \\\n",
"0 32.7157 -117.1611 34.1333 -118.0333 george kuratomi san diego \n",
"1 34.1333 -118.0333 33.3833 -91.4667 george kuratomi santa anita \n",
"2 33.3833 -91.4667 41.8931 -121.3735 george kuratomi jerome \n",
"3 41.8931 -121.3735 40.7300 -77.9380 george kuratomi tule lake \n",
"24 34.0430 -118.2190 34.1396 -118.0430 singer terada los angeles \n",
"25 34.1396 -118.0430 33.6284 -91.3957 singer terada santa anita \n",
"26 33.6284 -91.3957 41.8866 -121.3575 singer terada jerome \n",
"27 41.8866 -121.3575 40.6230 -77.8520 singer terada tule lake \n",
"\n",
" loc2 uid dates year iso_alpha \n",
"0 santa anita 1 1942-10-30 1945 US-CA \n",
"1 jerome 1 1943-09-26 1943 US-CA \n",
"2 tule lake 1 1943-09-30 1943 US-AR \n",
"3 pennsylvania 1 1946-01-10 1946 US-CA \n",
"24 santa anita 7 1942-10-30 1942 US-CA \n",
"25 jerome 7 1943-09-15 1943 US-CA \n",
"26 tule lake 7 1943-09-19 1943 US-CA \n",
"27 pennsylvania 7 1946-01-10 1946 US-AZ "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The below contains function will allow for searching through the data for two separate variables\n",
"kuratomiandterada = pathsdf[pathsdf['name'].str.contains('kuratomi')| pathsdf['name'].str.contains('terada')]\n",
"kuratomiandterada"
]
},
{
"cell_type": "markdown",
"id": "continuous-greece",
"metadata": {},
"source": [
"In this second module, I have shown how to use Pythons contains function to search and pull values from separate columns within datasets using the \"|\" aka OR operator. We used the value_counts function to return the number of individuals located at each city in our dataset which will let us view the concentration of groups. Additionally, we filtered out the paths dataset to view results for George Kuratomi as well as paths for Singer Terada which will be saved and used in part 3. \n",
"\n",
"In the following module, we will look at how to use the datasets we created outside of the notebook as well as the data that we processed and prepared in this module to create spatial and graph visualizations. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "corrected-modem",
"metadata": {},
"outputs": [],
"source": [
"# The below command let's us save the modified dataframes into a new output csv file. \n",
"# This can be useful when using these files for further steps of processing.\n",
"kuratomiandterada.to_csv('kuratomiandterada.csv', index=False)"
]
},
{
"cell_type": "markdown",
"id": "federal-health",
"metadata": {},
"source": [
"## Notebooks\n",
"\n",
"The below module is organized into a sequential set of Python Notebooks that allows us to interact with the collections related to the Framework for Unlocking and Linking WWII Japanese American Incarceration Biographical Data to explore, clean, prepare, visualize and analyze it from historical context perspective.\n",
"\n",
"1. A Framework for Unlocking and Linking WWII Japanese American Incarceration Biographical Data - Data Visualization \n",
"2. A Framework for Unlocking and Linking WWII Japanese American Incarceration Biographical Data - Context Based Data Manipulation and Analysis - Part 1"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}