{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Context Based Data Manipulation and Analysis Process - Part 1\n", "\n", "According to (Weintrop et al., 2015) \"Data manipulation includes sorting, filtering, cleaning, normalizing, and joining disparate datasets. There are many strategies that can be employed when analyzing data for use in a scientific or mathematical context, including looking for patterns or anomalies, defining rules to categorize data, and identifying trends and correlations.\" Below are the steps performed for data manipulation and analysis process using Python programming language on the Certificates of Freedom dataset. The modules are split into two parts - Part 1 and Part 2.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Acquiring or Accessing the Data\n", "The data for this project was originally crawled from the Maryland State Archives **Legacy of Data** collections. The data source is included in this module as a comma-separated values file. The link below will take you to a view the data file:\n", "* [LoS_CoF.csv](Datasets/LoS_CoF.csv)\n", "\n", "The dataset has 23,655 rows of data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To process a csv file in Python, one of the first steps is to import a Python library called as 'pandas' which would help the program convert the csv file into a dataframe format or commonly called as a table format. We import the library into the program as below:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Importing libraries - pandas used for data science/data analysis and machine learning tasks and numpy - which provides support for multi-dimensional arrays\n", "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the pandas library, we created a new dataframe in the name 'df' using read_csv function as shown below: After creating the dataframe, the print() function is used to display the top 10 rows loaded in the dataframe." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | DataID | \n", "DataItem | \n", "County | \n", "Owner_FirstName | \n", "Owner_LastName | \n", "Witness | \n", "Date | \n", "Freed_FirstName | \n", "Freed_LastName | \n", "Alias | \n", "... | \n", "Folder | \n", "Document | \n", "Page | \n", "Entry | \n", "DatasetName | \n", "Notes | \n", "isWorking | \n", "isError | \n", "ChangeDate | \n", "CreateDate | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "AR7-46 | \n", "1 | \n", "AA | \n", "Ann | \n", "Ailsworth | \n", "NaN | \n", "NaN | \n", "Keziah | \n", "Cromwell | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "42686.0 | \n", "12.0 | \n", "FF | \n", "NaN | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
| 1 | \n", "AR7-46 | \n", "2 | \n", "AA | \n", "Ann | \n", "Ailsworth | \n", "Zachariah Duvall | \n", "1811-06-24 | \n", "Resiah | \n", "Cromwell | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "24.0 | \n", "3.0 | \n", "FF | \n", "NaN | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
| 2 | \n", "AR7-46 | \n", "3 | \n", "AA | \n", "Ann | \n", "Ailsworth | \n", "Jenifer Duvall | \n", "1811-06-24 | \n", "Kesiah | \n", "Cromwell | \n", "NaN | \n", "... | \n", "55.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "FF | \n", "Freed by will of Mrs. Ann Ailsworth. | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
| 3 | \n", "AR7-46 | \n", "4 | \n", "AA | \n", "William | \n", "Alexander | \n", "NaN | \n", "1815-03-28 | \n", "Handy | \n", "McCeomey | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "50.0 | \n", "2.0 | \n", "FF | \n", "Freed by manumission, dated 27 March 1815. Rai... | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
| 4 | \n", "AR7-46 | \n", "5 | \n", "AA | \n", "Thomas | \n", "Allen | \n", "NaN | \n", "1837-07-10 | \n", "Nancy | \n", "Ennis | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "257.0 | \n", "1.0 | \n", "FF | \n", "Freed by petition to Anne Arundel County Court... | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
| 5 | \n", "AR7-46 | \n", "6 | \n", "AA | \n", "Thomas | \n", "Allen | \n", "NaN | \n", "1837-08-03 | \n", "Jim | \n", "Sharpe | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "257.0 | \n", "2.0 | \n", "FF | \n", "Freed by petition to Anne Arundel County Court... | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
| 6 | \n", "AR7-46 | \n", "7 | \n", "AA | \n", "James | \n", "Alleson | \n", "NaN | \n", "1826-10-28 | \n", "Belly | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "242.0 | \n", "1.0 | \n", "FF | \n", "Freed by manumission, dated 28 Oct 1826. Raise... | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
| 7 | \n", "AR7-46 | \n", "8 | \n", "AA | \n", "Mary | \n", "Alwell | \n", "NaN | \n", "1844-11-08 | \n", "Howard | \n", "Davis | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "372.0 | \n", "1.0 | \n", "FF | \n", "son of Nelly. Freed by manumission, dated 12 A... | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
| 8 | \n", "AR7-46 | \n", "9 | \n", "AA | \n", "Mary | \n", "Armiger | \n", "NaN | \n", "1819-01-27 | \n", "Abigail | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "126.0 | \n", "2.0 | \n", "FF | \n", "along with Richard G. Stetton. Freed by manumi... | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
| 9 | \n", "AR7-46 | \n", "10 | \n", "AA | \n", "Mary | \n", "Atcock | \n", "Jacob Franklin, Jr. | \n", "1812-12-30 | \n", "Ned | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "31.0 | \n", "3.0 | \n", "FF | \n", "NaN | \n", "0 | \n", "0 | \n", "39:20.3 | \n", "39:20.3 | \n", "
10 rows × 28 columns
\n", "| \n", " | Date | \n", "DateFormatted | \n", "
|---|---|---|
| 0 | \n", "None | \n", "NaT | \n", "
| 1 | \n", "1811-06-24 | \n", "1811-06-24 | \n", "
| 2 | \n", "1811-06-24 | \n", "1811-06-24 | \n", "
| 3 | \n", "1815-03-28 | \n", "1815-03-28 | \n", "
| 4 | \n", "1837-07-10 | \n", "1837-07-10 | \n", "
| ... | \n", "... | \n", "... | \n", "
| 23650 | \n", "18430826 | \n", "1843-08-26 | \n", "
| 23651 | \n", "18430905 | \n", "1843-09-05 | \n", "
| 23652 | \n", "18430912 | \n", "1843-09-12 | \n", "
| 23653 | \n", "18430913 | \n", "1843-09-13 | \n", "
| 23654 | \n", "18430916 | \n", "1843-09-16 | \n", "
23655 rows × 2 columns
\n", "| \n", " | DataID | \n", "DataItem | \n", "County | \n", "Owner_FirstName | \n", "Owner_LastName | \n", "Witness | \n", "Date | \n", "Freed_FirstName | \n", "Freed_LastName | \n", "Alias | \n", "... | \n", "Document | \n", "Page | \n", "Entry | \n", "DatasetName | \n", "Notes | \n", "isWorking | \n", "isError | \n", "ChangeDate | \n", "CreateDate | \n", "DateFormatted | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 23307 | \n", "AR7-46 | \n", "23310 | \n", "BA | \n", "Geo | \n", "Gillingham | \n", "NaN | \n", "184006 | \n", "Jeremiah W. | \n", "Brown | \n", "Jerry | \n", "... | \n", "NaN | \n", "224.0 | \n", "5.0 | \n", "FF | \n", "Freed by manumission, dated 15 June 1824, reco... | \n", "0 | \n", "0 | \n", "37:45.8 | \n", "03:44.1 | \n", "NaT | \n", "
| 23308 | \n", "AR7-46 | \n", "23311 | \n", "BA | \n", "NaN | \n", "NaN | \n", "NaN | \n", "184006 | \n", "Rachael | \n", "Brown | \n", "NaN | \n", "... | \n", "NaN | \n", "224.0 | \n", "6.0 | \n", "FF | \n", "NaN | \n", "0 | \n", "0 | \n", "37:45.8 | \n", "05:54.2 | \n", "NaT | \n", "
2 rows × 29 columns
\n", "