{
"cells": [
{
"cell_type": "markdown",
"id": "68b92712-4df4-4762-8eeb-1b7a9ec6ca56",
"metadata": {},
"source": [
"# Working with messy data with Python and pandas"
]
},
{
"cell_type": "markdown",
"id": "952c0972-0434-44e6-99c3-2594c5158ba1",
"metadata": {},
"source": [
"The goal of this workshop is to provide you with some tools that will help you work with real-world data.\n",
"\n",
"There are many **excellent** textbooks and online tutorials that will teach you how to analyze and visualize data.\n",
"\n",
"However, these tutorials rely on data that are pre-formatted in a very specific way (i.e., example datasets are ready to analyze).\n",
"\n",
"There is good reason for this: statistics and plotting libraries contain functions that are designed to have data organized with a precise structure.\n",
"- A great resource that describes how data *should* be structured can be found here: https://vita.had.co.nz/papers/tidy-data.pdf\n",
"\n",
"If data are not arranged appropriately:\n",
"- bad: it will be difficult to get your analysis working properly\n",
"- worse: you might not be able to get the analysis to work at all\n",
"- worst: the analysis will work, but not as you expected!!!\n",
"\n",
"## Problem\n",
"\n",
"Real-world data are not usually pre-formatted for analyses.\n",
"\n",
"Few textbooks or tutorials focus on the process of cleaning up data---that's where this workshop comes in!\n",
"\n",
"Here, you will learn:\n",
"- About the Python library called `pandas`\n",
"- How to use Python/pandas to\n",
" - Read csv files\n",
" - Filter/select data\n",
" - Create a clean dataset\n",
"\n",
"We will work with a small set of example files stored in the `raw_data` folder.\n",
"- These files can be downloaded using this link: https://minhaskamal.github.io/DownGit/#/home?url=https://github.com/scds/dash-webinars/tree/main/assets/data\n",
"\n",
"These files represent an experiment scenario where participants completed a questionnaire and a separate reaction time task.\n",
"\n",
"Each participant has their data stored in two separate csv files.\n",
"\n",
"We need to combine all of these files into one clean data set for analysis."
]
},
{
"cell_type": "markdown",
"id": "15b7ac87-876a-474d-a829-aabbfc03f1c3",
"metadata": {},
"source": [
"## Overview\n",
"\n",
"How will we approach this problem?\n",
"\n",
"1. Select all the survey and trial raw data files\n",
"2. Read each file\n",
"3. Select the appropriate data in each file and store in a DataFrame\n",
"4. Combine all individual data into a single clean data file\n",
"5. Write the clean file\n"
]
},
{
"cell_type": "markdown",
"id": "82cd08e7-85d2-41f7-ba98-a6259e5eb238",
"metadata": {},
"source": [
"## Organizing Files\n",
"The `os` module provides functions that allow you to interact with some basic parts of the operating system.\n",
"A very useful function is the `listdir()` function, which can read the contents of a folder on your computer!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "96c6ba92-565e-485e-8711-e1bb91b5ae5a",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [],
"source": [
"import os"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "49def607-ec20-4baf-9400-b993f183bbc2",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"['id_007_trial.csv',\n",
" 'id_005_survey.csv',\n",
" 'id_008_survey.csv',\n",
" '.DS_Store',\n",
" 'id_001_trial.csv',\n",
" 'id_006_trial.csv',\n",
" 'id_007_survey.csv',\n",
" 'id_002_survey.csv',\n",
" 'id_010_survey.csv',\n",
" 'id_005_trial.csv',\n",
" 'id_008_trial.csv',\n",
" 'id_001_survey.csv',\n",
" 'id_003_trial.csv',\n",
" 'id_004_survey.csv',\n",
" 'id_009_survey.csv',\n",
" '.ipynb_checkpoints',\n",
" 'id_004_trial.csv',\n",
" 'id_010_trial.csv',\n",
" 'id_003_survey.csv',\n",
" 'id_009_trial.csv',\n",
" 'id_002_trial.csv',\n",
" 'id_006_survey.csv']"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dir_raw = \"raw_data/\"\n",
"files_raw = os.listdir(dir_raw)\n",
"files_raw"
]
},
{
"cell_type": "markdown",
"id": "8b89b803-822d-46cb-80ff-1d12a319e11e",
"metadata": {},
"source": [
"There is a problem if .ipynb files are included. We need to exclude non-csv files.\n",
"A list comprehension can quickly check each filename for a string pattern."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "160c331f-5b46-41a7-8c66-ffc643acf5b9",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"['id_007_trial.csv',\n",
" 'id_005_survey.csv',\n",
" 'id_008_survey.csv',\n",
" 'id_001_trial.csv',\n",
" 'id_006_trial.csv',\n",
" 'id_007_survey.csv',\n",
" 'id_002_survey.csv',\n",
" 'id_010_survey.csv',\n",
" 'id_005_trial.csv',\n",
" 'id_008_trial.csv',\n",
" 'id_001_survey.csv',\n",
" 'id_003_trial.csv',\n",
" 'id_004_survey.csv',\n",
" 'id_009_survey.csv',\n",
" 'id_004_trial.csv',\n",
" 'id_010_trial.csv',\n",
" 'id_003_survey.csv',\n",
" 'id_009_trial.csv',\n",
" 'id_002_trial.csv',\n",
" 'id_006_survey.csv']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"files_raw = [file for file in files_raw if \".csv\" in file]\n",
"files_raw"
]
},
{
"cell_type": "markdown",
"id": "c37dbea8-4a59-4813-be6c-8320ac0d3d34",
"metadata": {},
"source": [
"We can do this to separate 'survey' files from 'trial' files as well.\n",
"This is important because survey and trial files have different structure and will require different selection and filtering tools."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "92ccf2a7-c59a-424a-b0f0-ad06071797ad",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['id_005_survey.csv', 'id_008_survey.csv', 'id_007_survey.csv', 'id_002_survey.csv', 'id_010_survey.csv', 'id_001_survey.csv', 'id_004_survey.csv', 'id_009_survey.csv', 'id_003_survey.csv', 'id_006_survey.csv']\n",
"['id_007_trial.csv', 'id_001_trial.csv', 'id_006_trial.csv', 'id_005_trial.csv', 'id_008_trial.csv', 'id_003_trial.csv', 'id_004_trial.csv', 'id_010_trial.csv', 'id_009_trial.csv', 'id_002_trial.csv']\n"
]
}
],
"source": [
"files_raw_survey = [f for f in files_raw if \"survey\" in f]\n",
"files_raw_trials = [f for f in files_raw if \"trial\" in f]\n",
"\n",
"print(files_raw_survey)\n",
"print(files_raw_trials)"
]
},
{
"cell_type": "markdown",
"id": "1c03f624-cea7-4908-9037-5d5e78eb672f",
"metadata": {},
"source": [
"Now we have two lists of file names. One that holds all of the survey files and one that holds all of the experiment files."
]
},
{
"cell_type": "markdown",
"id": "2b9b3ef8-a218-471b-b1f8-8e1bf4aa9a83",
"metadata": {},
"source": [
"## pandas\n",
"\n",
"The `pandas` library lets us easily work with data structures.\n",
"\n",
"The pandas documentation is a great resource!\n",
"\n",
"Check out the Intro to pandas: https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html#intro-to-pandas"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "afb39251-c315-4c99-80de-c70ab0e3df88",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "0a996d97-fe8c-44eb-a3ea-a23778eb2d2b",
"metadata": {},
"source": [
"### Creating DataFrames"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a8e01259-8594-4099-963e-3a53f8742d7d",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_A | \n",
" col_B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A | \n",
" 20 | \n",
"
\n",
" \n",
" 1 | \n",
" B | \n",
" 30 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_A col_B\n",
"0 A 20\n",
"1 B 30"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df01 = pd.DataFrame({\n",
" \"col_A\": [\"A\", \"B\"],\n",
" \"col_B\": [20, 30]\n",
"})\n",
"df01.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "df14b49e-b0db-4580-b93a-893c3affba33",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"0 A\n",
"1 B\n",
"Name: col_A, dtype: object"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df01[\"col_A\"]"
]
},
{
"cell_type": "markdown",
"id": "beb4d072-6ee3-4cbe-ae0c-82d5978d9bce",
"metadata": {},
"source": [
"### DataFrames from Files"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "8c481ede-90ed-4d48-b43f-597cf9257f8d",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Participant ID | \n",
" NaN | \n",
" 001 | \n",
" 21 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Consent | \n",
" NaN | \n",
" Complete | \n",
" 47 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Trial 1 | \n",
" blue_red_Incongruent | \n",
" 67 | \n",
" 63 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Trial 2 | \n",
" blue_blue_Congruent | \n",
" 51 | \n",
" 91 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Trial 3 | \n",
" red_red_Congruent | \n",
" 51 | \n",
" 116 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed\n",
"0 1 Participant ID NaN 001 21\n",
"1 2 Consent NaN Complete 47\n",
"2 3 Trial 1 blue_red_Incongruent 67 63\n",
"3 4 Trial 2 blue_blue_Congruent 51 91\n",
"4 5 Trial 3 red_red_Congruent 51 116"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example = pd.read_csv(\"raw_data/id_001_trial.csv\")\n",
"data_read_example.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "79218861-92f0-4ac7-9609-7a73adc54f0e",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
"
\n",
" \n",
" \n",
" \n",
" 19 | \n",
" 20 | \n",
" Trial 18 | \n",
" blue_blue_Congruent | \n",
" 26 | \n",
" 538 | \n",
"
\n",
" \n",
" 20 | \n",
" 21 | \n",
" Trial 19 | \n",
" red_red_Congruent | \n",
" 1 | \n",
" 585 | \n",
"
\n",
" \n",
" 21 | \n",
" 22 | \n",
" Trial 20 | \n",
" red_red_Congruent | \n",
" 37 | \n",
" 613 | \n",
"
\n",
" \n",
" 22 | \n",
" 23 | \n",
" Debrief | \n",
" NaN | \n",
" Complete | \n",
" 642 | \n",
"
\n",
" \n",
" 23 | \n",
" 24 | \n",
" Complete | \n",
" NaN | \n",
" Done | \n",
" 666 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed\n",
"19 20 Trial 18 blue_blue_Congruent 26 538\n",
"20 21 Trial 19 red_red_Congruent 1 585\n",
"21 22 Trial 20 red_red_Congruent 37 613\n",
"22 23 Debrief NaN Complete 642\n",
"23 24 Complete NaN Done 666"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.tail()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "1ac01988-b54c-4232-850f-e3a1dd9dbc0b",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 24 entries, 0 to 23\n",
"Data columns (total 5 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 trial_index 24 non-null int64 \n",
" 1 task 24 non-null object\n",
" 2 stimulus 20 non-null object\n",
" 3 value 24 non-null object\n",
" 4 time_elapsed 24 non-null int64 \n",
"dtypes: int64(2), object(3)\n",
"memory usage: 1.1+ KB\n"
]
}
],
"source": [
"data_read_example.info()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "a382e46f-00ab-48e1-b82e-f0a0299ed00f",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" time_elapsed | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 24.000000 | \n",
" 24.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 12.500000 | \n",
" 341.666667 | \n",
"
\n",
" \n",
" std | \n",
" 7.071068 | \n",
" 205.208837 | \n",
"
\n",
" \n",
" min | \n",
" 1.000000 | \n",
" 21.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" 6.750000 | \n",
" 147.250000 | \n",
"
\n",
" \n",
" 50% | \n",
" 12.500000 | \n",
" 355.500000 | \n",
"
\n",
" \n",
" 75% | \n",
" 18.250000 | \n",
" 501.000000 | \n",
"
\n",
" \n",
" max | \n",
" 24.000000 | \n",
" 666.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index time_elapsed\n",
"count 24.000000 24.000000\n",
"mean 12.500000 341.666667\n",
"std 7.071068 205.208837\n",
"min 1.000000 21.000000\n",
"25% 6.750000 147.250000\n",
"50% 12.500000 355.500000\n",
"75% 18.250000 501.000000\n",
"max 24.000000 666.000000"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.describe()"
]
},
{
"cell_type": "markdown",
"id": "d229c9de-9ce0-4b46-94dd-978deb94fdb0",
"metadata": {},
"source": [
"### Subsetting DataFrames\n",
"\n",
"#### Columns"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "37c7fcea-eecf-499b-9a8d-01d31718fce8",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"0 21\n",
"1 47\n",
"2 63\n",
"3 91\n",
"4 116\n",
"5 130\n",
"6 153\n",
"7 200\n",
"8 229\n",
"9 275\n",
"10 321\n",
"11 345\n",
"12 366\n",
"13 408\n",
"14 430\n",
"15 461\n",
"16 488\n",
"17 496\n",
"18 516\n",
"19 538\n",
"20 585\n",
"21 613\n",
"22 642\n",
"23 666\n",
"Name: time_elapsed, dtype: int64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[\"time_elapsed\"]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "dc17cacc-ffc0-4a66-9df4-b5e243ca750f",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time_elapsed | \n",
" task | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 21 | \n",
" Participant ID | \n",
"
\n",
" \n",
" 1 | \n",
" 47 | \n",
" Consent | \n",
"
\n",
" \n",
" 2 | \n",
" 63 | \n",
" Trial 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 91 | \n",
" Trial 2 | \n",
"
\n",
" \n",
" 4 | \n",
" 116 | \n",
" Trial 3 | \n",
"
\n",
" \n",
" 5 | \n",
" 130 | \n",
" Trial 4 | \n",
"
\n",
" \n",
" 6 | \n",
" 153 | \n",
" Trial 5 | \n",
"
\n",
" \n",
" 7 | \n",
" 200 | \n",
" Trial 6 | \n",
"
\n",
" \n",
" 8 | \n",
" 229 | \n",
" Trial 7 | \n",
"
\n",
" \n",
" 9 | \n",
" 275 | \n",
" Trial 8 | \n",
"
\n",
" \n",
" 10 | \n",
" 321 | \n",
" Trial 9 | \n",
"
\n",
" \n",
" 11 | \n",
" 345 | \n",
" Trial 10 | \n",
"
\n",
" \n",
" 12 | \n",
" 366 | \n",
" Trial 11 | \n",
"
\n",
" \n",
" 13 | \n",
" 408 | \n",
" Trial 12 | \n",
"
\n",
" \n",
" 14 | \n",
" 430 | \n",
" Trial 13 | \n",
"
\n",
" \n",
" 15 | \n",
" 461 | \n",
" Trial 14 | \n",
"
\n",
" \n",
" 16 | \n",
" 488 | \n",
" Trial 15 | \n",
"
\n",
" \n",
" 17 | \n",
" 496 | \n",
" Trial 16 | \n",
"
\n",
" \n",
" 18 | \n",
" 516 | \n",
" Trial 17 | \n",
"
\n",
" \n",
" 19 | \n",
" 538 | \n",
" Trial 18 | \n",
"
\n",
" \n",
" 20 | \n",
" 585 | \n",
" Trial 19 | \n",
"
\n",
" \n",
" 21 | \n",
" 613 | \n",
" Trial 20 | \n",
"
\n",
" \n",
" 22 | \n",
" 642 | \n",
" Debrief | \n",
"
\n",
" \n",
" 23 | \n",
" 666 | \n",
" Complete | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time_elapsed task\n",
"0 21 Participant ID\n",
"1 47 Consent\n",
"2 63 Trial 1\n",
"3 91 Trial 2\n",
"4 116 Trial 3\n",
"5 130 Trial 4\n",
"6 153 Trial 5\n",
"7 200 Trial 6\n",
"8 229 Trial 7\n",
"9 275 Trial 8\n",
"10 321 Trial 9\n",
"11 345 Trial 10\n",
"12 366 Trial 11\n",
"13 408 Trial 12\n",
"14 430 Trial 13\n",
"15 461 Trial 14\n",
"16 488 Trial 15\n",
"17 496 Trial 16\n",
"18 516 Trial 17\n",
"19 538 Trial 18\n",
"20 585 Trial 19\n",
"21 613 Trial 20\n",
"22 642 Debrief\n",
"23 666 Complete"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[[\"time_elapsed\", \"task\"]]"
]
},
{
"cell_type": "markdown",
"id": "c48a0f3d-1bbd-4d74-868a-c79b38147b0b",
"metadata": {},
"source": [
"#### Rows"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "bb43eace-9c27-40a4-a39a-263378ae8bdb",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
"
\n",
" \n",
" \n",
" \n",
" 13 | \n",
" 14 | \n",
" Trial 12 | \n",
" red_red_Congruent | \n",
" 15 | \n",
" 408 | \n",
"
\n",
" \n",
" 14 | \n",
" 15 | \n",
" Trial 13 | \n",
" red_red_Congruent | \n",
" 22 | \n",
" 430 | \n",
"
\n",
" \n",
" 15 | \n",
" 16 | \n",
" Trial 14 | \n",
" blue_red_Incongruent | \n",
" 55 | \n",
" 461 | \n",
"
\n",
" \n",
" 16 | \n",
" 17 | \n",
" Trial 15 | \n",
" blue_red_Incongruent | \n",
" 66 | \n",
" 488 | \n",
"
\n",
" \n",
" 17 | \n",
" 18 | \n",
" Trial 16 | \n",
" red_red_Congruent | \n",
" 48 | \n",
" 496 | \n",
"
\n",
" \n",
" 18 | \n",
" 19 | \n",
" Trial 17 | \n",
" red_blue_Incongruent | \n",
" 75 | \n",
" 516 | \n",
"
\n",
" \n",
" 19 | \n",
" 20 | \n",
" Trial 18 | \n",
" blue_blue_Congruent | \n",
" 26 | \n",
" 538 | \n",
"
\n",
" \n",
" 20 | \n",
" 21 | \n",
" Trial 19 | \n",
" red_red_Congruent | \n",
" 1 | \n",
" 585 | \n",
"
\n",
" \n",
" 21 | \n",
" 22 | \n",
" Trial 20 | \n",
" red_red_Congruent | \n",
" 37 | \n",
" 613 | \n",
"
\n",
" \n",
" 22 | \n",
" 23 | \n",
" Debrief | \n",
" NaN | \n",
" Complete | \n",
" 642 | \n",
"
\n",
" \n",
" 23 | \n",
" 24 | \n",
" Complete | \n",
" NaN | \n",
" Done | \n",
" 666 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed\n",
"13 14 Trial 12 red_red_Congruent 15 408\n",
"14 15 Trial 13 red_red_Congruent 22 430\n",
"15 16 Trial 14 blue_red_Incongruent 55 461\n",
"16 17 Trial 15 blue_red_Incongruent 66 488\n",
"17 18 Trial 16 red_red_Congruent 48 496\n",
"18 19 Trial 17 red_blue_Incongruent 75 516\n",
"19 20 Trial 18 blue_blue_Congruent 26 538\n",
"20 21 Trial 19 red_red_Congruent 1 585\n",
"21 22 Trial 20 red_red_Congruent 37 613\n",
"22 23 Debrief NaN Complete 642\n",
"23 24 Complete NaN Done 666"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[data_read_example[\"time_elapsed\"] > 400]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "2499275d-cb5b-4f9c-8f7f-8de239b89b17",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 False\n",
"10 False\n",
"11 False\n",
"12 False\n",
"13 True\n",
"14 True\n",
"15 True\n",
"16 True\n",
"17 True\n",
"18 True\n",
"19 True\n",
"20 True\n",
"21 True\n",
"22 True\n",
"23 True\n",
"Name: time_elapsed, dtype: bool"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[\"time_elapsed\"] > 400"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "7a43bbed-24b6-4c82-beba-e95c55e90d6c",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 False\n",
"10 False\n",
"11 False\n",
"12 False\n",
"13 False\n",
"14 True\n",
"15 False\n",
"16 False\n",
"17 False\n",
"18 True\n",
"19 False\n",
"20 False\n",
"21 False\n",
"22 False\n",
"23 False\n",
"Name: task, dtype: bool"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[\"task\"].isin([\"Trial 13\", \"Trial 17\"])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "c09ca6b8-5897-4671-91b4-16cb52da2a9e",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
"
\n",
" \n",
" \n",
" \n",
" 14 | \n",
" 15 | \n",
" Trial 13 | \n",
" red_red_Congruent | \n",
" 22 | \n",
" 430 | \n",
"
\n",
" \n",
" 18 | \n",
" 19 | \n",
" Trial 17 | \n",
" red_blue_Incongruent | \n",
" 75 | \n",
" 516 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed\n",
"14 15 Trial 13 red_red_Congruent 22 430\n",
"18 19 Trial 17 red_blue_Incongruent 75 516"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[data_read_example[\"task\"].isin([\"Trial 13\", \"Trial 17\"])]"
]
},
{
"cell_type": "markdown",
"id": "ae2349ad-6edd-4503-8a91-44c01f359bc7",
"metadata": {},
"source": [
"#### Rows and Columns"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "0750cb15-8cdf-4401-966b-b4ed6db8ff78",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" stimulus | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 14 | \n",
" red_red_Congruent | \n",
" 22 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" stimulus value\n",
"14 red_red_Congruent 22"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.loc[data_read_example[\"task\"].isin([\"Trial 13\"]), [\"stimulus\", \"value\"]]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "a5c04042-474b-482b-a706-c70bbd806ccb",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" stimulus | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 3 | \n",
" blue_blue_Congruent | \n",
" 51 | \n",
"
\n",
" \n",
" 4 | \n",
" red_red_Congruent | \n",
" 51 | \n",
"
\n",
" \n",
" 5 | \n",
" blue_blue_Congruent | \n",
" 16 | \n",
"
\n",
" \n",
" 6 | \n",
" blue_red_Incongruent | \n",
" 81 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" stimulus value\n",
"3 blue_blue_Congruent 51\n",
"4 red_red_Congruent 51\n",
"5 blue_blue_Congruent 16\n",
"6 blue_red_Incongruent 81"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.iloc[3:7,2:4]"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "39dfa042-0abc-4641-a7f8-c7b991e1966d",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Participant ID | \n",
" NaN | \n",
" 001 | \n",
" 21 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Consent | \n",
" NaN | \n",
" Complete | \n",
" 222 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Trial 1 | \n",
" blue_red_Incongruent | \n",
" 67 | \n",
" 222 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Trial 2 | \n",
" blue_blue_Congruent | \n",
" 51 | \n",
" 91 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Trial 3 | \n",
" red_red_Congruent | \n",
" 51 | \n",
" 116 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed\n",
"0 1 Participant ID NaN 001 21\n",
"1 2 Consent NaN Complete 222\n",
"2 3 Trial 1 blue_red_Incongruent 67 222\n",
"3 4 Trial 2 blue_blue_Congruent 51 91\n",
"4 5 Trial 3 red_red_Congruent 51 116"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.iloc[1:3, 4] = 222\n",
"data_read_example.head()"
]
},
{
"cell_type": "markdown",
"id": "f721fec7-1ecf-4c7a-aea1-3b11a8745cb2",
"metadata": {},
"source": [
"### Creating New Columns"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "88adb964-fed6-4515-b4a4-ba6232229718",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
" time_seconds | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Participant ID | \n",
" NaN | \n",
" 001 | \n",
" 21 | \n",
" 0.021 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Consent | \n",
" NaN | \n",
" Complete | \n",
" 222 | \n",
" 0.222 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Trial 1 | \n",
" blue_red_Incongruent | \n",
" 67 | \n",
" 222 | \n",
" 0.222 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Trial 2 | \n",
" blue_blue_Congruent | \n",
" 51 | \n",
" 91 | \n",
" 0.091 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Trial 3 | \n",
" red_red_Congruent | \n",
" 51 | \n",
" 116 | \n",
" 0.116 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed \\\n",
"0 1 Participant ID NaN 001 21 \n",
"1 2 Consent NaN Complete 222 \n",
"2 3 Trial 1 blue_red_Incongruent 67 222 \n",
"3 4 Trial 2 blue_blue_Congruent 51 91 \n",
"4 5 Trial 3 red_red_Congruent 51 116 \n",
"\n",
" time_seconds \n",
"0 0.021 \n",
"1 0.222 \n",
"2 0.222 \n",
"3 0.091 \n",
"4 0.116 "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[\"time_seconds\"] = data_read_example[\"time_elapsed\"] / 1000\n",
"data_read_example.head()"
]
},
{
"cell_type": "markdown",
"id": "d6ef3065-8f4d-46e7-9648-dfcd0cbc9d1d",
"metadata": {},
"source": [
"#### Renaming Columns"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "42960600-9ef7-46ba-bf4a-ace024f9f2f4",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
" time_s | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Participant ID | \n",
" NaN | \n",
" 001 | \n",
" 21 | \n",
" 0.021 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Consent | \n",
" NaN | \n",
" Complete | \n",
" 222 | \n",
" 0.222 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Trial 1 | \n",
" blue_red_Incongruent | \n",
" 67 | \n",
" 222 | \n",
" 0.222 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Trial 2 | \n",
" blue_blue_Congruent | \n",
" 51 | \n",
" 91 | \n",
" 0.091 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Trial 3 | \n",
" red_red_Congruent | \n",
" 51 | \n",
" 116 | \n",
" 0.116 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed \\\n",
"0 1 Participant ID NaN 001 21 \n",
"1 2 Consent NaN Complete 222 \n",
"2 3 Trial 1 blue_red_Incongruent 67 222 \n",
"3 4 Trial 2 blue_blue_Congruent 51 91 \n",
"4 5 Trial 3 red_red_Congruent 51 116 \n",
"\n",
" time_s \n",
"0 0.021 \n",
"1 0.222 \n",
"2 0.222 \n",
"3 0.091 \n",
"4 0.116 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.rename(columns={\"time_seconds\": \"time_s\"}, inplace=True)\n",
"data_read_example.head()"
]
},
{
"cell_type": "markdown",
"id": "031e90c1-528d-4d46-86b5-be189da55264",
"metadata": {},
"source": [
"### Summarizing"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "f1ac2866-34c3-46c3-9b78-bc14556e639c",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"0.3555833333333333"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[\"time_s\"].mean()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "14c5479a-84f4-489c-bd19-f9005a7f922d",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time_s | \n",
"
\n",
" \n",
" \n",
" \n",
" mean | \n",
" 0.355583 | \n",
"
\n",
" \n",
" std | \n",
" 0.189743 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time_s\n",
"mean 0.355583\n",
"std 0.189743"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.agg(\n",
" {\n",
" \"time_s\": [\"mean\", \"std\"]\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "dcd6ac3e-5e34-4a81-91bb-83b8aa54214b",
"metadata": {},
"source": [
"#### Grouping"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "3742041f-f799-4b40-a0f1-b5d5e5119bdf",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time_elapsed | \n",
" time_s | \n",
"
\n",
" \n",
" stimulus | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" blue_blue_Congruent | \n",
" 253.000000 | \n",
" 0.253000 | \n",
"
\n",
" \n",
" blue_red_Incongruent | \n",
" 331.000000 | \n",
" 0.331000 | \n",
"
\n",
" \n",
" red_blue_Incongruent | \n",
" 337.166667 | \n",
" 0.337167 | \n",
"
\n",
" \n",
" red_red_Congruent | \n",
" 411.000000 | \n",
" 0.411000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time_elapsed time_s\n",
"stimulus \n",
"blue_blue_Congruent 253.000000 0.253000\n",
"blue_red_Incongruent 331.000000 0.331000\n",
"red_blue_Incongruent 337.166667 0.337167\n",
"red_red_Congruent 411.000000 0.411000"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.groupby(\"stimulus\").mean(numeric_only=True)[[\"time_elapsed\", \"time_s\"]]"
]
},
{
"cell_type": "markdown",
"id": "192963fb-36f8-47e1-971d-649b87da3e96",
"metadata": {},
"source": [
"#### Counting"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "95926985-8970-4eb9-93c7-ed8bebf04521",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"51 2\n",
"75 2\n",
"48 2\n",
"Complete 2\n",
"001 1\n",
"15 1\n",
"37 1\n",
"1 1\n",
"26 1\n",
"66 1\n",
"55 1\n",
"22 1\n",
"31 1\n",
"58 1\n",
"34 1\n",
"61 1\n",
"81 1\n",
"16 1\n",
"67 1\n",
"Done 1\n",
"Name: value, dtype: int64"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[\"value\"].value_counts()"
]
},
{
"cell_type": "markdown",
"id": "4f061ad1-6516-4f1a-b3eb-64f6101e6404",
"metadata": {},
"source": [
"### Sorting"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "b349fe3c-492d-478c-8282-96571f9a3de4",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
" time_s | \n",
"
\n",
" \n",
" \n",
" \n",
" 23 | \n",
" 24 | \n",
" Complete | \n",
" NaN | \n",
" Done | \n",
" 666 | \n",
" 0.666 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Consent | \n",
" NaN | \n",
" Complete | \n",
" 222 | \n",
" 0.222 | \n",
"
\n",
" \n",
" 22 | \n",
" 23 | \n",
" Debrief | \n",
" NaN | \n",
" Complete | \n",
" 642 | \n",
" 0.642 | \n",
"
\n",
" \n",
" 0 | \n",
" 1 | \n",
" Participant ID | \n",
" NaN | \n",
" 001 | \n",
" 21 | \n",
" 0.021 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Trial 1 | \n",
" blue_red_Incongruent | \n",
" 67 | \n",
" 222 | \n",
" 0.222 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed \\\n",
"23 24 Complete NaN Done 666 \n",
"1 2 Consent NaN Complete 222 \n",
"22 23 Debrief NaN Complete 642 \n",
"0 1 Participant ID NaN 001 21 \n",
"2 3 Trial 1 blue_red_Incongruent 67 222 \n",
"\n",
" time_s \n",
"23 0.666 \n",
"1 0.222 \n",
"22 0.642 \n",
"0 0.021 \n",
"2 0.222 "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.sort_values(by=\"task\").head()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "1c91333d-9fba-4ebc-a963-e79f5dd6ef24",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
" time_s | \n",
"
\n",
" \n",
" \n",
" \n",
" 21 | \n",
" 22 | \n",
" Trial 20 | \n",
" red_red_Congruent | \n",
" 37 | \n",
" 613 | \n",
" 0.613 | \n",
"
\n",
" \n",
" 20 | \n",
" 21 | \n",
" Trial 19 | \n",
" red_red_Congruent | \n",
" 1 | \n",
" 585 | \n",
" 0.585 | \n",
"
\n",
" \n",
" 17 | \n",
" 18 | \n",
" Trial 16 | \n",
" red_red_Congruent | \n",
" 48 | \n",
" 496 | \n",
" 0.496 | \n",
"
\n",
" \n",
" 14 | \n",
" 15 | \n",
" Trial 13 | \n",
" red_red_Congruent | \n",
" 22 | \n",
" 430 | \n",
" 0.430 | \n",
"
\n",
" \n",
" 13 | \n",
" 14 | \n",
" Trial 12 | \n",
" red_red_Congruent | \n",
" 15 | \n",
" 408 | \n",
" 0.408 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed time_s\n",
"21 22 Trial 20 red_red_Congruent 37 613 0.613\n",
"20 21 Trial 19 red_red_Congruent 1 585 0.585\n",
"17 18 Trial 16 red_red_Congruent 48 496 0.496\n",
"14 15 Trial 13 red_red_Congruent 22 430 0.430\n",
"13 14 Trial 12 red_red_Congruent 15 408 0.408"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.sort_values(by=[\"stimulus\", \"time_elapsed\"], ascending=False).head()"
]
},
{
"cell_type": "markdown",
"id": "d1611f84-4018-4497-a6bb-a833cd5a0859",
"metadata": {},
"source": [
"### Formatting: Long to Wide"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "1a0839d7-ae90-41cd-a38b-7127f8afa8e5",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Type | \n",
" Value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A | \n",
" 20 | \n",
"
\n",
" \n",
" 1 | \n",
" B | \n",
" 30 | \n",
"
\n",
" \n",
" 2 | \n",
" C | \n",
" 40 | \n",
"
\n",
" \n",
" 3 | \n",
" A | \n",
" 50 | \n",
"
\n",
" \n",
" 4 | \n",
" B | \n",
" 30 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Type Value\n",
"0 A 20\n",
"1 B 30\n",
"2 C 40\n",
"3 A 50\n",
"4 B 30"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df02 = pd.DataFrame({\n",
" \"Type\": [\"A\", \"B\", \"C\", \"A\", \"B\"],\n",
" \"Value\": [20, 30, 40, 50, 30]\n",
"})\n",
"df02"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "a49b64bd-84d8-4cf0-80fa-0e69e12a7efd",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Type | \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 20.0 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" NaN | \n",
" 30.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" NaN | \n",
" NaN | \n",
" 40.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 50.0 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" NaN | \n",
" 30.0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Type A B C\n",
"0 20.0 NaN NaN\n",
"1 NaN 30.0 NaN\n",
"2 NaN NaN 40.0\n",
"3 50.0 NaN NaN\n",
"4 NaN 30.0 NaN"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df02.pivot(columns=\"Type\", values=\"Value\")"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "9665983a-b1a9-4ced-8dd2-5d623a43ac1d",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" Type | \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" \n",
" \n",
" Value | \n",
" 35 | \n",
" 30 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Type A B C\n",
"Value 35 30 40"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df02.pivot_table(columns=\"Type\", values=\"Value\", aggfunc=np.mean)"
]
},
{
"cell_type": "markdown",
"id": "5868d644-f52d-4fbb-b0d4-dda6b727fce2",
"metadata": {},
"source": [
"### Formatting: Wide to Long"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "1c410c4a-8ce3-4b55-b7f7-8922c0c3f3e5",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Participant ID | \n",
" Task 1 | \n",
" Task 2 | \n",
" Task 3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 001 | \n",
" 20 | \n",
" 22 | \n",
" 18 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Participant ID Task 1 Task 2 Task 3\n",
"0 001 20 22 18"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df03 = pd.DataFrame({\n",
" \"Participant ID\": [\"001\"],\n",
" \"Task 1\": [20],\n",
" \"Task 2\": [22],\n",
" \"Task 3\": [18],\n",
"})\n",
"df03"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "95a92d0c-f894-45bf-81e4-9d774191d4e9",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Participant ID | \n",
" task | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 001 | \n",
" Task 1 | \n",
" 20 | \n",
"
\n",
" \n",
" 1 | \n",
" 001 | \n",
" Task 2 | \n",
" 22 | \n",
"
\n",
" \n",
" 2 | \n",
" 001 | \n",
" Task 3 | \n",
" 18 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Participant ID task value\n",
"0 001 Task 1 20\n",
"1 001 Task 2 22\n",
"2 001 Task 3 18"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df03.melt(id_vars=\"Participant ID\", var_name=\"task\", value_name=\"value\")"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "7b41dd32-e019-485f-aaed-646287927016",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Participant ID | \n",
" Task 3 | \n",
" task | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 001 | \n",
" 18 | \n",
" Task 1 | \n",
" 20 | \n",
"
\n",
" \n",
" 1 | \n",
" 001 | \n",
" 18 | \n",
" Task 2 | \n",
" 22 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Participant ID Task 3 task value\n",
"0 001 18 Task 1 20\n",
"1 001 18 Task 2 22"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df03.melt(\n",
" id_vars=[\"Participant ID\", \"Task 3\"], \n",
" value_vars=[\"Task 1\", \"Task 2\"],\n",
" var_name=\"task\", \n",
" value_name=\"value\")"
]
},
{
"cell_type": "markdown",
"id": "73828442-8818-4159-b545-45ee08454541",
"metadata": {
"tags": []
},
"source": [
"### Combining Data"
]
},
{
"cell_type": "markdown",
"id": "429b6be8-9031-4d32-a828-2644c2d47266",
"metadata": {},
"source": [
"#### Concatenation"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "36e3bba1-fe72-4b22-874e-f9bc1b13f06d",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_A | \n",
" col_B | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A | \n",
" 22 | \n",
"
\n",
" \n",
" 1 | \n",
" B | \n",
" 33 | \n",
"
\n",
" \n",
" 2 | \n",
" C | \n",
" 30 | \n",
"
\n",
" \n",
" 3 | \n",
" D | \n",
" 10 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_A col_B\n",
"0 A 22\n",
"1 B 33\n",
"2 C 30\n",
"3 D 10"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df01 = pd.DataFrame({\n",
" \"col_A\": [\"A\", \"B\"],\n",
" \"col_B\": [22, 33],\n",
"})\n",
"\n",
"df02 = pd.DataFrame({\n",
" \"col_A\": [\"C\", \"D\"],\n",
" \"col_B\": [30, 10],\n",
"})\n",
"\n",
"pd.concat([df01, df02], ignore_index=True)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "bb7f99e7-e97a-4e20-a5d3-5af0288a3abb",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A | \n",
" 22 | \n",
" C | \n",
" 30 | \n",
"
\n",
" \n",
" 1 | \n",
" B | \n",
" 33 | \n",
" D | \n",
" 10 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3\n",
"0 A 22 C 30\n",
"1 B 33 D 10"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat([df01, df02], ignore_index=True, axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "75238f80-13ab-4263-94dd-777168bb4796",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_A | \n",
" col_B | \n",
" col_C | \n",
" col_D | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A | \n",
" 22.0 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" B | \n",
" 33.0 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" NaN | \n",
" NaN | \n",
" C | \n",
" 30.0 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" NaN | \n",
" D | \n",
" 10.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_A col_B col_C col_D\n",
"0 A 22.0 NaN NaN\n",
"1 B 33.0 NaN NaN\n",
"2 NaN NaN C 30.0\n",
"3 NaN NaN D 10.0"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df01 = pd.DataFrame({\n",
" \"col_A\": [\"A\", \"B\"],\n",
" \"col_B\": [22, 33],\n",
"})\n",
"\n",
"df02 = pd.DataFrame({\n",
" \"col_C\": [\"C\", \"D\"],\n",
" \"col_D\": [30, 10],\n",
"})\n",
"\n",
"pd.concat([df01, df02], ignore_index=True)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "e524efba-627a-4d36-9809-49d10289cbce",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_A | \n",
" col_B | \n",
" col_C | \n",
" col_D | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A | \n",
" 22.0 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" B | \n",
" 33.0 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" NaN | \n",
" NaN | \n",
" C | \n",
" 30.0 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" NaN | \n",
" D | \n",
" 10.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_A col_B col_C col_D\n",
"0 A 22.0 NaN NaN\n",
"1 B 33.0 NaN NaN\n",
"2 NaN NaN C 30.0\n",
"3 NaN NaN D 10.0"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat([df01, df02], ignore_index=True, keys=[\"col_A\", \"col_C\"])"
]
},
{
"cell_type": "markdown",
"id": "ac3dc7ac-8989-4c69-b5d4-fd1216ceccb7",
"metadata": {},
"source": [
"#### Merging"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "ded5917c-4eaa-4d34-a5f9-481cca0ed87d",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_A | \n",
" col_B | \n",
" col_C | \n",
" col_D | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A | \n",
" 22 | \n",
" A | \n",
" 30 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_A col_B col_C col_D\n",
"0 A 22 A 30"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df01 = pd.DataFrame({\n",
" \"col_A\": [\"A\", \"B\"],\n",
" \"col_B\": [22, 33],\n",
"})\n",
"\n",
"df02 = pd.DataFrame({\n",
" \"col_C\": [\"A\", \"C\"],\n",
" \"col_D\": [30, 10],\n",
"})\n",
"\n",
"pd.merge(df01, df02, left_on=\"col_A\", right_on=\"col_C\") # how=outer/inner - left/right/cross"
]
},
{
"cell_type": "markdown",
"id": "973e99d2-fa64-4938-b39d-e3f30c1ab21b",
"metadata": {},
"source": [
"### Manipulating Text"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "15984090-5725-4af3-af66-3cfa6806460c",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
" time_s | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" participant id | \n",
" NaN | \n",
" 001 | \n",
" 21 | \n",
" 0.021 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" consent | \n",
" NaN | \n",
" Complete | \n",
" 222 | \n",
" 0.222 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" trial 1 | \n",
" blue_red_Incongruent | \n",
" 67 | \n",
" 222 | \n",
" 0.222 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" trial 2 | \n",
" blue_blue_Congruent | \n",
" 51 | \n",
" 91 | \n",
" 0.091 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" trial 3 | \n",
" red_red_Congruent | \n",
" 51 | \n",
" 116 | \n",
" 0.116 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed \\\n",
"0 1 participant id NaN 001 21 \n",
"1 2 consent NaN Complete 222 \n",
"2 3 trial 1 blue_red_Incongruent 67 222 \n",
"3 4 trial 2 blue_blue_Congruent 51 91 \n",
"4 5 trial 3 red_red_Congruent 51 116 \n",
"\n",
" time_s \n",
"0 0.021 \n",
"1 0.222 \n",
"2 0.222 \n",
"3 0.091 \n",
"4 0.116 "
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[\"task\"] = data_read_example[\"task\"].str.lower()\n",
"data_read_example.head()"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "5175ace1-634e-483f-b4dd-463c85e3ed5a",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
" time_s | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" 3 | \n",
" trial 1 | \n",
" blue_red_Incongruent | \n",
" 67 | \n",
" 222 | \n",
" 0.222 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" trial 2 | \n",
" blue_blue_Congruent | \n",
" 51 | \n",
" 91 | \n",
" 0.091 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" trial 3 | \n",
" red_red_Congruent | \n",
" 51 | \n",
" 116 | \n",
" 0.116 | \n",
"
\n",
" \n",
" 5 | \n",
" 6 | \n",
" trial 4 | \n",
" blue_blue_Congruent | \n",
" 16 | \n",
" 130 | \n",
" 0.130 | \n",
"
\n",
" \n",
" 6 | \n",
" 7 | \n",
" trial 5 | \n",
" blue_red_Incongruent | \n",
" 81 | \n",
" 153 | \n",
" 0.153 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed time_s\n",
"2 3 trial 1 blue_red_Incongruent 67 222 0.222\n",
"3 4 trial 2 blue_blue_Congruent 51 91 0.091\n",
"4 5 trial 3 red_red_Congruent 51 116 0.116\n",
"5 6 trial 4 blue_blue_Congruent 16 130 0.130\n",
"6 7 trial 5 blue_red_Incongruent 81 153 0.153"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[data_read_example[\"task\"].str.contains(\"trial\")].head()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "142af218-72ec-4f43-8af6-3ddf20361ed0",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"0 NaN\n",
"1 NaN\n",
"2 [blue, red, Incongruent]\n",
"3 [blue, blue, Congruent]\n",
"4 [red, red, Congruent]\n",
"5 [blue, blue, Congruent]\n",
"6 [blue, red, Incongruent]\n",
"7 [red, blue, Incongruent]\n",
"8 [red, red, Congruent]\n",
"9 [red, blue, Incongruent]\n",
"10 [red, blue, Incongruent]\n",
"11 [red, blue, Incongruent]\n",
"12 [red, blue, Incongruent]\n",
"13 [red, red, Congruent]\n",
"14 [red, red, Congruent]\n",
"15 [blue, red, Incongruent]\n",
"16 [blue, red, Incongruent]\n",
"17 [red, red, Congruent]\n",
"18 [red, blue, Incongruent]\n",
"19 [blue, blue, Congruent]\n",
"20 [red, red, Congruent]\n",
"21 [red, red, Congruent]\n",
"22 NaN\n",
"23 NaN\n",
"Name: stimulus, dtype: object"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example[\"stimulus\"].str.split(\"_\")"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "33a3110c-a532-4580-826b-cd70e7f7d109",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Word | \n",
" Colour | \n",
" Condition | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
"
\n",
" \n",
" 3 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
"
\n",
" \n",
" 4 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
"
\n",
" \n",
" 5 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
"
\n",
" \n",
" 6 | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
"
\n",
" \n",
" 7 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
"
\n",
" \n",
" 8 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
"
\n",
" \n",
" 9 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
"
\n",
" \n",
" 10 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
"
\n",
" \n",
" 11 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
"
\n",
" \n",
" 12 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
"
\n",
" \n",
" 13 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
"
\n",
" \n",
" 14 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
"
\n",
" \n",
" 15 | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
"
\n",
" \n",
" 16 | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
"
\n",
" \n",
" 17 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
"
\n",
" \n",
" 18 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
"
\n",
" \n",
" 19 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
"
\n",
" \n",
" 20 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
"
\n",
" \n",
" 21 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Word Colour Condition\n",
"2 blue red Incongruent\n",
"3 blue blue Congruent\n",
"4 red red Congruent\n",
"5 blue blue Congruent\n",
"6 blue red Incongruent\n",
"7 red blue Incongruent\n",
"8 red red Congruent\n",
"9 red blue Incongruent\n",
"10 red blue Incongruent\n",
"11 red blue Incongruent\n",
"12 red blue Incongruent\n",
"13 red red Congruent\n",
"14 red red Congruent\n",
"15 blue red Incongruent\n",
"16 blue red Incongruent\n",
"17 red red Congruent\n",
"18 red blue Incongruent\n",
"19 blue blue Congruent\n",
"20 red red Congruent\n",
"21 red red Congruent"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_read_example.loc[data_read_example[\"task\"].str.contains(\n",
" \"trial\"), \"stimulus\"].str.split(\"_\", expand=True).rename(\n",
" columns={0: \"Word\", 1: \"Colour\", 2: \"Condition\"})"
]
},
{
"cell_type": "markdown",
"id": "cd89d8fd-3fcd-4783-80ec-fcfbd6fdb74a",
"metadata": {
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"source": [
"## Putting it all Together\n",
"\n",
"We can use pandas along with the list of files to create a clean set of data"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "e8a0e1ab-1171-4fab-8374-3046902953d0",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['id_005_survey.csv', 'id_008_survey.csv', 'id_007_survey.csv', 'id_002_survey.csv', 'id_010_survey.csv', 'id_001_survey.csv', 'id_004_survey.csv', 'id_009_survey.csv', 'id_003_survey.csv', 'id_006_survey.csv']\n",
"['id_007_trial.csv', 'id_001_trial.csv', 'id_006_trial.csv', 'id_005_trial.csv', 'id_008_trial.csv', 'id_003_trial.csv', 'id_004_trial.csv', 'id_010_trial.csv', 'id_009_trial.csv', 'id_002_trial.csv']\n"
]
}
],
"source": [
"print(files_raw_survey)\n",
"print(files_raw_trials)"
]
},
{
"cell_type": "markdown",
"id": "6f02ae94-a4ab-4f74-8752-f583685ee48c",
"metadata": {
"jupyter": {
"outputs_hidden": true
}
},
"source": [
"### Create Survey Data\n",
"\n",
"Start by creating an empty dataframe to store all of the survey data.\n",
"\n",
"Then, read each file in `files_raw_survey` into a temporary dataframe and concatenate to the empty survey data."
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "198f5fec-2b7d-4393-a761-c55e6ab3d6b4",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Participant ID | \n",
" Age | \n",
" Gender | \n",
" Income | \n",
" Education | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 5 | \n",
" 38 | \n",
" Male | \n",
" Medium | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 8 | \n",
" 23 | \n",
" Female | \n",
" High | \n",
" High school | \n",
"
\n",
" \n",
" 0 | \n",
" 7 | \n",
" 25 | \n",
" Male | \n",
" High | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 2 | \n",
" 33 | \n",
" Male | \n",
" High | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 10 | \n",
" 23 | \n",
" Female | \n",
" Medium | \n",
" High school | \n",
"
\n",
" \n",
" 0 | \n",
" 1 | \n",
" 20 | \n",
" Female | \n",
" Medium | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 4 | \n",
" 21 | \n",
" Male | \n",
" Low | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 9 | \n",
" 26 | \n",
" Male | \n",
" Low | \n",
" College | \n",
"
\n",
" \n",
" 0 | \n",
" 3 | \n",
" 31 | \n",
" Female | \n",
" Low | \n",
" College | \n",
"
\n",
" \n",
" 0 | \n",
" 6 | \n",
" 24 | \n",
" Female | \n",
" Medium | \n",
" University | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Participant ID Age Gender Income Education\n",
"0 5 38 Male Medium University\n",
"0 8 23 Female High High school\n",
"0 7 25 Male High University\n",
"0 2 33 Male High University\n",
"0 10 23 Female Medium High school\n",
"0 1 20 Female Medium University\n",
"0 4 21 Male Low University\n",
"0 9 26 Male Low College\n",
"0 3 31 Female Low College\n",
"0 6 24 Female Medium University"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_survey = pd.DataFrame()\n",
"\n",
"for file in files_raw_survey:\n",
" tmp_df = pd.read_csv(dir_raw + file)\n",
" data_survey = pd.concat([data_survey, tmp_df])\n",
" \n",
"data_survey"
]
},
{
"cell_type": "markdown",
"id": "bfd74b02-5b94-4071-aec4-ecf84ed81caa",
"metadata": {
"tags": []
},
"source": [
"### Create Trials data\n",
"\n",
"We can read the trial data in a similar way, but we will need to select and filter data as we go.\n",
"\n",
"Lets just start with one file:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "ea8cf893-5312-4c15-8a07-22bb52c3d791",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" trial_index | \n",
" task | \n",
" stimulus | \n",
" value | \n",
" time_elapsed | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Participant ID | \n",
" NaN | \n",
" 007 | \n",
" 27 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Consent | \n",
" NaN | \n",
" Complete | \n",
" 56 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Trial 1 | \n",
" red_red_Congruent | \n",
" 32 | \n",
" 80 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Trial 2 | \n",
" red_blue_Incongruent | \n",
" 76 | \n",
" 105 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Trial 3 | \n",
" red_red_Congruent | \n",
" 43 | \n",
" 116 | \n",
"
\n",
" \n",
" 5 | \n",
" 6 | \n",
" Trial 4 | \n",
" red_blue_Incongruent | \n",
" 85 | \n",
" 130 | \n",
"
\n",
" \n",
" 6 | \n",
" 7 | \n",
" Trial 5 | \n",
" red_blue_Incongruent | \n",
" 49 | \n",
" 166 | \n",
"
\n",
" \n",
" 7 | \n",
" 8 | \n",
" Trial 6 | \n",
" blue_blue_Congruent | \n",
" 2 | \n",
" 171 | \n",
"
\n",
" \n",
" 8 | \n",
" 9 | \n",
" Trial 7 | \n",
" blue_blue_Congruent | \n",
" 55 | \n",
" 201 | \n",
"
\n",
" \n",
" 9 | \n",
" 10 | \n",
" Trial 8 | \n",
" blue_blue_Congruent | \n",
" 34 | \n",
" 241 | \n",
"
\n",
" \n",
" 10 | \n",
" 11 | \n",
" Trial 9 | \n",
" blue_red_Incongruent | \n",
" 35 | \n",
" 246 | \n",
"
\n",
" \n",
" 11 | \n",
" 12 | \n",
" Trial 10 | \n",
" red_blue_Incongruent | \n",
" 71 | \n",
" 288 | \n",
"
\n",
" \n",
" 12 | \n",
" 13 | \n",
" Trial 11 | \n",
" blue_red_Incongruent | \n",
" 60 | \n",
" 294 | \n",
"
\n",
" \n",
" 13 | \n",
" 14 | \n",
" Trial 12 | \n",
" red_red_Congruent | \n",
" 49 | \n",
" 341 | \n",
"
\n",
" \n",
" 14 | \n",
" 15 | \n",
" Trial 13 | \n",
" blue_blue_Congruent | \n",
" 20 | \n",
" 360 | \n",
"
\n",
" \n",
" 15 | \n",
" 16 | \n",
" Trial 14 | \n",
" blue_blue_Congruent | \n",
" 8 | \n",
" 365 | \n",
"
\n",
" \n",
" 16 | \n",
" 17 | \n",
" Trial 15 | \n",
" red_blue_Incongruent | \n",
" 72 | \n",
" 381 | \n",
"
\n",
" \n",
" 17 | \n",
" 18 | \n",
" Trial 16 | \n",
" blue_blue_Congruent | \n",
" 35 | \n",
" 387 | \n",
"
\n",
" \n",
" 18 | \n",
" 19 | \n",
" Trial 17 | \n",
" blue_blue_Congruent | \n",
" 56 | \n",
" 436 | \n",
"
\n",
" \n",
" 19 | \n",
" 20 | \n",
" Trial 18 | \n",
" blue_blue_Congruent | \n",
" 53 | \n",
" 449 | \n",
"
\n",
" \n",
" 20 | \n",
" 21 | \n",
" Trial 19 | \n",
" red_red_Congruent | \n",
" 58 | \n",
" 465 | \n",
"
\n",
" \n",
" 21 | \n",
" 22 | \n",
" Trial 20 | \n",
" blue_red_Incongruent | \n",
" 35 | \n",
" 514 | \n",
"
\n",
" \n",
" 22 | \n",
" 23 | \n",
" Debrief | \n",
" NaN | \n",
" Complete | \n",
" 549 | \n",
"
\n",
" \n",
" 23 | \n",
" 24 | \n",
" Complete | \n",
" NaN | \n",
" Done | \n",
" 562 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" trial_index task stimulus value time_elapsed\n",
"0 1 Participant ID NaN 007 27\n",
"1 2 Consent NaN Complete 56\n",
"2 3 Trial 1 red_red_Congruent 32 80\n",
"3 4 Trial 2 red_blue_Incongruent 76 105\n",
"4 5 Trial 3 red_red_Congruent 43 116\n",
"5 6 Trial 4 red_blue_Incongruent 85 130\n",
"6 7 Trial 5 red_blue_Incongruent 49 166\n",
"7 8 Trial 6 blue_blue_Congruent 2 171\n",
"8 9 Trial 7 blue_blue_Congruent 55 201\n",
"9 10 Trial 8 blue_blue_Congruent 34 241\n",
"10 11 Trial 9 blue_red_Incongruent 35 246\n",
"11 12 Trial 10 red_blue_Incongruent 71 288\n",
"12 13 Trial 11 blue_red_Incongruent 60 294\n",
"13 14 Trial 12 red_red_Congruent 49 341\n",
"14 15 Trial 13 blue_blue_Congruent 20 360\n",
"15 16 Trial 14 blue_blue_Congruent 8 365\n",
"16 17 Trial 15 red_blue_Incongruent 72 381\n",
"17 18 Trial 16 blue_blue_Congruent 35 387\n",
"18 19 Trial 17 blue_blue_Congruent 56 436\n",
"19 20 Trial 18 blue_blue_Congruent 53 449\n",
"20 21 Trial 19 red_red_Congruent 58 465\n",
"21 22 Trial 20 blue_red_Incongruent 35 514\n",
"22 23 Debrief NaN Complete 549\n",
"23 24 Complete NaN Done 562"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_trials = pd.DataFrame()\n",
"\n",
"for file in files_raw_trials[0:1]:\n",
" tmp_df = pd.read_csv(dir_raw + file)\n",
"\n",
"tmp_df"
]
},
{
"cell_type": "markdown",
"id": "319f9e17-746e-4050-9c2a-e5e15e0fda70",
"metadata": {},
"source": [
"'Participant ID', 'Consent', 'Debrief', and 'Complete' should be in their own columns\n",
"\n",
"'stimulus' should be broken into its component factors."
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "4aa99378-3f86-42ab-98a5-2e1aa7427c45",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Word | \n",
" Colour | \n",
" Condition | \n",
" Participant ID | \n",
" Consent | \n",
" Debrief | \n",
" Complete | \n",
" Reaction Time | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 32 | \n",
"
\n",
" \n",
" 3 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 76 | \n",
"
\n",
" \n",
" 4 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 43 | \n",
"
\n",
" \n",
" 5 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 85 | \n",
"
\n",
" \n",
" 6 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 49 | \n",
"
\n",
" \n",
" 7 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 2 | \n",
"
\n",
" \n",
" 8 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 55 | \n",
"
\n",
" \n",
" 9 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 34 | \n",
"
\n",
" \n",
" 10 | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 35 | \n",
"
\n",
" \n",
" 11 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 71 | \n",
"
\n",
" \n",
" 12 | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 60 | \n",
"
\n",
" \n",
" 13 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 49 | \n",
"
\n",
" \n",
" 14 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 20 | \n",
"
\n",
" \n",
" 15 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 8 | \n",
"
\n",
" \n",
" 16 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 72 | \n",
"
\n",
" \n",
" 17 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 35 | \n",
"
\n",
" \n",
" 18 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 56 | \n",
"
\n",
" \n",
" 19 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 53 | \n",
"
\n",
" \n",
" 20 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 58 | \n",
"
\n",
" \n",
" 21 | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 35 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Word Colour Condition Participant ID Consent Debrief Complete \\\n",
"2 red red Congruent 007 Complete Complete 562 \n",
"3 red blue Incongruent 007 Complete Complete 562 \n",
"4 red red Congruent 007 Complete Complete 562 \n",
"5 red blue Incongruent 007 Complete Complete 562 \n",
"6 red blue Incongruent 007 Complete Complete 562 \n",
"7 blue blue Congruent 007 Complete Complete 562 \n",
"8 blue blue Congruent 007 Complete Complete 562 \n",
"9 blue blue Congruent 007 Complete Complete 562 \n",
"10 blue red Incongruent 007 Complete Complete 562 \n",
"11 red blue Incongruent 007 Complete Complete 562 \n",
"12 blue red Incongruent 007 Complete Complete 562 \n",
"13 red red Congruent 007 Complete Complete 562 \n",
"14 blue blue Congruent 007 Complete Complete 562 \n",
"15 blue blue Congruent 007 Complete Complete 562 \n",
"16 red blue Incongruent 007 Complete Complete 562 \n",
"17 blue blue Congruent 007 Complete Complete 562 \n",
"18 blue blue Congruent 007 Complete Complete 562 \n",
"19 blue blue Congruent 007 Complete Complete 562 \n",
"20 red red Congruent 007 Complete Complete 562 \n",
"21 blue red Incongruent 007 Complete Complete 562 \n",
"\n",
" Reaction Time \n",
"2 32 \n",
"3 76 \n",
"4 43 \n",
"5 85 \n",
"6 49 \n",
"7 2 \n",
"8 55 \n",
"9 34 \n",
"10 35 \n",
"11 71 \n",
"12 60 \n",
"13 49 \n",
"14 20 \n",
"15 8 \n",
"16 72 \n",
"17 35 \n",
"18 56 \n",
"19 53 \n",
"20 58 \n",
"21 35 "
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_trials = pd.DataFrame()\n",
"\n",
"for file in files_raw_trials[0:1]:\n",
" tmp_df = pd.read_csv(dir_raw + file)\n",
" tmp_trials = tmp_df.loc[tmp_df[\"task\"].str.contains(\n",
" \"Trial\"), \"stimulus\"].str.split(\"_\", expand=True).rename(\n",
" columns={0: \"Word\", 1: \"Colour\", 2: \"Condition\"})\n",
" tmp_trials[\"Participant ID\"] = tmp_df[tmp_df[\"task\"] == \"Participant ID\"][\"value\"].values[0]\n",
" tmp_trials[\"Consent\"] = tmp_df[tmp_df[\"task\"] == \"Consent\"][\"value\"].values[0]\n",
" tmp_trials[\"Debrief\"] = tmp_df[tmp_df[\"task\"] == \"Debrief\"][\"value\"].values[0]\n",
" tmp_trials[\"Complete\"] = tmp_df[tmp_df[\"task\"] == \"Complete\"][\"time_elapsed\"].values[0]\n",
" tmp_trials[\"Reaction Time\"] = tmp_df.loc[tmp_df[\"task\"].str.contains(\"Trial\"), \"value\"]\n",
"\n",
"tmp_trials"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "956d4db8-191c-40b3-b49f-0692f22ede9a",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Word | \n",
" Colour | \n",
" Condition | \n",
" Participant ID | \n",
" Consent | \n",
" Debrief | \n",
" Complete | \n",
" Reaction Time | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 32 | \n",
"
\n",
" \n",
" 3 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 76 | \n",
"
\n",
" \n",
" 4 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 43 | \n",
"
\n",
" \n",
" 5 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 85 | \n",
"
\n",
" \n",
" 6 | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" 007 | \n",
" Complete | \n",
" Complete | \n",
" 562 | \n",
" 49 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 17 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" 002 | \n",
" Complete | \n",
" Complete | \n",
" 662 | \n",
" 38 | \n",
"
\n",
" \n",
" 18 | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" 002 | \n",
" Complete | \n",
" Complete | \n",
" 662 | \n",
" 15 | \n",
"
\n",
" \n",
" 19 | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
" 002 | \n",
" Complete | \n",
" Complete | \n",
" 662 | \n",
" 38 | \n",
"
\n",
" \n",
" 20 | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
" 002 | \n",
" Complete | \n",
" Complete | \n",
" 662 | \n",
" 47 | \n",
"
\n",
" \n",
" 21 | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" 002 | \n",
" Complete | \n",
" Complete | \n",
" 662 | \n",
" 13 | \n",
"
\n",
" \n",
"
\n",
"
200 rows × 8 columns
\n",
"
"
],
"text/plain": [
" Word Colour Condition Participant ID Consent Debrief Complete \\\n",
"2 red red Congruent 007 Complete Complete 562 \n",
"3 red blue Incongruent 007 Complete Complete 562 \n",
"4 red red Congruent 007 Complete Complete 562 \n",
"5 red blue Incongruent 007 Complete Complete 562 \n",
"6 red blue Incongruent 007 Complete Complete 562 \n",
".. ... ... ... ... ... ... ... \n",
"17 red red Congruent 002 Complete Complete 662 \n",
"18 blue blue Congruent 002 Complete Complete 662 \n",
"19 blue red Incongruent 002 Complete Complete 662 \n",
"20 blue red Incongruent 002 Complete Complete 662 \n",
"21 red red Congruent 002 Complete Complete 662 \n",
"\n",
" Reaction Time \n",
"2 32 \n",
"3 76 \n",
"4 43 \n",
"5 85 \n",
"6 49 \n",
".. ... \n",
"17 38 \n",
"18 15 \n",
"19 38 \n",
"20 47 \n",
"21 13 \n",
"\n",
"[200 rows x 8 columns]"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_trials = pd.DataFrame()\n",
"\n",
"for file in files_raw_trials: # Loop through all trials now\n",
" tmp_df = pd.read_csv(dir_raw + file)\n",
" tmp_trials = tmp_df.loc[tmp_df[\"task\"].str.contains(\n",
" \"Trial\"), \"stimulus\"].str.split(\"_\", expand=True).rename(\n",
" columns={0: \"Word\", 1: \"Colour\", 2: \"Condition\"})\n",
" tmp_trials[\"Participant ID\"] = tmp_df[tmp_df[\"task\"] == \"Participant ID\"][\"value\"].values[0]\n",
" tmp_trials[\"Consent\"] = tmp_df[tmp_df[\"task\"] == \"Consent\"][\"value\"].values[0]\n",
" tmp_trials[\"Debrief\"] = tmp_df[tmp_df[\"task\"] == \"Debrief\"][\"value\"].values[0]\n",
" tmp_trials[\"Complete\"] = tmp_df[tmp_df[\"task\"] == \"Complete\"][\"time_elapsed\"].values[0]\n",
" tmp_trials[\"Reaction Time\"] = tmp_df.loc[tmp_df[\"task\"].str.contains(\"Trial\"), \"value\"]\n",
" \n",
" data_trials = pd.concat([data_trials, tmp_trials])\n",
"\n",
"data_trials"
]
},
{
"cell_type": "markdown",
"id": "61b874e0-e372-476b-93ea-d4023f721f2d",
"metadata": {},
"source": [
"### Merge Survey and Trials Data\n",
"\n",
"We want to merge on 'Participant ID', notice that in `survey_data` 'Participant ID' is an integer:"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "06420905-2262-43de-89ff-d4c902f464b4",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Int64Index: 10 entries, 0 to 0\n",
"Data columns (total 5 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Participant ID 10 non-null int64 \n",
" 1 Age 10 non-null int64 \n",
" 2 Gender 10 non-null object\n",
" 3 Income 10 non-null object\n",
" 4 Education 10 non-null object\n",
"dtypes: int64(2), object(3)\n",
"memory usage: 480.0+ bytes\n"
]
}
],
"source": [
"data_survey.info()"
]
},
{
"cell_type": "markdown",
"id": "a84277ab-e90e-405b-85de-36af07ef296b",
"metadata": {},
"source": [
"Merge will give an error since it doesn't match the data type of 'Participant ID' in `data_trials`"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "10a7acd5-fc54-48ff-966e-75d253530ab9",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Participant ID | \n",
" Age | \n",
" Gender | \n",
" Income | \n",
" Education | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 005 | \n",
" 38 | \n",
" Male | \n",
" Medium | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 008 | \n",
" 23 | \n",
" Female | \n",
" High | \n",
" High school | \n",
"
\n",
" \n",
" 0 | \n",
" 007 | \n",
" 25 | \n",
" Male | \n",
" High | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 002 | \n",
" 33 | \n",
" Male | \n",
" High | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 010 | \n",
" 23 | \n",
" Female | \n",
" Medium | \n",
" High school | \n",
"
\n",
" \n",
" 0 | \n",
" 001 | \n",
" 20 | \n",
" Female | \n",
" Medium | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 004 | \n",
" 21 | \n",
" Male | \n",
" Low | \n",
" University | \n",
"
\n",
" \n",
" 0 | \n",
" 009 | \n",
" 26 | \n",
" Male | \n",
" Low | \n",
" College | \n",
"
\n",
" \n",
" 0 | \n",
" 003 | \n",
" 31 | \n",
" Female | \n",
" Low | \n",
" College | \n",
"
\n",
" \n",
" 0 | \n",
" 006 | \n",
" 24 | \n",
" Female | \n",
" Medium | \n",
" University | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Participant ID Age Gender Income Education\n",
"0 005 38 Male Medium University\n",
"0 008 23 Female High High school\n",
"0 007 25 Male High University\n",
"0 002 33 Male High University\n",
"0 010 23 Female Medium High school\n",
"0 001 20 Female Medium University\n",
"0 004 21 Male Low University\n",
"0 009 26 Male Low College\n",
"0 003 31 Female Low College\n",
"0 006 24 Female Medium University"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_survey[\"Participant ID\"] = data_survey[\"Participant ID\"].astype(str).str.zfill(3)\n",
"data_survey"
]
},
{
"cell_type": "markdown",
"id": "24d77ccd-99e3-4a75-8594-1b80d9a725c1",
"metadata": {},
"source": [
"Now we can merge!"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "12b5943f-20b2-4181-b248-907b672db867",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Participant ID | \n",
" Age | \n",
" Gender | \n",
" Income | \n",
" Education | \n",
" Word | \n",
" Colour | \n",
" Condition | \n",
" Consent | \n",
" Debrief | \n",
" Complete | \n",
" Reaction Time | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 005 | \n",
" 38 | \n",
" Male | \n",
" Medium | \n",
" University | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" Complete | \n",
" Complete | \n",
" 614 | \n",
" 41 | \n",
"
\n",
" \n",
" 1 | \n",
" 005 | \n",
" 38 | \n",
" Male | \n",
" Medium | \n",
" University | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" Complete | \n",
" Complete | \n",
" 614 | \n",
" 19 | \n",
"
\n",
" \n",
" 2 | \n",
" 005 | \n",
" 38 | \n",
" Male | \n",
" Medium | \n",
" University | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" Complete | \n",
" Complete | \n",
" 614 | \n",
" 42 | \n",
"
\n",
" \n",
" 3 | \n",
" 005 | \n",
" 38 | \n",
" Male | \n",
" Medium | \n",
" University | \n",
" red | \n",
" red | \n",
" Congruent | \n",
" Complete | \n",
" Complete | \n",
" 614 | \n",
" 54 | \n",
"
\n",
" \n",
" 4 | \n",
" 005 | \n",
" 38 | \n",
" Male | \n",
" Medium | \n",
" University | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" Complete | \n",
" Complete | \n",
" 614 | \n",
" 31 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 195 | \n",
" 006 | \n",
" 24 | \n",
" Female | \n",
" Medium | \n",
" University | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" Complete | \n",
" Complete | \n",
" 558 | \n",
" 40 | \n",
"
\n",
" \n",
" 196 | \n",
" 006 | \n",
" 24 | \n",
" Female | \n",
" Medium | \n",
" University | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" Complete | \n",
" Complete | \n",
" 558 | \n",
" 83 | \n",
"
\n",
" \n",
" 197 | \n",
" 006 | \n",
" 24 | \n",
" Female | \n",
" Medium | \n",
" University | \n",
" blue | \n",
" red | \n",
" Incongruent | \n",
" Complete | \n",
" Complete | \n",
" 558 | \n",
" 70 | \n",
"
\n",
" \n",
" 198 | \n",
" 006 | \n",
" 24 | \n",
" Female | \n",
" Medium | \n",
" University | \n",
" blue | \n",
" blue | \n",
" Congruent | \n",
" Complete | \n",
" Complete | \n",
" 558 | \n",
" 26 | \n",
"
\n",
" \n",
" 199 | \n",
" 006 | \n",
" 24 | \n",
" Female | \n",
" Medium | \n",
" University | \n",
" red | \n",
" blue | \n",
" Incongruent | \n",
" Complete | \n",
" Complete | \n",
" 558 | \n",
" 83 | \n",
"
\n",
" \n",
"
\n",
"
200 rows × 12 columns
\n",
"
"
],
"text/plain": [
" Participant ID Age Gender Income Education Word Colour Condition \\\n",
"0 005 38 Male Medium University red red Congruent \n",
"1 005 38 Male Medium University blue blue Congruent \n",
"2 005 38 Male Medium University red red Congruent \n",
"3 005 38 Male Medium University red red Congruent \n",
"4 005 38 Male Medium University red blue Incongruent \n",
".. ... ... ... ... ... ... ... ... \n",
"195 006 24 Female Medium University blue blue Congruent \n",
"196 006 24 Female Medium University red blue Incongruent \n",
"197 006 24 Female Medium University blue red Incongruent \n",
"198 006 24 Female Medium University blue blue Congruent \n",
"199 006 24 Female Medium University red blue Incongruent \n",
"\n",
" Consent Debrief Complete Reaction Time \n",
"0 Complete Complete 614 41 \n",
"1 Complete Complete 614 19 \n",
"2 Complete Complete 614 42 \n",
"3 Complete Complete 614 54 \n",
"4 Complete Complete 614 31 \n",
".. ... ... ... ... \n",
"195 Complete Complete 558 40 \n",
"196 Complete Complete 558 83 \n",
"197 Complete Complete 558 70 \n",
"198 Complete Complete 558 26 \n",
"199 Complete Complete 558 83 \n",
"\n",
"[200 rows x 12 columns]"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_clean = pd.merge(data_survey, data_trials, on=\"Participant ID\")\n",
"data_clean"
]
},
{
"cell_type": "markdown",
"id": "05544e21-ee1f-4e0b-b05b-d4567ea30b96",
"metadata": {},
"source": [
"## Congratulations!!!\n",
"\n",
"We have now combined each of the raw data files into one clean DataFrame for analysis!\n",
"\n",
"We can either continue our anlaysis in Python, or write this DataFrame to a file and continue an analysis in another language or software."
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "fa4ac710-ef5f-415c-803c-71548c774c41",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [],
"source": [
"data_clean.to_csv(\"clean_data.csv\", index=None)"
]
},
{
"cell_type": "markdown",
"id": "81c418f3-a59d-4f65-97fc-466e3a90c9f9",
"metadata": {},
"source": [
"## Thank you for attending this workshop!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}