{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Appended one-table version of ICD 2019/20 orchestra season analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reshapes the [full data sheet](https://docs.google.com/spreadsheets/d/1CrybKRmes4vz4SacUXnyAgNSrIClNRR3eHV0AMmZGiQ/edit) from the [2019/20 Orchestra Season analysis](https://www.composerdiversity.com/orchestra-seasons) conducted by the Institute for Composer Diversity.\n", "\n", "The original data sheet has each orchestra in a separate spreadsheet tab, which makes it harder to conduct new analysis or generate intersectional counts (e.g. pieces composed by Women of Colour). By appending all the tabs into one sheet, this analysis can be completed.\n", "\n", "Written by [Ben Coleman](https://github.com/tallcoleman)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialization" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import\n", "\n", "You will have to download the [ICD's workbook](https://docs.google.com/spreadsheets/d/1CrybKRmes4vz4SacUXnyAgNSrIClNRR3eHV0AMmZGiQ/edit) as an excel sheet (File > Download > Microsoft Excel .xlsx).\n", "\n", "It is easiest if you save this in the same directory as the notebook; then the filepath just needs to be the filename." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Define filepath and import workbook\n", "ImportFilePath = \"2021-01-02_BREAKDOWN_ 2019-2020 ORCHESTRAL PROGRAMMING.xlsx\"\n", "ICD1920Original = pd.read_excel(ImportFilePath, sheet_name=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Processing" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Creates final data frame\n", "ICD1920Combined = pd.DataFrame(None,None,['Orchestra', 'Composer', 'WorkTitle', 'WorkCount', 'Living', 'Women', 'Heritages'])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Processes and appends individual orchestra worksheets\n", "for worksheetKey in ICD1920Original.keys():\n", " if ICD1920Original[worksheetKey].columns[0:2].to_list() == ['Unnamed: 0', 'Unnamed: 1']:\n", " worksheetTemp = ICD1920Original[worksheetKey]\n", " \n", " # add orchestra column based on worksheetKey\n", " orchestraList = [worksheetKey] * len(worksheetTemp.index)\n", " worksheetTemp.insert(0,'Orchestra',orchestraList,True)\n", " worksheetTemp.columns=['Orchestra', 'Composer', 'WorkTitle', 'WorkCount', 'Living', 'Women', 'Heritages']\n", " \n", " # remove top two summary rows and occasional \"*Heritages bottom row\"\n", " worksheetTemp = worksheetTemp[worksheetTemp.WorkTitle.notnull()]\n", " \n", " # append to final dataframe\n", " ICD1920Combined = ICD1920Combined.append(worksheetTemp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export to CSV" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "ICD1920Combined.to_csv('ICD1920Combined.csv',index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.13" } }, "nbformat": 4, "nbformat_minor": 4 }