{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This program processes and plots output from GP Microbiome using functions and loops, then saves the results. It has two main forms of input file: \n", "
\n", "\n", "First, there are the observed relative abundance files, one for each participant. For the CF data, I ran the program Create_relative_abundance_files on a file containing observed relative abundance data for all participants, splitting it into individual ones. For the example data, I ran the same program on fictitious participants, with data designed to resemble the actual data. You can create those files yourself before running this program, and I encourage you to do so - especially since they also serve as part of the input for Leave_One_Out_Examples. However they are also included for your convenience in a folder called 'Relative Abundance Files' contained in the 'Data' folder under 'Extras.'\n", "
\n", "\n", "Second, for the CF data, the input files for this program were created in the program readsample27_with_151_edit from GP Microbiome output files. This version of the program uses as input randomly generated example csv files, which are not actually output from GP Microbiome, but are designed to resemble it. I have included these files in this repository so that you can run the program as it is written and see the output for yourself. The code itself is identical to the code that I used, except for file names and a few comments. I have included those comments and file names for the convenience of those also working with the CF data. I have also included in this repository a full explanation of how I generated the example data for those who are interested. \n", "
\n", "\n", "The functions produce as many as twenty plots for each participant, and when they are run in a loop they generate plots for all participants at once. The plots use colour-coded markers to indicate the participant's clinical condition at each time point, a desirable feature which Python currently has no built-in mechanism for. To create plots with shaded backgrounds indicating binary variables over time - another type of visualisation that is not directly built in to matplotlib - see Plots_Shaded_Backgrounds. \n", "
\n", "\n", "The code may easily be adapted to other types of data, in a variety of situations involving generating any number of easily comparable plots, showing predicted and actual values, and indicating factor variables at distinct points in time." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#import necessary libraries \n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section 1\n", "The first few cells create the OTUkey_named file. If the file has already been created, you can skip down to Section 2.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Read in the original key file. We are going to add a column for only the bacteria's genus name to it.\n", "key = pd.read_excel(\"Data/OTUkey.xlsx\")\n", "#rename first column to avoid Excel mistaking it for a SYLK file due to the \"ID\" in the name\n", "key.rename(columns={'ID_OTU': 'OTU'}, inplace=True)\n", "#view the head, to get an idea of the rest of the format\n", "key.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#extract the genus from the taxonomic information and create the new 'Name' column for it\n", "pat = 'D_5__(?P.*)'\n", "key=key.join(key.Bacteria.str.extract(pat, expand=True))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#replace NaN, which occurs where the genus is \"Other\", with the word \"Other\"\n", "key['Name'].fillna('Other', inplace=True)\n", "#save edited file and review changes\n", "key.to_csv(\"Data/OTUkey_named.csv\", index=False)\n", "key.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section 2\n", "Read in the OTUkey_named file below if you have previously created and saved it with Section 1. I have included a copy of it in the 'Extras' folder as well." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#read in OTUkey_named file, if it has already been created \n", "key=pd.read_csv(\"Data/OTUkey_named.csv\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#make a list of the operational taxonomic unit (OTU) IDs for our bacteria of interest\n", "bacteria=[2,30,58,59,60,63,70,80,94,104,113,167,169,170,206,221,223,227,229,234]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section 3\n", "This section creates a second version of the OTUkey_named file for selected bacteria for use in other programs, and can also be skipped once you have the file. We will use an excel version of the OTUkey_named_selection file, with plot specifcations added, in DTW_All_boxplots, where we create box plots of the TIME Dynamic Time Warping output." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#this cell can be skipped if the OTUkey_selection_named file already exists, as it is not used in this particular program.\n", "#creating a second key for selected bacteria\n", "selectkey=key.iloc[[i-1 for i in bacteria],:]\n", "#saving the select key - this is optional, as the version used in the other programs is edited with specifications added\n", "#selectkey.to_csv(\"Data/OTUkey_named_selection.csv\", index=False)\n", "selectkey.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section 4: Plots\n", "Read the files into dictionaries and plot them with functions. The dictionaries are to facilitate plotting multiple particpants' output at once in loops. If you only have one file, you can run the function on just that participant ID or create a 1-item dictionary. In Section 5, I include some alternative plotting functions which use variable parameters that you can input manually, to provide space for those who wish to edit the plots more extensively - perhaps to apply them to other types of data.\n", "\n", "
\n", "See Section 6 for the code to create legends, with options to save the legends as separate files (my preferred method for the CF output data) or to copy and paste into any of the functions in Section 4 and 5 at the indicated places." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part A: Read in files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we read in our example output files and example relative abundance files. The example output csv files are intended to resemble real output and are not actually the results of running GP Microbiome and processing the raw output with the program readsample27. I have included in this repository a full explanation of how I created them for those who are interested. \n", "\n", "
\n", "See readsample27_with_151_edit for the actual code I used to process the output from running GP Microbiome on the CF Data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#create a list for the ID numbers of the participants whose data we ran through GPMicrobiome and wish to plot\n", "IDs=['405','453','480','500','511']\n", "#Create dictionaries and read in each person's output for noise-free compositions without predictions, \n", "#and with predictions added in. \n", "dfs = {i: pd.read_csv('Data/{}.csv'.format(i)) for i in IDs}\n", "both_dfs = {i: pd.read_csv('Data/{}_both.csv'.format(i)) for i in IDs}\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#if we were running this on the CF Data, we would simply change the IDs list:\n", "#IDs=['151','708','759','764','768']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#if desired, view the first few entries for one of the files without predictions to get a feel for the data\n", "dfs['511'].head() " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#rename the columns in the files containing both sets of time points based on the first row, which contains the time points\n", "#then reorder the columns in the files to make the time points consecutive, and put them in a new dictionary\n", "reordered_dfs={}\n", "for i in IDs:\n", " df=both_dfs[i].set_axis(both_dfs[i].loc[0].tolist(), axis=1, inplace=False)\n", " df=df.reindex(columns=sorted(df.columns))\n", " #save file if desired\n", " #df.to_csv('Data/{}_both_reordered.csv'.format(i), index=False)\n", " #if you did save it, you could edit the first cell in Part A to read it directly into a dictionary \n", " #I opted not to do so because it takes so little time to reorder and I wanted to save space on my computer\n", " #I wanted the unordered version saved so that I could easily examine predicted values on their own \n", " reordered_dfs[i]=df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#view the resulting reordered data frame, confirming that the redordering code ran correctly \n", "reordered_dfs['511'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next cell reads in files created in the program Create_relative_abundance_files. If you have not run that program yet, edit the cell as directed in the comments to use the copy in the 'Extras' folder." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#read in the files containing the observed relative abundance data for each participant, adding them to a new dictionary\n", "#the columns are the age in days at the time of each sample, and we will use this information as well in the plots\n", "#the files were created and saved in the program Create_relative_abundance_files\n", "#however, they are also in the 'Extras' folder for your convenience\n", "rel_dfs = {i: pd.read_csv(\"Data/{}_Rel.csv\".format(i)) for i in IDs}\n", "#to use the ones in the 'Extras' folder, comment out the previous line and un-comment this one:\n", "#rel_dfs = {i: pd.read_csv(\"Data/Extras/Relative Abundance Files/{}_Rel.csv\".format(i)) for i in IDs}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#if desired, examine the head of one of the files to get a feel for the data\n", "rel_dfs['511'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part B: Markers, two ways\n", "In our plots, different coloured markers indicate a participant's clinical condition at each of the time points where samples were taken. In plots without predictions, every time point is marked. In plots with predictions, predicted time points are of course not marked. I chose predicted time points to be evenly spaced between actual time points, with either 1 or 2 depending on the size of the gap; then in most cases I predicted 3 time points in the future, using intervals of 180 days. This means that positioning of markers differs for each participant and for plots with and without predictions, depending on what their statuses were, where I decided to make predictions, and how many predictions I made.\n", "
\n", "\n", "There are two main ways of creating dictionaries for the markers: One is to import metadata and process it into dictionaries directly. The other is, after doing the first method once and saving the results to an Excel file (in my case, the same metadata file), to import those results into dictionaries. The first method is more flexible, since it goes directly to the metadata, but the code for the second has useful applications beyond this program. Easily generalised, it shows how to force Python to recognize lists from a saved file as lists rather than strings. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First Method: Creating markers directly\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#read in the metadata file which includes the condition and time delta for each participant's samples \n", "status=pd.read_excel(\"Data/ExampleDeltaKey.xlsx\", sheet_name=\"Metadata and time deltas\")\n", "#for the CF Data, the name of the file is 'MetaDataKey.xlsx'\n", "#otherwise, the code is identical" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#create a dictionary which, for each participant, lists the time deltas for samples taken while stable\n", "S_list={}\n", "for i in [int(x) for x in IDs]:\n", " #convert to a list, for each ID, the entries in the Time_Delta column for which the Visit_type was 'Stable'\n", " S_list[i]=list(status.query('Participant == {} and Visit_type == \"Stable\"'.format(i))['Time_Delta'])\n", "#display to confirm\n", "S_list" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#create a dictionary which, for each participant, lists the time deltas for samples taken during exacerbations\n", "E_list={}\n", "for i in [int(x) for x in IDs]:\n", " #convert to a list, for each ID, the entries in the Time_Delta column for which the Visit_type was 'Exacerbation'\n", " E_list[i]=list(status.query('Participant == {} and Visit_type == \"Exacerbation\"'.format(i))['Time_Delta'])\n", "#display to confirm\n", "E_list" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#make a dictionary depicting the order of the values for each status, to identify where to place markers\n", "#these are for use with plots without predictions\n", "#the positional values correspond to column names in the 'dfs' dictionary, which are simple index values \n", "#if they weren't numbers, we would replace int(col) with dfs[name].columns.get_loc(col)\n", "markers_gdict={}\n", "markers_rdict={}\n", "for i in IDs:\n", "#make a dictionary depicting the order of the values in the lists contained in S_list, which will be for green markers\n", " markers_gdict[i]=[int(col) for col in dfs[i].columns if dfs[i][col][0] in S_list[int(i)]]\n", "#make a dictionary depicting the order of the values in the lists contained in E_list, which will be for red markers\n", " markers_rdict[i]=[int(col) for col in dfs[i].columns if dfs[i][col][0] in E_list[int(i)]]\n", "#display to confirm\n", "markers_gdict, markers_rdict" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#make versions of the dictionaries for the plots with predictions\n", "#the positional values correspond to index values of column names in the 'reordered_dfs' dictionary \n", "#which are time deltas for both actual and predicted values\n", "#if you did not predict between samples, you can use the same markers again since these columns are the same:\n", "#markers_r1dict=markers_rdict.copy()\n", "#markers_g1dict=markers_gdict.copy()\n", "#make a dictionary depicting the order of the values in the lists contained in S_list, which will be for green markers\n", "markers_g1dict={}\n", "markers_r1dict={}\n", "for i in IDs:\n", "#make a dictionary depicting the order of the values in the lists contained in S_list, which will be for green markers\n", " markers_g1dict[i]=[reordered_dfs[i].columns.get_loc(col) for col in reordered_dfs[i].columns \n", " if reordered_dfs[i][col][0] in S_list[int(i)]]\n", "#make a dictionary depicting the order of the values in the lists contained in E_list, which will be for red markers\n", " markers_r1dict[i]=[reordered_dfs[i].columns.get_loc(col) for col in reordered_dfs[i].columns \n", " if reordered_dfs[i][col][0] in E_list[int(i)]]\n", "#display to confirm\n", "markers_g1dict, markers_r1dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that you have created the markers dictionaries, the code below will allow you to save them to a new worksheet in the Excel metadata file. If you don't save them, you can skip to Part C, but then you will have to re-run the previous cells next time you run the program." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#convert to a data frame\n", "data={'markers_r': markers_rdict, 'markers_g': markers_gdict, 'markers_r1':markers_r1dict, 'markers_g1':markers_g1dict}\n", "#make it oriented with participants as columns and the names of the dictionaries as rows\n", "df=pd.DataFrame.from_dict(data, orient='index')\n", "#view the head\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#write the data frame with the markers to a new sheet in our metadata file\n", "#import libraries to write to Excel \n", "import os\n", "from openpyxl import load_workbook\n", "file_name=\"Data/ExampleDeltaKey.xlsx\"\n", "#again, the only difference in the code for the CF data is the file name, 'MetaDataKey.xlsx'\n", "#open the file\n", "writer = pd.ExcelWriter(file_name, engine='openpyxl')\n", "if os.path.exists(file_name):\n", " book = load_workbook(file_name)\n", " writer.book = book\n", "#check if the sheet already exists, and if it does then close the file \n", "if 'markers' in book.sheetnames:\n", " writer.close()\n", "else:\n", " #create the new sheet in the existing file and save\n", " df.to_excel(writer, sheet_name='markers')\n", " writer.save()\n", " writer.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Second Method: Import markers from file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#import literal_eval for use with the markers file\n", "from ast import literal_eval" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#read in the file with the lists of markers based on whether the participant is exacerbated or stable at a given time point\n", "#markers can indicate any condition, depending on your data, but you need separate lists for different conditions\n", "markers=pd.read_excel(\"Data/ExampleDeltaKey.xlsx\", sheet_name=\"markers\")\n", "#For the CF data, we would change the file name to 'MetaDataKey.xlsx'\n", "#view the columns, which will be strings of the ID numbers if you generated them with the code above\n", "#if for some reason you entered them manually in Excel you need to map them to strings as follows:\n", "#markers.columns=markers.columns.map(str)\n", "markers.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#force the program to view the lists from the Excel file as lists rather than as strings\n", "for i in range(1,len(markers.columns)):\n", " markers.iloc[:,i]=markers.iloc[:,i].apply(lambda x: literal_eval(x))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#create a dictionaries from the columns, pairing each participant with its markers \n", "#for use with the output without predictions (in the dfs dictionary) \n", "#if you don't need all the columns, you can use this code anyway or use the alternative code below\n", "#start with green (stable) markers for the output without predictions (in the dfs dictionary)\n", "markers_gdict=markers.iloc[0,1:].to_dict()\n", "#repeat for the red (exacerbated) markers for the same output\n", "markers_rdict=markers.iloc[2,1:].to_dict()\n", "#repeat for the green (stable) markers to be used with the output including predictions (reordered_dfs dictionary)\n", "#these only differ from the marker_g lists if between-time point predictions are made\n", "markers_g1dict=markers.iloc[1,1:].to_dict()\n", "#repeat for the red (exacerbated) markers to be used with the output including predictions (reordered_dfs dictionary)\n", "#these only differ from the marker_r lists if between-time point predictions are made\n", "markers_r1dict=markers.iloc[3,1:].to_dict()\n", "#display to confirm, if desired\n", "markers_gdict, markers_rdict, markers_g1dict, markers_r1dict" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#alternatively, make the dictionary just for participants you are plotting \n", "#I timed both methods, and this one took about 0.000038 seconds longer on my computer - basically no difference\n", "markers_gdict={}\n", "markers_g1dict={}\n", "markers_rdict={}\n", "markers_r1dict={}\n", "for i in IDs:\n", " markers_gdict[i]=markers.loc[0,i]\n", " markers_g1dict[i]=markers.loc[1,i]\n", " markers_rdict[i]=markers.loc[2,i]\n", " markers_r1dict[i]=markers.loc[3,i]\n", "#display to confirm, if desired\n", "markers_gdict, markers_rdict, markers_g1dict, markers_r1dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part C: Creating the plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My plotting functions create as many as 20 plots per participant, and when run in loops they plot all participants' data at the same time. Before running such a function, always make sure that your input data is formatted consistently for each participant, to ensure that the plots show what they are intended to show.\n", "
\n", "\n", "I had a very minor difference in the formatting of time points and prediction time points for one of my participants. Although it was minor, it could have resulted in inaccurate plots. I chose to correct this discrepancy as soon as possible, in the initial processing of the raw output files. You can see how I created files for all participants in a loop, while adjusting the formatting of the one different file to match the others, in readsample27_with_151_edit, along with an alternative correction method option which could be applied here. See that program for full details.\n", "\n", "
\n", "\n", "All of my plotting functions save the plots to a folder called 'Plots,' which is in this repository as well. Adjust the file path if you want to save them somewhere else, or comment out the line of code which saves them. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#define custom colours for the plots - light and dark red and green, for noise-free and observed values respectively\n", "l_red='#FF5959'\n", "d_red='#A40000'\n", "l_green='#14AE0E'\n", "d_green='#0B5A08'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Given the distribution of our data, it didn't make sense to define the y-axis the same way for all the plots. \n", "#However, if you did wish to do so, you would add the following code to the plot specifications section of the function:\n", "#plt.ylim(min_value, max_value) \n", "#and substitute in the minimum and maximum values you wish to use" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#function for use in a loop with the dictionaries, plotting noise-free compositions without predictions\n", "def plot_loop(name):\n", " #divide the list of bacteria of interest into groups of 4 to facilitate plotting\n", " rows=[[2,30,58,59],[60,63,70,80],[94,104,113,167],[169,170,206,221],[223,227,229,234]]\n", " s=dfs[name]\n", " rel=rel_dfs[name]\n", " days=[int(x) for x in rel.columns]\n", " markers_r = markers_rdict[name]\n", " markers_g = markers_gdict[name]\n", " ID=int(name) \n", " #run a loop to plot each group of 4 in a 2 by 2 format with our custom markers, then save the file\n", " for j in range(5):\n", " fig = plt.figure(figsize=(18,14))\n", " for i in range(4):\n", " ax = fig.add_subplot(2,2,i+1)\n", " #because I made my markers slightly transparent, I need separate plots for lines and red markers\n", " #this avoids having the line become transparent\n", " #slightly transparent markers make it easier to see subtle differences between the lines \n", " #if you opt to set alpha at the default of 1 (not transparent), you can combine the first 2 red plots this way:\n", " #ax.plot(days, s.iloc[rows[j][i]],'-gD', markevery=markers_r, markerfacecolor=l_red, markersize=8, \n", " #linewidth=2,dashes=[2, 2,5,2], c='black')\n", " #there's no built-in way to customise marker colours by variables, so the green markers always need a dummy line\n", " ax.plot(days, s.iloc[rows[j][i]],'-gD', markevery=markers_r, markerfacecolor='none',markersize=8, \n", " linewidth=2,dashes=[2, 2,5,2], c='black')\n", " ax.plot(days, s.iloc[rows[j][i]],'-gD', markevery=markers_r, markerfacecolor=l_red, alpha=0.75, \n", " markersize=8,c='none')\n", " ax.plot(days, s.iloc[rows[j][i]],'-gD', markevery=markers_g, markerfacecolor=l_green,alpha=0.75,\n", " markersize=8, c='none')\n", " #again, if you prefer alpha=1 you can combine the two lines for red markers:\n", " #ax.plot(days,rel.iloc[rows[j][i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red, markersize=8, \n", " #linewidth=2, c='black')\n", " ax.plot(days,rel.iloc[rows[j][i]-1],'-gD', markevery=markers_r,markerfacecolor='none',markersize=8, \n", " linewidth=2, c='black')\n", " ax.plot(days,rel.iloc[rows[j][i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red,alpha=0.75, markersize=8, \n", " c='none')\n", " ax.plot(days,rel.iloc[rows[j][i]-1],'-gD', markevery=markers_g,markerfacecolor=d_green,alpha=0.75, markersize=8, \n", " c='none')\n", " #optional: insert code from Section 6 to add a legend for each plot - remember to make size/fit adjustments \n", " plt.title('{} Composition'.format(key['Name'][rows[j][i]-1]), size=15)\n", " plt.xlabel(\"Age (Days) of Participant {}\".format(ID), size=13)\n", " plt.ylabel(\"Relative Abundance\", size=13)\n", " plt.savefig(\"Plots/{}_{}.png\".format(ID,j), format='png')\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#run the first function in a loop\n", "for name in IDs:\n", " plot_loop(name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#function to plot with predictions,for use in a loop with the dictionaries\n", "def plot_pred_loop(name):\n", " #divide the list of OTU's of interest into groups of 4 to facilitate plotting\n", " rows=[[2,30,58,59],[60,63,70,80],[94,104,113,167],[169,170,206,221],[223,227,229,234]]\n", " r=reordered_dfs[name]\n", " rel=rel_dfs[name]\n", " days=[int(x) for x in rel.columns]\n", " markers_r = markers_rdict[name]\n", " markers_g = markers_gdict[name]\n", " markers_r1 = markers_r1dict[name]\n", " markers_g1 = markers_g1dict[name]\n", " ID=int(name) \n", " #run a loop to plot each group of 4 in a 2 by 2 format with our custom markers, then save the file\n", " for j in range(5):\n", " fig = plt.figure(figsize=(18,14))\n", " for i in range(4):\n", " ax = fig.add_subplot(2,2,i+1)\n", " #because I made my markers slightly transparent, I need separate plots for lines and red markers\n", " #this avoids having the line become transparent\n", " #slightly transparent markers make it easier to see subtle differences between the lines\n", " #if you opt to set alpha at the default of 1 (not transparent), you can combine the first 2 red plots this way:\n", " #ax.plot(r.loc[0]+days[0], r.iloc[rows[j][i]],'-gD', markevery=markers_r1, markerfacecolor=l_red,markersize=8, \n", " #linewidth=2,dashes=[2, 2,5,2], c='black') \n", " #there's no built-in way to customise marker colours by variables, so the green markers always need a dummy line \n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[j][i]],'-gD', markevery=markers_r1, markerfacecolor='none',markersize=8, \n", " linewidth=2,dashes=[2, 2,5,2], c='black')\n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[j][i]],'-gD', markevery=markers_r1, markerfacecolor=l_red, alpha=0.75, \n", " markersize=8, c='none')\n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[j][i]],'-gD', markevery=markers_g1, markerfacecolor=l_green,alpha=0.75,\n", " markersize=8, c='none')\n", " #again, if you prefer alpha=1 you can combine the two lines for red markers:\n", " #ax.plot(days,rel.iloc[rows[j][i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red, markersize=8, \n", " #linewidth=2, c='black')\n", " ax.plot(days, rel.iloc[rows[j][i]-1],'-gD', markevery=markers_r,markerfacecolor='none',markersize=8, \n", " linewidth=2, c='black')\n", " ax.plot(days, rel.iloc[rows[j][i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red, alpha=0.75, markersize=8, \n", " c='none')\n", " ax.plot(days,rel.iloc[rows[j][i]-1],'-gD', markevery=markers_g,markerfacecolor=d_green,alpha=0.75, markersize=8, \n", " c='none')\n", " #optional: insert code from Section 6 to add a legend for each plot - remember to make size/fit adjustments\n", " plt.title('{} Composition with Predictions'.format(key['Name'][rows[j][i]-1]), size=15)\n", " plt.xlabel(\"Age (Days) of Participant {}\".format(ID), size=13)\n", " plt.ylabel(\"Relative Abundance\", size=13)\n", " plt.savefig(\"Plots/{}_pred_{}.png\".format(ID,j), format='png')\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#run the function with predictions in a loop\n", "for name in IDs:\n", " plot_pred_loop(name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#function to plot with predictions for the just 3 most important bacteria in a row, using the dictionaries\n", "def plot_pred_rows(name):\n", " rows=[94,113,229]\n", " markers_r = markers_rdict[name]\n", " markers_g = markers_gdict[name]\n", " markers_r1=markers_r1dict[name]\n", " markers_g1=markers_g1dict[name]\n", " r=reordered_dfs[name]\n", " rel=rel_dfs[name]\n", " days=[int(x) for x in rel.columns]\n", " ID=int(name) \n", " fig=plt.figure(figsize=(26,7))\n", " for i in range(3):\n", " ax = fig.add_subplot(1,3,i+1)\n", " #because I made my markers slightly transparent, I need separate plots for lines and red markers\n", " #this avoids having the line become transparent\n", " #slightly transparent markers make it easier to see subtle differences between the lines\n", " #if you opt to set alpha at the default of 1 (not transparent), you can combine the first two red plots:\n", " #ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_r1, markerfacecolor=l_red,markersize=8, \n", " #linewidth=2,dashes=[2, 2,5,2], c='black') \n", " #there's no built-in way to customise marker colours by variables, so the green markers always need a dummy line \n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_r1, markerfacecolor='none',markersize=8, \n", " linewidth=2,dashes=[2, 2,5,2], c='black')\n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_r1, markerfacecolor=l_red, alpha=0.75, \n", " markersize=8, c='none')\n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_g1, markerfacecolor=l_green, alpha=0.75,\n", " markersize=8, c='none')\n", " #again, if you prefer alpha=1 you can combine the two lines for red markers:\n", " #ax.plot(days,rel.iloc[rows[i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red, markersize=8, \n", " #linewidth=2, c='black')\n", " ax.plot(days, rel.iloc[rows[i]-1],'-gD', markevery=markers_r,markerfacecolor='none',markersize=8, \n", " linewidth=2, c='black')\n", " ax.plot(days, rel.iloc[rows[i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red, alpha=0.75, markersize=8, \n", " c='none')\n", " ax.plot(days,rel.iloc[rows[i]-1],'-gD', markevery=markers_g,markerfacecolor=d_green,alpha=0.75, markersize=8, \n", " c='none')\n", " #optional: insert code from Section 6 to add a legend for each plot - remember to make size/fit adjustments \n", " plt.title('{} Composition with Predictions'.format(key['Name'][rows[i]-1]), size=24)\n", " plt.xlabel(\"Age (Days) of Participant {}\".format(ID), size=18)\n", " plt.ylabel(\"Relative Abundance\", size=18)\n", " plt.setp(ax.get_xticklabels(), size=14)\n", " plt.setp(ax.get_yticklabels(), size=14)\n", " #the tight_layout function reduces white space in the image. \n", " #If you turn off tight_layout you may need to adjust your text size etc. \n", " plt.tight_layout() \n", " plt.savefig(\"Plots/{}_pred_rows.png\".format(ID), format='png')\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#test-run the function on one participant\n", "plot_pred_rows('405')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#plotting with predictions for 2 in a row - simple edit to plot_pred_rows\n", "def plot_pred_two(name):\n", " rows=[94,229]\n", " markers_r = markers_rdict[name]\n", " markers_g = markers_gdict[name]\n", " markers_r1 = markers_r1dict[name]\n", " markers_g1 = markers_g1dict[name] \n", " r=reordered_dfs[name]\n", " rel=rel_dfs[name]\n", " days=[int(x) for x in rel.columns]\n", " ID=int(name) \n", " fig=plt.figure(figsize=(15,6))\n", " for i in range(2):\n", " ax = fig.add_subplot(1,2,i+1)\n", " #because I made my markers slightly transparent, I need separate plots for lines and red markers\n", " #this avoids having the line become transparent\n", " #slightly transparent markers make it easier to see subtle differences between the lines\n", " #if you opt to set alpha at the default of 1 (not transparent), you can combine the first two red plots:\n", " #ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_r1, markerfacecolor=l_red,markersize=8, \n", " #linewidth=2,dashes=[2, 2,5,2], c='black') \n", " #there's no built-in way to customise marker colours by variables, so the green markers always need a dummy line \n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_r1, markerfacecolor='none',markersize=8, \n", " linewidth=2,dashes=[2, 2,5,2], c='black')\n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_r1, markerfacecolor=l_red, alpha=0.75, \n", " markersize=8, c='none')\n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_g1, markerfacecolor=l_green, alpha=0.75,\n", " markersize=8, c='none') \n", " #again, if you prefer alpha=1 you can combine the two lines for red markers:\n", " #ax.plot(days,rel.iloc[rows[i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red, markersize=8, \n", " #linewidth=2, c='black')\n", " ax.plot(days, rel.iloc[rows[i]-1],'-gD', markevery=markers_r,markerfacecolor='none',markersize=8, \n", " linewidth=2, c='black')\n", " ax.plot(days, rel.iloc[rows[i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red, alpha=0.75, markersize=8, \n", " c='none')\n", " ax.plot(days,rel.iloc[rows[i]-1],'-gD', markevery=markers_g,markerfacecolor=d_green,alpha=0.75, markersize=8, \n", " c='none')\n", " #optional: insert code from Section 6 to add a legend for each plot - remember to make size/fit adjustments \n", " plt.title('{} Composition with Predictions'.format(key['Name'][rows[i]-1]), size=20)\n", " plt.xlabel(\"Age (Days) of Participant {}\".format(ID), size=16)\n", " #alternative x axis label, if the participant's ID is in the title\n", " #plt.xlabel(\"Age(Days)\", size=16)\n", " plt.ylabel(\"Relative Abundance\", size=16)\n", " plt.setp(ax.get_xticklabels(), size=12)\n", " plt.setp(ax.get_yticklabels(), size=12)\n", " #the tight_layout function reduces white space in the image. \n", " #If you turn off tight_layout you may need to adjust your text size etc. \n", " plt.tight_layout()\n", " plt.savefig(\"Plots/{}_pred_two.png\".format(ID), format='png')\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#test-run the function on one participant\n", "plot_pred_two('405')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section 5: Alternative plotting functions\n", "The next two functions will plot the output, with and without predictions, for individual participants if you don't want to use dictionaries. Normally there is no downside to using dictionaries, and for this data manually inputting variables only saves you creating the markers dictionaries. However, I wanted to include this to provide space for those who wish to edit the plots more extensively, perhaps to apply them to other types of data. \n", "
\n", "\n", "These functions are nearly identical to the others, except for the manual input of parameters. As such, comments are kept to a minimum. For full explanatory comments see the loop versions of the functions. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#function for output for data without predictions, inputting files and data as variable parameters\n", "#of course you will need to read in the files first\n", "#s is the noise-free compositions file, rel is the relative abundance file, the markers are as explained above\n", "#ID is the participant ID\n", "#see the main versions of this function, plot_loop, for full explanatory comments\n", "def plot(s, rel, markers_r, markers_g,ID):\n", " #divide the list of bacteria of interest into groups of 4 to facilitate plotting\n", " rows=[[2,30,58,59],[60,63,70,80],[94,104,113,167],[169,170,206,221],[223,227,229,234]]\n", " days=[int(x) for x in rel.columns] \n", " #run a loop to plot each group of 4 in a 2 by 2 format with our custom markers, then save the file\n", " for j in range(5):\n", " fig = plt.figure(figsize=(18,14))\n", " for i in range(4):\n", " ax = fig.add_subplot(2,2,i+1)\n", " ax.plot(days, s.iloc[rows[j][i]],'-gD', markevery=markers_r, markerfacecolor='none', \n", " markersize=8, linewidth=2,dashes=[2, 2,5,2], c='black')\n", " ax.plot(days, s.iloc[rows[j][i]],'-gD', markevery=markers_r, markerfacecolor=l_red, alpha=0.75, \n", " markersize=8, c='none') \n", " ax.plot(days, s.iloc[rows[j][i]],'-gD', markevery=markers_g, markerfacecolor=l_green, alpha=0.75,\n", " markersize=8, c='none') \n", " ax.plot(days,rel.iloc[rows[j][i]-1],'-gD', markevery=markers_r,markerfacecolor='none',markersize=8, \n", " linewidth=2, c='black')\n", " ax.plot(days,rel.iloc[rows[j][i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red,alpha=0.75, markersize=8, \n", " c='none')\n", " ax.plot(days,rel.iloc[rows[j][i]-1],'-gD', markevery=markers_g,markerfacecolor=d_green,alpha=0.75, markersize=8, \n", " c='none') \n", " #optional: insert code from Section 6 to add a legend for each plot - remember to make size/fit adjustments\n", " plt.title('{} Composition'.format(key['Name'][rows[j][i]-1]), size=15)\n", " plt.xlabel(\"Age (Days) of Participant {}\".format(ID), size=13)\n", " plt.ylabel(\"Relative Abundance\", size=13)\n", " plt.savefig(\"Plots/{}_{}.png\".format(ID,j), format='png')\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#function with predictions included for individual files, inputting files and data manually as variable parameters\n", "#of course you will need to read in the files first and have the markers handy\n", "#r is the reordered noise-free compositions with predictions file, and rel is the relative abundance file\n", "#markers are as explained above, and ID is the participant ID\n", "#see the main version of this function, plot_pred_loop, for full explanatory comments\n", "def plot_pred(r, rel, markers_r, markers_g,markers_r1,markers_g1, ID):\n", " #divide the list of bacteria of interest into groups of 4 to facilitate plotting\n", " rows=[[2,30,58,59],[60,63,70,80],[94,104,113,167],[169,170,206,221],[223,227,229,234]]\n", " days=[int(x) for x in rel.columns]\n", " #run a loop to plot each group of 4 in a 2 by 2 format with our custom markers, then save the file\n", " for j in range(5):\n", " fig = plt.figure(figsize=(18,14))\n", " for i in range(4):\n", " ax = fig.add_subplot(2,2,i+1)\n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_r1, markerfacecolor='none',markersize=8, \n", " linewidth=2,dashes=[2, 2,5,2], c='black')\n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_r1, markerfacecolor=l_red, alpha=0.75, \n", " markersize=8, c='none')\n", " ax.plot(r.loc[0]+days[0], r.iloc[rows[i]],'-gD', markevery=markers_g1, markerfacecolor=l_green, alpha=0.75,\n", " markersize=8, c='none')\n", " ax.plot(days, rel.iloc[rows[i]-1],'-gD', markevery=markers_r,markerfacecolor='none',markersize=8, \n", " linewidth=2, c='black')\n", " ax.plot(days, rel.iloc[rows[i]-1],'-gD', markevery=markers_r,markerfacecolor=d_red, alpha=0.75, markersize=8, \n", " c='none')\n", " ax.plot(days,rel.iloc[rows[i]-1],'-gD', markevery=markers_g,markerfacecolor=d_green,alpha=0.75, markersize=8, \n", " c='none')\n", " #optional: insert code from Section 6 to add a legend for each plot - remember to make size/fit adjustments\n", " plt.title('{} Composition with Predictions'.format(key['Name'][rows[j][i]-1]), size=15)\n", " plt.xlabel(\"Age (Days) of Participant {}\".format(ID), size=13)\n", " plt.ylabel(\"Relative Abundance\", size=13)\n", " plt.savefig(\"Plots/{}_pred_{}.png\".format(ID,j), format='png')\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section 6: Legends\n", "Here we have code for creating legends for the plots in this program to be saved as separate files. Then we provide a template code which can be copied and pasted into the functions, then adjusted accordingly to give every plot its own legend. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#create a legend for plots without predictions and save to a separate file using dummy plots\n", "#you may already have this legend saved from the program Leave_One_Out_Examples, in which case you can skip this\n", "fig = plt.figure()\n", "fig.patch.set_alpha(0.0)\n", "ax = fig.add_subplot()\n", "ax.plot([], [], linewidth=2, c='black',dashes=[2, 2,5,2], label=\"Noise-Free\")\n", "ax.plot([], [], 'gD', color=l_red,alpha=0.75,label=\"Noise-Free Exacerbated\")\n", "ax.plot([], [], 'gD', color=l_green,alpha=0.75,label=\"Noise-Free Stable\")\n", "ax.plot([], [], linewidth=2, c='black', label=\"Observed\")\n", "ax.plot([], [], 'gD', color=d_red,alpha=0.75,label=\"Observed Exacerbated\")\n", "ax.plot([], [], 'gD', color=d_green,alpha=0.75,label=\"Observed Stable\")\n", "ax.legend(loc='center', shadow=True, ncol=2)\n", "plt.gca().set_axis_off()\n", "plt.savefig(\"Plots/Basic_Legend.png\", format='png')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#create a legend for plots with predictions and save to a separate file using dummy plots\n", "fig = plt.figure()\n", "fig.patch.set_alpha(0.0)\n", "ax = fig.add_subplot()\n", "ax.plot([], [], linewidth=2, c='black',dashes=[2, 2,5,2], label=\"Noise-Free with Predictions\")\n", "ax.plot([], [], 'gD', color=l_red,alpha=0.75,label=\"Noise-Free Exacerbated\")\n", "ax.plot([], [], 'gD', color=l_green,alpha=0.75,label=\"Noise-Free Stable\")\n", "ax.plot([], [], linewidth=2, c='black', label=\"Observed\")\n", "ax.plot([], [], 'gD', color=d_red,alpha=0.75,label=\"Observed Exacerbated\")\n", "ax.plot([], [], 'gD', color=d_green,alpha=0.75,label=\"Observed Stable\")\n", "ax.legend(loc='center', shadow=True, ncol=2)\n", "plt.gca().set_axis_off()\n", "plt.savefig(\"Plots/Legend_with_Pred.png\", format='png')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#code to paste into the functions at the indicated places - legends for plots without predictions\n", "#it is written to place the legend outside the plot, where it won't interfere\n", "#you may wish to change the position of the legend box or make other adjustments to the figsize, or make other edits\n", "#it is created using dummy plots with the same features as our actual plots\n", "ax.plot([], [], linewidth=2, c='black',dashes=[2, 2,5,2], label=\"Noise-Free\")\n", "ax.plot([], [], 'gD', color=l_red,alpha=0.75,label=\"Noise-Free Exacerbated\")\n", "ax.plot([], [], 'gD', color=l_green,alpha=0.75,label=\"Noise-Free Stable\")\n", "ax.plot([], [], linewidth=2, c='black', label=\"Observed\")\n", "ax.plot([], [], 'gD', color=d_red,alpha=0.75,label=\"Observed Exacerbated\")\n", "ax.plot([], [], 'gD', color=d_green,alpha=0.75,label=\"Observed Stable\")\n", "chartBox = ax.get_position()\n", "ax.set_position([chartBox.x0, chartBox.y0, chartBox.width*0.6, chartBox.height])\n", "#the tuple (1.25, 0.8) refers to the position relative to the width and height of the plot\n", "ax.legend(loc='upper center', bbox_to_anchor=(1.25, 0.8), shadow=True, ncol=2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#code to paste into the functions at the indicated places - legends for plots with predictions\n", "#it is written to place the legend outside the plot, where it won't interfere\n", "#you may wish to change the position of the legend box or make other adjustments to the figsize, or make other edits\n", "#it is created using dummy plots with the same features as our actual plots\n", "ax.plot([], [], linewidth=2, c='black',dashes=[2, 2,5,2], label=\"Noise-Free with Predictions\")\n", "ax.plot([], [], 'gD', color=l_red,alpha=0.75,label=\"Noise-Free Exacerbated\")\n", "ax.plot([], [], 'gD', color=l_green,alpha=0.75,label=\"Noise-Free Stable\")\n", "ax.plot([], [], linewidth=2, c='black', label=\"Observed\")\n", "ax.plot([], [], 'gD', color=d_red,alpha=0.75,label=\"Observed Exacerbated\")\n", "ax.plot([], [], 'gD', color=d_green,alpha=0.75,label=\"Observed Stable\")\n", "chartBox = ax.get_position()\n", "ax.set_position([chartBox.x0, chartBox.y0, chartBox.width*0.6, chartBox.height])\n", "#the tuple (1.25, 0.8) refers to the position relative to the width and height of the plot\n", "ax.legend(loc='upper center', bbox_to_anchor=(1.25, 0.8), shadow=True, ncol=2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }