{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This program processes and plots output from GP Microbiome with the background shaded according to a binary variable. This method did not suit the plotting of results for stable and exacerbated time points because the difference between a stable and exacerbated condition is somewhat subjective. However, for a binary variable (or variables) such as whether or not a participant is on antibiotics, such a plot could be quite informative. The program is meant to be something you could build upon to shade the backgrounds of many kinds of plots. It demonstrates how to convert actual dates to the participant's age in days on each of those dates, which is easily adaptable to any other type of time delta. \n", "
\n", "\n", "It plots an example output file included in this repository, with example dates for going on and off antibiotics, so that you can run it yourself. I have only included one function, but since the function is based on one of the ones in Plotsamples, you can easily edit it to make a shaded-background version of any of the plots in that program. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#import necessary libraries \n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read in the OTUkey_named file, which was created by running Section 1 of either the program Plotsamples or Leave_One_Out_Examples. If you have not run it yet, edit the cell as directed in the comments to use the copy in the 'Extras' folder." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "key=pd.read_csv(\"Data/OTUkey_named.csv\")\n", "#to use the copy in the 'Extras' folder, comment out the previous line and un-comment the next one:\n", "#key=pd.read_csv('Data/Extras/OTUkey_named.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#make a list of the operational taxonomic unit (OTU) IDs for our bacteria of interest\n", "bacteria=[2,30,58,59,60,63,70,80,94,104,113,167,169,170,206,221,223,227,229,234]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read in the files\n", "Read the output files into dictionaries, to facilitate plotting multiple participants' data.\n", "
\n", "\n", "First we read in our example output files and example relative abundance files. The example output csv files are intended to resemble real output and are not actually the results of running GP Microbiome and processing the raw output with the program readsample27. I have included in this repository a full explanation of how I randomly generated the example data for those who are interested. \n", "
\n", "\n", "Since there is a version of this section in both Plotsamples and Leave_One_Out_Examples, I've omitted the optional cells in those versions that preview the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#create a list for the ID numbers of the participants whose data we ran through GPMicrobiome and wish to plot\n", "IDs=['405','453','480','500','511']\n", "#Create dictionaries and read in each person's output for noise-free compositions without predictions \n", "#and with predictions added in. \n", "dfs = {i: pd.read_csv('Data/{}.csv'.format(i)) for i in IDs}\n", "both_dfs = {i: pd.read_csv('Data/{}_both.csv'.format(i)) for i in IDs}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#rename the columns in the files containing both sets of time points based on the first row, which contains the time points\n", "#then reorder the columns in the files to make the time points consecutive, and put them in a new dictionary\n", "reordered_dfs={}\n", "for i in IDs:\n", " df=both_dfs[i].set_axis(both_dfs[i].loc[0].tolist(), axis=1, inplace=False)\n", " df=df.reindex(columns=sorted(df.columns))\n", " #save file if desired\n", " #if you do save it, you could edit the first cell in Part A to read it directly into a dictionary \n", " #I opted not to do so because it takes so little time to reorder and I wanted to save space on my computer\n", " #I wanted the unordered version saved so that I could easily examine predicted values on their own \n", " df.to_csv('Data/{}_both_reordered.csv'.format(i), index=False)\n", " reordered_dfs[i]=df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next cell reads in files created in the program Create_relative_abundance_files. If you have not run that program yet, edit the cell as directed in the comments to use the copy in the 'Extras' folder." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#read in the files containing the observed relative abundance data for each participant, adding them to a new dictionary\n", "#the columns are the age in days at the time of each sample, and we will use this information as well in the plots\n", "#the files were created and saved in the program Create_relative_abundance_files\n", "#however, they are also in the 'Extras' folder for your convenience\n", "rel_dfs = {i: pd.read_csv(\"Data/{}_Rel.csv\".format(i)) for i in IDs}\n", "#to use the ones in the 'Extras' folder, comment out the previous line and un-comment this one:\n", "#rel_dfs = {i: pd.read_csv(\"Data/Extras/Relative Abundance Files/{}_Rel.csv\".format(i)) for i in IDs}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our other plots, different coloured markers indicate a participant's clinical condition at the times when samples were taken.\n", "What if we want to plot a condition that lasted for a specific length of time? Courses of antibiotic treatment, for example?\n", "We will shade the background of our plots according to whether or not the participant was on antibiotics. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#define custom colours for the plots - light red and light green, for periods on and off antibiotics respectively\n", "l_red='#FF5959'\n", "l_green='#14AE0E'\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the demonstration, we will create some example dates of a participant going on and off antibiotics. First we generate an example data frame, which in a real plot we would substitute with importing metadata from Excel or from a csv file. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#create a dictionary and sample data frame\n", "data={'On':['8/18/2008','10/31/2008', '7/6/2009', '12/4/2010'], 'Off':['9/26/2008','12/25/2008', '1/13/2010','2/1/2011']}\n", "df = pd.DataFrame.from_dict(data)\n", "#take a look at the table\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#convert the participant's birth date to a datetime object\n", "#although the dates are in the default format already, I prefer to specify format to avoid errors\n", "#I interact regularly with people in countries that put days before months, and this makes it easy to edit and switch formats\n", "bd=pd.to_datetime('4/1/2007', format='%m/%d/%Y')\n", "#convert the dates in the columns from strings to datetime objects, then to the participant's age in days on those dates\n", "df['On'] = pd.to_datetime(df['On'],format='%m/%d/%Y')-bd\n", "df['Off']= pd.to_datetime(df['Off'],format='%m/%d/%Y')-bd\n", "#convert the datetime object to integer\n", "df['On'] = df['On'].dt.days.astype('int16')\n", "df['Off'] = df['Off'].dt.days.astype('int16')\n", "#display our data frame to confirm \n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#make lists of lists containing pairs of time points, between which the participant was either on or off antibiotics\n", "#for the plots, we consider the age in days on the last day of antibiotics the same as that on the first day off antibiotics\n", "On=[]\n", "for i in range(df.shape[0]):\n", " On.append([df['On'][i], df['Off'][i]])\n", "Off=[]\n", "#we add a pair for the desired start date, since the participant would not have always been on antibiotics\n", "Off.append([0,df['On'][0]])\n", "for i in range(df.shape[0]-1):\n", " Off.append([df['Off'][i],df['On'][i+1]])\n", "#add a single value for the final 'Off' point\n", "Off.append(df['Off'].iloc[-1])\n", "#view the lists of lists\n", "On, Off" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Plots\n", "This example function does not include predictions. It also only shows two of the OTU's (operational taxonomic units) with this example, but of course this can be easily edited to plot any number of them, or to include predictions. In fact, since the function is based on the ones in Plotsamples and Leave_One_Out_Examples, it can easily be edited to produce shaded background plots of any of the same data. \n", "
\n", "\n", "When running GP Microbiome, I chose predicted time points to be evenly spaced between actual time points, with either 1 or 2 depending on the size of the gap; then in most cases I predicted 3 time points in the future, using intervals of 180 days. It might be interesting to play around with including predictions in shaded background plots - for instance, to try plotting leave-one-out data with shaded backgrounds and compare how close the prediction for the last time point came to the observed value for participants who were on antibiotics when that sample was taken with those who were not on antibiotics when that sample was taken. If you included predictions, you would no doubt also want to add markers (as in Plotsamples and Leave_One_Out_Examples) to show which time points were predicted and which were actual time points. However, depending on how far your dates for antibiotic use extend, you might need to exclude some of the future predictions. \n", "
\n", "\n", "All of my plotting functions save the plots to a folder called 'Plots,' which is in this repository as well. Adjust the file path if you want to save them somewhere else, or comment out the line of code which saves them if you prefer not to save." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#set a colour for the plot lines, if desired - I found dark blue looked better on the red/green background than black\n", "#I define it here, in RGB format, so you can easily play around with the colour and decide for yourself\n", "d_blue = (0/255,0/255,153/255)\n", "#define the function\n", "def plot_background(name):\n", " rows=[94,229]\n", " s=dfs[name]\n", " rel=rel_dfs[name]\n", " days=[int(x) for x in rel.columns]\n", " ID=int(name) \n", " fig=plt.figure(figsize=(15,6))\n", " for i in range(2):\n", " ax = fig.add_subplot(1,2,i+1)\n", " ax.plot(days, s.iloc[rows[i]],label=\"Noise-Free\", linewidth=2,dashes=[2, 2,5,2], c=d_blue)\n", " ax.plot(days, rel.iloc[rows[i]-1],label=\"Observed\", linewidth=2, c=d_blue)\n", " for j in range(len(On)):\n", " ax.axvspan(On[j][0], On[j][1], facecolor=l_red, alpha=0.65, lw=0)\n", " for k in range (len(Off)-1):\n", " ax.axvspan(Off[k][0], Off[k][1], facecolor=l_green, alpha=0.4, lw=0)\n", " #add the final 'Off' period\n", " #if the last day of antibiotics happens to equal the final time point, this will not show up as shaded\n", " ax.axvspan(Off[-1], days[-1], facecolor=l_green, alpha=0.4, lw=0)\n", " #put in dummy plots for the legend\n", " ax.axvspan(0,0, facecolor=l_red, alpha=0.65, lw=0, label='Antibiotic')\n", " ax.axvspan(0,0, facecolor=l_green, alpha=0.4, lw=0, label='No Antibiotics')\n", " plt.title('{} Composition'.format(key['Name'][rows[i]-1]), size=20)\n", " plt.xlabel(\"Age (Days) of Participant {}\".format(ID), size=16)\n", " plt.ylabel(\"Relative Abundance\", size=16)\n", " #put limits on the x-axis to reduce white background\n", " plt.xlim(days[0]-50, days[-1]+50)\n", " plt.setp(ax.get_xticklabels(), size=12)\n", " plt.setp(ax.get_yticklabels(), size=12)\n", " #add a legend\n", " plt.legend(loc='best')\n", " #the tight_layout function reduces white space in the image. \n", " #If you turn off tight_layout you may need to adjust your text size etc.\n", " plt.tight_layout()\n", " plt.savefig(\"Plots/{}_shaded.png\".format(ID), format='png')\n", " plt.show()\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#run on participant 511's data\n", "plot_background('511')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this example, I only created one set of fake dates, and it wasn't saved to a file. How would we create a loop to run this on all participants? We could use dictionaries again. Here's what that might look like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#to run the function in a loop, first we would read into a dictionary the files with the dates in them\n", "#dates = {i: pd.read_csv('{}_dates.csv'.format(i)) for i in IDs}\n", "#this could also be easily adapted for sheets in a metadata Excel file containing the dates for all participants\n", "#or for date sheets in individual participants' Excel files\n", "#so we can run this loop, I am assigning the same fictional dates to all participants in the dictionary\n", "dates={i: df for i in IDs}\n", "On_dict={}\n", "Off_dict={}\n", "for i in IDs:\n", " On_dict[i]=[]\n", " for j in range(dates[i].shape[0]):\n", " On_dict[i].append([dates[i]['On'][j], dates[i]['Off'][j]])\n", " Off_dict[i]=[]\n", " #we add a pair for the desired start date, since the participant would not have always been on antibiotics\n", " Off_dict[i].append([0,dates[i]['On'][0]])\n", " for j in range(dates[i].shape[0]-1):\n", " Off_dict[i].append([dates[i]['Off'][j], dates[i]['On'][j+1]])\n", " #add a single value for the end date\n", " Off_dict[i].append(dates[i]['Off'].iloc[-1])\n", "#view the dictionaries\n", "On_dict, Off_dict" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#set a colour for the plot lines, if desired - I found dark blue looked better on the red/green background than black\n", "#I define it here, in RGB format, so you can easily play around with the colour and decide for yourself\n", "d_blue = (0/255,0/255,153/255)\n", "#define the function\n", "def plot_background_loop(name):\n", " rows=[94,229]\n", " s=dfs[name]\n", " rel=rel_dfs[name]\n", " On=On_dict[name]\n", " Off=Off_dict[name]\n", " days=[int(x) for x in rel.columns]\n", " ID=int(name) \n", " fig=plt.figure(figsize=(15,6))\n", " for i in range(2):\n", " ax = fig.add_subplot(1,2,i+1)\n", " ax.plot(days, s.iloc[rows[i]],label=\"Noise-Free\", linewidth=2,dashes=[2, 2,5,2], c=d_blue)\n", " ax.plot(days, rel.iloc[rows[i]-1],label=\"Observed\", linewidth=2, c=d_blue)\n", " for j in range(len(On)):\n", " ax.axvspan(On[j][0], On[j][1], facecolor=l_red, alpha=0.65, lw=0)\n", " for k in range (len(Off)-1):\n", " ax.axvspan(Off[k][0], Off[k][1], facecolor=l_green, alpha=0.4, lw=0)\n", " #add the final 'Off' period\n", " #if the last day of antibiotics happens to equal the final time point, this will not show up as shaded \n", " ax.axvspan(Off[-1], days[-1], facecolor=l_green, alpha=0.4, lw=0)\n", " #put in dummy plots for the legend\n", " ax.axvspan(0,0, facecolor=l_red, alpha=0.65, lw=0, label='Antibiotic')\n", " ax.axvspan(0,0, facecolor=l_green, alpha=0.4, lw=0, label='No Antibiotics')\n", " plt.title('{} Composition'.format(key['Name'][rows[i]-1]), size=20)\n", " plt.xlabel(\"Age (Days) of Participant {}\".format(ID), size=16)\n", " plt.ylabel(\"Relative Abundance\", size=16)\n", " #put limits on the x-axis to reduce white background\n", " plt.xlim(days[0]-50, days[-1]+50)\n", " plt.setp(ax.get_xticklabels(), size=12)\n", " plt.setp(ax.get_yticklabels(), size=12)\n", " #add a legend\n", " plt.legend(loc='best')\n", " #the tight_layout function reduces white space in the image. \n", " #If you turn off tight_layout you may need to adjust your text size etc.\n", " plt.tight_layout()\n", " plt.savefig(\"Plots/{}_shaded.png\".format(ID), format='png')\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for name in IDs:\n", " plot_background_loop(name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }