{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Overview of CyTOF Data\n",
    "The original data was given as two tab-separated matrices\n",
    "* ``Plasma.txt`` (original name: 160202_CGI002_Plasma_Plasma_singlets.fcs_raw_events.txt)\n",
    "* ``PMA.txt`` (original name: 160202_CGI002_PMA_PMA_singlets.fcs_raw_events.txt)\n",
    "\n",
    "These files had individual cell measurements as rows and dimensions (e.g. antibodies) as columns. I only kept the dimensions of interest surface marker and phospho marker antibody columns/dimensions and renamed these files. I then semi-automatically identified 'roughly-defined' cell types using hierarchical clustering and the surface markers associated cell types. \n",
    "\n",
    "``Plasma_CT.txt`` and ``PMA_CT.txt``."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from clustergrammer_widget import *\n",
    "net = Network(clustergrammer_widget)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Plasma"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(110000, 28)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/nickfernandez/anaconda/lib/python2.7/site-packages/sklearn/cluster/k_means_.py:1382: RuntimeWarning: init_size=300 should be larger than k=1000. Setting it to 3*k\n",
      "  init_size=init_size)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1000, 28)\n"
     ]
    }
   ],
   "source": [
    "# load Plasma treated data with defined cell types\n",
    "net.load_file('../cytof_data/Plasma_UCT.txt')\n",
    "\n",
    "# subsample the data so that both treatments have the same number of cells\n",
    "net.random_sample(axis='row',num_samples=110000, random_state=99)\n",
    "df_plasma = net.export_df()\n",
    "print(df_plasma.shape)\n",
    "\n",
    "net.normalize(axis='col', norm_type='zscore', keep_orig=False)\n",
    "net.downsample(ds_type='kmeans', axis='row', num_samples=1000)\n",
    "print(net.dat['mat'].shape)\n",
    "\n",
    "# clip z-scores since we do not care about extreme outliers\n",
    "net.clip(-10,10)\n",
    "net.write_matrix_to_tsv('../cytof_data/ds_plasma.txt')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "net.set_cat_color('row', 1, 'Majority-Treatment: Plasma', 'blue')\n",
    "net.set_cat_color('row', 1, 'Majority-Treatment: PMA', 'red')\n",
    "\n",
    "# greens\n",
    "net.set_cat_color('row', 2, 'Majority-Category: CD14hi monocytes', 'yellow')\n",
    "net.set_cat_color('row', 2, 'Majority-Category: CD4 Tcells', 'blue')\n",
    "net.set_cat_color('row', 2, 'Majority-Category: NK cells_CD16hi', 'red')\n",
    "net.set_cat_color('row', 2, 'Majority-Category: NK cells_CD16hi_CD57hi', 'orange')\n",
    "net.set_cat_color('row', 2, 'Majority-Category: NK cells_CD56hi', '#FF6347')\n",
    "\n",
    "net.set_cat_color('col', 1, 'Marker-type: phospho marker', 'red')\n",
    "net.set_cat_color('col', 1, 'Marker-type: surface marker', 'blue')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "239191a72ccb425b8f091b7985cd1a5d"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "net.cluster(views=[])\n",
    "net.widget()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# PMA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "22fc7751aae646f6999ead7aed0eea68"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "net.load_file('../cytof_data/PMA_UCT.txt')\n",
    "net.random_sample(axis='row',num_samples=110000, random_state=99)\n",
    "df_pma = net.export_df()\n",
    "\n",
    "net.load_df(df_pma)\n",
    "\n",
    "net.normalize(axis='col', norm_type='zscore', keep_orig=False)\n",
    "net.downsample(ds_type='kmeans', axis='row', num_samples=1000)\n",
    "net.dat['mat'].shape\n",
    "net.clip(-10,10)\n",
    "net.write_matrix_to_tsv('../cytof_data/ds_pma.txt')\n",
    "\n",
    "net.cluster(views=[])\n",
    "net.widget()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Plasma vs PMA Treated\n",
    "\n",
    "### Merge Plasma and PMA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(220000, 28)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/nickfernandez/anaconda/lib/python2.7/site-packages/sklearn/cluster/k_means_.py:1382: RuntimeWarning: init_size=300 should be larger than k=2000. Setting it to 3*k\n",
      "  init_size=init_size)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "837bba60fd0a4eebaaef8bc4a31090dc"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df_merge = pd.concat([df_plasma, df_pma])\n",
    "print(df_merge.shape)\n",
    "net.load_df(df_merge)\n",
    "net.normalize(axis='col', norm_type='zscore', keep_orig=False)\n",
    "net.downsample(ds_type='kmeans', axis='row', num_samples=2000)\n",
    "net.clip(-10,10)\n",
    "net.dat['mat'].shape\n",
    "net.cluster(views=[])\n",
    "net.widget()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Plasma vs PMA based on Surface markers only"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2000, 18)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "31d3fefbd5594e5ba3b39475ceda3399"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df_merge = pd.concat([df_plasma, df_pma])\n",
    "net.load_df(df_merge)\n",
    "\n",
    "net.filter_cat('col', 1, 'Marker-type: surface marker')\n",
    "net.normalize(axis='col', norm_type='zscore', keep_orig=False)\n",
    "net.downsample(ds_type='kmeans', axis='row', num_samples=2000)\n",
    "net.clip(-10,10)\n",
    "print(net.dat['mat'].shape)\n",
    "\n",
    "net.cluster(views=[])\n",
    "net.widget()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Plasma vs PMA based on Phospho markers only"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2000, 10)\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d5a17ef73c494b7dbac42c2f1db5baae"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df_merge = pd.concat([df_plasma, df_pma])\n",
    "net.load_df(df_merge)\n",
    "\n",
    "net.filter_cat('col', 1, 'Marker-type: phospho marker')\n",
    "net.normalize(axis='col', norm_type='zscore', keep_orig=False)\n",
    "net.downsample(ds_type='kmeans', axis='row', num_samples=2000)\n",
    "net.clip(-10,10)\n",
    "print(net.dat['mat'].shape)\n",
    "\n",
    "net.cluster(views=[])\n",
    "net.widget()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PMA and Plasma treated cells separate more based on phospho markers than based on surface markers. This makes sense since PMA treatment is expected to influence phosphorylation levels."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see a cluster of Monocytes and Granulocytes with high phosphorylation markers: pCREB, pMAPKAP2, pERK1 2, pp38. Below we will export this cluster using the interactive dendrogram and the widget DataFrame export method, widget_df, below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df_CD14hi = net.widget_df()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4ac5863b87894a07abefb6674c90975f"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "net.load_df(df_CD14hi)\n",
    "net.cluster(views=[])\n",
    "net.widget()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [Root]",
   "language": "python",
   "name": "Python [Root]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 1,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}