# Downsampled Datasets
In this notebook, I will work with the smaller datasets since they are quicker to load and process. Once I've set up dendro-category export, etc, then I will work with the original datasets.

In [71]:
import pandas as pd
from clustergrammer_widget import *
net = Network()

In [72]:
net.load_file('../cytof_data/ds_plasma.txt')
net.clip(-10,10)
df_plasma = net.export_df()
net.load_file('../cytof_data/ds_pma.txt')
net.clip(-10,10)
df_pma = net.export_df()

# Plasma Dendro-Cats
I'm generating categories based on the clusters given by the dendrogram

In [73]:
net.load_df(df_plasma)
net.filter_cat('col', 1, 'Marker-type: surface marker')
net.make_clust()
net.dendro_cats('row', dendro_level=5)
net.make_clust()
df_plasma_cat = net.export_df()
net.set_cat_color('row', 1, 'Majority-Treatment: Plasma', 'blue')
net.set_cat_color('row', 1, 'Majority-Treatment: PMA', 'red')
clustergrammer_widget(network=net.widget())

# PMA Dendro-Cats

In [74]:
net.load_df(df_pma)
net.filter_cat('col', 1, 'Marker-type: surface marker')
net.make_clust()
net.dendro_cats('row', dendro_level=5)
net.make_clust()
df_pma_cat = net.export_df()
clustergrammer_widget(network=net.widget())

# Merge Plasma and PMA with cats

In [75]:
cell_type = {}
cell_type['plasma'] = {}
cell_type['pma'] = {}

cell_type['plasma']['Group 5: cat-4'] = 'Cell Types: T cells'
cell_type['plasma']['Group 5: cat-3'] = 'Cell Types: CD8 T cells'
cell_type['plasma']['Group 5: cat-2'] = 'Cell Types: Monocytes and Granulocytes'
cell_type['plasma']['Group 5: cat-1'] = 'Cell Types: NK cells'

cell_type['pma']['Group 5: cat-6'] = 'Cell Types: NK cells'
cell_type['pma']['Group 5: cat-5'] = 'Cell Types: NK cells'
cell_type['pma']['Group 5: cat-4'] = 'Cell Types: NK cells'
cell_type['pma']['Group 5: cat-3'] = 'Cell Types: Monocytes and Granulocytes'
cell_type['pma']['Group 5: cat-2'] = 'Cell Types: CD8 T cells'
cell_type['pma']['Group 5: cat-1'] = 'Cell Types: T cells'

In [76]:
cell_type['plasma'][ 'Group 5: cat-4']

'Cell Types: T cells'

In [77]:
# replace these categories with cell type categories
rows = df_plasma_cat.index.tolist()
new_rows = []
for inst_row in rows:
    inst_type = cell_type['plasma'][inst_row[3]]
    new_row = (inst_row[0], 'Majority-Treatment: Plasma', inst_type)
    new_rows.append(new_row)

df_plasma_cat.index = new_rows

# replace these categories with cell type categories
rows = df_pma_cat.index.tolist()
new_rows = []
for inst_row in rows:
    inst_type = cell_type['pma'][inst_row[3]]
    new_row = (inst_row[0], 'Majority-Treatment: PMA', inst_type)
    new_rows.append(new_row)    

df_pma_cat.index = new_rows

In [84]:
# net.load_df(df_plasma_cat)
# net.make_clust()
# clustergrammer_widget(network=net.widget())

In [86]:
# net.load_df(df_pma_cat)
# net.make_clust()
# clustergrammer_widget(network=net.widget())

In [80]:
df_merge_cat = pd.concat([df_plasma_cat, df_pma_cat])

In [81]:
df_merge_cat.index.tolist()[0]

('Cluster: cluster-0', 'Majority-Treatment: Plasma', 'Cell Types: T cells')

In [82]:
net.load_df(df_merge_cat)

In [83]:
net.make_clust()
clustergrammer_widget(network=net.widget())

I will also have to transfer the categories determined based on hierarchical clustering of downsampled data to the non-downsampled data. Here, I needed to manually make the names unique, but I will not need to do this when I work with the original non-downsampled data.

# Categorize Original Data
I need to generate the same categories from the downsampled data and transfer these categories to the original data. 