In [1]:
from pymeda import Meda

import pandas as pd
import sklearn.datasets as datasets

### Running PyMeda
First, PyMeda expects the input data to either be a string pointing to a csv file or a Pandas dataframe. Make sure that the first row corresponds to the feature names and you remove any columns without meaningful information (e.g. unique identifiers, etc).

Secondly, the following plots can be generated using PyMeda:
1. Representative Heatmap - plots all data points as a heatmap
2. Ridge Line Plot - plots each feature as a density
3. Location Heatmap - plots mean and median of each feature as heatmap
4. Location Lines - plots mean and median of each feature as lines
5. Scree Plot - computes the explained variance after computing PCA of each component
6. Correlation Matrix - correlation of features
7. Hierarchical Gaussian Mixture Model Plots - clustering of data
    1. Dendogram
    8. Pair Plot
    9. Stacked Means
    10. Mean Heatmap
    11. Mean Lines

You can run each plots individually (if you know what you want), or use the `run_all` function to generate all of the above plots. Note: Pair plot for clustering will not be made if number of samples is > 1000 due to issues with plotly.


Lastly, you can also save the outputs as an html file by using the `generate_report` function. 

In [3]:
#Create iris dataset in pandas dataframe
iris = datasets.load_iris()
data = iris.data
columns = iris.feature_names
iris_df = pd.DataFrame(data=data, columns=columns)

title = 'Iris Dataset' #Set title
cluster_levels = 2 #Set number of times to cluster.

meda = Meda(data=iris_df, title=title, cluster_levels=cluster_levels)

In [4]:
#You can make individual plots by calling the class methods
meda.correlation_matrix()

In [5]:
meda.cluster_pair_plot()

In [6]:
#You can also make all plots at once.
meda.run_all()

In [7]:
#Generate a static HTML report
out_dir = './'
meda.generate_report(out_dir)

Report saved at ./2018-10-01_Iris Dataset.html
