Advanced usage ============== This notebook replicates what was done in the *simple_usage* notebooks, but this time with the advanced API. The advanced API is required if we want to use non-standard affinity methods that better preserve global structure. If you are comfortable with the advanced API, please refer to the *preserving_global_structure* notebook for a guide how obtain better embeddings and preserve more global structure. .. code:: ipython3 from openTSNE import TSNEEmbedding from openTSNE import affinity from openTSNE import initialization from examples import utils import numpy as np from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt Load data --------- The preprocessed data set can be downloaded from .. code:: ipython3 import gzip import pickle with"data/macosko_2015.pkl.gz", "rb") as f: data = pickle.load(f) x = data["pca_50"] y = data["CellType1"].astype(str) .. code:: ipython3 print("Data set contains %d samples with %d features" % x.shape) .. parsed-literal:: Data set contains 44808 samples with 50 features Create train/test split ----------------------- .. code:: ipython3 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.33, random_state=42) .. code:: ipython3 print("%d training samples" % x_train.shape[0]) print("%d test samples" % x_test.shape[0]) .. parsed-literal:: 30021 training samples 14787 test samples Create a t-SNE embedding ------------------------ Like in the *simple_usage* notebook, we will run the standard t-SNE optimization. This example shows the standard t-SNE optimization. Much can be done in order to better preserve global structure and improve embedding quality. Please refer to the *preserving_global_structure* notebook for some examples. **1. Compute the affinities between data points** .. code:: ipython3 %%time affinities_train = affinity.PerplexityBasedNN( x_train, perplexity=30, metric="euclidean", n_jobs=8, random_state=42, verbose=True, ) .. parsed-literal:: ===> Finding 90 nearest neighbors using Annoy approximate search using euclidean distance... --> Time elapsed: 8.72 seconds ===> Calculating affinity matrix... --> Time elapsed: 0.58 seconds CPU times: user 31 s, sys: 1.63 s, total: 32.7 s Wall time: 14.1 s **2. Generate initial coordinates for our embedding** .. code:: ipython3 %time init_train = initialization.pca(x_train, random_state=42) .. parsed-literal:: CPU times: user 742 ms, sys: 418 ms, total: 1.16 s Wall time: 213 ms **3. Construct the ``TSNEEmbedding`` object** .. code:: ipython3 embedding_train = TSNEEmbedding( init_train, affinities_train, negative_gradient_method="fft", n_jobs=8, verbose=True, ) **4. Optimize embedding** 1. Early exaggeration phase .. code:: ipython3 %time embedding_train_1 = embedding_train.optimize(n_iter=250, exaggeration=12) .. parsed-literal:: ===> Running optimization with exaggeration=12.00, lr=2501.75 for 250 iterations... Iteration 50, KL divergence 5.1633, 50 iterations in 2.3390 sec Iteration 100, KL divergence 5.0975, 50 iterations in 2.5052 sec Iteration 150, KL divergence 5.0648, 50 iterations in 2.3208 sec Iteration 200, KL divergence 5.0510, 50 iterations in 2.3077 sec Iteration 250, KL divergence 5.0430, 50 iterations in 2.3200 sec --> Time elapsed: 11.79 seconds CPU times: user 31.7 s, sys: 343 ms, total: 32.1 s Wall time: 11.9 s .. code:: ipython3 utils.plot(embedding_train_1, y_train, colors=utils.MACOSKO_COLORS) .. image:: output_18_0.png 2. Regular optimization .. code:: ipython3 %time embedding_train_2 = embedding_train_1.optimize(n_iter=500) .. parsed-literal:: ===> Running optimization with exaggeration=1.00, lr=30021.00 for 500 iterations... Iteration 50, KL divergence 3.0008, 50 iterations in 2.4008 sec Iteration 100, KL divergence 2.7927, 50 iterations in 3.6000 sec Iteration 150, KL divergence 2.6962, 50 iterations in 4.8722 sec Iteration 200, KL divergence 2.6384, 50 iterations in 6.0994 sec Iteration 250, KL divergence 2.5970, 50 iterations in 7.2336 sec Iteration 300, KL divergence 2.5673, 50 iterations in 8.3499 sec Iteration 350, KL divergence 2.5431, 50 iterations in 9.6641 sec Iteration 400, KL divergence 2.5244, 50 iterations in 10.8648 sec Iteration 450, KL divergence 2.5088, 50 iterations in 11.8919 sec Iteration 500, KL divergence 2.4950, 50 iterations in 13.4849 sec --> Time elapsed: 78.46 seconds CPU times: user 1min 58s, sys: 442 ms, total: 1min 58s Wall time: 1min 18s .. code:: ipython3 utils.plot(embedding_train_2, y_train, colors=utils.MACOSKO_COLORS) .. image:: output_21_0.png Transform --------- .. code:: ipython3 %%time embedding_test = embedding_train_2.prepare_partial(x_test) .. parsed-literal:: ===> Finding 90 nearest neighbors in existing embedding using Annoy approximate search... --> Time elapsed: 3.86 seconds ===> Calculating affinity matrix... --> Time elapsed: 0.17 seconds CPU times: user 10.8 s, sys: 713 ms, total: 11.5 s Wall time: 4.06 s .. code:: ipython3 utils.plot(embedding_test, y_test, colors=utils.MACOSKO_COLORS) .. image:: output_24_0.png .. code:: ipython3 %time embedding_test_1 = embedding_test.optimize(n_iter=250, learning_rate=0.1, exaggeration=1.5) .. parsed-literal:: ===> Running optimization with exaggeration=1.50, lr=0.10 for 250 iterations... Iteration 50, KL divergence 207802.1491, 50 iterations in 0.4943 sec Iteration 100, KL divergence 203381.3331, 50 iterations in 0.4801 sec Iteration 150, KL divergence 199053.2098, 50 iterations in 0.4745 sec Iteration 200, KL divergence 197220.9429, 50 iterations in 0.5029 sec Iteration 250, KL divergence 196404.5606, 50 iterations in 0.4917 sec --> Time elapsed: 2.44 seconds CPU times: user 7.87 s, sys: 113 ms, total: 7.98 s Wall time: 3.24 s .. code:: ipython3 utils.plot(embedding_test_1, y_test, colors=utils.MACOSKO_COLORS) .. image:: output_26_0.png Together -------- We superimpose the transformed points onto the original embedding with larger opacity. .. code:: ipython3 fig, ax = plt.subplots(figsize=(8, 8)) utils.plot(embedding_train_2, y_train, colors=utils.MACOSKO_COLORS, alpha=0.25, ax=ax) utils.plot(embedding_test_1, y_test, colors=utils.MACOSKO_COLORS, alpha=0.75, ax=ax) .. image:: output_28_0.png