{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

ipyrad-analysis toolkit: Dimensionality reduction

\n", "\n", "The `pca` tool can be used to implement a number of dimensionality reduction methods on SNP data (PCA, t-SNE, UMAP) and to filter and/or impute missing data in genotype matrices to reduce the effects of missing data. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load libraries" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# conda install ipyrad -c conda-forge -c bioconda\n", "# conda install ipcoal -c conda-forge\n", "# conda install scikit-learn -c conda-forge" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import ipyrad.analysis as ipa\n", "import toyplot" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9.61\n", "0.19.0\n" ] } ], "source": [ "print(ipa.__version__)\n", "print(toyplot.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The input data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# the simulated SNP database file\n", "SNPS = \"/tmp/oaks.snps.hdf5\"" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "file already exists\n" ] } ], "source": [ "# download example hdf5 dataset (158Mb, takes ~2-3 minutes)\n", "URL = \"https://www.dropbox.com/s/x6a4i47xqum27fo/virentes_ref.snps.hdf5?raw=1\"\n", "ipa.download(url=URL, path=SNPS);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make an IMAP dictionary (map popnames to list of samplenames)" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [], "source": [ "IMAP = {\n", " \"virg\": [\"LALC2\", \"TXWV2\", \"FLBA140\", \"FLSF33\", \"SCCU3\"],\n", " \"mini\": [\"FLSF47\", \"FLMO62\", \"FLSA185\", \"FLCK216\"],\n", " \"gemi\": [\"FLCK18\", \"FLSF54\", \"FLWO6\", \"FLAB109\"],\n", " \"bran\": [\"BJSL25\", \"BJSB3\", \"BJVL19\"],\n", " \"fusi\": [\"MXED8\", \"MXGT4\", \"TXMD3\", \"TXGR3\"],\n", " \"sagr\": [\"CUCA4\", \"CUSV6\", \"CUVN10\"],\n", " \"oleo\": [\"MXSA3017\", \"BZBB1\", \"HNDA09\", \"CRL0030\", \"CRL0001\"],\n", "}\n", "MINMAP = {\n", " \"virg\": 3,\n", " \"mini\": 3,\n", " \"gemi\": 3,\n", " \"bran\": 2,\n", " \"fusi\": 2,\n", " \"sagr\": 2,\n", " \"oleo\": 3,\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initiate tool with filtering options" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Samples: 26\n", "Sites before filtering: 1182005\n", "Filtered (indels): 0\n", "Filtered (bi-allel): 26249\n", "Filtered (mincov): 142749\n", "Filtered (minmap): 876036\n", "Filtered (subsample invariant): 600226\n", "Filtered (minor allele frequency): 494278\n", "Filtered (combined): 1068892\n", "Sites after filtering: 74034\n", "Sites containing missing values: 61935 (83.66%)\n", "Missing values in SNP matrix: 140810 (7.32%)\n", "Imputation: 'sampled'; (0, 1, 2) = 61.2%, 17.0%, 21.8%\n" ] } ], "source": [ "tool = ipa.pca(data=SNPS, minmaf=0.05, imap=IMAP, minmap=MINMAP, impute_method=\"sample\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run PCA\n", "Unlinked SNPs are automatically sampled from each locus. By setting `nreplicates=N` the subsampling procedure is repeated N times to show variation over the subsampled SNPs. The imap dictionary is used in the `.draw()` function to color points, and can be overriden to color points differently from the IMAP used in the tool above." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Subsampling SNPs: 25092/100093\n" ] }, { "data": { "text/html": [ "
BJSB3BJSL25BJVL19BZBB1CRL0001CRL0030CUCA4CUSV6CUVN10FLAB109FLBA140FLCK18FLCK216FLMO62FLSA185FLSF33FLSF47FLSF54FLWO6HNDA09LALC2MXED8MXGT4MXSA3017SCCU3TXGR3TXMD3TXWV2-4004080PC0 (16.8%) explained-4004080PC1 (14.8%) explainedvirgminigemibranfusisagroleo
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tool.run(nreplicates=10)\n", "tool.draw(imap=IMAP);" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
BJSB3BJSL25BJVL19BZBB1CRL0001CRL0030CUCA4CUSV6CUVN10FLAB109FLBA140FLCK18FLCK216FLMO62FLSA185FLSF33FLSF47FLSF54FLWO6HNDA09LALC2MXED8MXGT4MXSA3017SCCU3TXGR3TXMD3TXWV2-4004080PC0 (16.8%) explained-4004080PC1 (14.8%) explainedBJSB3BJSL25BJVL19BZBB1CRL0001CRL0030CUCA4CUSV6CUVN10FLAB109FLBA140FLCK18FLCK216FLMO62FLSA185FLSF33FLSF47FLSF54FLWO6HNDA09LALC2MXED8MXGT4MXSA3017SCCU3TXGR3TXMD3TXWV2-4004080PC0 (16.8%) explained-3003060PC2 (6.6%) explainedBJSB3BJSL25BJVL19BZBB1CRL0001CRL0030CUCA4CUSV6CUVN10FLAB109FLBA140FLCK18FLCK216FLMO62FLSA185FLSF33FLSF47FLSF54FLWO6HNDA09LALC2MXED8MXGT4MXSA3017SCCU3TXGR3TXMD3TXWV2-4004080PC1 (14.8%) explained-3003060PC2 (6.6%) explainedbranfusigemiminioleosagrvirg
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# a convenience function for plotting across three axes\n", "tool.draw_panels(0, 1, 2, imap=IMAP);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run TSNE\n", "t-SNE is a manifold learning algorithm that can sometimes better project data into a 2-dimensional plane. The distances between points in this space are harder to interpret. " ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Subsampling SNPs: 25092/100093\n" ] }, { "data": { "text/html": [ "
BJSB3BJSL25BJVL19BZBB1CRL0001CRL0030CUCA4CUSV6CUVN10FLAB109FLBA140FLCK18FLCK216FLMO62FLSA185FLSF33FLSF47FLSF54FLWO6HNDA09LALC2MXED8MXGT4MXSA3017SCCU3TXGR3TXMD3TXWV2-2000200400TSNE component 1-2500250500TSNE component 2virgminigemibranfusisagroleo
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tool.run_tsne(perplexity=5, seed=333)\n", "tool.draw(imap=IMAP);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run UMAP\n", "UMAP is similar to t-SNE but the distances between clusters are more representative of the differences betwen groups. This requires another package that if it is not yet installed it will ask you to install. " ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Subsampling SNPs: 25092/100093\n" ] }, { "data": { "text/html": [ "
BJSB3BJSL25BJVL19BZBB1CRL0001CRL0030CUCA4CUSV6CUVN10FLAB109FLBA140FLCK18FLCK216FLMO62FLSA185FLSF33FLSF47FLSF54FLWO6HNDA09LALC2MXED8MXGT4MXSA3017SCCU3TXGR3TXMD3TXWV2-9-8-7-6-5UMAP component 1-5-202UMAP component 2virgminigemibranfusisagroleo
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tool.run_umap(n_neighbors=13, seed=333)\n", "tool.draw(imap=IMAP);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Missing data with imputation\n", "Missing data has large effects on dimensionality reduction methods, and it is best to (1) minimize the amount of missing data in your input data set by using filtering, and (2) impute missing data values. In the examples above data is imputed using the 'sample' method, which probabilistically samples alleles for based on the allele frequency in the group that a taxon is assigned to in IMAP. It is good to compare this to a case where imputation is performed without IMAP assignments, to assess the impact of the *a priori* assignments. Although this comparison is useful, assigning taxa to groups with IMAP dictionaries for imputation is expected to yield more accurate imputation. " ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
BJSB3BJSL25BJVL19BZBB1CRL0001CRL0030CUCA4CUSV6CUVN10FLAB109FLBA140FLCK18FLCK216FLMO62FLSA185FLSF33FLSF47FLSF54FLWO6HNDA09LALC2MXED8MXGT4MXSA3017SCCU3TXGR3TXMD3TXWV2-3003060PC0 (15.9%) explained-250255075PC1 (15.1%) explainedvirgminigemibranfusisagroleo
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# allow very little missing data\n", "import itertools\n", "tool = ipa.pca(\n", " data=SNPS, \n", " imap={'samples': list(itertools.chain(*[i for i in IMAP.values()]))},\n", " minmaf=0.05, \n", " mincov=0.9, \n", " impute_method=\"sample\", \n", " quiet=True,\n", ")\n", "tool.run(nreplicates=10, seed=123)\n", "tool.draw(imap=IMAP);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Statistics" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([0.16, 0.15, 0.06, 0.05, 0.04, 0.04, 0.03, 0.03, 0.03, 0.03, 0.03,\n", " 0.03, 0.03, 0.03, 0.03, 0.03, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02,\n", " 0.02, 0.02, 0.02, 0.01, 0.01, 0. ])" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# variance explained by each PC axes in the first replicate run\n", "tool.variances[0].round(2)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789...18192021222324252627
BJSB3-4.61074.151-18.954-26.0730.5870.241-1.056-2.311-1.1290.826...1.7391.861-0.803-2.5320.29245.451-3.763-6.6470.5115.150e-15
BJSL25-4.89770.815-17.631-22.660-0.514-0.194-3.377-1.688-0.4202.316...0.284-1.6982.0790.7570.322-17.0174.23439.465-0.777-3.872e-15
BJVL19-5.21170.611-19.109-23.0220.183-0.428-2.558-1.428-0.9321.365...-0.139-0.828-0.3560.900-0.154-31.0570.534-32.1380.0115.343e-15
BZBB159.008-7.8913.7391.1819.771-21.487-5.505-10.8504.0332.517...-23.674-7.35544.886-0.2010.7260.436-12.788-0.580-1.0931.349e-14
CRL000148.873-14.312-3.244-1.591-4.75624.6377.354-10.712-12.492-3.943...-2.002-0.992-1.080-1.4700.523-0.4877.4340.33535.4151.801e-14
CRL003065.379-10.1581.3790.0167.581-12.391-2.594-8.7620.4291.645...-16.870-1.965-15.548-1.6472.6703.92941.397-2.348-11.3421.248e-14
CUCA428.689-17.756-5.968-4.747-25.342-13.972-15.90460.494-27.650-2.318...3.3900.0351.8861.354-1.8750.5280.130-0.0050.234-5.325e-15
CUSV625.507-18.772-5.325-5.905-13.65725.755-3.04923.13560.07913.485...-0.0831.5580.302-0.601-1.8120.539-1.068-0.345-0.2487.175e-15
CUVN1036.071-18.435-5.100-3.779-13.75453.09814.693-10.917-24.551-6.453...4.4040.4157.293-0.120-0.499-0.757-6.956-0.045-20.3521.455e-14
FLAB109-27.347-23.411-22.5416.585-17.245-6.0953.553-13.3510.285-0.432...-9.7642.632-4.347-0.406-20.536-0.2711.2771.406-0.2145.870e-15
FLBA140-27.255-18.52424.450-17.5971.0651.099-6.548-4.178-0.478-2.337...0.620-20.521-2.25749.330-7.6022.7671.443-0.5700.4966.641e-15
FLCK18-27.442-22.547-22.6796.941-19.645-12.9940.340-9.506-0.1531.499...0.0833.1861.081-0.757-10.247-0.010-0.548-0.900-0.054-1.586e-15
FLCK216-25.219-19.694-21.3518.45620.7463.235-6.1582.4586.057-11.410...-1.026-4.1510.246-0.832-1.569-0.220-0.9910.159-0.4068.143e-15
FLMO62-24.339-17.004-18.57710.04521.7032.052-2.7567.499-2.9316.725...-2.4913.8080.2690.365-0.727-0.1440.9700.5750.6321.325e-15
FLSA185-23.002-14.798-21.46914.03255.13813.869-1.32615.814-4.7244.430...0.2180.192-0.342-0.2261.249-0.0030.3030.127-0.2665.993e-16
FLSF33-29.098-20.06728.700-18.6721.083-1.706-6.625-2.339-4.928-1.446...-9.27340.8591.740-7.499-2.054-1.5430.0600.5260.3821.520e-14
FLSF47-25.578-21.277-17.2026.7830.406-5.6790.162-5.437-10.4734.616...1.728-2.0370.717-1.482-4.5252.779-1.255-0.6200.8163.046e-15
FLSF54-26.729-23.262-15.3583.753-16.081-7.1602.704-7.1281.987-1.853...2.7862.5691.0667.32953.3440.0250.2180.002-0.0614.989e-15
FLWO6-27.152-23.355-17.8635.182-25.241-11.0593.364-13.8608.296-4.614...4.424-1.943-0.669-6.099-8.897-0.3030.600-0.089-0.7445.412e-16
HNDA0963.228-9.3921.370-0.7398.133-14.948-3.471-9.066-0.6083.730...-9.506-0.481-37.9780.2612.431-3.157-33.0272.726-1.7916.453e-15
LALC2-27.999-20.48126.148-17.8201.279-2.240-1.612-0.461-1.5800.043...8.728-39.895-2.325-32.6782.609-0.3700.287-0.300-0.3076.276e-15
MXED8-0.86537.46317.11434.215-2.978-1.33510.3008.0053.227-28.786...-5.141-3.268-1.844-2.208-1.518-0.215-0.8860.676-0.3221.025e-14
MXGT4-6.11244.64617.53329.862-4.5572.0266.2066.7918.412-22.120...-6.1120.630-2.1812.2401.7661.183-1.157-0.5280.4321.382e-14
MXSA301753.256-7.7133.4662.37212.366-18.793-1.507-10.9076.243-4.876...52.92810.5526.8292.753-4.458-0.9312.271-0.021-1.2135.627e-15
SCCU3-26.268-17.77234.307-26.4826.7348.784-32.502-6.3842.695-13.612...-2.6838.748-0.858-6.6871.802-0.599-0.454-0.1060.1956.228e-16
TXGR3-11.57329.69925.04930.186-8.2345.636-9.980-4.686-9.87956.818...2.577-0.0450.119-0.1080.8660.7460.604-0.6980.366-1.291e-15
TXMD3-13.72228.55925.62130.245-3.5523.446-5.903-3.708-0.026-5.823...6.3952.7151.365-0.384-1.193-0.1261.150-0.409-0.478-3.965e-14
TXWV2-15.595-9.32123.495-20.7678.781-13.39663.75713.4821.20910.009...-1.5415.4210.7100.646-0.936-1.173-0.0210.3520.178-4.576e-14
\n", "

28 rows × 28 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 \\\n", "BJSB3 -4.610 74.151 -18.954 -26.073 0.587 0.241 -1.056 -2.311 \n", "BJSL25 -4.897 70.815 -17.631 -22.660 -0.514 -0.194 -3.377 -1.688 \n", "BJVL19 -5.211 70.611 -19.109 -23.022 0.183 -0.428 -2.558 -1.428 \n", "BZBB1 59.008 -7.891 3.739 1.181 9.771 -21.487 -5.505 -10.850 \n", "CRL0001 48.873 -14.312 -3.244 -1.591 -4.756 24.637 7.354 -10.712 \n", "CRL0030 65.379 -10.158 1.379 0.016 7.581 -12.391 -2.594 -8.762 \n", "CUCA4 28.689 -17.756 -5.968 -4.747 -25.342 -13.972 -15.904 60.494 \n", "CUSV6 25.507 -18.772 -5.325 -5.905 -13.657 25.755 -3.049 23.135 \n", "CUVN10 36.071 -18.435 -5.100 -3.779 -13.754 53.098 14.693 -10.917 \n", "FLAB109 -27.347 -23.411 -22.541 6.585 -17.245 -6.095 3.553 -13.351 \n", "FLBA140 -27.255 -18.524 24.450 -17.597 1.065 1.099 -6.548 -4.178 \n", "FLCK18 -27.442 -22.547 -22.679 6.941 -19.645 -12.994 0.340 -9.506 \n", "FLCK216 -25.219 -19.694 -21.351 8.456 20.746 3.235 -6.158 2.458 \n", "FLMO62 -24.339 -17.004 -18.577 10.045 21.703 2.052 -2.756 7.499 \n", "FLSA185 -23.002 -14.798 -21.469 14.032 55.138 13.869 -1.326 15.814 \n", "FLSF33 -29.098 -20.067 28.700 -18.672 1.083 -1.706 -6.625 -2.339 \n", "FLSF47 -25.578 -21.277 -17.202 6.783 0.406 -5.679 0.162 -5.437 \n", "FLSF54 -26.729 -23.262 -15.358 3.753 -16.081 -7.160 2.704 -7.128 \n", "FLWO6 -27.152 -23.355 -17.863 5.182 -25.241 -11.059 3.364 -13.860 \n", "HNDA09 63.228 -9.392 1.370 -0.739 8.133 -14.948 -3.471 -9.066 \n", "LALC2 -27.999 -20.481 26.148 -17.820 1.279 -2.240 -1.612 -0.461 \n", "MXED8 -0.865 37.463 17.114 34.215 -2.978 -1.335 10.300 8.005 \n", "MXGT4 -6.112 44.646 17.533 29.862 -4.557 2.026 6.206 6.791 \n", "MXSA3017 53.256 -7.713 3.466 2.372 12.366 -18.793 -1.507 -10.907 \n", "SCCU3 -26.268 -17.772 34.307 -26.482 6.734 8.784 -32.502 -6.384 \n", "TXGR3 -11.573 29.699 25.049 30.186 -8.234 5.636 -9.980 -4.686 \n", "TXMD3 -13.722 28.559 25.621 30.245 -3.552 3.446 -5.903 -3.708 \n", "TXWV2 -15.595 -9.321 23.495 -20.767 8.781 -13.396 63.757 13.482 \n", "\n", " 8 9 ... 18 19 20 21 22 23 \\\n", "BJSB3 -1.129 0.826 ... 1.739 1.861 -0.803 -2.532 0.292 45.451 \n", "BJSL25 -0.420 2.316 ... 0.284 -1.698 2.079 0.757 0.322 -17.017 \n", "BJVL19 -0.932 1.365 ... -0.139 -0.828 -0.356 0.900 -0.154 -31.057 \n", "BZBB1 4.033 2.517 ... -23.674 -7.355 44.886 -0.201 0.726 0.436 \n", "CRL0001 -12.492 -3.943 ... -2.002 -0.992 -1.080 -1.470 0.523 -0.487 \n", "CRL0030 0.429 1.645 ... -16.870 -1.965 -15.548 -1.647 2.670 3.929 \n", "CUCA4 -27.650 -2.318 ... 3.390 0.035 1.886 1.354 -1.875 0.528 \n", "CUSV6 60.079 13.485 ... -0.083 1.558 0.302 -0.601 -1.812 0.539 \n", "CUVN10 -24.551 -6.453 ... 4.404 0.415 7.293 -0.120 -0.499 -0.757 \n", "FLAB109 0.285 -0.432 ... -9.764 2.632 -4.347 -0.406 -20.536 -0.271 \n", "FLBA140 -0.478 -2.337 ... 0.620 -20.521 -2.257 49.330 -7.602 2.767 \n", "FLCK18 -0.153 1.499 ... 0.083 3.186 1.081 -0.757 -10.247 -0.010 \n", "FLCK216 6.057 -11.410 ... -1.026 -4.151 0.246 -0.832 -1.569 -0.220 \n", "FLMO62 -2.931 6.725 ... -2.491 3.808 0.269 0.365 -0.727 -0.144 \n", "FLSA185 -4.724 4.430 ... 0.218 0.192 -0.342 -0.226 1.249 -0.003 \n", "FLSF33 -4.928 -1.446 ... -9.273 40.859 1.740 -7.499 -2.054 -1.543 \n", "FLSF47 -10.473 4.616 ... 1.728 -2.037 0.717 -1.482 -4.525 2.779 \n", "FLSF54 1.987 -1.853 ... 2.786 2.569 1.066 7.329 53.344 0.025 \n", "FLWO6 8.296 -4.614 ... 4.424 -1.943 -0.669 -6.099 -8.897 -0.303 \n", "HNDA09 -0.608 3.730 ... -9.506 -0.481 -37.978 0.261 2.431 -3.157 \n", "LALC2 -1.580 0.043 ... 8.728 -39.895 -2.325 -32.678 2.609 -0.370 \n", "MXED8 3.227 -28.786 ... -5.141 -3.268 -1.844 -2.208 -1.518 -0.215 \n", "MXGT4 8.412 -22.120 ... -6.112 0.630 -2.181 2.240 1.766 1.183 \n", "MXSA3017 6.243 -4.876 ... 52.928 10.552 6.829 2.753 -4.458 -0.931 \n", "SCCU3 2.695 -13.612 ... -2.683 8.748 -0.858 -6.687 1.802 -0.599 \n", "TXGR3 -9.879 56.818 ... 2.577 -0.045 0.119 -0.108 0.866 0.746 \n", "TXMD3 -0.026 -5.823 ... 6.395 2.715 1.365 -0.384 -1.193 -0.126 \n", "TXWV2 1.209 10.009 ... -1.541 5.421 0.710 0.646 -0.936 -1.173 \n", "\n", " 24 25 26 27 \n", "BJSB3 -3.763 -6.647 0.511 5.150e-15 \n", "BJSL25 4.234 39.465 -0.777 -3.872e-15 \n", "BJVL19 0.534 -32.138 0.011 5.343e-15 \n", "BZBB1 -12.788 -0.580 -1.093 1.349e-14 \n", "CRL0001 7.434 0.335 35.415 1.801e-14 \n", "CRL0030 41.397 -2.348 -11.342 1.248e-14 \n", "CUCA4 0.130 -0.005 0.234 -5.325e-15 \n", "CUSV6 -1.068 -0.345 -0.248 7.175e-15 \n", "CUVN10 -6.956 -0.045 -20.352 1.455e-14 \n", "FLAB109 1.277 1.406 -0.214 5.870e-15 \n", "FLBA140 1.443 -0.570 0.496 6.641e-15 \n", "FLCK18 -0.548 -0.900 -0.054 -1.586e-15 \n", "FLCK216 -0.991 0.159 -0.406 8.143e-15 \n", "FLMO62 0.970 0.575 0.632 1.325e-15 \n", "FLSA185 0.303 0.127 -0.266 5.993e-16 \n", "FLSF33 0.060 0.526 0.382 1.520e-14 \n", "FLSF47 -1.255 -0.620 0.816 3.046e-15 \n", "FLSF54 0.218 0.002 -0.061 4.989e-15 \n", "FLWO6 0.600 -0.089 -0.744 5.412e-16 \n", "HNDA09 -33.027 2.726 -1.791 6.453e-15 \n", "LALC2 0.287 -0.300 -0.307 6.276e-15 \n", "MXED8 -0.886 0.676 -0.322 1.025e-14 \n", "MXGT4 -1.157 -0.528 0.432 1.382e-14 \n", "MXSA3017 2.271 -0.021 -1.213 5.627e-15 \n", "SCCU3 -0.454 -0.106 0.195 6.228e-16 \n", "TXGR3 0.604 -0.698 0.366 -1.291e-15 \n", "TXMD3 1.150 -0.409 -0.478 -3.965e-14 \n", "TXWV2 -0.021 0.352 0.178 -4.576e-14 \n", "\n", "[28 rows x 28 columns]" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# PC loadings in the first replicate\n", "tool.pcs(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Styling plots (see toyplot documentation)\n", "The `.draw()` function returns a canvas and axes object from toyplot which can be further modified and styled." ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
BJSB3BJSL25BJVL19BZBB1CRL0001CRL0030CUCA4CUSV6CUVN10FLAB109FLBA140FLCK18FLCK216FLMO62FLSA185FLSF33FLSF47FLSF54FLWO6HNDA09LALC2MXED8MXGT4MXSA3017SCCU3TXGR3TXMD3TXWV2-3003060PC0 (15.9%) explained-250255075PC1 (15.1%) explainedvirgminigemibranfusisagroleo
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# get plot objects, several styling options to draw\n", "canvas, axes = tool.draw(imap=IMAP, size=8, width=400);\n", "\n", "# various axes styling options shown for x axis\n", "axes.x.ticks.show = True\n", "axes.x.spine.style['stroke-width'] = 1.5\n", "axes.x.ticks.labels.style['font-size'] = '13px'\n", "axes.x.label.style['font-size'] = \"15px\"\n", "axes.x.label.offset = \"22px\"" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 4 }