{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Front Matter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Abstract" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This thesis is a general overview of spectral methods on networks, and how you can use tools from a network’s eigenspace to understand and explain the network more deeply.\n", "\n", "Networks are some of the most fundamental building blocks of the universe. Atoms and molecules are connected to each other with chemical bonds. Neurons connect to each other through synapses, and the different parts of the brain connect to each other through groups of neurons interacting with each other. At a larger level, we are interconnected with other humans through social networks, and our economy is a global, interconnected trade network. The Earth’s food chain is an ecological network, and larger still, every object with mass in the universe is connected to every other object through a gravitational network.\n", "\n", "This thesis covers the fundamentals of spectral methods with respect to network data science, focusing on developing intuition on networks as statistical objects, while paired with relevant Python code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Acknowledgements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Big thanks to everybody who has been reading the thesis as I write and giving feedback. This list includes Dax Pryce, Ross Lawrence, Geoff Loftus, Alexandra McCoy, Olivia Taylor Peter Brown, Sambit Panda, Eric Bridgeford, Josh Vogelstein, and Ali Sad-Aldin. \n", "\n", "I am grateful to my advisor, Joshua Vogelstein, for his insights and strong feedback. The value he puts on clarity and simplicity in any mathematical model has been an enormous help throughout this process.\n", "\n", "I am also especially grateful to Eric Bridgeford, who has been giving me constant feedback throughout the writing process. I would be lost in a sea of papers without his help." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **Dedication**\n", "\n", "This thesis is dedicated to my father, Geoffrey Loftus, for teaching me the value of rigor in science and for being a resoundingly positive role model throughout my life; to my mother, Susan Loftus, for teaching me to never give up in the face of adversity; to my sister, Emma Loftus, for accepting me no matter what; to my stepfather, Matthew Voorsanger, for showing me when to be pragmatic and when to use humor; to my stepmother, Willa Rose, for always being there to listen; and to my stepsiblings for illuminating and enlarging my life." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Abstract** \n", "**Acknowledgements** \n", "**List of Figures** \n", "**1. Matrix Representations of Networks** \n", " The Adjacency Matrix \n", " The Incidence Matrix \n", " The Oriented Incidence Matrix \n", " The Degree Matrix \n", " The Laplacian Matrix \n", "**2. Why Embed Networks?** \n", " High Dimensionality of Network Data \n", " Latent Estimation \n", " The Latent Position Matrix \n", " Edge Probability Estimation\n", " Block Probability Matrices \n", " Geometry of Latent Positions \n", "**3. Spectral Embedding Methods** \n", " Singular Vectors and Singular Value Decomposition \n", " Breaking Down the Laplacian \n", " Matrix Rank \n", " Sums of Rank One Matrices \n", " Laplacian Approximation Through Summation \n", " Increased Usefulness of Approximation with Larger Networks \n", " Matrix Rank and Spectral Embedding \n", " Dimensionality Estimation \n", " The Two-Truths Phenomenon \n", "**4. Multiple-Network Representation Learning** \n", " Data Generation \n", " Simple Embedding Methods on Multiple Networks \n", " Averaging Separately \n", " Averaging Together \n", " Different Types of Multi-Network Representation Learning \n", " Network Combination: Together \n", " Network Combination: Separate \n", " Embedding Combination \n", " Multiple-Adjacency Spectral Embedding \n", " Overview of MASE\n", " Data Generation \n", " Embedding \n", " Combining Embeddings \n", " Joint Embedding of Combinations \n", " Score Matrices \n", " Omnibus Embedding \n", " OMNI on Four Networks \n", " Overview of OMNI \n", " The Omnibus Matrix \n", " Embedding the Omnibus Matrix \n", " Using the Omnibus Embedding \n", "**5. Joint Representation Learning** \n", " Data Generation \n", " Covariates \n", " Covariate-Assisted Spectral Embedding \n", " Weight Exploration \n", " Weight Estimation \n", " Omnibus Joint Embedding \n", " MASE Joint Embedding \n", "**6. Single-Network Vertex Nomination** \n", " Spectral Vertex Nomination \n", " Finding a Single Set of Nominations \n", " Nominations for Each Node \n", "**7. Out-of-Sample Embedding** \n", " Data Generation \n", " Probability Vector Estimation \n", " Inversion of Probability Vector Estimation \n", " The Moore-Penrose Pseudoinverse \n", " Using the Pseudoinverse for Out-of-Sample Estimation \n", "**8. Anomaly Detection for Timeseries of Networks** \n", " Simulating Timeseries Data \n", " Approaches for Anomaly Detection \n", " Detecting if the First Time-Point is an Anomaly \n", " Hypothesis Testing a Test Statistic \n", " Bootstrapped Distribution Estimation \n", " P-Value Estimation \n", " Testing the Remaining Time-Points \n", " The Distribution of the Bootstrapped Test Statistic " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List of Figures" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1.1 A Three-Node Network \n", "1.2 Adjacency Matrix and Layout Plot \n", "1.2 Incidence Matrix and Layout Plot \n", "1.3 Oriented Incidence Matrix and Layout Plot \n", "2.1 Euclidean data represented as a data matrix and represented in Euclidean space \n", "2.2 Clustered data after K-Means \n", "2.3 A Network With Two Groups \n", "2.4 Latent Position Estimation \n", "2.5 Estimated and True Block Probability Matrices \n", "2.6 Geometry of Latent Positions \n", "3.1 The Spectral Embedding Algorithm \n", "3.2 A Simple Network \n", "3.3 The Laplacian Is Just a Function of the Adjacency Matrix \n", "3.4 Decomposing our Simple Laplacian into Eigenvectors and Eigenvalues with SVD \n", "3.5 We Can Recreate our Simple Laplacian by Summing All the Low-Rank Matrices \n", "3.6 The Sum of an Increasing Number of Low-Rank Matrices \n", "3.7 Summing Only Two Low-Rank Matrices Approximates the Normalized Laplacian \n", "3.8 The Latent Position Matrix \n", "3.9 Low-Rank Matrices Contain the Same Information As Columns of the Latent Position Matrix \n", "3.10 Expressing the Sum With Columns of the Latent Position Matrix \n", "3.11 Scree Plot \n", "3.12 The Adjacency Spectral Embedding \n", "3.13 The Laplacian Spectral Embedding \n", "3.14 Affinity vs Core-Periphery Structure \n", "4.1 Different Sets of Brain Networks \n", "4.2 Averaged Embedded Networks \n", "4.3 Clustering with GMM \n", "4.4 Embedding when we Average Everything Together \n", "4.5 Network Comparison\n", "4.6 Averaged Brain Network \n", "4.7 Combined Network Group Embedding \n", "4.8 Separate Network Group Embedding \n", "4.9 Combined Embedding \n", "4.10 MASE Embedding On Network Groups \n", "4.11 Four Different Networks \n", "4.12 ASE on Four Networks \n", "4.13 Latent Position Matrices for Four Embeddings \n", "4.14 Combined Embeddings For Four Networks \n", "4.15 Visualizing the Joint Embedding \n", "4.16 MASE Embedding \n", "4.17 Score Matrices and Edge Probabilities \n", "4.18 ASE for Four Networks \n", "4.19 Omnibus Embedding for Four Networks \n", "4.20 The Omnibus Matrix for Two Networks \n", "4.21 Full Omnibus Matrix for All Four Networks \n", "4.22 Latent Positions for the Omnibus Embedding in Matrix Form \n", "4.23 Latent Positions for the Omnibus Embedding in Euclidean Space \n", "4.24 Mouse Networks Corresponding to a Single Node after Omnibus Embedding\n", "5.1 Stochastic Block Model with Three Communities \n", "5.2 Laplacian-Embedded Latent Positions \n", "5.3 Covariate Visualization \n", "5.4 Laplacian and Covariates \n", "5.5 Embedding Without Weights \n", "5.6 Comparison of Embeddings for Different Weights on $YY^\\top$ \n", "5.7 Embedding With Weights \n", "5.8 The Benefit Of Using Two Types of Information \n", "5.9 Embedding with Graspologic \n", "5.10 Omni Embedding for Topology and Covariates \n", "5.11 MASE Joint Embedding \n", "6.1 Latent Positions and Seeds for an SBM With Three Communities \n", "6.2 Centroid for Seed Latent Positions \n", "6.3 Nomination List \n", "6.4 Nomination List: Network Plot \n", "6.5 Nominations for Each Seed Node \n", "7.1 Adjacency Matrix and Vector For Additional Node \n", "7.2 Latent Positions for Original Network \n", "7.3 Estimated Probability Vector for First Node \n", "7.4 A Noninvertible Linear Transformation \n", "7.5 The Best Approximation The Pseudoinverse Can Do \n", "7.6 Estimating the Out-of-Sample Latent Position \n", "7.7 Latent Positions with Out-of-Sample Estimate \n", "8.1 Network Timeseries Data \n", "8.2 Distribution of test statistics with the same latent positions \n", "8.3 Test Statistics for Each Timeseries \n", "8.4 Distribution of test statistics with the same latent positions \n", "8.5 Distribution Comparison for Bootstrapped and True Test Statistic" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }