This ontology models classes and relationships describing deep learning networks, their component layers and activation functions, as well as potential biases. Artificial Intelligence Ontology 2024-06-26 The official definition, explaining the meaning of a class or property. Shall be Aristotelian, formalized and normalized. Can be augmented with colloquial definitions. definition Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. description A legal document giving official permission to do something with the resource. license A name given to the resource. title subset_property Bias Subset Class Subset Function Subset Layer Subset Machine Learning Subset Model Subset Network Subset Preprocessing Subset A core relation that holds between a part and its whole part of An abstract layer object representing an RNN cell that is the base class for implementing RNN cells with custom behavior. AbstractRNNCell An abstract layer object representing an RNN cell that is the base class for implementing RNN cells with custom behavior. A layer that applies an activation function to an output. Applies an activation function to an output. Activation Layer A layer that applies an activation function to an output. A type of machine learning focused on methods that interactively query a user or another information source to label new data points with the desired outputs. Query Learning Active Learning A type of machine learning focused on methods that interactively query a user or another information source to label new data points with the desired outputs. A use and interpretation bias occurring when systems/platforms get training data from their most active users rather than less active or inactive users. Activity Bias A use and interpretation bias occurring when systems/platforms get training data from their most active users rather than less active or inactive users. A regularization layer that applies an update to the cost function based on input activity. ActivityRegularization Layer A regularization layer that applies an update to the cost function based on input activity. A pooling layer that applies a 1D adaptive average pooling over an input signal composed of several input planes. AdaptiveAvgPool1D AdaptiveAvgPool1D Layer A pooling layer that applies a 1D adaptive average pooling over an input signal composed of several input planes. A pooling layer that applies a 2D adaptive average pooling over an input signal composed of several input planes. AdaptiveAvgPool2D AdaptiveAvgPool2D Layer A pooling layer that applies a 2D adaptive average pooling over an input signal composed of several input planes. A pooling layer that applies a 3D adaptive average pooling over an input signal composed of several input planes. AdaptiveAvgPool3D AdaptiveAvgPool3D Layer A pooling layer that applies a 3D adaptive average pooling over an input signal composed of several input planes. A pooling layer that applies a 1D adaptive max pooling over an input signal composed of several input planes. AdaptiveMaxPool1D AdaptiveMaxPool1D Layer A pooling layer that applies a 1D adaptive max pooling over an input signal composed of several input planes. A pooling layer that applies a 2D adaptive max pooling over an input signal composed of several input planes. AdaptiveMaxPool2D AdaptiveMaxPool2D Layer A pooling layer that applies a 2D adaptive max pooling over an input signal composed of several input planes. A pooling layer that applies a 3D adaptive max pooling over an input signal composed of several input planes. AdaptiveMaxPool3D AdaptiveMaxPool3D Layer A pooling layer that applies a 3D adaptive max pooling over an input signal composed of several input planes. A merging layer that adds a list of inputs taking as input a list of tensors all of the same shape. Layer that adds a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Add Layer A merging layer that adds a list of inputs taking as input a list of tensors all of the same shape. A layer that adds inputs from one or more other layers to cells or neurons of a target layer. Addition Layer An attention layer that implements additive attention also known as Bahdanau-style attention. Additive attention layer, a.k.a. Bahdanau-style attention. AdditiveAttention Layer An attention layer that implements additive attention also known as Bahdanau-style attention. A regularization layer that applies Alpha Dropout to the input keeping mean and variance of inputs to ensure self-normalizing property. Applies Alpha Dropout to the input. Alpha Dropout is a Dropout that keeps mean and variance of inputs to their original values, in order to ensure the self-normalizing property even after this dropout. Alpha Dropout fits well to Scaled Exponential Linear Units by randomly setting activations to the negative saturation value. AlphaDropout Layer A regularization layer that applies Alpha Dropout to the input keeping mean and variance of inputs to ensure self-normalizing property. A processing bias arising when the distribution over prediction outputs is skewed compared to the prior distribution of the prediction target. Amplification Bias A processing bias arising when the distribution over prediction outputs is skewed compared to the prior distribution of the prediction target. A cognitive bias characterized by the influence of a reference point or anchor on decisions leading to insufficient adjustment from that anchor point. Anchoring Bias A cognitive bias characterized by the influence of a reference point or anchor on decisions leading to insufficient adjustment from that anchor point. An individual bias occurring when users rely on automation as a heuristic replacement for their own information seeking and processing. Annotator Reporting Bias An individual bias occurring when users rely on automation as a heuristic replacement for their own information seeking and processing. A network based on a collection of connected units called artificial neurons modeled after biological neurons. ANN NN An artificial neural network (ANN) is based on a collection of connected units or nodes called artificial neurons, modeled after biological neurons, with connections transmitting signals processed by non-linear functions. Artificial Neural Network A network based on a collection of connected units called artificial neurons modeled after biological neurons. A supervised learning method focused on a rule-based approach for discovering interesting relations between variables in large databases. Association Rule Learning A supervised learning method focused on a rule-based approach for discovering interesting relations between variables in large databases. A layer that implements dot-product attention also known as Luong-style attention. Dot-product attention layer, a.k.a. Luong-style attention. Attention Layer A layer that implements dot-product attention also known as Luong-style attention. An unsupervised pretrained network that learns efficient codings of unlabeled data by training to ignore insignificant data and regenerate input from encoding. AE Layers: Input, Hidden, Matched Output-Input Auto Encoder Network An unsupervised pretrained network that learns efficient codings of unlabeled data by training to ignore insignificant data and regenerate input from encoding. A bias characterized by over-reliance on automated systems leading to attenuated human skills. Automation Complaceny Over-reliance on automated systems, leading to attenuated human skills, such as with spelling and autocorrect. Automation Complacency Bias A bias characterized by over-reliance on automated systems leading to attenuated human skills. A model that describes the variance of the current error term as a function of the previous periods' error terms, capturing volatility clustering. Used for time series data. ARCH Autoregressive Conditional Heteroskedasticity A model that includes lagged values of both the dependent variable and one or more independent variables, capturing dynamic relationships over time. Used in time series analysis. ARDL Autoregressive Distributed Lag A model which combines autoregression (AR), differencing (I), and moving average (MA) components. Used for analyzing and forecasting time series data. ARIMA Autoregressive Integrated Moving Average A language model that generates text sequentially predicting one token at a time based on the previously generated tokens excelling at natural language generation tasks by modeling the probability distribution over sequences of tokens. generative language model sequence-to-sequence model Autoregressive Language Model A model that combines autoregressive (AR) and moving average (MA) components to represent time series data, suitable for stationary series without the need for differencing. ARMA Autoregressive Moving Average A cognitive bias characterized by a mental shortcut where easily recalled information is overweighted in judgment and decision-making. Availability Bias Availability Heuristic Availability Heuristic Bias A cognitive bias characterized by a mental shortcut where easily recalled information is overweighted in judgment and decision-making. A merging layer that averages a list of inputs element-wise taking as input a list of tensors all of the same shape. Layer that averages a list of inputs element-wise. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Average Layer A merging layer that averages a list of inputs element-wise taking as input a list of tensors all of the same shape. A pooling layer that performs average pooling for temporal data. AvgPool1D Average pooling for temporal data. Downsamples the input representation by taking the average value over the window defined by pool_size. The window is shifted by strides. The resulting output when using "valid" padding option has a shape of: output_shape = (input_shape - pool_size + 1) / strides). The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides. AveragePooling1D Layer A pooling layer that performs average pooling for temporal data. A pooling layer that performs average pooling for spatial data. AvgPool2D Average pooling operation for spatial data. Downsamples the input along its spatial dimensions (height and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. The resulting output when using "valid" padding option has a shape (number of rows or columns) of: output_shape = math.floor((input_shape - pool_size) / strides) + 1 (when input_shape >= pool_size). The resulting output shape when using the "same" padding option is: output_shape = math.floor((input_shape - 1) / strides) + 1. AveragePooling2D Layer A pooling layer that performs average pooling for spatial data. A pooling layer that performs average pooling for 3D data (spatial or spatio-temporal). AvgPool3D Average pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. AveragePooling3D Layer A pooling layer that performs average pooling for 3D data (spatial or spatio-temporal). A pooling layer that applies a 1D average pooling over an input signal composed of several input planes. AvgPool1D AvgPool1D Layer A pooling layer that applies a 1D average pooling over an input signal composed of several input planes. A pooling layer that applies a 2D average pooling over an input signal composed of several input planes. AvgPool2D AvgPool2D Layer A pooling layer that applies a 2D average pooling over an input signal composed of several input planes. A pooling layer that applies a 3D average pooling over an input signal composed of several input planes. AvgPool3D AvgPool3D Layer A pooling layer that applies a 3D average pooling over an input signal composed of several input planes. An input layer that receives values from another layer. Backfed Input Layer A batch normalization layer that applies Batch Normalization over a 2D or 3D input. BatchNorm1D Applies Batch Normalization over a 2D or 3D input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . BatchNorm1D Layer A batch normalization layer that applies Batch Normalization over a 2D or 3D input. A batch normalization layer that applies Batch Normalization over a 4D input. BatchNorm2D Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . BatchNorm2D Layer A batch normalization layer that applies Batch Normalization over a 4D input. A batch normalization layer that applies Batch Normalization over a 5D input. BatchNorm3D Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . BatchNorm3D Layer A batch normalization layer that applies Batch Normalization over a 5D input. A normalization layer that normalizes its inputs applying a transformation that maintains the mean close to 0 and the standard deviation close to 1. BatchNorm Layer that normalizes its inputs. Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Importantly, batch normalization works differently during training and during inference. During training (i.e. when using fit() or when calling the layer/model with the argument training=True), the layer normalizes its output using the mean and standard deviation of the current batch of inputs. That is to say, for each channel being normalized, the layer returns gamma * (batch - mean(batch)) / sqrt(var(batch) + epsilon) + beta, where: epsilon is small constant (configurable as part of the constructor arguments), gamma is a learned scaling factor (initialized as 1), which can be disabled by passing scale=False to the constructor. beta is a learned offset factor (initialized as 0), which can be disabled by passing center=False to the constructor. During inference (i.e. when using evaluate() or predict() or when calling the layer/model with the argument training=False (which is the default), the layer normalizes its output using a moving average of the mean and standard deviation of the batches it has seen during training. That is to say, it returns gamma * (batch - self.moving_mean) / sqrt(self.moving_var + epsilon) + beta. self.moving_mean and self.moving_var are non-trainable variables that are updated each time the layer in called in training mode, as such: moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum) moving_var = moving_var * momentum + var(batch) * (1 - momentum). BatchNormalization Layer A normalization layer that normalizes its inputs applying a transformation that maintains the mean close to 0 and the standard deviation close to 1. A network that is a probabilistic graphical model representing variables and their conditional dependencies via a directed acyclic graph. Bayesian Network A network that is a probabilistic graphical model representing variables and their conditional dependencies via a directed acyclic graph. An individual bias characterized by systematic distortions in user behavior across platforms or contexts or across users represented in different datasets. Systematic distortions in user behavior across platforms or contexts, or across users represented in different datasets. Behavioral Bias An individual bias characterized by systematic distortions in user behavior across platforms or contexts or across users represented in different datasets. A systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others. Bias A systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others. A machine learning task focused on methods that simultaneously cluster the rows and columns of a matrix to identify submatrices with coherent patterns. Block Clustering Co-clustering Joint Clustering Two-mode Clustering Two-way Clustering Biclustering A machine learning task focused on methods that simultaneously cluster the rows and columns of a matrix to identify submatrices with coherent patterns. A recurrent layer that is a bidirectional wrapper for RNNs. Bidirectional wrapper for RNNs. Bidirectional Layer A recurrent layer that is a bidirectional wrapper for RNNs. A transformer language model such as BERT that uses the transformer architecture to build deep bidirectional representations by predicting masked tokens based on their context. BERT Bidirectional Transformer LM Bidirectional Transformer Language Model A transformer language model such as BERT that uses the transformer architecture to build deep bidirectional representations by predicting masked tokens based on their context. A machine learning task focused on methods that classify elements into two groups based on a classification rule. Binary Classification A machine learning task focused on methods that classify elements into two groups based on a classification rule. A symmetrically connected network that is a type of stochastic recurrent neural network and Markov random field. BM Sherrington–Kirkpatrick model with external field stochastic Hopfield network with hidden units stochastic Ising-Lenz-Little model Layers: Backfed Input, Probabilistic Hidden Boltzmann Machine Network A symmetrically connected network that is a type of stochastic recurrent neural network and Markov random field. A layer that performs categorical data preprocessing operations. Categorical Features Preprocessing Layer A layer that performs categorical data preprocessing operations. A categorical features preprocessing layer that encodes integer features providing options for condensing data into a categorical encoding. A preprocessing layer which encodes integer features. This layer provides options for condensing data into a categorical encoding when the total number of tokens are known in advance. It accepts integer values as inputs, and it outputs a dense or sparse representation of those inputs. For integer inputs where the total number of tokens is not known, use tf.keras.layers.IntegerLookup instead. CategoryEncoding Layer A categorical features preprocessing layer that encodes integer features providing options for condensing data into a categorical encoding. A probabilistic graphical model used to encode assumptions about the data-generating process. Casaul Bayesian Network Casaul Graph DAG Directed Acyclic Graph Path Diagram Causal Graphical Model A probabilistic graphical model used to encode assumptions about the data-generating process. A large language model that only attends to previous tokens in the sequence when generating text modeling the probability distribution autoregressively from left-to-right or causally. Causal Large Language Model autoregressive unidirectional Causal LLM An image preprocessing layer that crops the central portion of images to a target size. A preprocessing layer which crops images. This layers crops the central portion of the images to a target size. If an image is smaller than the target size, it will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. CenterCrop Layer An image preprocessing layer that crops the central portion of images to a target size. A supervised learning task focused on methods that distinguish and distribute kinds of "things" into different groups. Methods that distinguish and distribute kinds of "things" into different groups. Classification A supervised learning task focused on methods that distinguish and distribute kinds of "things" into different groups. The process of removing noise inconsistencies and irrelevant information from data to enhance its quality and prepare it for analysis or further processing. Data Cleansing Standardization Data cleaning Text normalization Cleaning A machine learning task focused on methods that group a set of objects such that objects in the same group are more similar to each other than to those in other groups. Cluster analysis Clustering A machine learning task focused on methods that group a set of objects such that objects in the same group are more similar to each other than to those in other groups. A systematic deviation from rational judgment and decision-making including adaptive mental shortcuts known as heuristics. Cognitive Bias A systematic deviation from rational judgment and decision-making including adaptive mental shortcuts known as heuristics. A large language model that is trained to understand and recombine the underlying compositional structures in language enabling better generalization to novel combinations and out-of-distribution examples. Compositional Generalization Large Language Model out-of-distribution generalization systematic generalization Compositional Generalization LLM A bias caused by differences between results and facts in the process of data analysis (including the source of data the estimator chose) and analysis methods. Statistical Bias Computational Bias A bias caused by differences between results and facts in the process of data analysis (including the source of data the estimator chose) and analysis methods. A merging layer that concatenates a list of inputs taking as input a list of tensors all of the same shape except for the concatenation axis. Layer that concatenates a list of inputs. It takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs. Concatenate Layer A merging layer that concatenates a list of inputs taking as input a list of tensors all of the same shape except for the concatenation axis. A use and interpretation bias due to the use of a system outside its planned domain of application causing performance gaps between laboratory settings and the real world. Concept Drift Concept Drift Bias A use and interpretation bias due to the use of a system outside its planned domain of application causing performance gaps between laboratory settings and the real world. A cognitive bias characterized by the tendency to prefer information that confirms existing beliefs influencing the search for interpretation of and recall of information. The tendency to prefer information that confirms existing beliefs, influencing the search for, interpretation of, and recall of information. Confirmation Bias A cognitive bias characterized by the tendency to prefer information that confirms existing beliefs influencing the search for interpretation of and recall of information. A bias arising when an algorithm or platform provides users a venue to express their biases occurring from either side in a digital interaction. Consumer Bias A bias arising when an algorithm or platform provides users a venue to express their biases occurring from either side in a digital interaction. A use and interpretation bias arising from structural lexical semantic and syntactic differences in user-generated content. Bias from structural, lexical, semantic, and syntactic differences in user-generated content. Content Production Bias A use and interpretation bias arising from structural lexical semantic and syntactic differences in user-generated content. A deep neural network that learns sequential tasks without forgetting knowledge from preceding tasks and without access to old task data during new task training. Incremental Learning Life-Long Learning Learning a model for sequential tasks without forgetting knowledge from preceding tasks, with no access to old task data during new task training. Continual Learning A deep neural network that learns sequential tasks without forgetting knowledge from preceding tasks and without access to old task data during new task training. A large language model that continually acquires new knowledge and skills over time without forgetting previously learned information allowing the model to adapt and expand its capabilities as new data becomes available. CL-Large Language Model Continual Learning Large Language Model catastrophic forgetting lifelong learning Continual Learning LLM A deep neural network self-supervised learning approach that learns to distinguish between similar and dissimilar data samples. Contrastive learning is a self-supervised learning approach in which the model learns to distinguish between similar and dissimilar pairs of data samples. By maximizing the similarity between positive pairs (similar samples) and minimizing the similarity between negative pairs (dissimilar samples), the model learns to capture meaningful representations of the data. This method is particularly effective for representation learning and is widely used in tasks such as image classification, clustering, and retrieval. Contrastive learning techniques often employ loss functions such as the contrastive loss or the triplet loss to achieve these objectives. Contrastive Learning A deep neural network self-supervised learning approach that learns to distinguish between similar and dissimilar data samples. A large language model that is trained to pull semantically similar samples closer together and push dissimilar samples apart in the representation space learning high-quality features useful for downstream tasks. Representation learning Contrastive Learning LLM A large language model that allows for explicit control over certain attributes of the generated text such as style tone topic or other desired characteristics through conditioning or specialized training objectives. Controllable Large Language Model conditional generation guided generation A controllable LLM allows for explicit control over certain attributes of the generated text, such as style, tone, topic, or other desired characteristics, through conditioning or specialized training objectives. Controllable LLM A convolutional layer that implements a 1D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations. 1D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional. ConvLSTM1D Layer A convolutional layer that implements a 1D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations. A convolutional layer that implements a 2D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations. 2D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional. ConvLSTM2D Layer A convolutional layer that implements a 2D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations. A convolutional layer that implements a 3D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations. 3D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional. ConvLSTM3D Layer A convolutional layer that implements a 3D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations. A layer that implements 1D convolution (e.g. temporal convolution). Conv1D Conv1D Layer Convolution1D nn.Conv1D Convolution1D Layer A layer that implements 1D convolution (e.g. temporal convolution). A layer that implements transposed 1D convolution sometimes called deconvolution. Conv1DTranspose Layer ConvTranspose1D Convolution1DTranspose nn.ConvTranspose1D Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 3) for data with 128 time steps and 3 channels. Convolution1DTranspose Layer A layer that implements transposed 1D convolution sometimes called deconvolution. A layer that implements 2D convolution (e.g. spatial convolution over images). Conv2D Conv2D Layer Convolution2D nn.Conv2D 2D convolution layer (e.g. spatial convolution over images). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last". You can use None when a dimension has variable size. Convolution2D Layer A layer that implements 2D convolution (e.g. spatial convolution over images). A layer that implements transposed 2D convolution Conv2DTranspose Layer ConvTranspose2D Convolution2DTranspose nn.ConvTranspose2D Transposed convolution layer (sometimes called Deconvolution). Convolution2DTranspose Layer A layer that implements transposed 2D convolution A layer that implements 3D convolution (e.g. spatial convolution over volumes). Conv3D Conv3D Layer Convolution3D nn.Conv3D 3D convolution layer (e.g. spatial convolution over volumes). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 1) for 128x128x128 volumes with a single channel, in data_format="channels_last". Convolution3D Layer A layer that implements 3D convolution (e.g. spatial convolution over volumes). A layer that implements transposed 3D convolution Conv3DTranspose Layer ConvTranspose3D Convolution3DTranspose nn.ConvTranspose3D Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 3) for a 128x128x128 volume with 3 channels if data_format="channels_last". Convolution3DTranspose Layer A layer that implements transposed 3D convolution A layer that contains a set of filters (or kernels) parameters of which are to be learned throughout the training. A convolutional layer is the main building block of a CNN. It contains a set of filters (or kernels), parameters of which are to be learned throughout the training. The size of the filters is usually smaller than the actual image. Each filter convolves with the image and creates an activation map. Convolutional Layer A layer that contains a set of filters (or kernels) parameters of which are to be learned throughout the training. A layer that crops along the time dimension (axis 1) for 1D input. Cropping layer for 1D input (e.g. temporal sequence). It crops along the time dimension (axis 1). Cropping1D Layer A layer that crops along the time dimension (axis 1) for 1D input. A layer that crops along spatial dimensions (i.e. height and width) for 2D input. Cropping layer for 2D input (e.g. picture). It crops along spatial dimensions, i.e. height and width. Cropping2D Layer A layer that crops along spatial dimensions (i.e. height and width) for 2D input. A layer that crops along spatial dimensions (depth Cropping layer for 3D data (e.g. spatial or spatio-temporal). Cropping3D Layer A layer that crops along spatial dimensions (depth A LLM that performs well across a wide range of domains without significant loss in performance, facilitated by advanced domain adaptation techniques. Domain-General LLM cross-domain transfer domain adaptation Cross-Domain LLM A training strategy in machine learning where models are trained on data in a meaningful order starting with simpler examples and gradually increasing the complexity to improve learning efficiency and model performance. Sequential Learning Structured Learning Complexity grading Sequential learning Curriculum Learning A large language model that is trained by presenting learning examples in a meaningful order from simple to complex mimicking the learning trajectory followed by humans. Learning progression Curriculum Learning LLM A technique used to increase the diversity and quantity of training data by applying various transformations such as rotation scaling flipping and cropping to existing data samples enhancing the robustness and performance of machine learning models. Data Enrichment Data Expansion Paraphrasing Synonym replacement Data Augmentation A use and interpretation bias where testing many hypotheses in a dataset may yield apparent statistical significance even when results are nonsignificant. Data Dredging Data Dredging Bias A use and interpretation bias where testing many hypotheses in a dataset may yield apparent statistical significance even when results are nonsignificant. Techniques used to improve the quality diversity and volume of data available for training machine learning models such as data augmentation synthesis and enrichment to enhance model robustness and accuracy. DataEnhancement A selection and sampling bias arising from adding synthetic or redundant data samples to a dataset. Bias from adding synthetic or redundant data samples to a dataset. Data Generation Bias A selection and sampling bias arising from adding synthetic or redundant data samples to a dataset. A machine learning task focused on methods that replace missing data with substituted values. Methods that replace missing data with substituted values. Data Imputation A machine learning task focused on methods that replace missing data with substituted values. The process of cleaning transforming and organizing raw data into a suitable format for analysis and modeling ensuring the quality and relevance of the data for machine learning tasks. Data Assembly Data Curation Data Processing Data Preparation A LLM that generates natural language descriptions from structured data sources like tables, graphs, and knowledge bases, requiring grounding in meaning representations. Meaning representation Data-to-Text LLM A machine learning model that uses a tree-like model of decisions and their possible consequences including chance event outcomes resource costs and utilities. A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utilities. Decision Tree A machine learning model that uses a tree-like model of decisions and their possible consequences including chance event outcomes resource costs and utilities. A large language model that uses a decoder-only architecture consisting of only a decoder trained to predict the next token in a sequence given the previous tokens. A decoder-only architecture consisting of only a decoder, trained to predict the next token in a sequence given the previous tokens. Unlike the encoder-decoder architecture, it does not have an explicit encoder and encodes information implicitly in the hidden state of the decoder, updated at each step of the generation process. Decoder LLM A large language model that uses a decoder-only architecture consisting of only a decoder trained to predict the next token in a sequence given the previous tokens. A deep neural network that uses deconvolution for unsupervised construction of hierarchical image representations. DN Layers: Input, Kernel, Convolutional/Pool, Output Deconvolutional Network A deep neural network that uses deconvolution for unsupervised construction of hierarchical image representations. A deep neural network that combines deep learning and active learning to maximize model performance while annotating the fewest samples possible. DeepAL Deep Active Learning A deep neural network that combines deep learning and active learning to maximize model performance while annotating the fewest samples possible. An unsupervised pretrained network composed of multiple layers of latent variables that learns to probabilistically reconstruct inputs and perform classification. DBN Layers: Backfed Input, Probabilistic Hidden, Hidden, Matched Output-Input Deep Belief Network An unsupervised pretrained network composed of multiple layers of latent variables that learns to probabilistically reconstruct inputs and perform classification. An autoencoder network that learns interpretable disentangled image representations through convolution and de-convolution layers trained with the stochastic gradient variational Bayes algorithm. DCIGN Layers: Input, Kernel, Convolutional/Pool, Probabilistic Hidden, Convolutional/Pool, Kernel, Output Deep Convolutional Inverse Graphics Network A deep neural network specialized for analyzing visual imagery using shared-weight architecture and translation-equivariant feature maps. CNN ConvNet Convolutional Neural Network DCN Layers: Input, Kernel, Convolutional/Pool, Hidden, Output Deep Convolutional Network A deep neural network specialized for analyzing visual imagery using shared-weight architecture and translation-equivariant feature maps. A deep neural network that processes information in one direction—from input nodes through hidden nodes to output nodes—without cycles or loops. DFF MLP Multilayer Perceptoron Layers: Input, Hidden, Output Deep Feed-Forward Network A deep neural network that processes information in one direction—from input nodes through hidden nodes to output nodes—without cycles or loops. An artificial neural network characterized by multiple hidden layers between the input and output layers. DNN A deep neural network (DNN) is a type of artificial neural network (ANN) characterized by multiple hidden layers between the input and output layers. Each layer consists of interconnected neurons that process and transmit information. DNNs can model complex patterns and representations in data through their hierarchical structure, where each layer extracts increasingly abstract features from the input. DNNs are widely used in various applications, including image and speech recognition, natural language processing, and more, due to their ability to learn and generalize from large amounts of data. Deep Neural Network A deep neural network that relaxes the hypothesis that training data must be independent and identically distributed with test data to address insufficient training data. Deep Transfer Learning A deep neural network that relaxes the hypothesis that training data must be independent and identically distributed with test data to address insufficient training data. An autoencoder network trained to reconstruct the original undistorted input from a partially corrupted input. DAE Denoising Autoencoder Layers: Noisy Input, Hidden, Matched Output-Input Denoising Auto Encoder An autoencoder network trained to reconstruct the original undistorted input from a partially corrupted input. A layer that produces a dense tensor based on given feature columns. A layer that produces a dense Tensor based on given feature_columns. Generally a single example in training data is described with FeatureColumns. At the first layer of the model, this column oriented data should be converted to a single Tensor. This layer can be called multiple times with different features. This is the V2 version of this layer that uses name_scopes to create variables instead of variable_scopes. But this approach currently lacks support for partitioned variables. In that case, use the V1 version instead. DenseFeatures Layer A layer that produces a dense tensor based on given feature columns. A layer that is a regular densely-connected neural network layer. Just your regular densely-connected NN layer. Dense Layer A layer that is a regular densely-connected neural network layer. A bias arising when systems are used as decision aids for humans since the human intermediary may act on predictions in ways that are typically not modeled in the system. Deployment Bias A bias arising when systems are used as decision aids for humans since the human intermediary may act on predictions in ways that are typically not modeled in the system. A layer that performs depthwise 1D convolution Depthwise 1D convolution. Depthwise convolution is a type of convolution in which each input channel is convolved with a different kernel (called a depthwise kernel). You can understand depthwise convolution as the first step in a depthwise separable convolution. It is implemented via the following steps: Split the input into individual channels. Convolve each channel with an individual depthwise kernel with depth_multiplier output channels. Concatenate the convolved outputs along the channels axis. Unlike a regular 1D convolution, depthwise convolution does not mix information across different input channels. The depth_multiplier argument determines how many filter are applied to one input channel. As such, it controls the amount of output channels that are generated per input channel in the depthwise step. DepthwiseConv1D Layer A layer that performs depthwise 1D convolution A layer that performs depthwise 2D convolution Depthwise 2D convolution. DepthwiseConv2D Layer A layer that performs depthwise 2D convolution A selection and sampling bias characterized by systematic differences between groups in how outcomes are determined potentially over- or underestimating effect size. Systematic differences between groups in how outcomes are determined, potentially over- or underestimating effect size. Detection Bias A selection and sampling bias characterized by systematic differences between groups in how outcomes are determined potentially over- or underestimating effect size. A large language model that is optimized for engaging in multi-turn conversations understanding context and generating relevant coherent responses continuously over many dialogue turns. Dialogue Large Language Model conversational AI multi-turn dialogue Dialogue LLM A large language model that has an architecture amenable to full end-to-end training via backpropagation without relying on teacher forcing or unlikelihood training objectives. Differentiable Large Language Model end-to-end training fully backpropagable Differentiable LLM A machine learning task focused on the process of transforming data from a high-dimensional space into a lower-dimensional space while retaining meaningful properties of the original data. Dimension Reduction Dimensionality Reduction A machine learning task focused on the process of transforming data from a high-dimensional space into a lower-dimensional space while retaining meaningful properties of the original data. A preprocessing layer which buckets continuous features by ranges. Discretization Layer A preprocessing layer which buckets continuous features by ranges. The process of training a smaller model to replicate the behavior of a larger model aiming to compress the knowledge into a more compact form without significant loss of performance. Purification Refining Knowledge compression Teacher-student model Distillation The process of training a smaller model to replicate the behavior of a larger model aiming to compress the knowledge into a more compact form without significant loss of performance. A LLM which is pre-trained on a broad corpus and then fine-tuned on domain-specific data to specialize its capabilities for particular domains or applications, like scientific literature or code generation. Domain-Adapted Large Language Model domain robustness transfer learning Domain-Adapted LLM A layer that computes a dot product between samples in two tensors. Layer that computes a dot product between samples in two tensors. E.g. if applied to a list of two tensors a and b of shape (batch_size, n), the output will be a tensor of shape (batch_size, 1) where each entry i will be the dot product between a[i] and b[i]. Dot Layer A layer that computes a dot product between samples in two tensors. A regularization layer that applies Dropout to the input Applies Dropout to the input. The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged. Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer. (This is in contrast to setting trainable=False for a Dropout layer. trainable does not affect the layer's behavior, as Dropout does not have any variables/weights that can be frozen during training.) Dropout Layer A regularization layer that applies Dropout to the input A cognitive bias in which people with low ability in an area overestimate that ability. Often measured by comparing self-assessment with objective performance. Dunning-Kruger Effect Dunning-Kruger Effect Bias A cognitive bias in which people with low ability in an area overestimate that ability. Often measured by comparing self-assessment with objective performance. A model that allows for time-varying correlations between different time series, used in financial econometrics to model and forecast covariances. DCC Dynamic Conditional Correlation An activation function that is x if x > 0 and alpha * (exp(x) - 1) if x < 0 where alpha controls the value to which an ELU saturates for negative net inputs. ELU Exponential Linear Unit The exponential linear unit (ELU) with alpha > 0 is: x if x > 0 and alpha * (exp(x) - 1) if x < 0 The ELU hyperparameter alpha controls the value to which an ELU saturates for negative net inputs. ELUs diminish the vanishing gradient effect. ELUs have negative values which pushes the mean of the activations closer to zero. Mean activations that are closer to zero enable faster Learning as they bring the gradient closer to the natural gradient. ELUs saturate to a negative value when the argument gets smaller. Saturation means a small derivative which decreases the variation and the information that is propagated to the next layer. ELU Function An activation function that is x if x > 0 and alpha * (exp(x) - 1) if x < 0 where alpha controls the value to which an ELU saturates for negative net inputs. An activation layer that applies the Exponential Linear Unit (ELU) function element-wise. Exponential Linear Unit. ELU Layer An activation layer that applies the Exponential Linear Unit (ELU) function element-wise. A recurrent neural network with a recurrent hidden layer and sparsely connected hidden neurons that learns output weights to produce temporal patterns. ESN Layers: Input, Recurrent, Output Echo State Network A recurrent neural network with a recurrent hidden layer and sparsely connected hidden neurons that learns output weights to produce temporal patterns. A selection and sampling bias occurring when an inference about an individual is made based on their group membership. Ecological Fallacy Ecological Fallacy Bias A selection and sampling bias occurring when an inference about an individual is made based on their group membership. A layer that turns positive integers (indexes) into dense vectors of fixed size. Embedding Layer A layer that turns positive integers (indexes) into dense vectors of fixed size. A large language model that integrates language with other modalities like vision audio and robotics to enable grounded language understanding in real-world environments. Embodied Large Language Model multimodal grounding An embodied LLM integrates language with other modalities like vision, audio, and robotics to enable grounded language understanding in real-world environments. Embodied LLM A use and interpretation bias resulting from the use and reliance on algorithms across new or unanticipated contexts. Emergent Bias A use and interpretation bias resulting from the use and reliance on algorithms across new or unanticipated contexts. The LLM introduced in the "Attention Is All You Need" paper. The encoder processes the input sequence to generate a hidden representation summarizing the input information, while the decoder uses this hidden representation to generate the desired output sequence. Encoder-Decoder LLM The LLM introduced in the "Attention Is All You Need" paper. The encoder processes the input sequence to generate a hidden representation summarizing the input information, while the decoder uses this hidden representation to generate the desired output sequence. A large language model that uses an encoder-only architecture to encode the input sequence into a fixed-length representation which is then used as input to a classifier or regressor for prediction. An encoder-only architecture that encodes the input sequence into a fixed-length representation, which is then used as input to a classifier or regressor for prediction. The model has a pre-trained general-purpose encoder that requires fine-tuning for specific tasks. Encoder LLM A large language model that uses an encoder-only architecture to encode the input sequence into a fixed-length representation which is then used as input to a classifier or regressor for prediction. A LLM which models the explicit probability density over token sequences using an energy function, rather than an autoregressive factorization. This can improve modeling of long-range dependencies and global coherence. Energy-Based Large Language Model energy scoring explicit density modeling Energy-Based LLM A type of machine learning focused on methods that use multiple learning algorithms to achieve better predictive performance than any of the constituent algorithms alone. Ensemble Learning A type of machine learning focused on methods that use multiple learning algorithms to achieve better predictive performance than any of the constituent algorithms alone. A processing bias characterized by the effect of variables' uncertainties (or errors more specifically random errors) on the uncertainty of a function based on them. Error Propagation Error Propagation Bias A processing bias characterized by the effect of variables' uncertainties (or errors more specifically random errors) on the uncertainty of a function based on them. A large language model that is trained to uphold certain ethical principles values or rules in its language generation to increase safety and trustworthiness. Ethical Large Language Model constituitional AI value alignment Ethical LLM A selection and sampling bias arising when testing populations do not equally represent user populations or when inappropriate performance metrics are used. Evaluation Bias A selection and sampling bias arising when testing populations do not equally represent user populations or when inappropriate performance metrics are used. A large language model that applies principles of evolutionary computation to optimize its structure and parameters evolving over time to improve performance. Evolutionary Language Model evolutionary algorithms genetic programming Evolutionary LLM A selection and sampling bias occurring when specific groups of user populations are excluded from testing and analysis. Exclusion Bias A selection and sampling bias occurring when specific groups of user populations are excluded from testing and analysis. A large language model that is designed to provide insights into its decision-making process making it easier for users to understand and trust the model's outputs by incorporating mechanisms for interpreting and explaining its predictions in human-understandable terms. Explainable Language Model XAI LLM interpretability model understanding Explainable LLM An activation function that is the mathematical function denoted by f(x)=exp or e^{x}. The exponential function is a mathematical function denoted by f(x)=exp or e^{x}. Exponential Function An activation function that is the mathematical function denoted by f(x)=exp or e^{x}. A model that combines exponential smoothing with state space modeling, allowing for the inclusion of both trend and seasonal components. Used in forecasting. ETS Exponential Smoothing State Space Model A feedback network with randomly assigned hidden nodes that are not updated during training. ELM Layers: Input, Hidden, Output Extreme Learning Machine A feedback network with randomly assigned hidden nodes that are not updated during training. A language model that views each word as a vector of multiple factors such as part-of-speech morphology and semantics to improve language modeling. Factorized Language Model Factored Language Model A language model that views each word as a vector of multiple factors such as part-of-speech morphology and semantics to improve language modeling. A large language model that decomposes the full language modeling task into multiple sub-components or experts that each focus on a subset of the information enabling more efficient scaling. Factorized Large Language Model Factorized Learning Assisted with Large Language Model Conditional masking Product of experts Factorized LLM A large language model that decomposes the full language modeling task into multiple sub-components or experts that each focus on a subset of the information enabling more efficient scaling. The process of transforming raw data into a set of measurable characteristics that can be used as input for machine learning algorithms enhancing the ability to make accurate predictions. Attribute Extraction Feature Isolation Semantic embeddings Syntactic information Feature Extraction A large language model that is trained in a decentralized manner across multiple devices or silos without directly sharing private data enabling collaborative training while preserving data privacy and security. Federated Large Language Model decentralized training privacy-preserving Federated LLM A deep neural network trained across decentralized edge devices or servers holding local data samples without exchanging them. Training an algorithm across multiple decentralized edge devices or servers holding local data samples without exchanging them. Federated Learning A deep neural network trained across decentralized edge devices or servers holding local data samples without exchanging them. A use and interpretation bias occurring when an algorithm learns from user behavior and feeds that behavior back into the model. Feedback Loop Bias A use and interpretation bias occurring when an algorithm learns from user behavior and feeds that behavior back into the model. An artificial neural network that refines its representations iteratively based on feedback from previous outputs. FBN Layers: Input, Hidden, Output, Hidden Feedback Network A regression analysis model in which the model parameters are fixed or non-random quantities. FEM Fixed Effects Model A regression analysis model in which the model parameters are fixed or non-random quantities. A layer that flattens the input Flattens the input. Does not affect the batch size. Flatten Layer A layer that flattens the input A pooling layer that applies a 2D fractional max pooling over an input signal composed of several input planes. FractionalMaxPool2D FractionalMaxPool2D Layer A pooling layer that applies a 2D fractional max pooling over an input signal composed of several input planes. A pooling layer that applies a 3D fractional max pooling over an input signal composed of several input planes. FractionalMaxPool3D FractionalMaxPool3D Layer A pooling layer that applies a 3D fractional max pooling over an input signal composed of several input planes. A mathematical rule that gives the value of a dependent variable corresponding to specified values of independent variables. Function A mathematical rule that gives the value of a dependent variable corresponding to specified values of independent variables. A bias arising when biased results are reported to support or satisfy the funding agency or financial supporter of a research study. Funding Bias A bias arising when biased results are reported to support or satisfy the funding agency or financial supporter of a research study. An activation function that computes x * P(X <= x) where P(X) ~ N(0 1) weighting inputs by their value rather than gating inputs by their sign as in ReLU. GELU Gaussian Error Linear Unit Gaussian error linear unit (GELU) computes x * P(X <= x), where P(X) ~ N(0, 1). The (GELU) nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLU. GELU Function An activation function that computes x * P(X <= x) where P(X) ~ N(0 1) weighting inputs by their value rather than gating inputs by their sign as in ReLU. A layer that processes one step within the whole time sequence input for a GRU layer. Cell class for the GRU layer. This class processes one step within the whole time sequence input, whereas tf.keras.layer.GRU processes the whole sequence. GRUCell Layer A layer that processes one step within the whole time sequence input for a GRU layer. A recurrent layer that implements the Gated Recurrent Unit architecture. Gated Recurrent Unit - Cho et al. 2014. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: activation == tanh, recurrent_activation == sigmoid, recurrent_dropout == 0, unroll is False, use_bias is True, reset_after is True. Inputs, if use masking, are strictly right-padded. Eager execution is enabled in the outermost context. There are two variants of the GRU implementation. The default one is based on v3 and has reset gate applied to hidden state before matrix multiplication. The other one is based on original and has the order reversed. The second variant is compatible with CuDNNGRU (GPU-only) and allows inference on CPU. Thus it has separate biases for kernel and recurrent_kernel. To use this variant, set reset_after=True and recurrent_activation='sigmoid'. GRU Layer A recurrent layer that implements the Gated Recurrent Unit architecture. A long short-term memory network that is a gating mechanism in recurrent neural networks similar to LSTMs but with fewer parameters and no output gate. GRU Layers: Input, Memory Cell, Output Gated Recurrent Unit A long short-term memory network that is a gating mechanism in recurrent neural networks similar to LSTMs but with fewer parameters and no output gate. A regularization layer that applies multiplicative 1-centered Gaussian noise. Apply multiplicative 1-centered Gaussian noise. As it is a regularization layer, it is only active at training time. GaussianDropout Layer A regularization layer that applies multiplicative 1-centered Gaussian noise. A regularization layer that applies additive zero-centered Gaussian noise. Apply additive zero-centered Gaussian noise. This is useful to mitigate overfitting (you could see it as a form of random data augmentation). Gaussian Noise (GS) is a natural choice as corruption process for real valued inputs. As it is a regularization layer, it is only active at training time. GaussianNoise Layer A regularization layer that applies additive zero-centered Gaussian noise. A model that incorporates lagged conditional variances, allowing for more flexibility in modeling time-varying volatility. GARCH Generalized Autoregressive Conditional Heteroskedasticity A deep neural network that learns novel classes from few samples per class, preventing catastrophic forgetting of base classes and ensuring classifier calibration. GFSL Generalized Few-shot Learning A deep neural network that learns novel classes from few samples per class, preventing catastrophic forgetting of base classes and ensuring classifier calibration. A machine learning model that generalizes linear regression by relating the linear model to the response variable via a link function and allowing the variance of each measurement to be a function of its predicted value. GLM Generalized Linear Model A machine learning model that generalizes linear regression by relating the linear model to the response variable via a link function and allowing the variance of each measurement to be a function of its predicted value. An unsupervised pretrained network framework where two neural networks contest in a game to generate new data with the same statistics as the training set. GAN Layers: Backfed Input, Hidden, Matched Output-Input, Hidden, Matched Output-Input Generative Adversarial Network An unsupervised pretrained network framework where two neural networks contest in a game to generate new data with the same statistics as the training set. A LLM which incorporates a generative adversarial network (GAN) into its training process, using a discriminator network to provide a signal for generating more realistic and coherent text. This adversarial training can improve the quality and diversity of generated text. GAN-Large Language Model Generative Adversarial Network-Augmented Large Language Model adversarial training text generation Generative Adversarial Network-Augmented LLM A large language model that is trained to understand and model basic physics causality and common sense about how the real world works. Generative Commonsense Large Language Model World Model causal modeling physical reasoning Generative Commonsense LLM A large language model that is trained to understand and model basic physics causality and common sense about how the real world works. A language model that enables users to engage in an interactive dialogue with an LLM providing feedback to guide and refine the generated outputs iteratively. Interactive generation Generative Language Interface A pooling layer that performs global average pooling operation for temporal data. GlobalAvgPool1D Global average pooling operation for temporal data. GlobalAveragePooling1D Layer A pooling layer that performs global average pooling operation for temporal data. A pooling layer that performs global average pooling operation for spatial data. GlobalAvgPool2D Global average pooling operation for spatial data. GlobalAveragePooling2D Layer A pooling layer that performs global average pooling operation for spatial data. A pooling layer that performs global average pooling operation for 3D data. GlobalAvgPool3D Global Average pooling operation for 3D data. GlobalAveragePooling3D Layer A pooling layer that performs global average pooling operation for 3D data. A pooling layer that performs global max pooling operation for temporal data. GlobalMaxPool1D Global max pooling operation for 1D temporal data. GlobalMaxPooling1D Layer A pooling layer that performs global max pooling operation for temporal data. A pooling layer that performs global max pooling operation for spatial data. GlobalMaxPool2D Global max pooling operation for spatial data. GlobalMaxPooling2D Layer A pooling layer that performs global max pooling operation for spatial data. A pooling layer that performs global max pooling operation for 3D data. GlobalMaxPool3D Global Max pooling operation for 3D data. GlobalMaxPooling3D Layer A pooling layer that performs global max pooling operation for 3D data. A deep neural network that operates directly on graph structures utilizing structural information. GCN Layers: Input, Hidden, Hidden, Output Graph Convolutional Network A deep neural network that operates directly on graph structures utilizing structural information. A graph convolutional network that generates goal-directed graphs using reinforcement learning and optimizing for rewards and adversarial loss. GPCN Layers: Input, Hidden, Hidden, Policy, Output Graph Convolutional Policy Network A graph convolutional network that generates goal-directed graphs using reinforcement learning and optimizing for rewards and adversarial loss. A language model that operates over structured inputs or outputs represented as graphs enabling reasoning over explicit relational knowledge representations during language tasks. Graph LM Structured representations Graph Language Model A language model that operates over structured inputs or outputs represented as graphs enabling reasoning over explicit relational knowledge representations during language tasks. A bias characterized by favoring members of one's in-group over out-group members expressed in evaluation resource allocation and other ways. In-group Favoritism In-group bias In-group preference In-group–out-group Bias Intergroup bias Favoring members of one's in-group over out-group members, expressed in evaluation, resource allocation, and other ways. Group Bias A bias characterized by favoring members of one's in-group over out-group members expressed in evaluation resource allocation and other ways. A normalization layer that applies Group Normalization over a mini-batch of inputs. GroupNorm Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization GroupNorm Layer A normalization layer that applies Group Normalization over a mini-batch of inputs. A psychological phenomenon where people in a group make non-optimal decisions due to a desire to conform or fear of dissent. Groupthink Groupthink Bias A psychological phenomenon where people in a group make non-optimal decisions due to a desire to conform or fear of dissent. An activation function that is a faster approximation of the sigmoid activation using a piecewise linear approximation. A faster approximation of the sigmoid activation. Piecewise linear approximation of the sigmoid function. Ref: 'https://en.wikipedia.org/wiki/Hard_sigmoid' Hard Sigmoid Function An activation function that is a faster approximation of the sigmoid activation using a piecewise linear approximation. A categorical features preprocessing layer which hashes and bins categorical features. A preprocessing layer which hashes and bins categorical features. This layer transforms categorical inputs to hashed output. It element-wise converts a ints or strings to ints in a fixed range. The stable hash function uses tensorflow::ops::Fingerprint to produce the same output consistently across all platforms. This layer uses FarmHash64 by default, which provides a consistent hashed output across different platforms and is stable across invocations, regardless of device and context, by mixing the input bits thoroughly. If you want to obfuscate the hashed output, you can also pass a random salt argument in the constructor. In that case, the layer will use the SipHash64 hash function, with the salt value serving as additional input to the hash function. Hashing Layer A categorical features preprocessing layer which hashes and bins categorical features. A layer located between the input and output that performs nonlinear transformations of the inputs entered into the network. A hidden layer is located between the input and output of the algorithm, in which the function applies weights to the inputs and directs them through an activation function as the output. In short, the hidden layers perform nonlinear transformations of the inputs entered into the network. Hidden layers vary depending on the function of the neural network, and similarly, the layers may vary depending on their associated weights. Hidden Layer A layer located between the input and output that performs nonlinear transformations of the inputs entered into the network. A classification task focused on methods that group things according to a hierarchy. Methods that group things according to a hierarchy. Hierarchical Classification A classification task focused on methods that group things according to a hierarchy. A clustering method that builds a hierarchy of clusters. HCL Methods that build a hierarchy of clusters. Hierarchical Clustering A clustering method that builds a hierarchy of clusters. A language model that represents language at multiple levels of granularity learning hierarchical representations that capture both low-level patterns and high-level abstractions. Hierarchical LM multi-scale representations Hierarchical Language Model A language model that represents language at multiple levels of granularity learning hierarchical representations that capture both low-level patterns and high-level abstractions. A bias characterized by long-standing biases encoded in society over time distinct from biases in historical description or interpretation. Long-standing biases encoded in society over time, distinct from biases in historical description or the interpretation of history, such as viewing the larger world from a Western or European perspective. Historical Bias A bias characterized by long-standing biases encoded in society over time distinct from biases in historical description or interpretation. A symmetrically connected network that is a type of recurrent artificial neural network serving as a content-addressable memory system. HN Ising model of a neural network Ising–Lenz–Little model Layers: Backfed input Hopfield Network A symmetrically connected network that is a type of recurrent artificial neural network serving as a content-addressable memory system. A use and interpretation bias where individuals perceive benign or ambiguous behaviors as hostile. Bias where individuals perceive benign or ambiguous behaviors as hostile. Hostile Attribution Bias A use and interpretation bias where individuals perceive benign or ambiguous behaviors as hostile. A systematic error in human thought based on heuristic principles leading to simplified judgmental operations. Human Bias A systematic error in human thought based on heuristic principles leading to simplified judgmental operations. An individual bias that arises when users depend on automated systems as heuristic substitutes for their own information-seeking and processing efforts. Human Reporting Bias An individual bias that arises when users depend on automated systems as heuristic substitutes for their own information-seeking and processing efforts. A layer that performs image data preprocessing augmentations. Image Augmentation Layer A layer that performs image data preprocessing augmentations. A layer that performs image data preprocessing operations. Image Preprocessing Layer A layer that performs image data preprocessing operations. An individual bias characterized by unconscious beliefs attitudes feelings associations or stereotypes that affect information processing decision-making and actions. Confirmatory Bias Unconscious beliefs, attitudes, feelings, associations, or stereotypes that affect information processing, decision-making, and actions. Implicit Bias An individual bias characterized by unconscious beliefs attitudes feelings associations or stereotypes that affect information processing decision-making and actions. A language model that uses an energy function to score entire sequences instead of factorizing probabilities autoregressively better capturing global properties and long-range dependencies. Implicit LM Energy-based models Token-level scoring Implicit Language Model A language model that uses an energy function to score entire sequences instead of factorizing probabilities autoregressively better capturing global properties and long-range dependencies. A deep neural network trained on a base set of classes and then presented with novel classes, each with few labeled examples. IFSL Incremenetal Few-shot Learning A deep neural network trained on a base set of classes and then presented with novel classes, each with few labeled examples. A persistent point of view or limited list of such points of view applied by an individual. Individual Bias A persistent point of view or limited list of such points of view applied by an individual. A processing bias arising when machine learning applications generate inputs for other machine learning algorithms passing on any existing bias. Inherited Bias A processing bias arising when machine learning applications generate inputs for other machine learning algorithms passing on any existing bias. A layer composed of artificial input neurons that brings the initial data into the system for further processing by subsequent layers. The input layer of a neural network is composed of artificial input neurons, and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The input layer is the very beginning of the workflow for the artificial neural network. Input Layer A layer composed of artificial input neurons that brings the initial data into the system for further processing by subsequent layers. A layer to be used as an entry point into a Network (a graph of layers). InputLayer Layer A layer to be used as an entry point into a Network (a graph of layers). A layer that specifies the rank Specifies the rank, dtype and shape of every input to a layer. Layers can expose (if appropriate) an input_spec attribute: an instance of InputSpec, or a nested structure of InputSpec instances (one per input tensor). These objects enable the layer to run input compatibility checks for input structure, input rank, input shape, and input dtype. A None entry in a shape is compatible with any dimension, a None shape is compatible with any shape. InputSpec Layer A layer that specifies the rank A normalization layer that applies Instance Normalization over a 2D (unbatched) or 3D (batched) input. InstanceNorm1D Applies Instance Normalization over a 2D (unbatched) or 3D (batched) input as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. InstanceNorm1D Layer A normalization layer that applies Instance Normalization over a 2D (unbatched) or 3D (batched) input. A normalization layer that applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension). Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. InstanceNorm2D A normalization layer that applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension). A normalization layer that applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension). InstanceNorm3D Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. InstanceNorm3D Layer A normalization layer that applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension). A bias exhibited at the level of entire institutions where practices or norms result in the favoring or disadvantaging of certain social groups. Bias exhibited at the level of entire institutions, where practices or norms result in the favoring or disadvantaging of certain social groups, such as institutional racism or sexism. Institutional Bias A bias exhibited at the level of entire institutions where practices or norms result in the favoring or disadvantaging of certain social groups. A LLM which is fine-tuned to follow natural language instructions accurately and safely, learning to map from instructions to desired model behavior in a more controlled and principled way. Instruction-Tuned Large Language Model constitutional AI natural language instructions Instruction-Tuned LLM A categorical features preprocessing layer that maps integer features to contiguous ranges. IntegerLookup Layer A categorical features preprocessing layer that maps integer features to contiguous ranges. An individual bias where users interpret algorithmic outputs according to their internalized biases and views. Interpretation Bias An individual bias where users interpret algorithmic outputs according to their internalized biases and views. A layer that obtains the dot product of input values or subsets of input values. Kernel Layer A machine learning that groups objects by a plurality vote of its neighbors, assigning each object to the class most common among its k nearest neighbors. K-NN KNN K-nearest Neighbor Algorithm A machine learning that groups objects by a plurality vote of its neighbors, assigning each object to the class most common among its k nearest neighbors. A classification and clustering that classifies objects by a plurality vote of its neighbors, assigning each object to the class most common among its k nearest neighbors. K-NN KNN K-nearest Neighbor Classification Algorithm A classification and clustering that classifies objects by a plurality vote of its neighbors, assigning each object to the class most common among its k nearest neighbors. An regression analysis that assigns the average of the values of k nearest neighbors to objects. K-NN KNN K-nearest Neighbor Regression Algorithm An regression analysis that assigns the average of the values of k nearest neighbors to objects. A LLM which incorporates external knowledge sources or knowledge bases into the model architecture, enabling it to generate more factually accurate and knowledge-aware text. Knowledge-Grounded Large Language Model factual grounding knowledge integration Knowledge-Grounded LLM The process by which knowledge is passed from one entity such as a person organization or system to another facilitating learning and adaptation in the receiving entity through various methods such as teaching training or data exchange. Inductive Transfer Skill Acquisition Adaptation Pretrained models Knowledge Transfer The process by which knowledge is passed from one entity such as a person organization or system to another facilitating learning and adaptation in the receiving entity through various methods such as teaching training or data exchange. A network that is an unsupervised technique producing a low-dimensional representation of high-dimensional data preserving topological structure. KN SOFM SOM Self-Organizing Feature Map Self-Organizing Map Layers: Input, Hidden Kohonen Network A network that is an unsupervised technique producing a low-dimensional representation of high-dimensional data preserving topological structure. A pooling layer that applies 1D power-average pooling over an input signal composed of several input planes. LPPool1D LPPool1D Layer A pooling layer that applies 1D power-average pooling over an input signal composed of several input planes. A pooling layer that applies 2D power-average pooling over an input signal composed of several input planes. LPPool2D LPPool2D Layer A pooling layer that applies 2D power-average pooling over an input signal composed of several input planes. A layer that processes one step within the whole time sequence input for an LSTM layer. Cell class for the LSTM layer. LSTMCell Layer A layer that processes one step within the whole time sequence input for an LSTM layer. A recurrent layer that implements the Long Short-Term Memory architecture. Long Short-Term Memory layer - Hochreiter 1997. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: 1. activation == tanh, 2. recurrent_activation == sigmoid, 3. recurrent_dropout == 0, 4. unroll is False, 5. use_bias is True, 6. Inputs, if use masking, are strictly right-padded, 7. Eager execution is enabled in the outermost context. LSTM Layer A recurrent layer that implements the Long Short-Term Memory architecture. A layer that wraps arbitrary expressions as a Layer object. Wraps arbitrary expressions as a Layer object. Lambda Layer A layer that wraps arbitrary expressions as a Layer object. A large language model that supports interactive semantic parsing enabling users to provide feedback and corrections to dynamically refine and update the language model. Interactive learning Language Interface LLM A model designed to predict the next word in a sequence or assign probabilities to sequences of words in natural language. Language Model A model designed to predict the next word in a sequence or assign probabilities to sequences of words in natural language. A language model consisting of a neural network with many parameters (typically billions of weights or more) trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning. LLM Large Language Model A language model consisting of a neural network with many parameters (typically billions of weights or more) trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning. A regression analysis method that performs both variable selection and regularization to enhance prediction accuracy and interpretability. Lasso Regression A regression analysis method that performs both variable selection and regularization to enhance prediction accuracy and interpretability. A structure or network topology in a deep learning model that takes information from previous layers and passes it to the next layer. Layer A structure or network topology in a deep learning model that takes information from previous layers and passes it to the next layer. The base class from which all layers inherit. This is the class from which all layers inherit. A layer is a callable object that takes as input one or more tensors and that outputs one or more tensors. It involves computation, defined in the call() method, and a state (weight variables). State can be created in various places, at the convenience of the subclass implementer: in __init__(); in the optional build() method, which is invoked by the first __call__() to the layer, and supplies the shape(s) of the input(s), which may not have been known at initialization time; in the first invocation of call(), with some caveats discussed below. Users will just instantiate a layer and then treat it as a callable. Layer Layer The base class from which all layers inherit. A normalization layer that applies Layer Normalization over a mini-batch of inputs. LayerNorm Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization LayerNorm Layer A normalization layer that applies Layer Normalization over a mini-batch of inputs. A normalization layer that applies Layer Normalization over the inputs. Layer normalization layer (Ba et al., 2016). Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1. Given a tensor inputs, moments are calculated and normalization is performed across the axes specified in axis. LayerNormalization Layer A normalization layer that applies Layer Normalization over the inputs. A batch normalization layer that lazily initializes the num_features argument from the input size for 1D data. LazyBatchNorm1D A torch.nn.BatchNorm1D module with lazy initialization of the num_features argument of the BatchNorm1D that is inferred from the input.size(1). LazyBatchNorm1D Layer A batch normalization layer that lazily initializes the num_features argument from the input size for 1D data. A batch normalization layer that lazily initializes the num_features argument from the input size for 2D data. LazyBatchNorm2D A torch.nn.BatchNorm2D module with lazy initialization of the num_features argument of the BatchNorm2D that is inferred from the input.size(1). LazyBatchNorm2D Layer A batch normalization layer that lazily initializes the num_features argument from the input size for 2D data. A batch normalization layer that lazily initializes the num_features argument from the input size for 3D data. LazyBatchNorm3D A torch.nn.BatchNorm3D module with lazy initialization of the num_features argument of the BatchNorm3D that is inferred from the input.size(1). LazyBatchNorm3D Layer A batch normalization layer that lazily initializes the num_features argument from the input size for 3D data. An instance normalization layer that lazily initializes the num_features argument from the input size for 1D data. LazyInstanceNorm1D A torch.nn.InstanceNorm1D module with lazy initialization of the num_features argument of the InstanceNorm1D that is inferred from the input.size(1). LazyInstanceNorm1D Layer An instance normalization layer that lazily initializes the num_features argument from the input size for 1D data. An instance normalization layer that lazily initializes the num_features argument from the input size for 2D data. LazyInstanceNorm2D A torch.nn.InstanceNorm2D module with lazy initialization of the num_features argument of the InstanceNorm2D that is inferred from the input.size(1). LazyInstanceNorm2D Layer An instance normalization layer that lazily initializes the num_features argument from the input size for 2D data. An instance normalization layer that lazily initializes the num_features argument from the input size for 3D data. LazyInstanceNorm3D A torch.nn.InstanceNorm3D module with lazy initialization of the num_features argument of the InstanceNorm3D that is inferred from the input.size(1). LazyInstanceNorm3D Layer An instance normalization layer that lazily initializes the num_features argument from the input size for 3D data. An activation layer that applies the leaky rectified linear unit function element-wise. Leaky version of a Rectified Linear Unit. LeakyReLU Layer An activation layer that applies the leaky rectified linear unit function element-wise. A regression analysis which approximates the solution of overdetermined systems by minimizing the sum of the squares of the residuals. Least-squares Analysis A regression analysis which approximates the solution of overdetermined systems by minimizing the sum of the squares of the residuals. A large language model that continually acquires new knowledge over time without forgetting previously learned information maintaining a balance between plasticity and stability. Continual Learning LLM Forever Learning Catastrophic forgetting Plasticity-Stability balance Lifelong Learning LLM An activation function that has the form f(x) = a + bx. Linear Function An activation function that has the form f(x) = a + bx. A regression analysis model that is a linear approach for modeling the relationship between a scalar response and one or more explanatory variables. Linear Regression A regression analysis model that is a linear approach for modeling the relationship between a scalar response and one or more explanatory variables. A use and interpretation bias arising when network attributes obtained from user connections activities or interactions misrepresent true user behavior. Bias arising when network attributes obtained from user connections, activities, or interactions misrepresent true user behavior. Linking Bias A use and interpretation bias arising when network attributes obtained from user connections activities or interactions misrepresent true user behavior. A network that is a type of reservoir computer turning time-varying input into spatio-temporal activation patterns. LSM Layers: Input, Spiking Hidden, Output Liquid State Machine Network A network that is a type of reservoir computer turning time-varying input into spatio-temporal activation patterns. A normalization layer that applies local response normalization over an input signal composed of several input planes. LocalResponseNorm Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. LocalResponseNorm Layer A normalization layer that applies local response normalization over an input signal composed of several input planes. A locally-connected layer for 1D inputs where each patch of the input is convolved with a different set of filters. Locally-connected layer for 1D inputs. The LocallyConnected1D layer works similarly to the Conv1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. LocallyConnected1D Layer A locally-connected layer for 1D inputs where each patch of the input is convolved with a different set of filters. A locally-connected layer for 2D inputs where each patch of the input is convolved with a different set of filters. Locally-connected layer for 2D inputs. The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. LocallyConnected2D Layer A locally-connected layer for 2D inputs where each patch of the input is convolved with a different set of filters. A layer that works similarly to the Convolution1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. Locally-connected Layer A layer that works similarly to the Convolution1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. A regression analysis model that estimates the probability of an event occurring by modeling the log-odds of the event as a linear combination of one or more independent variables. Logistic Regression A regression analysis model that estimates the probability of an event occurring by modeling the log-odds of the event as a linear combination of one or more independent variables. A recurrent neural network with feedback connections that processes entire sequences of data. LSTM Layers: Input, Memory Cell, Output Long Short Term Memory A recurrent neural network with feedback connections that processes entire sequences of data. An individual bias occurring when automation leads to humans being unaware of their situation making them unprepared to assume control in cooperative systems. Loss Of Situational Awareness Bias An individual bias occurring when automation leads to humans being unaware of their situation making them unprepared to assume control in cooperative systems. A LLM which is optimized for performance in scenarios with limited data, computational resources, or for languages with sparse datasets. Low-Resource Language Model low-resource languages resource-efficient Low-Resource LLM A field of inquiry devoted to understanding and building methods that learn from data to improve performance on a set of tasks. Machine Learning A field of inquiry devoted to understanding and building methods that learn from data to improve performance on a set of tasks. A dimensionality reduction method based on the assumption that observed data lie on a low-dimensional manifold embedded in a higher-dimensional space. Manifold Learning A dimensionality reduction method based on the assumption that observed data lie on a low-dimensional manifold embedded in a higher-dimensional space. A network that is a stochastic model describing a sequence of possible events where the probability of each event depends only on the previous event's state. MC MP Markov Process Layers: Probalistic Hidden Markov Chain A network that is a stochastic model describing a sequence of possible events where the probability of each event depends only on the previous event's state. A language model that is trained to predict randomly masked tokens in a sequence based on the remaining unmasked tokens allowing it to build deep bidirectional representations that can be effectively transferred to various NLP tasks via fine-tuning. bidirectional encoder denoising autoencoder Masked Language Model A layer that masks a sequence by using a mask value to skip timesteps. Masks a sequence by using a mask value to skip timesteps. For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking). If any downstream layer does not support masking yet receives such an input mask, an exception will be raised. Masking Layer A layer that masks a sequence by using a mask value to skip timesteps. An input layer with a shape corresponding to that of the output layer. Matched Input-Output Layer A pooling layer that performs max pooling operation for temporal data. MaxPool1D MaxPooling1D Max pooling operation for 1D temporal data. Downsamples the input representation by taking the maximum value over a spatial window of size pool_size. The window is shifted by strides. The resulting output, when using the "valid" padding option, has a shape of: output_shape = (input_shape - pool_size + 1) / strides) The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides. MaxPooling1D Layer A pooling layer that performs max pooling operation for temporal data. A pooling layer that performs max pooling operation for spatial data. MaxPool2D MaxPooling2D Max pooling operation for 2D spatial data. MaxPooling2D Layer A pooling layer that performs max pooling operation for spatial data. A pooling layer that performs max pooling operation for 3D data (spatial or spatio-temporal). MaxPool3D MaxPooling3D Max pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. MaxPooling3D Layer A pooling layer that performs max pooling operation for 3D data (spatial or spatio-temporal). A pooling layer that computes a partial inverse of MaxPool1D. MaxUnpool1D Computes a partial inverse of MaxPool1D. MaxUnpool1D Layer A pooling layer that computes a partial inverse of MaxPool1D. A pooling layer that computes a partial inverse of MaxPool2D. MaxUnpool2D Computes a partial inverse of MaxPool2D. MaxUnpool2D Layer A pooling layer that computes a partial inverse of MaxPool2D. A pooling layer that computes a partial inverse of MaxPool3D. MaxUnpool3D Computes a partial inverse of MaxPool3D. MaxUnpool3D Layer A pooling layer that computes a partial inverse of MaxPool3D. A merging layer that computes the maximum (element-wise) of a list of inputs. Layer that computes the maximum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Maximum Layer A merging layer that computes the maximum (element-wise) of a list of inputs. A selection and sampling bias arising when features and labels are proxies for desired quantities potentially leading to differential performance. Measurement Bias A selection and sampling bias arising when features and labels are proxies for desired quantities potentially leading to differential performance. A LLM which incorporates external writable and readable memory components, allowing it to store and retrieve information over long contexts. Memory-Augmented Large Language Model external memory Memory-Augmented LLM A LLM which incorporates external writable and readable memory components, allowing it to store and retrieve information over long contexts. A layer of cells, each with an internal state or weights. Memory Cell Layer A layer of cells, each with an internal state or weights. A layer used to merge a list of inputs. Merging Layer A layer used to merge a list of inputs. A machine learning that automatically learns from metadata about machine learning experiments. Meta-Learning A machine learning that automatically learns from metadata about machine learning experiments. A LLM which is trained in a way that allows it to quickly adapt to new tasks or datasets through only a few examples or fine-tuning steps, leveraging meta-learned priors about how to efficiently learn. Meta-Learning Large Language Model few-shot adaptation learning to learn Meta-Learning LLM A deep neural network that learns a representation function mapping objects into an embedded space. Distance Metric Learning Learning a representation function that maps objects into an embedded space. Metric Learning A deep neural network that learns a representation function mapping objects into an embedded space. A merging layer that computes the minimum (element-wise) of a list of inputs. Layer that computes the minimum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Minimum Layer A merging layer that computes the minimum (element-wise) of a list of inputs. A LLM which dynamically selects and combines outputs from multiple expert submodels, allowing for efficient scaling by conditionally activating only a subset of model components for each input. Mixture-of-Experts Large Language Model MoE Large Language Model conditional computation model parallelism Mixture-of-Experts LLM A LLM which dynamically selects and combines outputs from multiple expert submodels, allowing for efficient scaling by conditionally activating only a subset of model components for each input. A bias occurring when modal interfaces confuse human operators causing actions appropriate for a different mode but incorrect for the current situation. Mode Confusion Bias A bias occurring when modal interfaces confuse human operators causing actions appropriate for a different mode but incorrect for the current situation. An abstract representation of a complex system generally assembled as a set of logical mathematical or conceptual properties to simulate or understand the system's behavior. Model An abstract representation of a complex system generally assembled as a set of logical mathematical or conceptual properties to simulate or understand the system's behavior. Techniques aimed at making models more efficient such as knowledge distillation. Computational Efficiency Model Optimization Model Efficiency Techniques aimed at making models more efficient such as knowledge distillation. A processing bias introduced when using data to select a single "best" model from many or when an explanatory variable has a weak relationship with the response variable. Model Selection Bias A processing bias introduced when using data to select a single "best" model from many or when an explanatory variable has a weak relationship with the response variable. A modular large language model that consists of multiple specialized components or skills that can be dynamically composed and recombined to solve complex tasks mimicking the modular structure of human cognition. Modular Large Language Model component skills skill composition Modular LLM A modular large language model that consists of multiple specialized components or skills that can be dynamically composed and recombined to solve complex tasks mimicking the modular structure of human cognition. A language model that consists of multiple specialized components or skills that can be dynamically composed and recombined to solve complex tasks mimicking the modular structure of human cognition. Modular LM Modular Language Model A language model that consists of multiple specialized components or skills that can be dynamically composed and recombined to solve complex tasks mimicking the modular structure of human cognition. An attention layer that allows the model to attend to information from different representation subspaces. MultiHeadAttention layer. This is an implementation of multi-headed attention as described in the paper "Attention is all you Need" (Vaswani et al., 2017). If query, key, value are the same, then this is self-attention. Each timestep in query attends to the corresponding sequence in key, and returns a fixed-width vector.This layer first projects query, key and value. These are (effectively) a list of tensors of length num_attention_heads, where the corresponding shapes are (batch_size, <query dimensions>, key_dim), (batch_size, <key/value dimensions>, key_dim), (batch_size, <key/value dimensions>, value_dim).Then, the query and key tensors are dot-producted and scaled. These are softmaxed to obtain attention probabilities. The value tensors are then interpolated by these probabilities, then concatenated back to a single tensor. Finally, the result tensor with the last dimension as value_dim can take an linear projection and return. When using MultiHeadAttention inside a custom Layer, the custom Layer must implement build() and call MultiHeadAttention's _build_from_signature(). This enables weights to be restored correctly when the model is loaded. MultiHeadAttention Layer An attention layer that allows the model to attend to information from different representation subspaces. A LLM which is trained jointly on multiple language tasks simultaneously, learning shared representations that transfer across tasks. Multi-Task Large Language Model transfer learning Multi-Task LLM A machine learning task focused on methods that classify instances into one of three or more classes. Multinomial Classification Methods that classify instances into one of three or more classes. Multiclass Classification A machine learning task focused on methods that classify instances into one of three or more classes. A dimensionality reduction method that translates information about the pairwise distances among a set of objects or individuals into a configuration of points mapped into an abstract Cartesian space. MDS Multidimensional Scaling A dimensionality reduction method that translates information about the pairwise distances among a set of objects or individuals into a configuration of points mapped into an abstract Cartesian space. A large language model that is trained on text from multiple languages learning shared representations that enable zero-shot or few-shot transfer to new languages. Multilingual Large Language Model cross-lingual transfer Multilingual LLM A deep neural network that processes and links information using various modalities. Creating models that process and link information using various modalities. Multimodal Deep Learning A deep neural network that processes and links information using various modalities. A large language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation. cross-modal grounding Multimodal Fusion LLM A multimodal large language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation. Multimodal Large Language Model cross-modal grounding Multimodal LLM A multimodal large language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation. A language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation. Mulimodal LM Multimodal Language Model A language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation. A type of machine learning that uses multiple modalities of data such as text audio and images to improve learning outcomes. A type of deep learning that uses multiple modalities of data, such as text, audio, and images, to improve learning outcomes. Multimodal Learning A type of machine learning that uses multiple modalities of data such as text audio and images to improve learning outcomes. A multimodal LLM which processes prompts that include multiple modalities, such as both text and images, to generate relevant responses. Multimodal Prompt-based Language Model A multimodal LLM which processes prompts that include multiple modalities, such as both text and images, to generate relevant responses. A transformer network that processes and relates information from different modalities such as text images and audio using a shared embedding space and attention mechanism to learn joint representations across modalities. unified encoder vision-language model Multimodal Transformer A merging layer that multiplies (element-wise) a list of inputs. Layer that multiplies (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Multiply Layer A merging layer that multiplies (element-wise) a list of inputs. A subfield of machine learning focused on the interactions between computers and human language including programming computers to process and analyze large amounts of natural language data. NLP Natural Language Processing A subfield of machine learning focused on the interactions between computers and human language including programming computers to process and analyze large amounts of natural language data. A system of interconnected nodes or entities for communication computation or data exchange. Network A deep feedforward network that combines neural network pattern matching with the algorithmic power of programmable computers. NTM Layers: Input, Hidden, Spiking Hidden, Output Neural Turing Machine Network A deep feedforward network that combines neural network pattern matching with the algorithmic power of programmable computers. A LLM which combines neural language modeling with symbolic reasoning components, leveraging structured knowledge representations and logical inferences to improve reasoning capabilities. Neuro-Symbolic Large Language Model knowledge reasoning symbolic grounding Neuro-Symbolic LLM A layer that is a densely-connected neural network layer with added noise for regularization. Noisy dense layer that injects random noise to the weights of dense layer. Noisy dense layers are fully connected layers whose weights and biases are augmented by factorised Gaussian noise. The factorised Gaussian noise is controlled through gradient descent by a second weights layer. A NoisyDense layer implements the operation: $$ mathrm{NoisyDense}(x) = mathrm{activation}(mathrm{dot}(x, mu + (sigma cdot epsilon)) mathrm{bias}) $$ where mu is the standard weights layer, epsilon is the factorised Gaussian noise, and delta is a second weights layer which controls epsilon. Noise Dense Layer A layer that is a densely-connected neural network layer with added noise for regularization. An input layer that adds noise to each value. Noisy Input Layer An input layer that adds noise to each value. The technique of transforming data into a standard format or scale typically to reduce redundancy and improve consistency often involving the adjustment of values measured on different scales to a common scale. Normalization A preprocessing layer that normalizes continuous features. Normalization Layer A preprocessing layer that normalizes continuous features. A layer that performs numerical data preprocessing operations. Numerical Features Preprocessing Layer A layer that performs numerical data preprocessing operations. A deep neural network that classified objects from one or only a few examples. OSL One-shot Learning A deep neural network that classified objects from one or only a few examples. A large language model that is trained to model ordinal relationships and rank outputs rather than model probability distributions over text sequences directly. Ordinal Large Language Model preference modeling ranking Ordinal LLM A layer containing the last neurons in the network that produces given outputs for the program. The output layer in an artificial neural network is the last layer of neurons that produces given outputs for the program. Though they are made much like other artificial neurons in the neural network, output layer neurons may be built or observed in a different way, given that they are the last “actor” nodes on the network. Output Layer A layer containing the last neurons in the network that produces given outputs for the program. An activation layer that applies parametric rectified linear unit function element-wise. Parametric Rectified Linear Unit. PReLU Layer An activation layer that applies parametric rectified linear unit function element-wise. An artificial neural network with a supervised learning algorithm for binary classification using a linear predictor function. FFN Feed-Forward Network SLP Single Layer Perceptron Layers: Input, Output Perceptron A layer that permutes the dimensions of the input according to a given pattern. Permutes the dimensions of the input according to a given pattern. Useful e.g. connecting RNNs and convnets. Permute Layer A layer that permutes the dimensions of the input according to a given pattern. A large language model that adapts its language modeling and generation to the preferences style and persona of individual users or audiences. Personalized Large Language Model user adaptation LLM Personalized LLM A layer that, after taking a set of states or values as input, predicts a probability distribution of actions to take. Policy Layer A layer that serves to mitigate the sensitivity of convolutional layers to location and spatially downsample representations. Pooling layers serve the dual purposes of mitigating the sensitivity of convolutional layers to location and of spatially downsampling representations. Pooling Layer A layer that serves to mitigate the sensitivity of convolutional layers to location and spatially downsample representations. A selection and sampling bias where more popular items are more exposed under-representing less popular items. Selection bias where more popular items are more exposed, under-representing less popular items. Popularity Bias A selection and sampling bias where more popular items are more exposed under-representing less popular items. A selection and sampling bias characterized by systematic distortions in demographics or other user characteristics between represented users and the target population. Population Bias A selection and sampling bias characterized by systematic distortions in demographics or other user characteristics between represented users and the target population. The series of steps applied to raw data before it is used in a machine learning model including tasks such as normalization scaling encoding and transformation to ensure the data is in an appropriate format and quality for analysis. Preprocessing The series of steps applied to raw data before it is used in a machine learning model including tasks such as normalization scaling encoding and transformation to ensure the data is in an appropriate format and quality for analysis. A layer that performs data preprocessing operations. Preprocessing Layer A layer that performs data preprocessing operations. An individual bias arising from how information is presented on the Web via a user interface due to rating or ranking of output or through users' self-selected biased interaction. Presentation Bias An individual bias arising from how information is presented on the Web via a user interface due to rating or ranking of output or through users' self-selected biased interaction. A dimensionality reduction method for analyzing large datasets with high-dimensional features per observation increasing data interpretability while preserving maximum information and enabling visualization. PCA Principal Component Analysis A dimensionality reduction method for analyzing large datasets with high-dimensional features per observation increasing data interpretability while preserving maximum information and enabling visualization. A machine learning model in which a graph expresses the conditional dependence structure between random variables. Graphical Model PGM Structure Probabilistic Model Probabilistic Graphical Model A machine learning model in which a graph expresses the conditional dependence structure between random variables. A hidden layer that estimates the probability of a sample being within a certain category. Probabilistic Hidden Layer A probabilistic graphical model that uses statistical techniques to analyze the words in each text to discover common themes their connections and their changes over time. Probabilistic Topic Model A probabilistic graphical model that uses statistical techniques to analyze the words in each text to discover common themes their connections and their changes over time. A computational bias resulting from judgment modulated by affect influenced by the level of efficacy and efficiency in information processing. Validation Bias Judgment modulated by affect, influenced by the level of efficacy and efficiency in information processing; often referred to as aesthetic judgment in cognitive sciences. Processing Bias A computational bias resulting from judgment modulated by affect influenced by the level of efficacy and efficiency in information processing. A LLM which is fine-tuned on a small number of examples or prompts, rather than full task datasets. This allows for rapid adaptation to new tasks with limited data, leveraging the model's few-shot learning capabilities. Prompt-based Fine-Tuning Large Language Model Prompt-tuned Large Language Model few-shot learning in-context learning Prompt-based Fine-Tuning LLM A regression analysis method for survival analysis where the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. Proportional Hazards Model A regression analysis method for survival analysis where the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. The base class for recurrent layers. Base class for recurrent layers. RNN Layer The base class for recurrent layers. A deep feedforward network that uses radial basis functions as activation functions for pattern recognition and interpolation. RBFN RBN Radial Basis Function Network Layers: Input, Hidden, Output Radial Basis Network A deep feedforward network that uses radial basis functions as activation functions for pattern recognition and interpolation. An image preprocessing layer that randomly adjusts brightness during training. A preprocessing layer which randomly adjusts brightness during training. This layer will randomly increase/reduce the brightness for the input RGB images. At inference time, the output will be identical to the input. Call the layer with training=True to adjust the brightness of the input. Note that different brightness adjustment factors will be apply to each the images in the batch. RandomBrightness Layer An image preprocessing layer that randomly adjusts brightness during training. An image preprocessing layer that randomly adjusts contrast during training. A preprocessing layer which randomly adjusts contrast during training. This layer will randomly adjust the contrast of an image or images by a random factor. Contrast is adjusted independently for each channel of each image during training. For each channel, this layer computes the mean of the image pixels in the channel and then adjusts each component x of each pixel to (x - mean) * contrast_factor + mean. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and in integer or floating point dtype. By default, the layer will output floats. The output value will be clipped to the range [0, 255], the valid range of RGB colors. RandomContrast Layer An image preprocessing layer that randomly adjusts contrast during training. An image preprocessing layer that randomly crops images during training. A preprocessing layer which randomly crops images during training. During training, this layer will randomly choose a location to crop images down to a target size. The layer will crop all the images in the same batch to the same cropping location. At inference time, and during training if an input image is smaller than the target size, the input will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. If you need to apply random cropping at inference time, set training to True when calling the layer. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. RandomCrop Layer An image preprocessing layer that randomly crops images during training. A regression analysis model where the model parameters are random variables. REM Random Effects Model A regression analysis model where the model parameters are random variables. An image preprocessing layer that randomly flips images during training. A preprocessing layer which randomly flips images during training. This layer will flip the images horizontally and or vertically based on the mode attribute. During inference time, the output will be identical to input. Call the layer with training=True to flip the input. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. RandomFlip Layer An image preprocessing layer that randomly flips images during training. An ensemble learning method for classification regression and other tasks that constructs a multitude of decision trees during training. Random Forest An ensemble learning method for classification regression and other tasks that constructs a multitude of decision trees during training. An image preprocessing layer that randomly varies image height during training. A preprocessing layer which randomly varies image height during training. This layer adjusts the height of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference. RandomHeight Layer An image preprocessing layer that randomly varies image height during training. An image preprocessing layer that randomly rotates images during training. A preprocessing layer which randomly rotates images during training. RandomRotation Layer An image preprocessing layer that randomly rotates images during training. An image preprocessing layer that randomly translates images during training. A preprocessing layer which randomly translates images during training. This layer will apply random translations to each image during training, filling empty space according to fill_mode. aInput pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. RandomTranslation Layer An image preprocessing layer that randomly translates images during training. An image preprocessing layer that randomly varies image width during training. A preprocessing layer which randomly varies image width during training. This layer will randomly adjusts the width of a batch of images of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference. RandomWidth Layer An image preprocessing layer that randomly varies image width during training. An image preprocessing layer that randomly zooms in or out on images during training. A preprocessing layer which randomly zooms images during training. This layer will randomly zoom in or out on each axis of an image independently, filling empty space according to fill_mode.Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. RandomZoom Layer An image preprocessing layer that randomly zooms in or out on images during training. An anchoring bias characterized by the idea that top-ranked results are the most relevant and important leading to more clicks than other results. Ranking Bias An anchoring bias characterized by the idea that top-ranked results are the most relevant and important leading to more clicks than other results. An individual bias characterized by differences in perspective memory recall interpretation and reporting of the same event by multiple persons or witnesses. Rashomon Effect Rashomon Principle Differences in perspective, memory, recall, interpretation, and reporting of the same event by multiple persons or witnesses. Rashomon Effect Bias An individual bias characterized by differences in perspective memory recall interpretation and reporting of the same event by multiple persons or witnesses. An activation function that returns max(x 0) the element-wise maximum of 0 and the input tensor. ReLU Rectified Linear Unit The ReLU activation function returns: max(x, 0), the element-wise maximum of 0 and the input tensor. ReLU Function An activation function that returns max(x 0) the element-wise maximum of 0 and the input tensor. An activation layer that applies the rectified linear unit function element-wise. Rectified Linear Unit activation function. With default values, it returns element-wise max(x, 0). ReLU Layer An activation layer that applies the rectified linear unit function element-wise. A large language model that incorporates explicit reasoning capabilities leveraging logical rules axioms or external knowledge to make deductive inferences during language tasks. Rational Large Language Model Reasoning Large Language Model logical inferences reasoning Reasoning LLM A large language model that incorporates explicit reasoning capabilities leveraging logical rules axioms or external knowledge to make deductive inferences during language tasks. A layer composed of recurrent units with the number equal to the hidden size of the layer. Recurrent Layer A layer composed of recurrent units with the number equal to the hidden size of the layer. A deep neural network with connections forming a directed graph along a temporal sequence enabling dynamic behavior. RN RecNN Recurrent Network Recurrent Neural Network A large language model that uses recursive neural network architectures like TreeLSTMs to learn syntactic composition functions improving systematic generalization abilities. Recursive Large Language Model Self-Attending Large Language Model iterative refinement self-attention Recursive LLM A large language model that uses recursive neural network architectures like TreeLSTMs to learn syntactic composition functions improving systematic generalization abilities. A language model that uses recursive neural network architectures like TreeLSTMs to learn syntactic composition functions improving systematic generalization abilities. RLM Compositional generalization Layers: Input, Memory Cell, Output Recursive Language Model A language model that uses recursive neural network architectures like TreeLSTMs to learn syntactic composition functions improving systematic generalization abilities. A deep neural network that recursively applies weights over structured input to generate structured or scalar predictions. RecuNN RvNN Recursive Neural Network A deep neural network that recursively applies weights over structured input to generate structured or scalar predictions. A set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. Regression analysis Regression model Regression Analysis A set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. A layer that applies penalties on layer parameters or layer activity during optimization summed into the loss function that the network optimizes. Regularizers allow you to apply penalties on layer parameters or layer activity during optimization. These penalties are summed into the loss function that the network optimizes. Regularization penalties are applied on a per-layer basis. Regularization Layer A layer that applies penalties on layer parameters or layer activity during optimization summed into the loss function that the network optimizes. A type of machine learning focused on methods that do not require labeled input/output pairs or explicit correction of sub-optimal actions focusing instead on balancing exploration and exploitation to optimize performance over time. Reinforcement Learning A type of machine learning focused on methods that do not require labeled input/output pairs or explicit correction of sub-optimal actions focusing instead on balancing exploration and exploitation to optimize performance over time. A large language model that is fine-tuned using reinforcement learning where the model receives rewards for generating text that satisfies certain desired properties or objectives improving the quality safety or alignment of generated text. RL-Large Language Model Reinforcement Learning Large Language Model decision transformers reward modeling An RL-LLM is a language model fine-tuned using reinforcement learning, where the model receives rewards for generating text that satisfies certain desired properties or objectives. This can improve the quality, safety, or alignment of generated text. Reinforcement Learning LLM A layer that repeats the input n times. Repeats the input n times. RepeatVector Layer A layer that repeats the input n times. A selection and sampling bias due to non-random sampling of subgroups making trends non-generalizable to new populations. Bias due to non-random sampling of subgroups, making trends non-generalizable to new populations. Representation Bias A selection and sampling bias due to non-random sampling of subgroups making trends non-generalizable to new populations. A deep neural network that discovers representations required for feature detection or classification from raw data. Feature Learning Discovering representations required for feature detection or classification from raw data. Representation Learning A deep neural network that discovers representations required for feature detection or classification from raw data. A preprocessing layer that rescales input values to a new range. Rescaling Layer A preprocessing layer that rescales input values to a new range. A layer that reshapes the inputs into the given shape. Reshape Layer A layer that reshapes the inputs into the given shape. A layer that is used to change the shape of the input. Reshape Layer Reshape layers are used to change the shape of the input. Reshaping Layer A layer that is used to change the shape of the input. A deep neural network that employs skip connections to bypass layers facilitating learning of residual functions. DRN Deep Residual Network ResNN ResNet Layers: Input, Weight, BN, ReLU, Weight, BN, Addition, ReLU Residual Neural Network A deep neural network that employs skip connections to bypass layers facilitating learning of residual functions. A preprocessing layer that resizes images to a target size. A preprocessing layer which resizes images. This layer resizes an image input to a target height and width. The input should be a 4D (batched) or 3D (unbatched) tensor in "channels_last" format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. This layer can be called on tf.RaggedTensor batches of input images of distinct sizes, and will resize the outputs to dense tensors of uniform size. Resizing Layer A preprocessing layer that resizes images to a target size. A Boltzmann machine network that learns the probability distribution of its input data. RBM Layers: Backfed Input, Probabilistic Hidden Restricted Boltzmann Machine A Boltzmann machine network that learns the probability distribution of its input data. A LLM which combines a pre-trained language model with a retrieval system that can access external knowledge sources. This allows the model to condition its generation on relevant retrieved knowledge, improving factual accuracy and knowledge grounding. Retrieval-Augmented Large Language Model knowledge grounding open-book question answering Retrieval-Augmented LLM A regression analysis method that estimates the coefficients of multiple regression models in scenarios where the independent variables are highly correlated. Ridge Regression A regression analysis method that estimates the coefficients of multiple regression models in scenarios where the independent variables are highly correlated. An activation function that multiplies scale (> 1) with the output of the ELU function to ensure a slope larger than one for positive inputs. SELU Scaled Exponential Linear Unit SELU Function An activation function that multiplies scale (> 1) with the output of the ELU function to ensure a slope larger than one for positive inputs. A model that extends ARIMA, explicitly supporting univariate time series data with a seasonal component, combining seasonal differencing with ARIMA modeling. SARIMA Seasonal Autoregressive Integrated Moving-Average A computational bias introduced by non-random selection of individuals groups or data failing to ensure representativeness. Sampling Bias Selection Bias Selection Effect Bias introduced by non-random selection of individuals, groups, or data, failing to ensure representativeness. Selection And Sampling Bias A computational bias introduced by non-random selection of individuals groups or data failing to ensure representativeness. An individual bias characterized by the tendency to selectively adopt algorithmic advice that matches pre-existing beliefs and stereotypes. Selective Adherence Bias An individual bias characterized by the tendency to selectively adopt algorithmic advice that matches pre-existing beliefs and stereotypes. A LLM which learns rich representations by solving pretext tasks that involve predicting parts of the input from other observed parts of the data, without relying on human-annotated labels. Pretext tasks Self-Supervised LLM A machine learning that is intermediate between supervised and unsupervised learning and predicts parts of the input data from other observed parts without relying on human-annotated labels. Self-supervised Learning A machine learning that is intermediate between supervised and unsupervised learning and predicts parts of the input data from other observed parts without relying on human-annotated labels. A LLM which combines self-supervised pretraining on unlabeled data with supervised fine-tuning on labeled task data. Semi-Supervised Large Language Model self-training Semi-Supervised LLM A layer that performs depthwise separable 1D convolution. SeparableConv1D Layer Depthwise separable 1D convolution. This layer performs a depthwise convolution that acts separately on channels, followed by a pointwise convolution that mixes channels. If use_bias is True and a bias initializer is provided, it adds a bias vector to the output. It then optionally applies an activation function to produce the final output.a SeparableConvolution1D Layer A layer that performs depthwise separable 1D convolution. A layer that performs depthwise separable 2D convolution. SeparableConv2D Layer Depthwise separable 2D convolution. Separable convolutions consist of first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes the resulting output channels. The depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step. Intuitively, separable convolutions can be understood as a way to factorize a convolution kernel into two smaller kernels, or as an extreme version of an Inception block. SeparableConvolution2D Layer A layer that performs depthwise separable 2D convolution. An activation function that applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)) always returning a value between 0 and 1. Applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)). For small values (<-5), sigmoid returns a value close to zero, and for large values (>5) the result of the function gets close to 1. Sigmoid is equivalent to a 2-element Softmax, where the second element is assumed to be zero. The sigmoid function always returns a value between 0 and 1. Sigmoid Function An activation function that applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)) always returning a value between 0 and 1. A layer that processes one step within the whole time sequence input for a SimpleRNN layer. Cell class for SimpleRNN. This class processes one step within the whole time sequence input, whereas tf.keras.layer.SimpleRNN processes the whole sequence. SimpleRNNCell Layer A layer that processes one step within the whole time sequence input for a SimpleRNN layer. A recurrent layer that implements a fully-connected RNN where the output is to be fed back to input. Fully-connected RNN where the output is to be fed back to input. SimpleRNN Layer A recurrent layer that implements a fully-connected RNN where the output is to be fed back to input. Ahere the association between two variables changes when controlling for another variable. Simpson's Paradox Simpon's Paradox Bias Ahere the association between two variables changes when controlling for another variable. A bias characterized by being for or against groups or individuals based on social identities demographic factors or immutable physical characteristics often manifesting as stereotypes. Societal Bias A bias characterized by being for or against groups or individuals based on social identities demographic factors or immutable physical characteristics often manifesting as stereotypes. An activation function where the elements of the output vector are in range (0 1) and sum to 1 and each vector is handled independently. The elements of the output vector are in range (0, 1) and sum to 1. Each vector is handled independently. The axis argument sets which axis of the input the function is applied along. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution. The softmax of each vector x is computed as exp(x) / tf.reduce_sum(exp(x)). The input values in are the log-odds of the resulting probability. Softmax Function An activation function where the elements of the output vector are in range (0 1) and sum to 1 and each vector is handled independently. An activation layer that applies the softmax function to the inputs. Softmax activation function. Softmax Layer An activation layer that applies the softmax function to the inputs. An activation function that is softplus(x) = log(exp(x) + 1). softplus(x) = log(exp(x) + 1) Softplus Function An activation function that is softplus(x) = log(exp(x) + 1). An activation function that is softsign(x) = x / (abs(x) + 1). softsign(x) = x / (abs(x) + 1) Softsign Function An activation function that is softsign(x) = x / (abs(x) + 1). An autoencoder network with more hidden units than inputs that constrains only a few hidden units to be active at once. SAE Sparse AE Sparse Autoencoder Layers: Input, Hidden, Matched Output-Input Sparse Auto Encoder A large language model that uses techniques like pruning or quantization to reduce the number of non-zero parameters in the model making it more parameter-efficient and easier to deploy on resource-constrained devices. Sparse Large Language Model model compression parameter efficiency Sparse LLM A representation learning network that finds sparse representations of input data as a linear combination of basic elements and identifies those elements. Sparse coding Sparse dictionary Learning Sparse Learning A representation learning network that finds sparse representations of input data as a linear combination of basic elements and identifies those elements. A regularization layer that performs the same function as Dropout but drops entire 1D feature maps instead of individual elements. Spatial 1D version of Dropout. This version performs the same function as Dropout, however, it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout1D will help promote independence between feature maps and should be used instead. SpatialDropout1D Layer A regularization layer that performs the same function as Dropout but drops entire 1D feature maps instead of individual elements. A regularization layer that performs the same function as Dropout but drops entire 2D feature maps instead of individual elements. Spatial 2D version of Dropout. This version performs the same function as Dropout, however, it drops entire 2D feature maps instead of individual elements. If adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout2D will help promote independence between feature maps and should be used instead.a SpatialDropout2D Layer A regularization layer that performs the same function as Dropout but drops entire 2D feature maps instead of individual elements. A regularization layer that performs the same function as Dropout but drops entire 3D feature maps instead of individual elements. Spatial 3D version of Dropout. This version performs the same function as Dropout, however, it drops entire 3D feature maps instead of individual elements. If adjacent voxels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout3D will help promote independence between feature maps and should be used instead. SpatialDropout3D Layer A regularization layer that performs the same function as Dropout but drops entire 3D feature maps instead of individual elements. A regression analysis method used to model spatial relationships. Spatial Regression A regression analysis method used to model spatial relationships. A hidden layer that makes connections to an additional, heterogeneous hidden layer; modeled after biological neural networks. Spiking Hidden Layer A hidden layer that makes connections to an additional, heterogeneous hidden layer; modeled after biological neural networks. A layer that allows a stack of RNN cells to behave as a single cell. Wrapper allowing a stack of RNN cells to behave as a single cell. Used to implement efficient stacked RNNs. StackedRNNCells Layer A layer that allows a stack of RNN cells to behave as a single cell. An individual bias where people search only where it is easiest to look. Streetlight Effect Streetlight Effect Bias An individual bias where people search only where it is easiest to look. A categorical features preprocessing layer that maps string features to integer indices. StringLookup Layer A categorical features preprocessing layer that maps string features to integer indices. A merging layer that subtracts two inputs. Layer that subtracts two inputs. It takes as input a list of tensors of size 2, both of the same shape, and returns a single tensor, (inputs[0] - inputs[1]), also of the same shape. Subtract Layer A merging layer that subtracts two inputs. The process of dividing text into subword units which are smaller than words but larger than individual characters to improve the efficiency and effectiveness of natural language processing models by capturing meaningful subunits of words. Fragmentation Part-word Division Byte Pair Encoding SentencePiece Subword Segmentation A bias characterized by the tendency to continue an endeavor due to previously invested resources despite costs outweighing benefits. Sunk Cost Fallacy The tendency to continue an endeavor due to previously invested resources, despite costs outweighing benefits. Sunk Cost Fallacy Bias A bias characterized by the tendency to continue an endeavor due to previously invested resources despite costs outweighing benefits. A biclustering task focused on methods that simultaneously cluster the rows and columns of a labeled matrix considering data labels to enhance cluster coherence. Supervised Block Clustering Supervised Co-clustering Supervised Joint Clustering Supervised Two-mode Clustering Supervised Two-way Clustering Supervised Biclustering A biclustering task focused on methods that simultaneously cluster the rows and columns of a labeled matrix considering data labels to enhance cluster coherence. A clustering task focused on methods that group labeled objects such that objects in the same group have similar labels relative to those in other groups. Cluster analysis Supervised Clustering A clustering task focused on methods that group labeled objects such that objects in the same group have similar labels relative to those in other groups. A type of machine learning focused on methods that learn a function mapping input to output based on example input-output pairs. Supervised Learning A type of machine learning focused on methods that learn a function mapping input to output based on example input-output pairs. A network with supervised learning models for classification and regression that maps training examples to points in space maximizing the gap between categories. SVM SVN Supper Vector Network Layers: Input, Hidden, Output Support Vector Machine A network with supervised learning models for classification and regression that maps training examples to points in space maximizing the gap between categories. A machine learning task focused on methods for analyzing the expected duration of time until one or more events occur such as death in biological organisms or failure in mechanical systems. Survival Analysis A machine learning task focused on methods for analyzing the expected duration of time until one or more events occur such as death in biological organisms or failure in mechanical systems. A processing bias characterized by the tendency to focus on items observations or people that "survive" a selection process overlooking those that did not. The tendency to focus on items, observations, or people that "survive" a selection process, overlooking those that did not. Survivorship Bias A processing bias characterized by the tendency to focus on items observations or people that "survive" a selection process overlooking those that did not. An activation function that is x*sigmoid(x) a smooth non-monotonic function that consistently matches or outperforms ReLU on deep networks. x*sigmoid(x). It is a smooth, non-monotonic function that consistently matches or outperforms ReLU on deep networks, it is unbounded above and bounded below. Swish Function An activation function that is x*sigmoid(x) a smooth non-monotonic function that consistently matches or outperforms ReLU on deep networks. A network that is a type of recurrent neural network where connections between units are symmetrical with equal weights in both directions. SCN Symmetrically connected networks are a type of recurrent neural network where connections between units are symmetrical, meaning they have equal weights in both directions. This structure allows the network to maintain consistent information flow and equilibrium. Symmetrically Connected Network A network that is a type of recurrent neural network where connections between units are symmetrical with equal weights in both directions. A batch normalization layer that applies synchronous Batch Normalization across multiple devices. SyncBatchNorm Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . SyncBatchNorm Layer A batch normalization layer that applies synchronous Batch Normalization across multiple devices. A bias resulting from procedures and practices of institutions that operate in ways which result in certain social groups being advantaged or favored and others being disadvantaged or devalued. Institutional Bias Societal Bias Systemic Bias A bias resulting from procedures and practices of institutions that operate in ways which result in certain social groups being advantaged or favored and others being disadvantaged or devalued. An activation function that is the hyperbolic tangent activation function. hyperbolic tangent Hyperbolic tangent activation function. Tanh Function An activation function that is the hyperbolic tangent activation function. A selection and sampling bias arising from differences in populations and behaviors over time. Temporal Bias A selection and sampling bias arising from differences in populations and behaviors over time. A layer that performs text data preprocessing operations. Text Preprocessing Layer A layer that performs text data preprocessing operations. A preprocessing layer that maps text features to integer sequences. TextVectorization Layer A preprocessing layer that maps text features to integer sequences. A model that allows for different autoregressive processes depending on the regime or state of the time series, enabling the capture of nonlinear behaviors. TAR Threshold Autoregressive An activation layer that applies the thresholded rectified linear unit function element-wise. Thresholded Rectified Linear Unit. ThresholdedReLU Layer An activation layer that applies the thresholded rectified linear unit function element-wise. A wrapper layer that applies a layer to every temporal slice of an input. This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension. Consider a batch of 32 video samples, where each sample is a 128x128 RGB image with channels_last data format, across 10 timesteps. The batch input shape is (32, 10, 128, 128, 3). You can then use TimeDistributed to apply the same Conv2D layer to each of the 10 timesteps, independently: TimeDistributed Layer A wrapper layer that applies a layer to every temporal slice of an input. A machine learning task focused on methods for analyzing time series data to extract meaningful statistics and characteristics. Methods for analyzing time series data to extract meaningful statistics and characteristics. Time Series Analysis A machine learning task focused on methods for analyzing time series data to extract meaningful statistics and characteristics. A machine learning task focused on methods that predict future values based on previously observed values. Methods that predict future values based on previously observed values. Time Series Forecasting A machine learning task focused on methods that predict future values based on previously observed values. The process of converting a sequence of text into smaller meaningful units called tokens typically words or subwords for the purpose of analysis or processing by language models. Lexical Analysis Text Segmentation Tokenization The methodologies and approaches used to train machine learning models including techniques such as supervised learning unsupervised learning reinforcement learning and transfer learning aimed at optimizing model performance. Instructional Methods Learning Techniques Training Strategies A type of machine learning focused on methods that reuse or transfer information from previously learned tasks to facilitate the learning of new tasks. Transfer Learning A type of machine learning focused on methods that reuse or transfer information from previously learned tasks to facilitate the learning of new tasks. A large language model that leverages knowledge acquired during training on one task to improve performance on different but related tasks facilitating more efficient learning and adaptation. Transfer LLM transfer learning Transfer Learning LLM A transformer language model with large training corpuses and sets of parameters that uses the transformer architecture based on multi-head attention mechanisms allowing it to contextualize tokens within a context window for effective language understanding and generation. Transformer Large Language Model Transformer LLM A transformer language model with large training corpuses and sets of parameters that uses the transformer architecture based on multi-head attention mechanisms allowing it to contextualize tokens within a context window for effective language understanding and generation. A language model that uses the transformer architecture based on multi-head attention mechanisms allowing it to contextualize tokens within a context window for effective language understanding and generation. Transformer LM Transformer Language Model A language model that uses the transformer architecture based on multi-head attention mechanisms allowing it to contextualize tokens within a context window for effective language understanding and generation. A deep neural network that utilizes attention mechanisms to weigh the significance of input data. A transformer network utilizes attention mechanisms to weigh the significance of each part of the input data, widely used in natural language processing (NLP) and computer vision (CV). Transformer Network A deep neural network that utilizes attention mechanisms to weigh the significance of input data. A selection and sampling bias favoring groups better represented in training data due to less prediction uncertainty. Bias favoring groups better represented in training data, due to less prediction uncertainty. Uncertainty Bias A selection and sampling bias favoring groups better represented in training data due to less prediction uncertainty. A normalization layer that normalizes a batch of inputs so that each input in the batch has a L2 norm equal to 1. Unit normalization layer. Normalize a batch of inputs so that each input in the batch has a L2 norm equal to 1 (across the axes specified in axis). UnitNormalization Layer A normalization layer that normalizes a batch of inputs so that each input in the batch has a L2 norm equal to 1. A biclustering task focused on methods that simultaneously cluster the rows and columns of an unlabeled input matrix to identify submatrices with coherent patterns. Block Clustering Co-clustering Joint Clustering Two-mode Clustering Two-way Clustering Unsupervised Biclustering A biclustering task focused on methods that simultaneously cluster the rows and columns of an unlabeled input matrix to identify submatrices with coherent patterns. A clustering task focused on methods that group a set of unlabeled objects such that objects in the same group are more similar to each other than to those in other groups. Cluster analysis Unsupervised Clustering A clustering task focused on methods that group a set of unlabeled objects such that objects in the same group are more similar to each other than to those in other groups. A large language model that is trained solely on unlabeled data using self-supervised objectives like masked language modeling without any supervised fine-tuning. Unsupervised Large Language Model self-supervised Unsupervised LLM A type of machine learning focused on algorithms that learn patterns from unlabeled data. Algorithms that learn patterns from unlabeled data. Unsupervised Learning A type of machine learning focused on algorithms that learn patterns from unlabeled data. A network that initializes a discriminative neural net from one trained using an unsupervised criterion. UPN Unsupervised pre-training initializes a discriminative neural net from one trained using an unsupervised criterion, aiding in optimization and overfitting issues. Unsupervised Pretrained Network A network that initializes a discriminative neural net from one trained using an unsupervised criterion. A layer that upsamples the input by repeating each temporal step size times along the time axis. Upsampling layer for 1D inputs. Repeats each temporal step size times along the time axis. UpSampling1D Layer A layer that upsamples the input by repeating each temporal step size times along the time axis. A layer that upsamples the input by repeating each row and column size times. Upsampling layer for 2D inputs. Repeats the rows and columns of the data by size[0] and size[1] respectively. UpSampling2D Layer A layer that upsamples the input by repeating each row and column size times. A layer that upsamples the input by repeating each depth Upsampling layer for 3D inputs. UpSampling3D Layer A layer that upsamples the input by repeating each depth A computational bias characterized by inappropriately analyzing ambiguous stimuli scenarios and events. Interpretive Bias Bias inappropriately analyzing ambiguous stimuli, scenarios, and events. Use And Interpretation Bias A computational bias characterized by inappropriately analyzing ambiguous stimuli scenarios and events. An individual bias arising when a user imposes their own biases during interaction with data output results etc. Bias arising when a user imposes their own biases during interaction with data, output, results, etc. User Interaction Bias An individual bias arising when a user imposes their own biases during interaction with data output results etc. An autoencoder network that imposes a probabilistic structure on the latent space for unsupervised learning. VAE Layers: Input, Probabilistic Hidden, Matched Output-Input Variational Auto Encoder A model that captures the linear interdependencies among multiple time series, where each variable is modeled as a linear function of its own past values and the past values of all other variables in the system. VAR Vector Autoregression The technique of limiting the number of unique tokens in a language model's vocabulary by merging or eliminating less frequent tokens thereby optimizing computational efficiency and resource usage. Lexical Simplification Lexicon Pruning Vocabulary Condensation Vocabulary Reduction A layer of values to be applied to other cells or neurons in a network. Weighted Layer An abstract base class for wrappers that augment the functionality of another layer. Abstract wrapper base class. Wrappers take another layer and augment it in various ways. Do not use this class as a layer, it is only an abstract base class. Two usable wrappers are the TimeDistributed and Bidirectional wrappers. Wrapper Layer An abstract base class for wrappers that augment the functionality of another layer. A layer that zero-pads the input along the time axis. Zero-padding layer for 1D input (e.g. temporal sequence). ZeroPadding1D Layer A layer that zero-pads the input along the time axis. A layer that zero-pads the input along the height and width dimensions. Zero-padding layer for 2D input (e.g. picture). This layer can add rows and columns of zeros at the top, bottom, left and right side of an image tensor. ZeroPadding2D Layer A layer that zero-pads the input along the height and width dimensions. A layer that zero-pads the input along the depth Zero-padding layer for 3D data (spatial or spatio-temporal). ZeroPadding3D Layer A layer that zero-pads the input along the depth A LLM which performs tasks or understands concepts it has not explicitly been trained on, demonstrating a high degree of generalization and understanding. Zero-Shot LLM zero-shot learning Zero-Shot Learning LLM A deep neural network that predicts classes at test time from classes not observed during training. ZSL Zero-shot Learning A deep neural network that predicts classes at test time from classes not observed during training. A machine learning designed to learn continuous feature representations for nodes in a graph by optimizing a neighborhood-preserving objective. N2V Layers: Input, Hidden, Output node2vec A machine learning designed to learn continuous feature representations for nodes in a graph by optimizing a neighborhood-preserving objective. A node2vec that predicts the current node from a window of surrounding context nodes, with the order of context nodes not influencing prediction. N2V-CBOW CBOW Layers: Input, Hidden, Output node2vec-CBOW A node2vec that predicts the current node from a window of surrounding context nodes, with the order of context nodes not influencing prediction. A node2vec that uses the current node to predict the surrounding window of context nodes, weighing nearby context nodes more heavily than distant ones. N2V-SkipGram SkipGram Layers: Input, Hidden, Output node2vec-SkipGram A node2vec that uses the current node to predict the surrounding window of context nodes, weighing nearby context nodes more heavily than distant ones. A dimensionality reduction for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map. t-SNE tSNE t-Distributed Stochastic Neighbor embedding A dimensionality reduction for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map. A machine learning that generates distributed representations of words by training a shallow neural network model, which aims to predict the context of each word within a corpus. This algorithm captures semantic meanings of words through their contextual usage in the text. W2V Layers: Input, Hidden, Output word2vec A machine learning that generates distributed representations of words by training a shallow neural network model, which aims to predict the context of each word within a corpus. This algorithm captures semantic meanings of words through their contextual usage in the text. A word2vec that predicts the current word from a window of surrounding context words, ignoring the order of context words. W2V-CBOW CBOW Layers: Input, Hidden, Output word2vec-CBOW A word2vec that predicts the current word from a window of surrounding context words, ignoring the order of context words. A word2vec that predicts surrounding context words from the current word, giving more weight to nearby context words than distant ones. W2V-SkipGram SkipGram Layers: Input, Hidden, Output word2vec-SkipGram A word2vec that predicts surrounding context words from the current word, giving more weight to nearby context words than distant ones. A core relation that holds between a whole and its part has part