This ontology models classes and relationships describing deep learning networks, their component layers and activation functions, as well as potential biases.
Artificial Intelligence Ontology
2024-11-11
The official definition, explaining the meaning of a class or property. Shall be Aristotelian, formalized and normalized. Can be augmented with colloquial definitions.
definition
Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.
description
A legal document giving official permission to do something with the resource.
license
A name given to the resource.
title
subset_property
Activation Function Subset
Bias Subset
Class Subset
Function Subset
Instance Normalization Layer Subset
Layer Subset
Machine Learning Subset
Model Subset
Network Subset
Preprocessing Subset
A core relation that holds between a part and its whole
part of
A layer representing an RNN cell that is the base class for implementing RNN cells with custom behavior.
AbstractRNNCell
A layer representing an RNN cell that is the base class for implementing RNN cells with custom behavior.
A layer that applies an activation function to an output.
Applies an activation function to an output.
Activation Layer
A layer that applies an activation function to an output.
A machine learning task focused on methods that interactively query a user or another information source to label new data points with the desired outputs.
Query Learning
Active Learning
A machine learning task focused on methods that interactively query a user or another information source to label new data points with the desired outputs.
A use and interpretation bias occurring when systems/platforms get training data from their most active users rather than less active or inactive users.
Activity Bias
A use and interpretation bias occurring when systems/platforms get training data from their most active users rather than less active or inactive users.
A regularization layer that applies an update to the cost function based on input activity.
ActivityRegularization Layer
A regularization layer that applies an update to the cost function based on input activity.
A pooling layer that applies a 1D adaptive average pooling over an input signal composed of several input planes.
AdaptiveAvgPool1D
AdaptiveAvgPool1D Layer
A pooling layer that applies a 1D adaptive average pooling over an input signal composed of several input planes.
A pooling layer that applies a 2D adaptive average pooling over an input signal composed of several input planes.
AdaptiveAvgPool2D
AdaptiveAvgPool2D Layer
A pooling layer that applies a 2D adaptive average pooling over an input signal composed of several input planes.
A pooling layer that applies a 3D adaptive average pooling over an input signal composed of several input planes.
AdaptiveAvgPool3D
AdaptiveAvgPool3D Layer
A pooling layer that applies a 3D adaptive average pooling over an input signal composed of several input planes.
A pooling layer that applies a 1D adaptive max pooling over an input signal composed of several input planes.
AdaptiveMaxPool1D
AdaptiveMaxPool1D Layer
A pooling layer that applies a 1D adaptive max pooling over an input signal composed of several input planes.
A pooling layer that applies a 2D adaptive max pooling over an input signal composed of several input planes.
AdaptiveMaxPool2D
AdaptiveMaxPool2D Layer
A pooling layer that applies a 2D adaptive max pooling over an input signal composed of several input planes.
A pooling layer that applies a 3D adaptive max pooling over an input signal composed of several input planes.
AdaptiveMaxPool3D
AdaptiveMaxPool3D Layer
A pooling layer that applies a 3D adaptive max pooling over an input signal composed of several input planes.
A merging layer that adds a list of inputs taking as input a list of tensors all of the same shape.
Layer that adds a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Add Layer
A merging layer that adds a list of inputs taking as input a list of tensors all of the same shape.
A layer that adds inputs from one or more other layers to cells or neurons of a target layer.
Addition Layer
An attention layer that implements additive attention also known as Bahdanau-style attention.
Additive attention layer, a.k.a. Bahdanau-style attention.
AdditiveAttention Layer
An attention layer that implements additive attention also known as Bahdanau-style attention.
A regularization layer that applies Alpha Dropout to the input keeping mean and variance of inputs to ensure self-normalizing property.
Applies Alpha Dropout to the input. Alpha Dropout is a Dropout that keeps mean and variance of inputs to their original values, in order to ensure the self-normalizing property even after this dropout. Alpha Dropout fits well to Scaled Exponential Linear Units by randomly setting activations to the negative saturation value.
AlphaDropout Layer
A regularization layer that applies Alpha Dropout to the input keeping mean and variance of inputs to ensure self-normalizing property.
A processing bias arising when the distribution over prediction outputs is skewed compared to the prior distribution of the prediction target.
Amplification Bias
A processing bias arising when the distribution over prediction outputs is skewed compared to the prior distribution of the prediction target.
An individual bias characterized by the influence of a reference point or anchor on decisions leading to insufficient adjustment from that anchor point.
Anchoring Bias
An individual bias characterized by the influence of a reference point or anchor on decisions leading to insufficient adjustment from that anchor point.
An individual bias occurring when users rely on automation as a heuristic replacement for their own information seeking and processing.
Annotator Reporting Bias
An individual bias occurring when users rely on automation as a heuristic replacement for their own information seeking and processing.
A network based on a collection of connected units called artificial neurons modeled after biological neurons.
ANN
NN
An artificial neural network (ANN) is based on a collection of connected units or nodes called artificial neurons, modeled after biological neurons, with connections transmitting signals processed by non-linear functions.
Artificial Neural Network
A network based on a collection of connected units called artificial neurons modeled after biological neurons.
A supervised learning focused on a rule-based approach for discovering interesting relations between variables in large databases.
Association Rule Learning
A supervised learning focused on a rule-based approach for discovering interesting relations between variables in large databases.
A layer that implements dot-product attention also known as Luong-style attention.
Dot-product attention layer, a.k.a. Luong-style attention.
Attention Layer
A layer that implements dot-product attention also known as Luong-style attention.
An unsupervised pretrained network that learns efficient codings of unlabeled data by training to ignore insignificant data and regenerate input from encoding.
AE
Layers: Input, Hidden, Matched Output-Input
Auto Encoder Network
An unsupervised pretrained network that learns efficient codings of unlabeled data by training to ignore insignificant data and regenerate input from encoding.
An individual bias characterized by over-reliance on automated systems leading to attenuated human skills.
Automation Complaceny
Over-reliance on automated systems, leading to attenuated human skills, such as with spelling and autocorrect.
Automation Complacency Bias
An individual bias characterized by over-reliance on automated systems leading to attenuated human skills.
A model that describes the variance of the current error term as a function of the previous periods' error terms, capturing volatility clustering. Used for time series data.
ARCH
Autoregressive Conditional Heteroskedasticity
A model that includes lagged values of both the dependent variable and one or more independent variables, capturing dynamic relationships over time. Used in time series analysis.
ARDL
Autoregressive Distributed Lag
A model which combines autoregression (AR), differencing (I), and moving average (MA) components. Used for analyzing and forecasting time series data.
ARIMA
Autoregressive Integrated Moving Average
A language model that generates text sequentially predicting one token at a time based on the previously generated tokens excelling at natural language generation tasks by modeling the probability distribution over sequences of tokens.
generative language model
sequence-to-sequence model
Autoregressive Language Model
A model that combines autoregressive (AR) and moving average (MA) components to represent time series data, suitable for stationary series without the need for differencing.
ARMA
Autoregressive Moving Average
An individual bias characterized by a mental shortcut where easily recalled information is overweighted in judgment and decision-making.
Availability Bias
Availability Heuristic
Availability Heuristic Bias
An individual bias characterized by a mental shortcut where easily recalled information is overweighted in judgment and decision-making.
A merging layer that averages a list of inputs element-wise taking as input a list of tensors all of the same shape.
Layer that averages a list of inputs element-wise. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Average Layer
A merging layer that averages a list of inputs element-wise taking as input a list of tensors all of the same shape.
A pooling layer that performs average pooling for temporal data.
AvgPool1D
Average pooling for temporal data. Downsamples the input representation by taking the average value over the window defined by pool_size. The window is shifted by strides. The resulting output when using "valid" padding option has a shape of: output_shape = (input_shape - pool_size + 1) / strides). The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides.
AveragePooling1D Layer
A pooling layer that performs average pooling for temporal data.
A pooling layer that performs average pooling for spatial data.
AvgPool2D
Average pooling operation for spatial data. Downsamples the input along its spatial dimensions (height and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. The resulting output when using "valid" padding option has a shape (number of rows or columns) of: output_shape = math.floor((input_shape - pool_size) / strides) + 1 (when input_shape >= pool_size). The resulting output shape when using the "same" padding option is: output_shape = math.floor((input_shape - 1) / strides) + 1.
AveragePooling2D Layer
A pooling layer that performs average pooling for spatial data.
A pooling layer that performs average pooling for 3D data (spatial or spatio-temporal).
AvgPool3D
Average pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension.
AveragePooling3D Layer
A pooling layer that performs average pooling for 3D data (spatial or spatio-temporal).
A pooling layer that applies a 1D average pooling over an input signal composed of several input planes.
AvgPool1D
AvgPool1D Layer
A pooling layer that applies a 1D average pooling over an input signal composed of several input planes.
A pooling layer that applies a 2D average pooling over an input signal composed of several input planes.
AvgPool2D
AvgPool2D Layer
A pooling layer that applies a 2D average pooling over an input signal composed of several input planes.
A pooling layer that applies a 3D average pooling over an input signal composed of several input planes.
AvgPool3D
AvgPool3D Layer
A pooling layer that applies a 3D average pooling over an input signal composed of several input planes.
An input layer that receives values from another layer.
Backfed Input Layer
A batch normalization layer that applies Batch Normalization over a 2D or 3D input.
BatchNorm1D
Applies Batch Normalization over a 2D or 3D input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
BatchNorm1D Layer
A batch normalization layer that applies Batch Normalization over a 2D or 3D input.
A batch normalization layer that applies Batch Normalization over a 4D input.
BatchNorm2D
Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
BatchNorm2D Layer
A batch normalization layer that applies Batch Normalization over a 4D input.
A batch normalization layer that applies Batch Normalization over a 5D input.
BatchNorm3D
Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
BatchNorm3D Layer
A batch normalization layer that applies Batch Normalization over a 5D input.
A normalization layer that normalizes its inputs applying a transformation that maintains the mean close to 0 and the standard deviation close to 1.
BatchNorm
Layer that normalizes its inputs. Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Importantly, batch normalization works differently during training and during inference. During training (i.e. when using fit() or when calling the layer/model with the argument training=True), the layer normalizes its output using the mean and standard deviation of the current batch of inputs. That is to say, for each channel being normalized, the layer returns gamma * (batch - mean(batch)) / sqrt(var(batch) + epsilon) + beta, where: epsilon is small constant (configurable as part of the constructor arguments), gamma is a learned scaling factor (initialized as 1), which can be disabled by passing scale=False to the constructor. beta is a learned offset factor (initialized as 0), which can be disabled by passing center=False to the constructor. During inference (i.e. when using evaluate() or predict() or when calling the layer/model with the argument training=False (which is the default), the layer normalizes its output using a moving average of the mean and standard deviation of the batches it has seen during training. That is to say, it returns gamma * (batch - self.moving_mean) / sqrt(self.moving_var + epsilon) + beta. self.moving_mean and self.moving_var are non-trainable variables that are updated each time the layer in called in training mode, as such: moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum) moving_var = moving_var * momentum + var(batch) * (1 - momentum).
BatchNormalization Layer
A normalization layer that normalizes its inputs applying a transformation that maintains the mean close to 0 and the standard deviation close to 1.
A network that is a probabilistic graphical model representing variables and their conditional dependencies via a directed acyclic graph.
Bayesian Network
A network that is a probabilistic graphical model representing variables and their conditional dependencies via a directed acyclic graph.
An individual bias characterized by systematic distortions in user behavior across platforms or contexts or across users represented in different datasets.
Systematic distortions in user behavior across platforms or contexts, or across users represented in different datasets.
Behavioral Bias
An individual bias characterized by systematic distortions in user behavior across platforms or contexts or across users represented in different datasets.
A systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others.
Bias
A systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others.
A machine learning task focused on methods that simultaneously cluster the rows and columns of a matrix to identify submatrices with coherent patterns.
Block Clustering
Co-clustering
Joint Clustering
Two-mode Clustering
Two-way Clustering
Biclustering
A machine learning task focused on methods that simultaneously cluster the rows and columns of a matrix to identify submatrices with coherent patterns.
A recurrent layer that is a bidirectional wrapper for RNNs.
Bidirectional wrapper for RNNs.
Bidirectional Layer
A recurrent layer that is a bidirectional wrapper for RNNs.
A transformer language model such as BERT that uses the transformer architecture to build deep bidirectional representations by predicting masked tokens based on their context.
BERT
Bidirectional Transformer LM
Bidirectional Transformer Language Model
A transformer language model such as BERT that uses the transformer architecture to build deep bidirectional representations by predicting masked tokens based on their context.
A classification focused on methods that classify elements into two groups based on a classification rule.
Binary Classification
A classification focused on methods that classify elements into two groups based on a classification rule.
A symmetrically connected network that is a type of stochastic recurrent neural network and Markov random field.
BM
Sherrington–Kirkpatrick model with external field
stochastic Hopfield network with hidden units
stochastic Ising-Lenz-Little model
Layers: Backfed Input, Probabilistic Hidden
Boltzmann Machine Network
A symmetrically connected network that is a type of stochastic recurrent neural network and Markov random field.
A layer that performs categorical data preprocessing operations.
Categorical Features Preprocessing Layer
A layer that performs categorical data preprocessing operations.
A categorical features preprocessing layer that encodes integer features providing options for condensing data into a categorical encoding.
A preprocessing layer which encodes integer features. This layer provides options for condensing data into a categorical encoding when the total number of tokens are known in advance. It accepts integer values as inputs, and it outputs a dense or sparse representation of those inputs. For integer inputs where the total number of tokens is not known, use tf.keras.layers.IntegerLookup instead.
CategoryEncoding Layer
A categorical features preprocessing layer that encodes integer features providing options for condensing data into a categorical encoding.
A probabilistic graphical model used to encode assumptions about the data-generating process.
Casaul Bayesian Network
Casaul Graph
DAG
Directed Acyclic Graph
Path Diagram
Causal Graphical Model
A probabilistic graphical model used to encode assumptions about the data-generating process.
A large language model that only attends to previous tokens in the sequence when generating text modeling the probability distribution autoregressively from left-to-right or causally.
Causal Large Language Model
autoregressive
unidirectional
Causal LLM
An image preprocessing layer that crops the central portion of images to a target size.
A preprocessing layer which crops images. This layers crops the central portion of the images to a target size. If an image is smaller than the target size, it will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
CenterCrop Layer
An image preprocessing layer that crops the central portion of images to a target size.
A supervised learning focused on methods that distinguish and distribute kinds of "things" into different groups.
Methods that distinguish and distribute kinds of "things" into different groups.
Classification
A supervised learning focused on methods that distinguish and distribute kinds of "things" into different groups.
A data preparation that removes noise inconsistencies and irrelevant information from data to enhance its quality and prepare it for analysis or further processing.
Data Cleansing
Standardization
Data cleaning
Text normalization
Cleaning
A machine learning task focused on methods that group a set of objects such that objects in the same group are more similar to each other than to those in other groups.
Cluster analysis
Clustering
A machine learning task focused on methods that group a set of objects such that objects in the same group are more similar to each other than to those in other groups.
An individual bias characterized by deviations from rational judgment and decision-making including adaptive mental shortcuts known as heuristics.
Cognitive Bias
An individual bias characterized by deviations from rational judgment and decision-making including adaptive mental shortcuts known as heuristics.
A large language model that is trained to understand and recombine the underlying compositional structures in language enabling better generalization to novel combinations and out-of-distribution examples.
Compositional Generalization Large Language Model
out-of-distribution generalization
systematic generalization
Compositional Generalization LLM
A bias caused by differences between results and facts in the process of data analysis (including the source of data the estimator chose) and analysis methods.
Statistical Bias
Computational Bias
A bias caused by differences between results and facts in the process of data analysis (including the source of data the estimator chose) and analysis methods.
A merging layer that concatenates a list of inputs taking as input a list of tensors all of the same shape except for the concatenation axis.
Layer that concatenates a list of inputs. It takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs.
Concatenate Layer
A merging layer that concatenates a list of inputs taking as input a list of tensors all of the same shape except for the concatenation axis.
A use and interpretation bias due to the use of a system outside its planned domain of application causing performance gaps between laboratory settings and the real world.
Concept Drift
Concept Drift Bias
A use and interpretation bias due to the use of a system outside its planned domain of application causing performance gaps between laboratory settings and the real world.
An individual bias characterized by the tendency to prefer information that confirms existing beliefs influencing the search for interpretation of and recall of information.
The tendency to prefer information that confirms existing beliefs, influencing the search for, interpretation of, and recall of information.
Confirmation Bias
An individual bias characterized by the tendency to prefer information that confirms existing beliefs influencing the search for interpretation of and recall of information.
An individual bias arising when an algorithm or platform provides users a venue to express their biases occurring from either side in a digital interaction.
Consumer Bias
An individual bias arising when an algorithm or platform provides users a venue to express their biases occurring from either side in a digital interaction.
A use and interpretation bias arising from structural lexical semantic and syntactic differences in user-generated content.
Bias from structural, lexical, semantic, and syntactic differences in user-generated content.
Content Production Bias
A use and interpretation bias arising from structural lexical semantic and syntactic differences in user-generated content.
A deep neural network that learns sequential tasks without forgetting knowledge from preceding tasks and without access to old task data during new task training.
Incremental Learning
Life-Long Learning
Learning a model for sequential tasks without forgetting knowledge from preceding tasks, with no access to old task data during new task training.
Continual Learning
A deep neural network that learns sequential tasks without forgetting knowledge from preceding tasks and without access to old task data during new task training.
A large language model that continually acquires new knowledge and skills over time without forgetting previously learned information allowing the model to adapt and expand its capabilities as new data becomes available.
CL-Large Language Model
Continual Learning Large Language Model
catastrophic forgetting
lifelong learning
Continual Learning LLM
A deep neural network self-supervised learning approach that learns to distinguish between similar and dissimilar data samples.
Contrastive learning is a self-supervised learning approach in which the model learns to distinguish between similar and dissimilar pairs of data samples. By maximizing the similarity between positive pairs (similar samples) and minimizing the similarity between negative pairs (dissimilar samples), the model learns to capture meaningful representations of the data. This method is particularly effective for representation learning and is widely used in tasks such as image classification, clustering, and retrieval. Contrastive learning techniques often employ loss functions such as the contrastive loss or the triplet loss to achieve these objectives.
Contrastive Learning
A deep neural network self-supervised learning approach that learns to distinguish between similar and dissimilar data samples.
A large language model that is trained to pull semantically similar samples closer together and push dissimilar samples apart in the representation space learning high-quality features useful for downstream tasks.
Representation learning
Contrastive Learning LLM
A large language model that allows for explicit control over certain attributes of the generated text such as style tone topic or other desired characteristics through conditioning or specialized training objectives.
Controllable Large Language Model
conditional generation
guided generation
A controllable LLM allows for explicit control over certain attributes of the generated text, such as style, tone, topic, or other desired characteristics, through conditioning or specialized training objectives.
Controllable LLM
A convolutional layer that implements a 1D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations.
1D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.
ConvLSTM1D Layer
A convolutional layer that implements a 1D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations.
A convolutional layer that implements a 2D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations.
2D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.
ConvLSTM2D Layer
A convolutional layer that implements a 2D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations.
A convolutional layer that implements a 3D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations.
3D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.
ConvLSTM3D Layer
A convolutional layer that implements a 3D Convolutional LSTM similar to an LSTM but with convolutional input and recurrent transformations.
A convolutional layer that implements 1D convolution (e.g. temporal convolution).
Conv1D
Conv1D Layer
Convolution1D
nn.Conv1D
Convolution1D Layer
A convolutional layer that implements 1D convolution (e.g. temporal convolution).
A convolutional layer that implements transposed 1D convolution sometimes called deconvolution.
Conv1DTranspose Layer
ConvTranspose1D
Convolution1DTranspose
nn.ConvTranspose1D
Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 3) for data with 128 time steps and 3 channels.
Convolution1DTranspose Layer
A convolutional layer that implements transposed 1D convolution sometimes called deconvolution.
A convolutional layer that implements 2D convolution (e.g. spatial convolution over images).
Conv2D
Conv2D Layer
Convolution2D
nn.Conv2D
2D convolution layer (e.g. spatial convolution over images). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last". You can use None when a dimension has variable size.
Convolution2D Layer
A convolutional layer that implements 2D convolution (e.g. spatial convolution over images).
A convolutional layer that implements transposed 2D convolution
Conv2DTranspose Layer
ConvTranspose2D
Convolution2DTranspose
nn.ConvTranspose2D
Transposed convolution layer (sometimes called Deconvolution).
Convolution2DTranspose Layer
A convolutional layer that implements transposed 2D convolution
A convolutional layer that implements 3D convolution (e.g. spatial convolution over volumes).
Conv3D
Conv3D Layer
Convolution3D
nn.Conv3D
3D convolution layer (e.g. spatial convolution over volumes). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 1) for 128x128x128 volumes with a single channel, in data_format="channels_last".
Convolution3D Layer
A convolutional layer that implements 3D convolution (e.g. spatial convolution over volumes).
A convolutional layer that implements transposed 3D convolution
Conv3DTranspose Layer
ConvTranspose3D
Convolution3DTranspose
nn.ConvTranspose3D
Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 3) for a 128x128x128 volume with 3 channels if data_format="channels_last".
Convolution3DTranspose Layer
A convolutional layer that implements transposed 3D convolution
A layer that contains a set of filters (or kernels) parameters of which are to be learned throughout the training.
A convolutional layer is the main building block of a CNN. It contains a set of filters (or kernels), parameters of which are to be learned throughout the training. The size of the filters is usually smaller than the actual image. Each filter convolves with the image and creates an activation map.
Convolutional Layer
A layer that contains a set of filters (or kernels) parameters of which are to be learned throughout the training.
A reshaping layer that crops along the time dimension (axis 1) for 1D input.
Cropping layer for 1D input (e.g. temporal sequence). It crops along the time dimension (axis 1).
Cropping1D Layer
A reshaping layer that crops along the time dimension (axis 1) for 1D input.
A layer that crops along spatial dimensions (i.e. height and width) for 2D input.
Cropping layer for 2D input (e.g. picture). It crops along spatial dimensions, i.e. height and width.
Cropping2D Layer
A layer that crops along spatial dimensions (i.e. height and width) for 2D input.
A layer that crops along spatial dimensions (depth
Cropping layer for 3D data (e.g. spatial or spatio-temporal).
Cropping3D Layer
A layer that crops along spatial dimensions (depth
A large language model that performs well across a wide range of domains without significant loss in performance, facilitated by advanced domain adaptation techniques.
Domain-General LLM
cross-domain transfer
domain adaptation
Cross-Domain LLM
A training strategy in machine learning where models are trained on data in a meaningful order starting with simpler examples and gradually increasing the complexity to improve learning efficiency and model performance.
Sequential Learning
Structured Learning
Complexity grading
Sequential learning
Curriculum Learning
A large language model that is trained by presenting learning examples in a meaningful order from simple to complex mimicking the learning trajectory followed by humans.
Learning progression
Curriculum Learning LLM
A data enhancement used to increase the diversity and quantity of training data by applying various transformations such as rotation scaling flipping and cropping to existing data samples enhancing the robustness and performance of machine learning models.
Data Enrichment
Data Expansion
Paraphrasing
Synonym replacement
Data Augmentation
A use and interpretation bias where testing many hypotheses in a dataset may yield apparent statistical significance even when results are nonsignificant.
Data Dredging
Data Dredging Bias
A use and interpretation bias where testing many hypotheses in a dataset may yield apparent statistical significance even when results are nonsignificant.
A preprocessing used to improve the quality diversity and volume of data available for training machine learning models, such as data augmentation synthesis and enrichment to enhance model robustness and accuracy.
AIO:PreprocessingSubset
DataEnhancement
A selection and sampling bias arising from adding synthetic or redundant data samples to a dataset.
Bias from adding synthetic or redundant data samples to a dataset.
Data Generation Bias
A selection and sampling bias arising from adding synthetic or redundant data samples to a dataset.
A machine learning task focused on methods that replace missing data with substituted values.
Methods that replace missing data with substituted values.
Data Imputation
A machine learning task focused on methods that replace missing data with substituted values.
A preprocessing that cleans, transforms and organizes raw data into a suitable format for analysis and modeling, ensuring the quality and relevance of the data for machine learning tasks.
Data Assembly
Data Curation
Data Processing
Data Preparation
A large language model that generates natural language descriptions from structured data sources like tables, graphs, and knowledge bases, requiring grounding in meaning representations.
Meaning representation
Data-to-Text LLM
A classification that uses a tree-like model of decisions and their possible consequences including chance event outcomes resource costs and utilities.
A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utilities.
Decision Tree
A classification that uses a tree-like model of decisions and their possible consequences including chance event outcomes resource costs and utilities.
A large language model that uses a decoder-only architecture consisting of only a decoder trained to predict the next token in a sequence given the previous tokens.
A decoder-only architecture consisting of only a decoder, trained to predict the next token in a sequence given the previous tokens. Unlike the encoder-decoder architecture, it does not have an explicit encoder and encodes information implicitly in the hidden state of the decoder, updated at each step of the generation process.
Decoder LLM
A large language model that uses a decoder-only architecture consisting of only a decoder trained to predict the next token in a sequence given the previous tokens.
A deep neural network that uses deconvolution for unsupervised construction of hierarchical image representations.
DN
Layers: Input, Kernel, Convolutional/Pool, Output
Deconvolutional Network
A deep neural network that uses deconvolution for unsupervised construction of hierarchical image representations.
A deep neural network that combines deep learning and active learning to maximize model performance while annotating the fewest samples possible.
DeepAL
Deep Active Learning
A deep neural network that combines deep learning and active learning to maximize model performance while annotating the fewest samples possible.
An unsupervised pretrained network composed of multiple layers of latent variables that learns to probabilistically reconstruct inputs and perform classification.
DBN
Layers: Backfed Input, Probabilistic Hidden, Hidden, Matched Output-Input
Deep Belief Network
An unsupervised pretrained network composed of multiple layers of latent variables that learns to probabilistically reconstruct inputs and perform classification.
An autoencoder network that learns interpretable disentangled image representations through convolution and de-convolution layers trained with the stochastic gradient variational Bayes algorithm.
DCIGN
Layers: Input, Kernel, Convolutional/Pool, Probabilistic Hidden, Convolutional/Pool, Kernel, Output
Deep Convolutional Inverse Graphics Network
A deep neural network specialized for analyzing visual imagery using shared-weight architecture and translation-equivariant feature maps.
CNN
ConvNet
Convolutional Neural Network
DCN
Layers: Input, Kernel, Convolutional/Pool, Hidden, Output
Deep Convolutional Network
A deep neural network specialized for analyzing visual imagery using shared-weight architecture and translation-equivariant feature maps.
A deep neural network that processes information in one direction—from input nodes through hidden nodes to output nodes—without cycles or loops.
DFF
MLP
Multilayer Perceptoron
Layers: Input, Hidden, Output
Deep Feed-Forward Network
A deep neural network that processes information in one direction—from input nodes through hidden nodes to output nodes—without cycles or loops.
An artificial neural network characterized by multiple hidden layers between the input and output layers.
DNN
A deep neural network (DNN) is a type of artificial neural network (ANN) characterized by multiple hidden layers between the input and output layers. Each layer consists of interconnected neurons that process and transmit information. DNNs can model complex patterns and representations in data through their hierarchical structure, where each layer extracts increasingly abstract features from the input. DNNs are widely used in various applications, including image and speech recognition, natural language processing, and more, due to their ability to learn and generalize from large amounts of data.
Deep Neural Network
A deep neural network that relaxes the hypothesis that training data must be independent and identically distributed with test data to address insufficient training data.
Deep Transfer Learning
A deep neural network that relaxes the hypothesis that training data must be independent and identically distributed with test data to address insufficient training data.
An autoencoder network trained to reconstruct the original undistorted input from a partially corrupted input.
DAE
Denoising Autoencoder
Layers: Noisy Input, Hidden, Matched Output-Input
Denoising Auto Encoder
An autoencoder network trained to reconstruct the original undistorted input from a partially corrupted input.
A layer that produces a dense tensor based on given feature columns.
A layer that produces a dense Tensor based on given feature_columns. Generally a single example in training data is described with FeatureColumns. At the first layer of the model, this column oriented data should be converted to a single Tensor. This layer can be called multiple times with different features. This is the V2 version of this layer that uses name_scopes to create variables instead of variable_scopes. But this approach currently lacks support for partitioned variables. In that case, use the V1 version instead.
DenseFeatures Layer
A layer that produces a dense tensor based on given feature columns.
A layer that is a regular densely-connected neural network layer.
Just your regular densely-connected NN layer.
Dense Layer
A layer that is a regular densely-connected neural network layer.
A group bias arising when systems are used as decision aids for humans since the human intermediary may act on predictions in ways that are typically not modeled in the system.
Deployment Bias
A group bias arising when systems are used as decision aids for humans since the human intermediary may act on predictions in ways that are typically not modeled in the system.
A convolutional layer that performs depthwise 1D convolution
Depthwise 1D convolution. Depthwise convolution is a type of convolution in which each input channel is convolved with a different kernel (called a depthwise kernel). You can understand depthwise convolution as the first step in a depthwise separable convolution. It is implemented via the following steps: Split the input into individual channels. Convolve each channel with an individual depthwise kernel with depth_multiplier output channels. Concatenate the convolved outputs along the channels axis. Unlike a regular 1D convolution, depthwise convolution does not mix information across different input channels. The depth_multiplier argument determines how many filter are applied to one input channel. As such, it controls the amount of output channels that are generated per input channel in the depthwise step.
DepthwiseConv1D Layer
A convolutional layer that performs depthwise 1D convolution
A convolutional layer that performs depthwise 2D convolution
Depthwise 2D convolution.
DepthwiseConv2D Layer
A convolutional layer that performs depthwise 2D convolution
A selection and sampling bias characterized by systematic differences between groups in how outcomes are determined potentially over- or underestimating effect size.
Systematic differences between groups in how outcomes are determined, potentially over- or underestimating effect size.
Detection Bias
A selection and sampling bias characterized by systematic differences between groups in how outcomes are determined potentially over- or underestimating effect size.
A large language model that is optimized for engaging in multi-turn conversations understanding context and generating relevant coherent responses continuously over many dialogue turns.
Dialogue Large Language Model
conversational AI
multi-turn dialogue
Dialogue LLM
A large language model that has an architecture amenable to full end-to-end training via backpropagation without relying on teacher forcing or unlikelihood training objectives.
Differentiable Large Language Model
end-to-end training
fully backpropagable
Differentiable LLM
An unsupervised learning focused on the process of transforming data from a high-dimensional space into a lower-dimensional space while retaining meaningful properties of the original data.
Dimension Reduction
Dimensionality Reduction
An unsupervised learning focused on the process of transforming data from a high-dimensional space into a lower-dimensional space while retaining meaningful properties of the original data.
A numerical features prepreprocessing layer which buckets continuous features by ranges.
Discretization Layer
A numerical features prepreprocessing layer which buckets continuous features by ranges.
A preprocessing that trains a smaller model to replicate the behavior of a larger model aiming to compress the knowledge into a more compact form without significant loss of performance.
Purification
Refining
Knowledge compression
Teacher-student model
Distillation
A preprocessing that trains a smaller model to replicate the behavior of a larger model aiming to compress the knowledge into a more compact form without significant loss of performance.
A large language model which is pre-trained on a broad corpus and then fine-tuned on domain-specific data to specialize its capabilities for particular domains or applications, like scientific literature or code generation.
Domain-Adapted Large Language Model
domain robustness
transfer learning
Domain-Adapted LLM
A layer that computes a dot product between samples in two tensors.
Layer that computes a dot product between samples in two tensors. E.g. if applied to a list of two tensors a and b of shape (batch_size, n), the output will be a tensor of shape (batch_size, 1) where each entry i will be the dot product between a[i] and b[i].
Dot Layer
A layer that computes a dot product between samples in two tensors.
A regularization layer that applies Dropout to the input
Applies Dropout to the input. The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged. Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer. (This is in contrast to setting trainable=False for a Dropout layer. trainable does not affect the layer's behavior, as Dropout does not have any variables/weights that can be frozen during training.)
Dropout Layer
A regularization layer that applies Dropout to the input
A cognitive bias in which people with low ability in an area overestimate that ability. Often measured by comparing self-assessment with objective performance.
Dunning-Kruger Effect
Dunning-Kruger Effect Bias
A cognitive bias in which people with low ability in an area overestimate that ability. Often measured by comparing self-assessment with objective performance.
A model that allows for time-varying correlations between different time series, used in financial econometrics to model and forecast covariances.
DCC
Dynamic Conditional Correlation
A mathematical function that is x if x > 0 and alpha * (exp(x) - 1) if x < 0 where alpha controls the value to which an ELU saturates for negative net inputs.
ELU
Exponential Linear Unit
The exponential linear unit (ELU) with alpha > 0 is: x if x > 0 and alpha * (exp(x) - 1) if x < 0 The ELU hyperparameter alpha controls the value to which an ELU saturates for negative net inputs. ELUs diminish the vanishing gradient effect. ELUs have negative values which pushes the mean of the activations closer to zero. Mean activations that are closer to zero enable faster Learning as they bring the gradient closer to the natural gradient. ELUs saturate to a negative value when the argument gets smaller. Saturation means a small derivative which decreases the variation and the information that is propagated to the next layer.
ELU Function
A mathematical function that is x if x > 0 and alpha * (exp(x) - 1) if x < 0 where alpha controls the value to which an ELU saturates for negative net inputs.
An activation layer that applies the Exponential Linear Unit (ELU) function element-wise.
Exponential Linear Unit.
ELU Layer
An activation layer that applies the Exponential Linear Unit (ELU) function element-wise.
A recurrent neural network with a recurrent hidden layer and sparsely connected hidden neurons that learns output weights to produce temporal patterns.
ESN
Layers: Input, Recurrent, Output
Echo State Network
A recurrent neural network with a recurrent hidden layer and sparsely connected hidden neurons that learns output weights to produce temporal patterns.
A selection and sampling bias occurring when an inference about an individual is made based on their group membership.
Ecological Fallacy
Ecological Fallacy Bias
A selection and sampling bias occurring when an inference about an individual is made based on their group membership.
A layer that turns positive integers (indexes) into dense vectors of fixed size.
Embedding Layer
A layer that turns positive integers (indexes) into dense vectors of fixed size.
A large language model that integrates language with other modalities like vision audio and robotics to enable grounded language understanding in real-world environments.
Embodied Large Language Model
multimodal grounding
An embodied LLM integrates language with other modalities like vision, audio, and robotics to enable grounded language understanding in real-world environments.
Embodied LLM
A use and interpretation bias resulting from the use and reliance on algorithms across new or unanticipated contexts.
Emergent Bias
A use and interpretation bias resulting from the use and reliance on algorithms across new or unanticipated contexts.
A large language model introduced in the "Attention Is All You Need" paper. The encoder processes the input sequence to generate a hidden representation summarizing the input information, while the decoder uses this hidden representation to generate the desired output sequence.
Encoder-Decoder LLM
A large language model introduced in the "Attention Is All You Need" paper. The encoder processes the input sequence to generate a hidden representation summarizing the input information, while the decoder uses this hidden representation to generate the desired output sequence.
A large language model that uses an encoder-only architecture to encode the input sequence into a fixed-length representation which is then used as input to a classifier or regressor for prediction.
An encoder-only architecture that encodes the input sequence into a fixed-length representation, which is then used as input to a classifier or regressor for prediction. The model has a pre-trained general-purpose encoder that requires fine-tuning for specific tasks.
Encoder LLM
A large language model that uses an encoder-only architecture to encode the input sequence into a fixed-length representation which is then used as input to a classifier or regressor for prediction.
A large language model which models the explicit probability density over token sequences using an energy function, rather than an autoregressive factorization. This can improve modeling of long-range dependencies and global coherence.
Energy-Based Large Language Model
energy scoring
explicit density modeling
Energy-Based LLM
A machine learning task focused on methods that use multiple learning algorithms to achieve better predictive performance than any of the constituent algorithms alone.
Ensemble Learning
A machine learning task focused on methods that use multiple learning algorithms to achieve better predictive performance than any of the constituent algorithms alone.
A processing bias characterized by the effect of variables' uncertainties (or errors more specifically random errors) on the uncertainty of a function based on them.
Error Propagation
Error Propagation Bias
A processing bias characterized by the effect of variables' uncertainties (or errors more specifically random errors) on the uncertainty of a function based on them.
A large language model that is trained to uphold certain ethical principles values or rules in its language generation to increase safety and trustworthiness.
Ethical Large Language Model
constituitional AI
value alignment
Ethical LLM
A selection and sampling bias arising when testing populations do not equally represent user populations or when inappropriate performance metrics are used.
Evaluation Bias
A selection and sampling bias arising when testing populations do not equally represent user populations or when inappropriate performance metrics are used.
A large language model that applies principles of evolutionary computation to optimize its structure and parameters evolving over time to improve performance.
Evolutionary Language Model
evolutionary algorithms
genetic programming
Evolutionary LLM
A selection and sampling bias occurring when specific groups of user populations are excluded from testing and analysis.
Exclusion Bias
A selection and sampling bias occurring when specific groups of user populations are excluded from testing and analysis.
A large language model that is designed to provide insights into its decision-making process making it easier for users to understand and trust the model's outputs by incorporating mechanisms for interpreting and explaining its predictions in human-understandable terms.
Explainable Language Model
XAI LLM
interpretability
model understanding
Explainable LLM
A mathematical function denoted by f(x)=exp or e^{x}.
The exponential function is a mathematical function denoted by f(x)=exp or e^{x}.
Exponential Function
A mathematical function denoted by f(x)=exp or e^{x}.
A model that combines exponential smoothing with state space modeling, allowing for the inclusion of both trend and seasonal components. Used in forecasting.
ETS
Exponential Smoothing State Space Model
A feedback network with randomly assigned hidden nodes that are not updated during training.
ELM
Layers: Input, Hidden, Output
Extreme Learning Machine
A feedback network with randomly assigned hidden nodes that are not updated during training.
A language model that views each word as a vector of multiple factors such as part-of-speech morphology and semantics to improve language modeling.
Factorized Language Model
Factored Language Model
A language model that views each word as a vector of multiple factors such as part-of-speech morphology and semantics to improve language modeling.
A large language model that decomposes the full language modeling task into multiple sub-components or experts that each focus on a subset of the information enabling more efficient scaling.
Factorized Large Language Model
Factorized Learning Assisted with Large Language Model
Conditional masking
Product of experts
Factorized LLM
A large language model that decomposes the full language modeling task into multiple sub-components or experts that each focus on a subset of the information enabling more efficient scaling.
A data enhancement that transforms raw data into a set of measurable characteristics that can be used as input for machine learning algorithms, enhancing the ability to make accurate predictions.
Attribute Extraction
Feature Isolation
Semantic embeddings
Syntactic information
Feature Extraction
A large language model that is trained in a decentralized manner across multiple devices or silos without directly sharing private data enabling collaborative training while preserving data privacy and security.
Federated Large Language Model
decentralized training
privacy-preserving
Federated LLM
A deep neural network trained across decentralized edge devices or servers holding local data samples without exchanging them.
Training an algorithm across multiple decentralized edge devices or servers holding local data samples without exchanging them.
Federated Learning
A deep neural network trained across decentralized edge devices or servers holding local data samples without exchanging them.
A use and interpretation bias occurring when an algorithm learns from user behavior and feeds that behavior back into the model.
Feedback Loop Bias
A use and interpretation bias occurring when an algorithm learns from user behavior and feeds that behavior back into the model.
An artificial neural network that refines its representations iteratively based on feedback from previous outputs.
FBN
Layers: Input, Hidden, Output, Hidden
Feedback Network
A regression analysis in which the model parameters are fixed or non-random quantities.
FEM
Fixed Effects Model
A regression analysis in which the model parameters are fixed or non-random quantities.
A reshaping layer that flattens the input
Flattens the input. Does not affect the batch size.
Flatten Layer
A reshaping layer that flattens the input
A pooling layer that applies a 2D fractional max pooling over an input signal composed of several input planes.
FractionalMaxPool2D
FractionalMaxPool2D Layer
A pooling layer that applies a 2D fractional max pooling over an input signal composed of several input planes.
A pooling layer that applies a 3D fractional max pooling over an input signal composed of several input planes.
FractionalMaxPool3D
FractionalMaxPool3D Layer
A pooling layer that applies a 3D fractional max pooling over an input signal composed of several input planes.
A group bias arising when biased results are reported to support or satisfy the funding agency or financial supporter of a research study.
Funding Bias
A group bias arising when biased results are reported to support or satisfy the funding agency or financial supporter of a research study.
A mathematical function that computes x * P(X <= x) where P(X) ~ N(0 1) weighting inputs by their value rather than gating inputs by their sign as in ReLU.
GELU
Gaussian Error Linear Unit
Gaussian error linear unit (GELU) computes x * P(X <= x), where P(X) ~ N(0, 1). The (GELU) nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLU.
GELU Function
A mathematical function that computes x * P(X <= x) where P(X) ~ N(0 1) weighting inputs by their value rather than gating inputs by their sign as in ReLU.
A layer that processes one step within the whole time sequence input for a GRU layer.
Cell class for the GRU layer. This class processes one step within the whole time sequence input, whereas tf.keras.layer.GRU processes the whole sequence.
GRUCell Layer
A layer that processes one step within the whole time sequence input for a GRU layer.
A recurrent layer that implements the Gated Recurrent Unit architecture.
Gated Recurrent Unit - Cho et al. 2014. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: activation == tanh, recurrent_activation == sigmoid, recurrent_dropout == 0, unroll is False, use_bias is True, reset_after is True. Inputs, if use masking, are strictly right-padded. Eager execution is enabled in the outermost context. There are two variants of the GRU implementation. The default one is based on v3 and has reset gate applied to hidden state before matrix multiplication. The other one is based on original and has the order reversed. The second variant is compatible with CuDNNGRU (GPU-only) and allows inference on CPU. Thus it has separate biases for kernel and recurrent_kernel. To use this variant, set reset_after=True and recurrent_activation='sigmoid'.
GRU Layer
A recurrent layer that implements the Gated Recurrent Unit architecture.
A long short-term memory network that is a gating mechanism in recurrent neural networks similar to LSTMs but with fewer parameters and no output gate.
GRU
Layers: Input, Memory Cell, Output
Gated Recurrent Unit
A long short-term memory network that is a gating mechanism in recurrent neural networks similar to LSTMs but with fewer parameters and no output gate.
A regularization layer that applies multiplicative 1-centered Gaussian noise.
Apply multiplicative 1-centered Gaussian noise. As it is a regularization layer, it is only active at training time.
GaussianDropout Layer
A regularization layer that applies multiplicative 1-centered Gaussian noise.
A regularization layer that applies additive zero-centered Gaussian noise.
Apply additive zero-centered Gaussian noise. This is useful to mitigate overfitting (you could see it as a form of random data augmentation). Gaussian Noise (GS) is a natural choice as corruption process for real valued inputs. As it is a regularization layer, it is only active at training time.
GaussianNoise Layer
A regularization layer that applies additive zero-centered Gaussian noise.
A model that incorporates lagged conditional variances, allowing for more flexibility in modeling time-varying volatility.
GARCH
Generalized Autoregressive Conditional Heteroskedasticity
A deep neural network that learns novel classes from few samples per class, preventing catastrophic forgetting of base classes and ensuring classifier calibration.
GFSL
Generalized Few-shot Learning
A deep neural network that learns novel classes from few samples per class, preventing catastrophic forgetting of base classes and ensuring classifier calibration.
A regression analysis that relates the linear model to the response variable via a link function and allowing the variance of each measurement to be a function of its predicted value.
GLM
Generalized Linear Model
A regression analysis that relates the linear model to the response variable via a link function and allowing the variance of each measurement to be a function of its predicted value.
An unsupervised pretrained network framework where two neural networks contest in a game to generate new data with the same statistics as the training set.
GAN
Layers: Backfed Input, Hidden, Matched Output-Input, Hidden, Matched Output-Input
Generative Adversarial Network
An unsupervised pretrained network framework where two neural networks contest in a game to generate new data with the same statistics as the training set.
A large language model which incorporates a generative adversarial network (GAN) into its training process, using a discriminator network to provide a signal for generating more realistic and coherent text. This adversarial training can improve the quality and diversity of generated text.
GAN-Large Language Model
Generative Adversarial Network-Augmented Large Language Model
adversarial training
text generation
Generative Adversarial Network-Augmented LLM
A large language model that is trained to understand and model basic physics causality and common sense about how the real world works.
Generative Commonsense Large Language Model
World Model
causal modeling
physical reasoning
Generative Commonsense LLM
A large language model that is trained to understand and model basic physics causality and common sense about how the real world works.
A language model that enables users to engage in an interactive dialogue with an LLM providing feedback to guide and refine the generated outputs iteratively.
Interactive generation
Generative Language Interface
A pooling layer that performs global average pooling operation for temporal data.
GlobalAvgPool1D
Global average pooling operation for temporal data.
GlobalAveragePooling1D Layer
A pooling layer that performs global average pooling operation for temporal data.
A pooling layer that performs global average pooling operation for spatial data.
GlobalAvgPool2D
Global average pooling operation for spatial data.
GlobalAveragePooling2D Layer
A pooling layer that performs global average pooling operation for spatial data.
A pooling layer that performs global average pooling operation for 3D data.
GlobalAvgPool3D
Global Average pooling operation for 3D data.
GlobalAveragePooling3D Layer
A pooling layer that performs global average pooling operation for 3D data.
A pooling layer that performs global max pooling operation for temporal data.
GlobalMaxPool1D
Global max pooling operation for 1D temporal data.
GlobalMaxPooling1D Layer
A pooling layer that performs global max pooling operation for temporal data.
A pooling layer that performs global max pooling operation for spatial data.
GlobalMaxPool2D
Global max pooling operation for spatial data.
GlobalMaxPooling2D Layer
A pooling layer that performs global max pooling operation for spatial data.
A pooling layer that performs global max pooling operation for 3D data.
GlobalMaxPool3D
Global Max pooling operation for 3D data.
GlobalMaxPooling3D Layer
A pooling layer that performs global max pooling operation for 3D data.
A deep neural network that operates directly on graph structures utilizing structural information.
GCN
Layers: Input, Hidden, Hidden, Output
Graph Convolutional Network
A deep neural network that operates directly on graph structures utilizing structural information.
A graph convolutional network that generates goal-directed graphs using reinforcement learning and optimizing for rewards and adversarial loss.
GPCN
Layers: Input, Hidden, Hidden, Policy, Output
Graph Convolutional Policy Network
A graph convolutional network that generates goal-directed graphs using reinforcement learning and optimizing for rewards and adversarial loss.
A language model that operates over structured inputs or outputs represented as graphs enabling reasoning over explicit relational knowledge representations during language tasks.
Graph LM
Structured representations
Graph Language Model
A language model that operates over structured inputs or outputs represented as graphs enabling reasoning over explicit relational knowledge representations during language tasks.
A systemic bias characterized by favoring members of one's in-group over out-group members expressed in evaluation resource allocation and other ways.
In-group Favoritism
In-group bias
In-group preference
In-group–out-group Bias
Intergroup bias
Favoring members of one's in-group over out-group members, expressed in evaluation, resource allocation, and other ways.
Group Bias
A systemic bias characterized by favoring members of one's in-group over out-group members expressed in evaluation resource allocation and other ways.
A normalization layer that applies Group Normalization over a mini-batch of inputs.
GroupNorm
Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization
GroupNorm Layer
A normalization layer that applies Group Normalization over a mini-batch of inputs.
A group bias in which people in a group make non-optimal decisions due to a desire to conform or fear of dissent.
Groupthink
Groupthink Bias
A group bias in which people in a group make non-optimal decisions due to a desire to conform or fear of dissent.
A mathematical function that is a faster approximation of the sigmoid activation using a piecewise linear approximation.
A faster approximation of the sigmoid activation. Piecewise linear approximation of the sigmoid function. Ref: 'https://en.wikipedia.org/wiki/Hard_sigmoid'
Hard Sigmoid Function
A mathematical function that is a faster approximation of the sigmoid activation using a piecewise linear approximation.
A categorical features preprocessing layer which hashes and bins categorical features.
A preprocessing layer which hashes and bins categorical features. This layer transforms categorical inputs to hashed output. It element-wise converts a ints or strings to ints in a fixed range. The stable hash function uses tensorflow::ops::Fingerprint to produce the same output consistently across all platforms. This layer uses FarmHash64 by default, which provides a consistent hashed output across different platforms and is stable across invocations, regardless of device and context, by mixing the input bits thoroughly. If you want to obfuscate the hashed output, you can also pass a random salt argument in the constructor. In that case, the layer will use the SipHash64 hash function, with the salt value serving as additional input to the hash function.
Hashing Layer
A categorical features preprocessing layer which hashes and bins categorical features.
A layer located between the input and output that performs nonlinear transformations of the inputs entered into the network.
A hidden layer is located between the input and output of the algorithm, in which the function applies weights to the inputs and directs them through an activation function as the output. In short, the hidden layers perform nonlinear transformations of the inputs entered into the network. Hidden layers vary depending on the function of the neural network, and similarly, the layers may vary depending on their associated weights.
Hidden Layer
A layer located between the input and output that performs nonlinear transformations of the inputs entered into the network.
A classification focused on methods that group things according to a hierarchy.
Methods that group things according to a hierarchy.
Hierarchical Classification
A classification focused on methods that group things according to a hierarchy.
A clustering that builds a hierarchy of clusters.
HCL
Methods that build a hierarchy of clusters.
Hierarchical Clustering
A clustering that builds a hierarchy of clusters.
A language model that represents language at multiple levels of granularity learning hierarchical representations that capture both low-level patterns and high-level abstractions.
Hierarchical LM
multi-scale representations
Hierarchical Language Model
A language model that represents language at multiple levels of granularity learning hierarchical representations that capture both low-level patterns and high-level abstractions.
A bias characterized by long-standing biases encoded in society over time distinct from biases in historical description or interpretation.
Long-standing biases encoded in society over time, distinct from biases in historical description or the interpretation of history, such as viewing the larger world from a Western or European perspective.
Historical Bias
A bias characterized by long-standing biases encoded in society over time distinct from biases in historical description or interpretation.
A symmetrically connected network that is a type of recurrent artificial neural network serving as a content-addressable memory system.
HN
Ising model of a neural network
Ising–Lenz–Little model
Layers: Backfed input
Hopfield Network
A symmetrically connected network that is a type of recurrent artificial neural network serving as a content-addressable memory system.
A use and interpretation bias where individuals perceive benign or ambiguous behaviors as hostile.
Bias where individuals perceive benign or ambiguous behaviors as hostile.
Hostile Attribution Bias
A use and interpretation bias where individuals perceive benign or ambiguous behaviors as hostile.
A bias in human thought based on heuristic principles leading to simplified judgmental operations.
Human Bias
A bias in human thought based on heuristic principles leading to simplified judgmental operations.
An individual bias that arises when users depend on automated systems as heuristic substitutes for their own information-seeking and processing efforts.
Human Reporting Bias
An individual bias that arises when users depend on automated systems as heuristic substitutes for their own information-seeking and processing efforts.
A layer that performs image data preprocessing augmentations.
Image Augmentation Layer
A layer that performs image data preprocessing augmentations.
A layer that performs image data preprocessing operations.
Image Preprocessing Layer
A layer that performs image data preprocessing operations.
An individual bias characterized by unconscious beliefs attitudes feelings associations or stereotypes that affect information processing decision-making and actions.
Confirmatory Bias
Unconscious beliefs, attitudes, feelings, associations, or stereotypes that affect information processing, decision-making, and actions.
Implicit Bias
An individual bias characterized by unconscious beliefs attitudes feelings associations or stereotypes that affect information processing decision-making and actions.
A language model that uses an energy function to score entire sequences instead of factorizing probabilities autoregressively better capturing global properties and long-range dependencies.
Implicit LM
Energy-based models
Token-level scoring
Implicit Language Model
A language model that uses an energy function to score entire sequences instead of factorizing probabilities autoregressively better capturing global properties and long-range dependencies.
A deep neural network trained on a base set of classes and then presented with novel classes, each with few labeled examples.
IFSL
Incremenetal Few-shot Learning
A deep neural network trained on a base set of classes and then presented with novel classes, each with few labeled examples.
A bias characterized by a persistent point of view or limited list of such points of view, applied by an individual.
Individual Bias
A bias characterized by a persistent point of view or limited list of such points of view, applied by an individual.
A processing bias arising when machine learning applications generate inputs for other machine learning algorithms passing on any existing bias.
Inherited Bias
A processing bias arising when machine learning applications generate inputs for other machine learning algorithms passing on any existing bias.
A layer composed of artificial input neurons that brings the initial data into the system for further processing by subsequent layers.
The input layer of a neural network is composed of artificial input neurons, and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The input layer is the very beginning of the workflow for the artificial neural network.
Input Layer
A layer composed of artificial input neurons that brings the initial data into the system for further processing by subsequent layers.
A layer to be used as an entry point into a Network (a graph of layers).
InputLayer Layer
A layer to be used as an entry point into a Network (a graph of layers).
A layer that specifies the rank
Specifies the rank, dtype and shape of every input to a layer. Layers can expose (if appropriate) an input_spec attribute: an instance of InputSpec, or a nested structure of InputSpec instances (one per input tensor). These objects enable the layer to run input compatibility checks for input structure, input rank, input shape, and input dtype. A None entry in a shape is compatible with any dimension, a None shape is compatible with any shape.
InputSpec Layer
A layer that specifies the rank
A normalization layer that applies Instance Normalization over a 2D (unbatched) or 3D (batched) input.
InstanceNorm1D
Applies Instance Normalization over a 2D (unbatched) or 3D (batched) input as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
InstanceNorm1D Layer
A normalization layer that applies Instance Normalization over a 2D (unbatched) or 3D (batched) input.
A normalization layer that applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension).
Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
InstanceNorm2D
A normalization layer that applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension).
A normalization layer that applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension).
InstanceNorm3D
Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
InstanceNorm3D Layer
A normalization layer that applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension).
A systemic bias exhibited at the level of entire institutions where practices or norms result in the favoring or disadvantaging of certain social groups.
Bias exhibited at the level of entire institutions, where practices or norms result in the favoring or disadvantaging of certain social groups, such as institutional racism or sexism.
Institutional Bias
A systemic bias exhibited at the level of entire institutions where practices or norms result in the favoring or disadvantaging of certain social groups.
A large language model which is fine-tuned to follow natural language instructions accurately and safely, learning to map from instructions to desired model behavior in a more controlled and principled way.
Instruction-Tuned Large Language Model
constitutional AI
natural language instructions
Instruction-Tuned LLM
A categorical features preprocessing layer that maps integer features to contiguous ranges.
IntegerLookup Layer
A categorical features preprocessing layer that maps integer features to contiguous ranges.
An individual bias where users interpret algorithmic outputs according to their internalized biases and views.
Interpretation Bias
An individual bias where users interpret algorithmic outputs according to their internalized biases and views.
A layer that obtains the dot product of input values or subsets of input values.
Kernel Layer
A machine learning task that groups objects by a plurality vote of its neighbors, assigning each object to the class most common among its k nearest neighbors.
K-NN
KNN
K-nearest Neighbor Algorithm
A machine learning task that groups objects by a plurality vote of its neighbors, assigning each object to the class most common among its k nearest neighbors.
A classification and clustering that classifies objects by a plurality vote of its neighbors, assigning each object to the class most common among its k nearest neighbors.
K-NN Classification
KNN Classification
K-nearest Neighbor Classification Algorithm
A classification and clustering that classifies objects by a plurality vote of its neighbors, assigning each object to the class most common among its k nearest neighbors.
An regression analysis that assigns the average of the values of k nearest neighbors to objects.
K-NN Regression
KNN Regression
K-nearest Neighbor Regression Algorithm
An regression analysis that assigns the average of the values of k nearest neighbors to objects.
A large language model which incorporates external knowledge sources or knowledge bases into the model architecture, enabling it to generate more factually accurate and knowledge-aware text.
Knowledge-Grounded Large Language Model
factual grounding
knowledge integration
Knowledge-Grounded LLM
A training strategy in which knowledge is passed from one entity such as a person organization or system to another facilitating learning and adaptation in the receiving entity through various methods such as teaching training or data exchange.
Inductive Transfer
Adaptation
Pretrained models
Knowledge Transfer
A training strategy in which knowledge is passed from one entity such as a person organization or system to another facilitating learning and adaptation in the receiving entity through various methods such as teaching training or data exchange.
A network that is an unsupervised technique producing a low-dimensional representation of high-dimensional data preserving topological structure.
KN
SOFM
SOM
Self-Organizing Feature Map
Self-Organizing Map
Layers: Input, Hidden
Kohonen Network
A network that is an unsupervised technique producing a low-dimensional representation of high-dimensional data preserving topological structure.
A pooling layer that applies 1D power-average pooling over an input signal composed of several input planes.
LPPool1D
LPPool1D Layer
A pooling layer that applies 1D power-average pooling over an input signal composed of several input planes.
A pooling layer that applies 2D power-average pooling over an input signal composed of several input planes.
LPPool2D
LPPool2D Layer
A pooling layer that applies 2D power-average pooling over an input signal composed of several input planes.
A layer that processes one step within the whole time sequence input for an LSTM layer.
Cell class for the LSTM layer.
LSTMCell Layer
A layer that processes one step within the whole time sequence input for an LSTM layer.
A recurrent layer that implements the Long Short-Term Memory architecture.
Long Short-Term Memory layer - Hochreiter 1997. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: 1. activation == tanh, 2. recurrent_activation == sigmoid, 3. recurrent_dropout == 0, 4. unroll is False, 5. use_bias is True, 6. Inputs, if use masking, are strictly right-padded, 7. Eager execution is enabled in the outermost context.
LSTM Layer
A recurrent layer that implements the Long Short-Term Memory architecture.
A layer that wraps arbitrary expressions as a Layer object.
Wraps arbitrary expressions as a Layer object.
Lambda Layer
A layer that wraps arbitrary expressions as a Layer object.
A large language model that supports interactive semantic parsing enabling users to provide feedback and corrections to dynamically refine and update the language model.
Interactive learning
Language Interface LLM
A model designed to predict the next word in a sequence or assign probabilities to sequences of words in natural language.
Language Model
A model designed to predict the next word in a sequence or assign probabilities to sequences of words in natural language.
A language model consisting of a neural network with many parameters (typically billions of weights or more) trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning.
LLM
Large Language Model
A language model consisting of a neural network with many parameters (typically billions of weights or more) trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning.
A regression analysis that performs both variable selection and regularization to enhance prediction accuracy and interpretability.
Lasso Regression
A regression analysis that performs both variable selection and regularization to enhance prediction accuracy and interpretability.
A structure or network topology in a deep learning model that takes information from previous layers and passes it to the next layer.
Layer
A structure or network topology in a deep learning model that takes information from previous layers and passes it to the next layer.
A layer that form which other layers can inherit.
This is the class from which all layers inherit. A layer is a callable object that takes as input one or more tensors and that outputs one or more tensors. It involves computation, defined in the call() method, and a state (weight variables). State can be created in various places, at the convenience of the subclass implementer: in __init__(); in the optional build() method, which is invoked by the first __call__() to the layer, and supplies the shape(s) of the input(s), which may not have been known at initialization time; in the first invocation of call(), with some caveats discussed below. Users will just instantiate a layer and then treat it as a callable.
Layer Layer
A layer that form which other layers can inherit.
A normalization layer that applies Layer Normalization over a mini-batch of inputs.
LayerNorm
Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
LayerNorm Layer
A normalization layer that applies Layer Normalization over a mini-batch of inputs.
A normalization layer that applies Layer Normalization over the inputs.
Layer normalization layer (Ba et al., 2016). Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1. Given a tensor inputs, moments are calculated and normalization is performed across the axes specified in axis.
LayerNormalization Layer
A normalization layer that applies Layer Normalization over the inputs.
A batch normalization layer that lazily initializes the num_features argument from the input size for 1D data.
LazyBatchNorm1D
A torch.nn.BatchNorm1D module with lazy initialization of the num_features argument of the BatchNorm1D that is inferred from the input.size(1).
LazyBatchNorm1D Layer
A batch normalization layer that lazily initializes the num_features argument from the input size for 1D data.
A batch normalization layer that lazily initializes the num_features argument from the input size for 2D data.
LazyBatchNorm2D
A torch.nn.BatchNorm2D module with lazy initialization of the num_features argument of the BatchNorm2D that is inferred from the input.size(1).
LazyBatchNorm2D Layer
A batch normalization layer that lazily initializes the num_features argument from the input size for 2D data.
A batch normalization layer that lazily initializes the num_features argument from the input size for 3D data.
LazyBatchNorm3D
A torch.nn.BatchNorm3D module with lazy initialization of the num_features argument of the BatchNorm3D that is inferred from the input.size(1).
LazyBatchNorm3D Layer
A batch normalization layer that lazily initializes the num_features argument from the input size for 3D data.
A normalization layer that lazily initializes the num_features argument from the input size for 1D data.
LazyInstanceNorm1D
A torch.nn.InstanceNorm1D module with lazy initialization of the num_features argument of the InstanceNorm1D that is inferred from the input.size(1).
LazyInstanceNorm1D Layer
A normalization layer that lazily initializes the num_features argument from the input size for 1D data.
A normalization layer that lazily initializes the num_features argument from the input size for 2D data.
LazyInstanceNorm2D
A torch.nn.InstanceNorm2D module with lazy initialization of the num_features argument of the InstanceNorm2D that is inferred from the input.size(1).
LazyInstanceNorm2D Layer
A normalization layer that lazily initializes the num_features argument from the input size for 2D data.
A normalization that lazily initializes the num_features argument from the input size for 3D data.
LazyInstanceNorm3D
A torch.nn.InstanceNorm3D module with lazy initialization of the num_features argument of the InstanceNorm3D that is inferred from the input.size(1).
LazyInstanceNorm3D Layer
A normalization that lazily initializes the num_features argument from the input size for 3D data.
An activation layer that applies the leaky rectified linear unit function element-wise.
Leaky version of a Rectified Linear Unit.
LeakyReLU Layer
An activation layer that applies the leaky rectified linear unit function element-wise.
A regression analysis which approximates the solution of overdetermined systems by minimizing the sum of the squares of the residuals.
Least-squares Analysis
A regression analysis which approximates the solution of overdetermined systems by minimizing the sum of the squares of the residuals.
A large language model that continually acquires new knowledge over time without forgetting previously learned information maintaining a balance between plasticity and stability.
Continual Learning LLM
Forever Learning
Catastrophic forgetting
Plasticity-Stability balance
Lifelong Learning LLM
A mathematical function that has the form f(x) = a + bx.
Linear Function
A mathematical function that has the form f(x) = a + bx.
A regression analysis that is a linear approach for modeling the relationship between a scalar response and one or more explanatory variables.
Linear Regression
A regression analysis that is a linear approach for modeling the relationship between a scalar response and one or more explanatory variables.
A use and interpretation bias arising when network attributes obtained from user connections activities or interactions misrepresent true user behavior.
Bias arising when network attributes obtained from user connections, activities, or interactions misrepresent true user behavior.
Linking Bias
A use and interpretation bias arising when network attributes obtained from user connections activities or interactions misrepresent true user behavior.
A network that is a type of reservoir computer turning time-varying input into spatio-temporal activation patterns.
LSM
Layers: Input, Spiking Hidden, Output
Liquid State Machine Network
A network that is a type of reservoir computer turning time-varying input into spatio-temporal activation patterns.
A normalization layer that applies local response normalization over an input signal composed of several input planes.
LocalResponseNorm
Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension.
LocalResponseNorm Layer
A normalization layer that applies local response normalization over an input signal composed of several input planes.
A locally-connected layer for 1D inputs where each patch of the input is convolved with a different set of filters.
Locally-connected layer for 1D inputs. The LocallyConnected1D layer works similarly to the Conv1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
LocallyConnected1D Layer
A locally-connected layer for 1D inputs where each patch of the input is convolved with a different set of filters.
A locally-connected layer for 2D inputs where each patch of the input is convolved with a different set of filters.
Locally-connected layer for 2D inputs. The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
LocallyConnected2D Layer
A locally-connected layer for 2D inputs where each patch of the input is convolved with a different set of filters.
A layer that works similarly to the Convolution1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
Locally-connected Layer
A layer that works similarly to the Convolution1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
A regression analysis that estimates the probability of an event occurring by modeling the log-odds of the event as a linear combination of one or more independent variables.
Logistic Regression
A regression analysis that estimates the probability of an event occurring by modeling the log-odds of the event as a linear combination of one or more independent variables.
A recurrent neural network with feedback connections that processes entire sequences of data.
LSTM
Layers: Input, Memory Cell, Output
Long Short Term Memory
A recurrent neural network with feedback connections that processes entire sequences of data.
An individual bias occurring when automation leads to humans being unaware of their situation making them unprepared to assume control in cooperative systems.
Loss Of Situational Awareness Bias
An individual bias occurring when automation leads to humans being unaware of their situation making them unprepared to assume control in cooperative systems.
A large language model which is optimized for performance in scenarios with limited data, computational resources, or for languages with sparse datasets.
Low-Resource Language Model
low-resource languages
resource-efficient
Low-Resource LLM
A process intended to build methods that learn from data.
Machine Learning
Machine Learning Task
A process intended to build methods that learn from data.
A dimensionality reduction method based on the assumption that observed data lie on a low-dimensional manifold embedded in a higher-dimensional space.
Manifold Learning
A dimensionality reduction method based on the assumption that observed data lie on a low-dimensional manifold embedded in a higher-dimensional space.
A network that is a stochastic model describing a sequence of possible events where the probability of each event depends only on the previous event's state.
MC
MP
Markov Process
Layers: Probalistic Hidden
Markov Chain
A network that is a stochastic model describing a sequence of possible events where the probability of each event depends only on the previous event's state.
A language model that is trained to predict randomly masked tokens in a sequence based on the remaining unmasked tokens allowing it to build deep bidirectional representations that can be effectively transferred to various NLP tasks via fine-tuning.
bidirectional encoder
denoising autoencoder
Masked Language Model
A layer that masks a sequence by using a mask value to skip timesteps.
Masks a sequence by using a mask value to skip timesteps. For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking). If any downstream layer does not support masking yet receives such an input mask, an exception will be raised.
Masking Layer
A layer that masks a sequence by using a mask value to skip timesteps.
An input layer with a shape corresponding to that of the output layer.
Matched Input-Output Layer
A mathematical rule that gives the value of a dependent variable corresponding to specified values of one or more independent variables.
Mathematical Function
A mathematical rule that gives the value of a dependent variable corresponding to specified values of one or more independent variables.
A pooling layer that performs max pooling operation for temporal data.
MaxPool1D
MaxPooling1D
Max pooling operation for 1D temporal data. Downsamples the input representation by taking the maximum value over a spatial window of size pool_size. The window is shifted by strides. The resulting output, when using the "valid" padding option, has a shape of: output_shape = (input_shape - pool_size + 1) / strides) The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides.
MaxPooling1D Layer
A pooling layer that performs max pooling operation for temporal data.
A pooling layer that performs max pooling operation for spatial data.
MaxPool2D
MaxPooling2D
Max pooling operation for 2D spatial data.
MaxPooling2D Layer
A pooling layer that performs max pooling operation for spatial data.
A pooling layer that performs max pooling operation for 3D data (spatial or spatio-temporal).
MaxPool3D
MaxPooling3D
Max pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension.
MaxPooling3D Layer
A pooling layer that performs max pooling operation for 3D data (spatial or spatio-temporal).
A pooling layer that computes a partial inverse of MaxPool1D.
MaxUnpool1D
Computes a partial inverse of MaxPool1D.
MaxUnpool1D Layer
A pooling layer that computes a partial inverse of MaxPool1D.
A pooling layer that computes a partial inverse of MaxPool2D.
MaxUnpool2D
Computes a partial inverse of MaxPool2D.
MaxUnpool2D Layer
A pooling layer that computes a partial inverse of MaxPool2D.
A pooling layer that computes a partial inverse of MaxPool3D.
MaxUnpool3D
Computes a partial inverse of MaxPool3D.
MaxUnpool3D Layer
A pooling layer that computes a partial inverse of MaxPool3D.
A merging layer that computes the maximum (element-wise) of a list of inputs.
Layer that computes the maximum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Maximum Layer
A merging layer that computes the maximum (element-wise) of a list of inputs.
A selection and sampling bias arising when features and labels are proxies for desired quantities potentially leading to differential performance.
Measurement Bias
A selection and sampling bias arising when features and labels are proxies for desired quantities potentially leading to differential performance.
A large language model which incorporates external writable and readable memory components, allowing it to store and retrieve information over long contexts.
Memory-Augmented Large Language Model
external memory
Memory-Augmented LLM
A large language model which incorporates external writable and readable memory components, allowing it to store and retrieve information over long contexts.
A layer of cells, each with an internal state or weights.
Memory Cell Layer
A layer of cells, each with an internal state or weights.
A layer used to merge a list of inputs.
Merging Layer
A layer used to merge a list of inputs.
A machine learning task that automatically learns from metadata about machine learning experiments.
Meta-Learning
A machine learning task that automatically learns from metadata about machine learning experiments.
A large language model which is trained in a way that allows it to quickly adapt to new tasks or datasets through only a few examples or fine-tuning steps, leveraging meta-learned priors about how to efficiently learn.
Meta-Learning Large Language Model
few-shot adaptation
learning to learn
Meta-Learning LLM
A deep neural network that learns a representation function mapping objects into an embedded space.
Distance Metric Learning
Learning a representation function that maps objects into an embedded space.
Metric Learning
A deep neural network that learns a representation function mapping objects into an embedded space.
A merging layer that computes the minimum (element-wise) of a list of inputs.
Layer that computes the minimum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Minimum Layer
A merging layer that computes the minimum (element-wise) of a list of inputs.
A large language model which dynamically selects and combines outputs from multiple expert submodels, allowing for efficient scaling by conditionally activating only a subset of model components for each input.
Mixture-of-Experts Large Language Model
MoE Large Language Model
conditional computation
model parallelism
Mixture-of-Experts LLM
A large language model which dynamically selects and combines outputs from multiple expert submodels, allowing for efficient scaling by conditionally activating only a subset of model components for each input.
An individual bias occurring when modal interfaces confuse human operators causing actions appropriate for a different mode but incorrect for the current situation.
Mode Confusion Bias
An individual bias occurring when modal interfaces confuse human operators causing actions appropriate for a different mode but incorrect for the current situation.
An abstract representation of a complex system generally assembled as a set of logical mathematical or conceptual properties to simulate or understand the system's behavior.
Model
An abstract representation of a complex system generally assembled as a set of logical mathematical or conceptual properties to simulate or understand the system's behavior.
A processing bias introduced when using data to select a single "best" model from many or when an explanatory variable has a weak relationship with the response variable.
Model Selection Bias
A processing bias introduced when using data to select a single "best" model from many or when an explanatory variable has a weak relationship with the response variable.
A modular language model that consists of multiple specialized components or skills that can be dynamically composed and recombined to solve complex tasks mimicking the modular structure of human cognition.
Modular Large Language Model
component skills
skill composition
Modular LLM
A modular language model that consists of multiple specialized components or skills that can be dynamically composed and recombined to solve complex tasks mimicking the modular structure of human cognition.
A language model that consists of multiple specialized components or skills that can be dynamically composed and recombined to solve complex tasks mimicking the modular structure of human cognition.
Modular LM
Modular Language Model
A language model that consists of multiple specialized components or skills that can be dynamically composed and recombined to solve complex tasks mimicking the modular structure of human cognition.
An attention layer that allows the model to attend to information from different representation subspaces.
MultiHeadAttention layer. This is an implementation of multi-headed attention as described in the paper "Attention is all you Need" (Vaswani et al., 2017). If query, key, value are the same, then this is self-attention. Each timestep in query attends to the corresponding sequence in key, and returns a fixed-width vector.This layer first projects query, key and value. These are (effectively) a list of tensors of length num_attention_heads, where the corresponding shapes are (batch_size, <query dimensions>, key_dim), (batch_size, <key/value dimensions>, key_dim), (batch_size, <key/value dimensions>, value_dim).Then, the query and key tensors are dot-producted and scaled. These are softmaxed to obtain attention probabilities. The value tensors are then interpolated by these probabilities, then concatenated back to a single tensor. Finally, the result tensor with the last dimension as value_dim can take an linear projection and return. When using MultiHeadAttention inside a custom Layer, the custom Layer must implement build() and call MultiHeadAttention's _build_from_signature(). This enables weights to be restored correctly when the model is loaded.
MultiHeadAttention Layer
An attention layer that allows the model to attend to information from different representation subspaces.
A large language model which is trained jointly on multiple language tasks simultaneously, learning shared representations that transfer across tasks.
Multi-Task Large Language Model
transfer learning
Multi-Task LLM
A classification focused on methods that classify instances into one of three or more classes.
Multinomial Classification
Methods that classify instances into one of three or more classes.
Multiclass Classification
A classification focused on methods that classify instances into one of three or more classes.
A dimensionality reduction method that translates information about the pairwise distances among a set of objects or individuals into a configuration of points mapped into an abstract Cartesian space.
MDS
Multidimensional Scaling
A dimensionality reduction method that translates information about the pairwise distances among a set of objects or individuals into a configuration of points mapped into an abstract Cartesian space.
A large language model that is trained on text from multiple languages learning shared representations that enable zero-shot or few-shot transfer to new languages.
Multilingual Large Language Model
cross-lingual transfer
Multilingual LLM
A deep neural network that processes and links information using various modalities.
Creating models that process and link information using various modalities.
Multimodal Deep Learning
A deep neural network that processes and links information using various modalities.
A large language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation.
cross-modal grounding
Multimodal Fusion LLM
A multimodal large language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation.
Multimodal Large Language Model
cross-modal grounding
Multimodal LLM
A multimodal large language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation.
A language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation.
Mulimodal LM
Multimodal Language Model
A language model that learns joint representations across different modalities like text vision and audio in an end-to-end fashion for better cross-modal understanding and generation.
A machine learning task that uses multiple modalities of data such as text audio and images to improve learning outcomes.
A type of deep learning that uses multiple modalities of data, such as text, audio, and images, to improve learning outcomes.
Multimodal Learning
A machine learning task that uses multiple modalities of data such as text audio and images to improve learning outcomes.
A multimodal large language model which processes prompts that include multiple modalities, such as both text and images, to generate relevant responses.
Multimodal Prompt-based Language Model
A multimodal large language model which processes prompts that include multiple modalities, such as both text and images, to generate relevant responses.
A transformer network that processes and relates information from different modalities such as text images and audio using a shared embedding space and attention mechanism to learn joint representations across modalities.
unified encoder
vision-language model
Multimodal Transformer
A merging layer that multiplies (element-wise) a list of inputs.
Layer that multiplies (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Multiply Layer
A merging layer that multiplies (element-wise) a list of inputs.
A machine learning task focused on the interactions between computers and human language including programming computers to process and analyze large amounts of natural language data.
NLP
Natural Language Processing
A machine learning task focused on the interactions between computers and human language including programming computers to process and analyze large amounts of natural language data.
A system of interconnected nodes or entities for communication, computation or data exchange.
Network
A deep feedforward network that combines neural network pattern matching with the algorithmic power of programmable computers.
NTM
Layers: Input, Hidden, Spiking Hidden, Output
Neural Turing Machine Network
A deep feedforward network that combines neural network pattern matching with the algorithmic power of programmable computers.
A large language model which combines neural language modeling with symbolic reasoning components, leveraging structured knowledge representations and logical inferences to improve reasoning capabilities.
Neuro-Symbolic Large Language Model
knowledge reasoning
symbolic grounding
Neuro-Symbolic LLM
A layer that is a densely-connected neural network layer with added noise for regularization.
Noisy dense layer that injects random noise to the weights of dense layer. Noisy dense layers are fully connected layers whose weights and biases are augmented by factorised Gaussian noise. The factorised Gaussian noise is controlled through gradient descent by a second weights layer. A NoisyDense layer implements the operation: $$ mathrm{NoisyDense}(x) = mathrm{activation}(mathrm{dot}(x, mu + (sigma cdot epsilon)) mathrm{bias}) $$ where mu is the standard weights layer, epsilon is the factorised Gaussian noise, and delta is a second weights layer which controls epsilon.
Noise Dense Layer
A layer that is a densely-connected neural network layer with added noise for regularization.
An input layer that adds noise to each value.
Noisy Input Layer
An input layer that adds noise to each value.
A data preparation that transforms data into a standard format or scale typically to reduce redundancy and improve consistency often involving the adjustment of values measured on different scales to a common scale.
Normalization
A numerical features prepreprocessing layer that normalizes continuous features.
Normalization Layer
A numerical features prepreprocessing layer that normalizes continuous features.
A layer that performs numerical data preprocessing operations.
Numerical Features Preprocessing Layer
A layer that performs numerical data preprocessing operations.
A deep neural network that classified objects from one or only a few examples.
OSL
One-shot Learning
A deep neural network that classified objects from one or only a few examples.
A large language model that is trained to model ordinal relationships and rank outputs rather than model probability distributions over text sequences directly.
Ordinal Large Language Model
preference modeling
ranking
Ordinal LLM
A layer containing the last neurons in the network that produces given outputs for the program.
The output layer in an artificial neural network is the last layer of neurons that produces given outputs for the program. Though they are made much like other artificial neurons in the neural network, output layer neurons may be built or observed in a different way, given that they are the last “actor” nodes on the network.
Output Layer
A layer containing the last neurons in the network that produces given outputs for the program.
An activation layer that applies parametric rectified linear unit function element-wise.
Parametric Rectified Linear Unit.
PReLU Layer
An activation layer that applies parametric rectified linear unit function element-wise.
An artificial neural network with a supervised learning algorithm for binary classification using a linear predictor function.
FFN
Feed-Forward Network
SLP
Single Layer Perceptron
Layers: Input, Output
Perceptron
A reshaping layer that permutes the dimensions of the input according to a given pattern.
Permutes the dimensions of the input according to a given pattern. Useful e.g. connecting RNNs and convnets.
Permute Layer
A reshaping layer that permutes the dimensions of the input according to a given pattern.
A large language model that adapts its language modeling and generation to the preferences style and persona of individual users or audiences.
Personalized Large Language Model
user adaptation LLM
Personalized LLM
A layer that, after taking a set of states or values as input, predicts a probability distribution of actions to take.
Policy Layer
A layer that serves to mitigate the sensitivity of convolutional layers to location and spatially downsample representations.
Pooling layers serve the dual purposes of mitigating the sensitivity of convolutional layers to location and of spatially downsampling representations.
Pooling Layer
A layer that serves to mitigate the sensitivity of convolutional layers to location and spatially downsample representations.
A selection and sampling bias where more popular items are more exposed under-representing less popular items.
Selection bias where more popular items are more exposed, under-representing less popular items.
Popularity Bias
A selection and sampling bias where more popular items are more exposed under-representing less popular items.
A selection and sampling bias characterized by systematic distortions in demographics or other user characteristics between represented users and the target population.
Population Bias
A selection and sampling bias characterized by systematic distortions in demographics or other user characteristics between represented users and the target population.
A process applied to raw data before it is used in a machine learning model, including tasks such as normalization, scaling, encoding, and transformation, to ensure the data is in an appropriate format and quality for analysis.
Preprocessing
A process applied to raw data before it is used in a machine learning model, including tasks such as normalization, scaling, encoding, and transformation, to ensure the data is in an appropriate format and quality for analysis.
A layer that performs data preprocessing operations.
Preprocessing Layer
A layer that performs data preprocessing operations.
An individual bias arising from how information is presented on the Web via a user interface due to rating or ranking of output or through users' self-selected biased interaction.
Presentation Bias
An individual bias arising from how information is presented on the Web via a user interface due to rating or ranking of output or through users' self-selected biased interaction.
A dimensionality reduction method for analyzing large datasets with high-dimensional features per observation increasing data interpretability while preserving maximum information and enabling visualization.
PCA
Principal Component Analysis
A dimensionality reduction method for analyzing large datasets with high-dimensional features per observation increasing data interpretability while preserving maximum information and enabling visualization.
A machine learning task in which a graph expresses the conditional dependence structure between random variables.
Graphical Model
PGM
Structure Probabilistic Model
Probabilistic Graphical Model
A machine learning task in which a graph expresses the conditional dependence structure between random variables.
A hidden layer that estimates the probability of a sample being within a certain category.
Probabilistic Hidden Layer
A probabilistic graphical model that uses statistical techniques to analyze the words in each text to discover common themes their connections and their changes over time.
Probabilistic Topic Model
A probabilistic graphical model that uses statistical techniques to analyze the words in each text to discover common themes their connections and their changes over time.
A computational bias resulting from judgment modulated by affect influenced by the level of efficacy and efficiency in information processing.
Validation Bias
Judgment modulated by affect, influenced by the level of efficacy and efficiency in information processing; often referred to as aesthetic judgment in cognitive sciences.
Processing Bias
A computational bias resulting from judgment modulated by affect influenced by the level of efficacy and efficiency in information processing.
A large language model which is fine-tuned on a small number of examples or prompts, rather than full task datasets. This allows for rapid adaptation to new tasks with limited data, leveraging the model's few-shot learning capabilities.
Prompt-based Fine-Tuning Large Language Model
Prompt-tuned Large Language Model
few-shot learning
in-context learning
Prompt-based Fine-Tuning LLM
A regression analysis for survival analysis where the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate.
Proportional Hazards Model
A regression analysis for survival analysis where the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate.
The base class for recurrent layers.
Base class for recurrent layers.
RNN Layer
The base class for recurrent layers.
A deep feedforward network that uses radial basis functions as activation functions for pattern recognition and interpolation.
RBFN
RBN
Radial Basis Function Network
Layers: Input, Hidden, Output
Radial Basis Network
A deep feedforward network that uses radial basis functions as activation functions for pattern recognition and interpolation.
An image preprocessing layer that randomly adjusts brightness during training.
A preprocessing layer which randomly adjusts brightness during training. This layer will randomly increase/reduce the brightness for the input RGB images. At inference time, the output will be identical to the input. Call the layer with training=True to adjust the brightness of the input. Note that different brightness adjustment factors will be apply to each the images in the batch.
RandomBrightness Layer
An image preprocessing layer that randomly adjusts brightness during training.
An image preprocessing layer that randomly adjusts contrast during training.
A preprocessing layer which randomly adjusts contrast during training. This layer will randomly adjust the contrast of an image or images by a random factor. Contrast is adjusted independently for each channel of each image during training. For each channel, this layer computes the mean of the image pixels in the channel and then adjusts each component x of each pixel to (x - mean) * contrast_factor + mean. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and in integer or floating point dtype. By default, the layer will output floats. The output value will be clipped to the range [0, 255], the valid range of RGB colors.
RandomContrast Layer
An image preprocessing layer that randomly adjusts contrast during training.
An image preprocessing layer that randomly crops images during training.
A preprocessing layer which randomly crops images during training. During training, this layer will randomly choose a location to crop images down to a target size. The layer will crop all the images in the same batch to the same cropping location. At inference time, and during training if an input image is smaller than the target size, the input will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. If you need to apply random cropping at inference time, set training to True when calling the layer. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
RandomCrop Layer
An image preprocessing layer that randomly crops images during training.
A regression analysis where the model parameters are random variables.
REM
Random Effects Model
A regression analysis where the model parameters are random variables.
An image preprocessing layer that randomly flips images during training.
A preprocessing layer which randomly flips images during training. This layer will flip the images horizontally and or vertically based on the mode attribute. During inference time, the output will be identical to input. Call the layer with training=True to flip the input. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
RandomFlip Layer
An image preprocessing layer that randomly flips images during training.
An ensemble learning method for classification regression and other tasks that constructs a multitude of decision trees during training.
Random Forest
An ensemble learning method for classification regression and other tasks that constructs a multitude of decision trees during training.
An image preprocessing layer that randomly varies image height during training.
A preprocessing layer which randomly varies image height during training. This layer adjusts the height of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference.
RandomHeight Layer
An image preprocessing layer that randomly varies image height during training.
An image preprocessing layer that randomly rotates images during training.
A preprocessing layer which randomly rotates images during training.
RandomRotation Layer
An image preprocessing layer that randomly rotates images during training.
An image preprocessing layer that randomly translates images during training.
A preprocessing layer which randomly translates images during training. This layer will apply random translations to each image during training, filling empty space according to fill_mode. aInput pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
RandomTranslation Layer
An image preprocessing layer that randomly translates images during training.
An image preprocessing layer that randomly varies image width during training.
A preprocessing layer which randomly varies image width during training. This layer will randomly adjusts the width of a batch of images of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference.
RandomWidth Layer
An image preprocessing layer that randomly varies image width during training.
An image preprocessing layer that randomly zooms in or out on images during training.
A preprocessing layer which randomly zooms images during training. This layer will randomly zoom in or out on each axis of an image independently, filling empty space according to fill_mode.Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
RandomZoom Layer
An image preprocessing layer that randomly zooms in or out on images during training.
An anchoring bias characterized by the idea that top-ranked results are the most relevant and important leading to more clicks than other results.
Ranking Bias
An anchoring bias characterized by the idea that top-ranked results are the most relevant and important leading to more clicks than other results.
An individual bias characterized by differences in perspective memory recall interpretation and reporting of the same event by multiple persons or witnesses.
Rashomon Effect
Rashomon Principle
Differences in perspective, memory, recall, interpretation, and reporting of the same event by multiple persons or witnesses.
Rashomon Effect Bias
An individual bias characterized by differences in perspective memory recall interpretation and reporting of the same event by multiple persons or witnesses.
A mathematical function that returns max(x 0) the element-wise maximum of 0 and the input tensor.
ReLU
Rectified Linear Unit
The ReLU activation function returns: max(x, 0), the element-wise maximum of 0 and the input tensor.
ReLU Function
A mathematical function that returns max(x 0) the element-wise maximum of 0 and the input tensor.
An activation layer that applies the rectified linear unit function element-wise.
Rectified Linear Unit activation function. With default values, it returns element-wise max(x, 0).
ReLU Layer
An activation layer that applies the rectified linear unit function element-wise.
A large language model that incorporates explicit reasoning capabilities leveraging logical rules axioms or external knowledge to make deductive inferences during language tasks.
Rational Large Language Model
Reasoning Large Language Model
logical inferences
reasoning
Reasoning LLM
A large language model that incorporates explicit reasoning capabilities leveraging logical rules axioms or external knowledge to make deductive inferences during language tasks.
A layer composed of recurrent units with the number equal to the hidden size of the layer.
Recurrent Layer
A layer composed of recurrent units with the number equal to the hidden size of the layer.
A network with connections forming a directed graph along a temporal sequence enabling dynamic behavior.
RN
RecNN
Recurrent Network
Recurrent Neural Network
A language model that uses recursive neural network architectures like TreeLSTMs to learn syntactic composition functions improving systematic generalization abilities.
Recursive Large Language Model
Self-Attending Large Language Model
iterative refinement
self-attention
Recursive LLM
A language model that uses recursive neural network architectures like TreeLSTMs to learn syntactic composition functions improving systematic generalization abilities.
A deep neural network that uses recursive neural network architectures like TreeLSTMs to learn syntactic composition functions improving systematic generalization abilities.
RLM
Compositional generalization
Layers: Input, Memory Cell, Output
Recursive Language Model
A deep neural network that uses recursive neural network architectures like TreeLSTMs to learn syntactic composition functions improving systematic generalization abilities.
A deep neural network that recursively applies weights over structured input to generate structured or scalar predictions.
RecuNN
RvNN
Recursive Neural Network
A deep neural network that recursively applies weights over structured input to generate structured or scalar predictions.
A set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables.
Regression analysis
Regression model
Regression Analysis
A set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables.
A layer that applies penalties on layer parameters or layer activity during optimization summed into the loss function that the network optimizes.
Regularizers allow you to apply penalties on layer parameters or layer activity during optimization. These penalties are summed into the loss function that the network optimizes. Regularization penalties are applied on a per-layer basis.
Regularization Layer
A layer that applies penalties on layer parameters or layer activity during optimization summed into the loss function that the network optimizes.
A machine learning task focused on methods that do not require labeled input/output pairs or explicit correction of sub-optimal actions focusing instead on balancing exploration and exploitation to optimize performance over time.
Reinforcement Learning
A machine learning task focused on methods that do not require labeled input/output pairs or explicit correction of sub-optimal actions focusing instead on balancing exploration and exploitation to optimize performance over time.
A large language model that is fine-tuned using reinforcement learning where the model receives rewards for generating text that satisfies certain desired properties or objectives improving the quality safety or alignment of generated text.
RL-Large Language Model
Reinforcement Learning Large Language Model
decision transformers
reward modeling
An RL-LLM is a language model fine-tuned using reinforcement learning, where the model receives rewards for generating text that satisfies certain desired properties or objectives. This can improve the quality, safety, or alignment of generated text.
Reinforcement Learning LLM
A reshaping layer that repeats the input n times.
Repeats the input n times.
RepeatVector Layer
A reshaping layer that repeats the input n times.
A selection and sampling bias due to non-random sampling of subgroups making trends non-generalizable to new populations.
Bias due to non-random sampling of subgroups, making trends non-generalizable to new populations.
Representation Bias
A selection and sampling bias due to non-random sampling of subgroups making trends non-generalizable to new populations.
A deep neural network that discovers representations required for feature detection or classification from raw data.
Feature Learning
Discovering representations required for feature detection or classification from raw data.
Representation Learning
A deep neural network that discovers representations required for feature detection or classification from raw data.
An image preprocessing layer that rescales input values to a new range.
Rescaling Layer
An image preprocessing layer that rescales input values to a new range.
A reshaping layer that reshapes the inputs into the given shape.
Reshape Layer
A reshaping layer that reshapes the inputs into the given shape.
A layer that is used to change the shape of the input.
Reshape Layer
Reshape layers are used to change the shape of the input.
Reshaping Layer
A layer that is used to change the shape of the input.
A deep neural network that employs skip connections to bypass layers facilitating learning of residual functions.
DRN
Deep Residual Network
ResNN
ResNet
Layers: Input, Weight, BN, ReLU, Weight, BN, Addition, ReLU
Residual Neural Network
A deep neural network that employs skip connections to bypass layers facilitating learning of residual functions.
An image preprocessing layer that resizes images to a target size.
A preprocessing layer which resizes images. This layer resizes an image input to a target height and width. The input should be a 4D (batched) or 3D (unbatched) tensor in "channels_last" format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. This layer can be called on tf.RaggedTensor batches of input images of distinct sizes, and will resize the outputs to dense tensors of uniform size.
Resizing Layer
An image preprocessing layer that resizes images to a target size.
A Boltzmann machine network that learns the probability distribution of its input data.
RBM
Layers: Backfed Input, Probabilistic Hidden
Restricted Boltzmann Machine
A Boltzmann machine network that learns the probability distribution of its input data.
A large language model which combines a pre-trained language model with a retrieval system that can access external knowledge sources. This allows the model to condition its generation on relevant retrieved knowledge, improving factual accuracy and knowledge grounding.
Retrieval-Augmented Large Language Model
knowledge grounding
open-book question answering
Retrieval-Augmented LLM
A regression analysis that estimates the coefficients of multiple regression models in scenarios where the independent variables are highly correlated.
Ridge Regression
A regression analysis that estimates the coefficients of multiple regression models in scenarios where the independent variables are highly correlated.
A mathematical function that multiplies scale (> 1) with the output of the ELU function to ensure a slope larger than one for positive inputs.
SELU
Scaled Exponential Linear Unit
SELU Function
A mathematical function that multiplies scale (> 1) with the output of the ELU function to ensure a slope larger than one for positive inputs.
A model that extends ARIMA, explicitly supporting univariate time series data with a seasonal component, combining seasonal differencing with ARIMA modeling.
SARIMA
Seasonal Autoregressive Integrated Moving-Average
A computational bias introduced by non-random selection of individuals groups or data failing to ensure representativeness.
Sampling Bias
Selection Bias
Selection Effect
Bias introduced by non-random selection of individuals, groups, or data, failing to ensure representativeness.
Selection And Sampling Bias
A computational bias introduced by non-random selection of individuals groups or data failing to ensure representativeness.
An individual bias characterized by the tendency to selectively adopt algorithmic advice that matches pre-existing beliefs and stereotypes.
Selective Adherence Bias
An individual bias characterized by the tendency to selectively adopt algorithmic advice that matches pre-existing beliefs and stereotypes.
A large language model which learns rich representations by solving pretext tasks that involve predicting parts of the input from other observed parts of the data, without relying on human-annotated labels.
Pretext tasks
Self-Supervised LLM
A machine learning task that is intermediate between supervised and unsupervised learning and predicts parts of the input data from other observed parts without relying on human-annotated labels.
Self-supervised Learning
A machine learning task that is intermediate between supervised and unsupervised learning and predicts parts of the input data from other observed parts without relying on human-annotated labels.
A large language model which combines self-supervised pretraining on unlabeled data with supervised fine-tuning on labeled task data.
Semi-Supervised Large Language Model
self-training
Semi-Supervised LLM
A convolutional layer that performs depthwise separable 1D convolution.
SeparableConv1D Layer
Depthwise separable 1D convolution. This layer performs a depthwise convolution that acts separately on channels, followed by a pointwise convolution that mixes channels. If use_bias is True and a bias initializer is provided, it adds a bias vector to the output. It then optionally applies an activation function to produce the final output.a
SeparableConvolution1D Layer
A convolutional layer that performs depthwise separable 1D convolution.
A convolutional layer that performs depthwise separable 2D convolution.
SeparableConv2D Layer
Depthwise separable 2D convolution. Separable convolutions consist of first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes the resulting output channels. The depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step. Intuitively, separable convolutions can be understood as a way to factorize a convolution kernel into two smaller kernels, or as an extreme version of an Inception block.
SeparableConvolution2D Layer
A convolutional layer that performs depthwise separable 2D convolution.
A mathematical function that applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)) always returning a value between 0 and 1.
tore
Applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)). For small values (<-5), sigmoid returns a value close to zero, and for large values (>5) the result of the function gets close to 1. Sigmoid is equivalent to a 2-element Softmax, where the second element is assumed to be zero. The sigmoid function always returns a value between 0 and 1.
Sigmoid Function
A mathematical function that applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)) always returning a value between 0 and 1.
A layer that processes one step within the whole time sequence input for a SimpleRNN layer.
Cell class for SimpleRNN. This class processes one step within the whole time sequence input, whereas tf.keras.layer.SimpleRNN processes the whole sequence.
SimpleRNNCell Layer
A layer that processes one step within the whole time sequence input for a SimpleRNN layer.
A recurrent layer that implements a fully-connected RNN where the output is to be fed back to input.
Fully-connected RNN where the output is to be fed back to input.
SimpleRNN Layer
A recurrent layer that implements a fully-connected RNN where the output is to be fed back to input.
A selection and sampling bias where the association between two variables changes when controlling for another variable.
Simpson's Paradox
Simpon's Paradox Bias
A selection and sampling bias where the association between two variables changes when controlling for another variable.
A systemic bias characterized by being for or against groups or individuals based on social identities demographic factors or immutable physical characteristics often manifesting as stereotypes.
Societal Bias
A systemic bias characterized by being for or against groups or individuals based on social identities demographic factors or immutable physical characteristics often manifesting as stereotypes.
A mathematical function where the elements of the output vector are in range (0 1) and sum to 1 and each vector is handled independently.
The elements of the output vector are in range (0, 1) and sum to 1. Each vector is handled independently. The axis argument sets which axis of the input the function is applied along. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution. The softmax of each vector x is computed as exp(x) / tf.reduce_sum(exp(x)). The input values in are the log-odds of the resulting probability.
Softmax Function
A mathematical function where the elements of the output vector are in range (0 1) and sum to 1 and each vector is handled independently.
An activation layer that applies the softmax function to the inputs.
Softmax activation function.
Softmax Layer
An activation layer that applies the softmax function to the inputs.
A mathematical function that is softplus(x) = log(exp(x) + 1).
softplus(x) = log(exp(x) + 1)
Softplus Function
A mathematical function that is softplus(x) = log(exp(x) + 1).
A mathematical function that is softsign(x) = x / (abs(x) + 1).
softsign(x) = x / (abs(x) + 1)
Softsign Function
A mathematical function that is softsign(x) = x / (abs(x) + 1).
An autoencoder network with more hidden units than inputs that constrains only a few hidden units to be active at once.
SAE
Sparse AE
Sparse Autoencoder
Layers: Input, Hidden, Matched Output-Input
Sparse Auto Encoder
A large language model that uses techniques like pruning or quantization to reduce the number of non-zero parameters in the model making it more parameter-efficient and easier to deploy on resource-constrained devices.
Sparse Large Language Model
model compression
parameter efficiency
Sparse LLM
A representation learning network that finds sparse representations of input data as a linear combination of basic elements and identifies those elements.
Sparse coding
Sparse dictionary Learning
Sparse Learning
A representation learning network that finds sparse representations of input data as a linear combination of basic elements and identifies those elements.
A regularization layer that performs the same function as Dropout but drops entire 1D feature maps instead of individual elements.
Spatial 1D version of Dropout. This version performs the same function as Dropout, however, it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout1D will help promote independence between feature maps and should be used instead.
SpatialDropout1D Layer
A regularization layer that performs the same function as Dropout but drops entire 1D feature maps instead of individual elements.
A regularization layer that performs the same function as Dropout but drops entire 2D feature maps instead of individual elements.
Spatial 2D version of Dropout. This version performs the same function as Dropout, however, it drops entire 2D feature maps instead of individual elements. If adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout2D will help promote independence between feature maps and should be used instead.a
SpatialDropout2D Layer
A regularization layer that performs the same function as Dropout but drops entire 2D feature maps instead of individual elements.
A regularization layer that performs the same function as Dropout but drops entire 3D feature maps instead of individual elements.
Spatial 3D version of Dropout. This version performs the same function as Dropout, however, it drops entire 3D feature maps instead of individual elements. If adjacent voxels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout3D will help promote independence between feature maps and should be used instead.
SpatialDropout3D Layer
A regularization layer that performs the same function as Dropout but drops entire 3D feature maps instead of individual elements.
A regression analysis used to model spatial relationships.
Spatial Regression
A regression analysis used to model spatial relationships.
A hidden layer that makes connections to an additional, heterogeneous hidden layer; modeled after biological neural networks.
Spiking Hidden Layer
A hidden layer that makes connections to an additional, heterogeneous hidden layer; modeled after biological neural networks.
A layer that allows a stack of RNN cells to behave as a single cell.
Wrapper allowing a stack of RNN cells to behave as a single cell. Used to implement efficient stacked RNNs.
StackedRNNCells Layer
A layer that allows a stack of RNN cells to behave as a single cell.
An individual bias where people search only where it is easiest to look.
Streetlight Effect
Streetlight Effect Bias
An individual bias where people search only where it is easiest to look.
A categorical features preprocessing layer that maps string features to integer indices.
StringLookup Layer
A categorical features preprocessing layer that maps string features to integer indices.
A merging layer that subtracts two inputs.
Layer that subtracts two inputs. It takes as input a list of tensors of size 2, both of the same shape, and returns a single tensor, (inputs[0] - inputs[1]), also of the same shape.
Subtract Layer
A merging layer that subtracts two inputs.
A data preparation that divides text into subword units which are smaller than words but larger than individual characters to improve the efficiency and effectiveness of natural language processing models by capturing meaningful subunits of words.
Fragmentation
Part-word Division
Byte Pair Encoding
SentencePiece
Subword Segmentation
A bias characterized by the tendency to continue an endeavor due to previously invested resources despite costs outweighing benefits.
Sunk Cost Fallacy
The tendency to continue an endeavor due to previously invested resources, despite costs outweighing benefits.
Sunk Cost Fallacy Bias
A bias characterized by the tendency to continue an endeavor due to previously invested resources despite costs outweighing benefits.
A biclustering task focused on methods that simultaneously cluster the rows and columns of a labeled matrix considering data labels to enhance cluster coherence.
Supervised Block Clustering
Supervised Co-clustering
Supervised Joint Clustering
Supervised Two-mode Clustering
Supervised Two-way Clustering
Supervised Biclustering
A biclustering task focused on methods that simultaneously cluster the rows and columns of a labeled matrix considering data labels to enhance cluster coherence.
A clustering focused on methods that group labeled objects such that objects in the same group have similar labels relative to those in other groups.
Supervised Cluster Analysis
Supervised Clustering
A clustering focused on methods that group labeled objects such that objects in the same group have similar labels relative to those in other groups.
A machine learning task focused on methods that learn a function mapping input to output based on example input-output pairs.
Supervised Learning
A machine learning task focused on methods that learn a function mapping input to output based on example input-output pairs.
A network with supervised learning models for classification and regression that maps training examples to points in space maximizing the gap between categories.
SVM
SVN
Supper Vector Network
Layers: Input, Hidden, Output
Support Vector Machine
A network with supervised learning models for classification and regression that maps training examples to points in space maximizing the gap between categories.
A machine learning task focused on methods for analyzing the expected duration of time until one or more events occur such as death in biological organisms or failure in mechanical systems.
Survival Analysis
A machine learning task focused on methods for analyzing the expected duration of time until one or more events occur such as death in biological organisms or failure in mechanical systems.
A processing bias characterized by the tendency to focus on items observations or people that "survive" a selection process overlooking those that did not.
The tendency to focus on items, observations, or people that "survive" a selection process, overlooking those that did not.
Survivorship Bias
A processing bias characterized by the tendency to focus on items observations or people that "survive" a selection process overlooking those that did not.
A mathematical function that is x*sigmoid(x) a smooth non-monotonic function that consistently matches or outperforms ReLU on deep networks.
x*sigmoid(x). It is a smooth, non-monotonic function that consistently matches or outperforms ReLU on deep networks, it is unbounded above and bounded below.
Swish Function
A mathematical function that is x*sigmoid(x) a smooth non-monotonic function that consistently matches or outperforms ReLU on deep networks.
A network that is a type of recurrent neural network where connections between units are symmetrical with equal weights in both directions.
SCN
Symmetrically connected networks are a type of recurrent neural network where connections between units are symmetrical, meaning they have equal weights in both directions. This structure allows the network to maintain consistent information flow and equilibrium.
Symmetrically Connected Network
A network that is a type of recurrent neural network where connections between units are symmetrical with equal weights in both directions.
A batch normalization layer that applies synchronous Batch Normalization across multiple devices.
SyncBatchNorm
Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
SyncBatchNorm Layer
A batch normalization layer that applies synchronous Batch Normalization across multiple devices.
A bias resulting from procedures and practices of systems that operate in ways which result in certain social groups being advantaged or favored and others being disadvantaged or devalued.
Systemic Bias
A bias resulting from procedures and practices of systems that operate in ways which result in certain social groups being advantaged or favored and others being disadvantaged or devalued.
A mathematical function that is the hyperbolic tangent activation function.
hyperbolic tangent
Hyperbolic tangent activation function.
Tanh Function
A mathematical function that is the hyperbolic tangent activation function.
A selection and sampling bias arising from differences in populations and behaviors over time.
Temporal Bias
A selection and sampling bias arising from differences in populations and behaviors over time.
A layer that performs text data preprocessing operations.
Text Preprocessing Layer
A layer that performs text data preprocessing operations.
A text preprocessing layer that maps text features to integer sequences.
TextVectorization Layer
A text preprocessing layer that maps text features to integer sequences.
A model that allows for different autoregressive processes depending on the regime or state of the time series, enabling the capture of nonlinear behaviors.
TAR
Threshold Autoregressive
A model that allows for different autoregressive processes depending on the regime or state of the time series, enabling the capture of nonlinear behaviors.
An activation layer that applies the thresholded rectified linear unit function element-wise.
Thresholded Rectified Linear Unit.
ThresholdedReLU Layer
An activation layer that applies the thresholded rectified linear unit function element-wise.
A recurrent layer that applies a layer to every temporal slice of an input.
This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension. Consider a batch of 32 video samples, where each sample is a 128x128 RGB image with channels_last data format, across 10 timesteps. The batch input shape is (32, 10, 128, 128, 3). You can then use TimeDistributed to apply the same Conv2D layer to each of the 10 timesteps, independently:
TimeDistributed Layer
A recurrent layer that applies a layer to every temporal slice of an input.
A machine learning task focused on methods for analyzing time series data to extract meaningful statistics and characteristics.
Methods for analyzing time series data to extract meaningful statistics and characteristics.
Time Series Analysis
A machine learning task focused on methods for analyzing time series data to extract meaningful statistics and characteristics.
A machine learning task focused on methods that predict future values based on previously observed values.
Methods that predict future values based on previously observed values.
Time Series Forecasting
A machine learning task focused on methods that predict future values based on previously observed values.
A data preparation that converts a sequence of text into smaller meaningful units called tokens typically words or subwords for the purpose of analysis or processing by language models.
Lexical Analysis
Text Segmentation
Tokenization
A preprocessing used to train machine learning models including techniques such as supervised learning unsupervised learning reinforcement learning and transfer learning aimed at optimizing model performance.
Instructional Methods
Learning Techniques
TrainingStrategy
A machine learning task focused on methods that reuse or transfer information from previously learned tasks to facilitate the learning of new tasks.
Transfer Learning
A machine learning task focused on methods that reuse or transfer information from previously learned tasks to facilitate the learning of new tasks.
A large language model that leverages knowledge acquired during training on one task to improve performance on different but related tasks facilitating more efficient learning and adaptation.
Transfer LLM
transfer learning
Transfer Learning LLM
A transformer language model with large training corpuses and sets of parameters that uses the transformer architecture based on multi-head attention mechanisms allowing it to contextualize tokens within a context window for effective language understanding and generation.
Transformer Large Language Model
Transformer LLM
A transformer language model with large training corpuses and sets of parameters that uses the transformer architecture based on multi-head attention mechanisms allowing it to contextualize tokens within a context window for effective language understanding and generation.
A language model that uses the transformer architecture based on multi-head attention mechanisms allowing it to contextualize tokens within a context window for effective language understanding and generation.
Transformer LM
Transformer Language Model
A language model that uses the transformer architecture based on multi-head attention mechanisms allowing it to contextualize tokens within a context window for effective language understanding and generation.
A deep neural network that utilizes attention mechanisms to weigh the significance of input data.
A transformer network utilizes attention mechanisms to weigh the significance of each part of the input data, widely used in natural language processing (NLP) and computer vision (CV).
Transformer Network
A deep neural network that utilizes attention mechanisms to weigh the significance of input data.
A selection and sampling bias favoring groups better represented in training data due to less prediction uncertainty.
Bias favoring groups better represented in training data, due to less prediction uncertainty.
Uncertainty Bias
A selection and sampling bias favoring groups better represented in training data due to less prediction uncertainty.
A recurrent layer that normalizes a batch of inputs so that each input in the batch has a L2 norm equal to 1.
Unit normalization layer. Normalize a batch of inputs so that each input in the batch has a L2 norm equal to 1 (across the axes specified in axis).
UnitNormalization Layer
A recurrent layer that normalizes a batch of inputs so that each input in the batch has a L2 norm equal to 1.
A biclustering task focused on methods that simultaneously cluster the rows and columns of an unlabeled input matrix to identify submatrices with coherent patterns.
Unsupervised Block Clustering
Unsupervised Co-clustering
Unsupervised Joint Clustering
Unsupervised Two-mode Clustering
Unsupervised Two-way Clustering
Unsupervised Biclustering
A biclustering task focused on methods that simultaneously cluster the rows and columns of an unlabeled input matrix to identify submatrices with coherent patterns.
A clustering focused on methods that group a set of unlabeled objects such that objects in the same group are more similar to each other than to those in other groups.
Unsupervised Cluster Analysis
Unsupervised Clustering
A clustering focused on methods that group a set of unlabeled objects such that objects in the same group are more similar to each other than to those in other groups.
A large language model that is trained solely on unlabeled data using self-supervised objectives like masked language modeling without any supervised fine-tuning.
Unsupervised Large Language Model
self-supervised
Unsupervised LLM
A machine learning task focused on algorithms that learn patterns from unlabeled data.
Algorithms that learn patterns from unlabeled data.
Unsupervised Learning
A machine learning task focused on algorithms that learn patterns from unlabeled data.
A network that initializes a discriminative neural net from one trained using an unsupervised criterion.
UPN
Unsupervised pre-training initializes a discriminative neural net from one trained using an unsupervised criterion, aiding in optimization and overfitting issues.
Unsupervised Pretrained Network
A network that initializes a discriminative neural net from one trained using an unsupervised criterion.
A reshaping layer that upsamples the input by repeating each temporal step size times along the time axis.
Upsampling layer for 1D inputs. Repeats each temporal step size times along the time axis.
UpSampling1D Layer
A reshaping layer that upsamples the input by repeating each temporal step size times along the time axis.
A layer that upsamples the input by repeating each row and column size times.
Upsampling layer for 2D inputs. Repeats the rows and columns of the data by size[0] and size[1] respectively.
UpSampling2D Layer
A layer that upsamples the input by repeating each row and column size times.
A layer that upsamples the input by repeating each depth
Upsampling layer for 3D inputs.
UpSampling3D Layer
A layer that upsamples the input by repeating each depth
A computational bias characterized by inappropriately analyzing ambiguous stimuli scenarios and events.
Interpretive Bias
Bias inappropriately analyzing ambiguous stimuli, scenarios, and events.
Use And Interpretation Bias
A computational bias characterized by inappropriately analyzing ambiguous stimuli scenarios and events.
An individual bias arising when a user imposes their own biases during interaction with data output results etc.
Bias arising when a user imposes their own biases during interaction with data, output, results, etc.
User Interaction Bias
An individual bias arising when a user imposes their own biases during interaction with data output results etc.
An autoencoder network that imposes a probabilistic structure on the latent space for unsupervised learning.
VAE
Layers: Input, Probabilistic Hidden, Matched Output-Input
Variational Auto Encoder
A model that captures the linear interdependencies among multiple time series, where each variable is modeled as a linear function of its own past values and the past values of all other variables in the system.
VAR
Vector Autoregression
A data preparation that limits the number of unique tokens in a language model's vocabulary by merging or eliminating less frequent tokens thereby optimizing computational efficiency and resource usage.
Lexical Simplification
Lexicon Pruning
Vocabulary Condensation
Vocabulary Reduction
A layer of values to be applied to other cells or neurons in a network.
Weighted Layer
A layer that augment the functionality of another layer.
Abstract wrapper base class. Wrappers take another layer and augment it in various ways. Do not use this class as a layer, it is only an abstract base class. Two usable wrappers are the TimeDistributed and Bidirectional wrappers.
Wrapper Layer
A layer that augment the functionality of another layer.
A reshaping layer that zero-pads the input along the time axis.
Zero-padding layer for 1D input (e.g. temporal sequence).
ZeroPadding1D Layer
A reshaping layer that zero-pads the input along the time axis.
A reshaping layer that zero-pads the input along the height and width dimensions.
Zero-padding layer for 2D input (e.g. picture). This layer can add rows and columns of zeros at the top, bottom, left and right side of an image tensor.
ZeroPadding2D Layer
A reshaping layer that zero-pads the input along the height and width dimensions.
A reshaping layer that zero-pads the input along the depth
Zero-padding layer for 3D data (spatial or spatio-temporal).
ZeroPadding3D Layer
A reshaping layer that zero-pads the input along the depth
A large language model which performs tasks or understands concepts it has not explicitly been trained on, demonstrating a high degree of generalization and understanding.
Zero-Shot LLM
zero-shot learning
Zero-Shot Learning LLM
A deep neural network that predicts classes at test time from classes not observed during training.
ZSL
Zero-shot Learning
A deep neural network that predicts classes at test time from classes not observed during training.
A machine learning task designed to learn continuous feature representations for nodes in a graph by optimizing a neighborhood-preserving objective.
N2V
Layers: Input, Hidden, Output
node2vec
A machine learning task designed to learn continuous feature representations for nodes in a graph by optimizing a neighborhood-preserving objective.
A node2vec that predicts the current node from a window of surrounding context nodes, with the order of context nodes not influencing prediction.
N2V-CBOW
CBOW
Layers: Input, Hidden, Output
node2vec-CBOW
A node2vec that predicts the current node from a window of surrounding context nodes, with the order of context nodes not influencing prediction.
A node2vec that uses the current node to predict the surrounding window of context nodes, weighing nearby context nodes more heavily than distant ones.
N2V-SkipGram
SkipGram
Layers: Input, Hidden, Output
node2vec-SkipGram
A node2vec that uses the current node to predict the surrounding window of context nodes, weighing nearby context nodes more heavily than distant ones.
A dimensionality reduction for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.
t-SNE
tSNE
t-Distributed Stochastic Neighbor embedding
A dimensionality reduction for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.
A machine learning task that generates distributed representations of words by training a shallow neural network model, which aims to predict the context of each word within a corpus. This algorithm captures semantic meanings of words through their contextual usage in the text.
W2V
Layers: Input, Hidden, Output
word2vec
A machine learning task that generates distributed representations of words by training a shallow neural network model, which aims to predict the context of each word within a corpus. This algorithm captures semantic meanings of words through their contextual usage in the text.
A word2vec that predicts the current word from a window of surrounding context words, ignoring the order of context words.
W2V-CBOW
CBOW
Layers: Input, Hidden, Output
word2vec-CBOW
A word2vec that predicts the current word from a window of surrounding context words, ignoring the order of context words.
A word2vec that predicts surrounding context words from the current word, giving more weight to nearby context words than distant ones.
W2V-SkipGram
SkipGram
Layers: Input, Hidden, Output
word2vec-SkipGram
A word2vec that predicts surrounding context words from the current word, giving more weight to nearby context words than distant ones.
A core relation that holds between a whole and its part
has part