This ontology models classes and relationships describing deep learning networks, their component layers and activation functions, as well as potential biases.
Artificial Intelligence Ontology
2023-09-08
Abstract object representing an RNN cell. This is the base class for implementing RNN cells with custom behavior.
AbstractRNNCell
Abstract object representing an RNN cell. This is the base class for implementing RNN cells with custom behavior.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/AbstractRNNCell
Applies an activation function to an output.
Activation Layer
Applies an activation function to an output.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Activation
Methods which can interactively query a user (or some other information source) to label new data points with the desired outputs.
Query Learning
Active Learning
Methods which can interactively query a user (or some other information source) to label new data points with the desired outputs.
https://en.wikipedia.org/wiki/Active_learning_(machine_learning)
Layer that applies an update to the cost function based input activity.
ActivityRegularization Layer
Layer that applies an update to the cost function based input activity.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ActivityRegularization
A type of selection bias that occurs when systems/platforms get their training data from their most active users, rather than those less active (or inactive).
Activity Bias
A type of selection bias that occurs when systems/platforms get their training data from their most active users, rather than those less active (or inactive).
https://doi.org/10.6028/NIST.SP.1270
Applies a 1D adaptive average pooling over an input signal composed of several input planes.
AdaptiveAvgPool1D
AdaptiveAvgPool1d
AdaptiveAvgPool1D Layer
Applies a 1D adaptive average pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies a 2D adaptive average pooling over an input signal composed of several input planes.
AdaptiveAvgPool2D
AdaptiveAvgPool2d
AdaptiveAvgPool2D Layer
Applies a 2D adaptive average pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies a 3D adaptive average pooling over an input signal composed of several input planes.
AdaptiveAvgPool3D
AdaptiveAvgPool3d
AdaptiveAvgPool3D Layer
Applies a 3D adaptive average pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies a 1D adaptive max pooling over an input signal composed of several input planes.
AdaptiveMaxPool1D
AdaptiveMaxPool1d
AdaptiveMaxPool1D Layer
Applies a 1D adaptive max pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies a 2D adaptive max pooling over an input signal composed of several input planes.
AdaptiveMaxPool2D
AdaptiveMaxPool2d
AdaptiveMaxPool2D Layer
Applies a 2D adaptive max pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies a 3D adaptive max pooling over an input signal composed of several input planes.
AdaptiveMaxPool3D
AdaptiveMaxPool3d
AdaptiveMaxPool3D Layer
Applies a 3D adaptive max pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Layer that adds a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Add Layer
Layer that adds a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Add
Additive attention layer, a.k.a. Bahdanau-style attention.
AdditiveAttention Layer
Additive attention layer, a.k.a. Bahdanau-style attention.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/AdditiveAttention
Applies Alpha Dropout to the input. Alpha Dropout is a Dropout that keeps mean and variance of inputs to their original values, in order to ensure the self-normalizing property even after this dropout. Alpha Dropout fits well to Scaled Exponential Linear Units by randomly setting activations to the negative saturation value.
AlphaDropout Layer
Applies Alpha Dropout to the input. Alpha Dropout is a Dropout that keeps mean and variance of inputs to their original values, in order to ensure the self-normalizing property even after this dropout. Alpha Dropout fits well to Scaled Exponential Linear Units by randomly setting activations to the negative saturation value.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/AlphaDropout
Arises when the distribution over prediction outputs is skewed in comparison to the prior distribution of the prediction target.
Amplification Bias
Arises when the distribution over prediction outputs is skewed in comparison to the prior distribution of the prediction target.
https://doi.org/10.6028/NIST.SP.1270
A cognitive bias, the influence of a particular reference point or anchor on people’s decisions. Often more fully referred to as anchoring-and-adjustment, or anchoring-and-adjusting: after an anchor is set, people adjust insufficiently from that anchor point to arrive at a final answer. Decision makers are biased towards an initially presented value.
Anchoring Bias
A cognitive bias, the influence of a particular reference point or anchor on people’s decisions. Often more fully referred to as anchoring-and-adjustment, or anchoring-and-adjusting: after an anchor is set, people adjust insufficiently from that anchor point to arrive at a final answer. Decision makers are biased towards an initially presented value.
https://doi.org/10.6028/NIST.SP.1270
When users rely on automation as a heuristic replacement for their own information seeking and processing. A form of individual bias but often discussed as a group bias, or the larger effects on natural language processing models.
Annotator Reporting Bias
When users rely on automation as a heuristic replacement for their own information seeking and processing. A form of individual bias but often discussed as a group bias, or the larger effects on natural language processing models.
https://doi.org/10.6028/NIST.SP.1270
An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons. An artificial neuron receives a signal then processes it and can signal neurons connected to it. The "signal" at a connection is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs. The connections are called edges. Neurons and edges typically have a weight that adjusts as Learning proceeds. The weight increases or decreases the strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times.
ANN
NN
Artificial Neural Network
An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons. An artificial neuron receives a signal then processes it and can signal neurons connected to it. The "signal" at a connection is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs. The connections are called edges. Neurons and edges typically have a weight that adjusts as Learning proceeds. The weight increases or decreases the strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times.
https://en.wikipedia.org/wiki/Artificial_neural_network
A rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.
Association Rule Learning
A rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.
https://en.wikipedia.org/wiki/Association_rule_learning
Dot-product attention layer, a.k.a. Luong-style attention.
Attention Layer
Dot-product attention layer, a.k.a. Luong-style attention.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention
An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised Learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”). (https://en.wikipedia.org/wiki/Autoencoder)
AE
Input, Hidden, Matched Output-Input
Auto Encoder Network
An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised Learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”). (https://en.wikipedia.org/wiki/Autoencoder)
https://en.wikipedia.org/wiki/Autoencoder
When humans over-rely on automated systems or have their skills attenuated by such over-reliance (e.g., spelling and autocorrect or spellcheckers).
Automation Complaceny
Automation Complacency Bias
When humans over-rely on automated systems or have their skills attenuated by such over-reliance (e.g., spelling and autocorrect or spellcheckers).
https://doi.org/10.6028/NIST.SP.1270
A mental shortcut whereby people tend to overweight what comes easily or quickly to mind, meaning that what is easier to recall—e.g., more “available”—receives greater emphasis in judgement and decision-making.
Availability Bias
Availability Heuristic
Availability Heuristic Bias
A mental shortcut whereby people tend to overweight what comes easily or quickly to mind, meaning that what is easier to recall—e.g., more “available”—receives greater emphasis in judgement and decision-making.
https://doi.org/10.6028/NIST.SP.1270
Average pooling for temporal data. Downsamples the input representation by taking the average value over the window defined by pool_size. The window is shifted by strides. The resulting output when using "valid" padding option has a shape of: output_shape = (input_shape - pool_size + 1) / strides). The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides.
AvgPool1D
AvgPool1d
AveragePooling1D Layer
Average pooling for temporal data. Downsamples the input representation by taking the average value over the window defined by pool_size. The window is shifted by strides. The resulting output when using "valid" padding option has a shape of: output_shape = (input_shape - pool_size + 1) / strides). The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling1D
Average pooling operation for spatial data. Downsamples the input along its spatial dimensions (height and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. The resulting output when using "valid" padding option has a shape (number of rows or columns) of: output_shape = math.floor((input_shape - pool_size) / strides) + 1 (when input_shape >= pool_size). The resulting output shape when using the "same" padding option is: output_shape = math.floor((input_shape - 1) / strides) + 1.
AvgPool2D
AvgPool2d
AveragePooling2D Layer
Average pooling operation for spatial data. Downsamples the input along its spatial dimensions (height and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. The resulting output when using "valid" padding option has a shape (number of rows or columns) of: output_shape = math.floor((input_shape - pool_size) / strides) + 1 (when input_shape >= pool_size). The resulting output shape when using the "same" padding option is: output_shape = math.floor((input_shape - 1) / strides) + 1.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling2D
Average pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension.
AvgPool3D
AvgPool3d
AveragePooling3D Layer
Average pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling3D
Layer that averages a list of inputs element-wise. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Average Layer
Layer that averages a list of inputs element-wise. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Average
Applies a 1D average pooling over an input signal composed of several input planes.
AvgPool1D
AvgPool1d
AvgPool1D Layer
Applies a 1D average pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies a 2D average pooling over an input signal composed of several input planes.
AvgPool2D
AvgPool2d
AvgPool2D Layer
Applies a 2D average pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies a 3D average pooling over an input signal composed of several input planes.
AvgPool3D
AvgPool3d
AvgPool3D Layer
Applies a 3D average pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies Batch Normalization over a 2D or 3D input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
BatchNorm1D
BatchNorm1d
nn.BatchNorm1d
BatchNorm1D Layer
Applies Batch Normalization over a 2D or 3D input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
https://pytorch.org/docs/stable/nn.html#normalization-layers
Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
BatchNorm2D
BatchNorm2d
nn.BatchNorm2d
BatchNorm2D Layer
Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
https://pytorch.org/docs/stable/nn.html#normalization-layers
Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
BatchNorm3D
BatchNorm3d
nn.BatchNorm3d
BatchNorm3D Layer
Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
https://pytorch.org/docs/stable/nn.html#normalization-layers
Layer that normalizes its inputs. Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Importantly, batch normalization works differently during training and during inference. During training (i.e. when using fit() or when calling the layer/model with the argument training=True), the layer normalizes its output using the mean and standard deviation of the current batch of inputs. That is to say, for each channel being normalized, the layer returns gamma * (batch - mean(batch)) / sqrt(var(batch) + epsilon) + beta, where: epsilon is small constant (configurable as part of the constructor arguments), gamma is a learned scaling factor (initialized as 1), which can be disabled by passing scale=False to the constructor. beta is a learned offset factor (initialized as 0), which can be disabled by passing center=False to the constructor. During inference (i.e. when using evaluate() or predict() or when calling the layer/model with the argument training=False (which is the default), the layer normalizes its output using a moving average of the mean and standard deviation of the batches it has seen during training. That is to say, it returns gamma * (batch - self.moving_mean) / sqrt(self.moving_var + epsilon) + beta. self.moving_mean and self.moving_var are non-trainable variables that are updated each time the layer in called in training mode, as such: moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum) moving_var = moving_var * momentum + var(batch) * (1 - momentum).
BatchNormalization Layer
Layer that normalizes its inputs. Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Importantly, batch normalization works differently during training and during inference. During training (i.e. when using fit() or when calling the layer/model with the argument training=True), the layer normalizes its output using the mean and standard deviation of the current batch of inputs. That is to say, for each channel being normalized, the layer returns gamma * (batch - mean(batch)) / sqrt(var(batch) + epsilon) + beta, where: epsilon is small constant (configurable as part of the constructor arguments), gamma is a learned scaling factor (initialized as 1), which can be disabled by passing scale=False to the constructor. beta is a learned offset factor (initialized as 0), which can be disabled by passing center=False to the constructor. During inference (i.e. when using evaluate() or predict() or when calling the layer/model with the argument training=False (which is the default), the layer normalizes its output using a moving average of the mean and standard deviation of the batches it has seen during training. That is to say, it returns gamma * (batch - self.moving_mean) / sqrt(self.moving_var + epsilon) + beta. self.moving_mean and self.moving_var are non-trainable variables that are updated each time the layer in called in training mode, as such: moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum) moving_var = moving_var * momentum + var(batch) * (1 - momentum).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization
A probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG).
Bayesian Network
A probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG).
https://en.wikipedia.org/wiki/Bayesian_network
Systematic distortions in user behavior across platforms or contexts, or across users represented in different datasets.
Behavioral Bias
Systematic distortions in user behavior across platforms or contexts, or across users represented in different datasets.
https://doi.org/10.6028/NIST.SP.1270
Systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others.
Bias
Systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others.
https://www.merriam-webster.com/dictionary/bias
Methods that simultaneously cluster the rows and columns of a matrix.
Block Clustering
Co-clustering
Joint Clustering
Two-mode Clustering
Two-way Clustering
Biclustering
Methods that simultaneously cluster the rows and columns of a matrix.
https://en.wikipedia.org/wiki/Biclustering
Bidirectional wrapper for RNNs.
Bidirectional Layer
Bidirectional wrapper for RNNs.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional
Methods that classify the elements of a set into two groups (each called class) on the basis of a classification rule.
Binary Classification
Methods that classify the elements of a set into two groups (each called class) on the basis of a classification rule.
https://en.wikipedia.org/wiki/Binary_classification
A Boltzmann machine is a type of stochastic recurrent neural network. It is a Markov random field. It was translated from statistical physics for use in cognitive science. The Boltzmann machine is based on a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model that is a stochastic Ising Model[2] and applied to machine Learning.
BM
Sherrington–Kirkpatrick model with external field
stochastic Hopfield network with hidden units
stochastic Ising-Lenz-Little model
Backfed Input, Probabilistic Hidden
Boltzmann Machine Network
A Boltzmann machine is a type of stochastic recurrent neural network. It is a Markov random field. It was translated from statistical physics for use in cognitive science. The Boltzmann machine is based on a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model that is a stochastic Ising Model[2] and applied to machine Learning.
https://en.wikipedia.org/wiki/Boltzmann_machine
A layer that performs categorical data preprocessing operations.
Categorical Features Preprocessing Layer
A layer that performs categorical data preprocessing operations.
https://keras.io/guides/preprocessing_layers/
A preprocessing layer which encodes integer features. This layer provides options for condensing data into a categorical encoding when the total number of tokens are known in advance. It accepts integer values as inputs, and it outputs a dense or sparse representation of those inputs. For integer inputs where the total number of tokens is not known, use tf.keras.layers.IntegerLookup instead.
CategoryEncoding Layer
A preprocessing layer which encodes integer features. This layer provides options for condensing data into a categorical encoding when the total number of tokens are known in advance. It accepts integer values as inputs, and it outputs a dense or sparse representation of those inputs. For integer inputs where the total number of tokens is not known, use tf.keras.layers.IntegerLookup instead.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/CategoryEncoding
Probabilistic graphical models used to encode assumptions about the data-generating process.
Casaul Bayesian Network
Casaul Graph
DAG
Directed Acyclic Graph
Path Diagram
Causal Graphical Model
Probabilistic graphical models used to encode assumptions about the data-generating process.
https://en.wikipedia.org/wiki/Causal_graph
A preprocessing layer which crops images. This layers crops the central portion of the images to a target size. If an image is smaller than the target size, it will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
CenterCrop Layer
A preprocessing layer which crops images. This layers crops the central portion of the images to a target size. If an image is smaller than the target size, it will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/CenterCrop
Methods that distinguishand distribute kinds of "things" into different groups.
Classification
Methods that distinguishand distribute kinds of "things" into different groups.
https://en.wikipedia.org/wiki/Classification_(general_theory)
Methods that group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
Cluster analysis
Clustering
Methods that group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
https://en.wikipedia.org/wiki/Cluster_analysis
A broad term referring generally to a systematic pattern of deviation from rational judgement and decision-making. A large variety of cognitive biases have been identified over many decades of research in judgement and decision-making, some of which are adaptive mental shortcuts known as heuristics.
Cognitive Bias
A broad term referring generally to a systematic pattern of deviation from rational judgement and decision-making. A large variety of cognitive biases have been identified over many decades of research in judgement and decision-making, some of which are adaptive mental shortcuts known as heuristics.
https://doi.org/10.6028/NIST.SP.1270
A systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the estimator chosen, and the ways the data was analyzed.
Statistical Bias
Computational Bias
A systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the estimator chosen, and the ways the data was analyzed.
https://en.wikipedia.org/wiki/Bias_(statistics)
Layer that concatenates a list of inputs. It takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs.
Concatenate Layer
Layer that concatenates a list of inputs. It takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Concatenate
Use of a system outside the planned domain of application, and a common cause of performance gaps between laboratory settings and the real world.
Concept Drift
Concept Drift Bias
Use of a system outside the planned domain of application, and a common cause of performance gaps between laboratory settings and the real world.
https://doi.org/10.6028/NIST.SP.1270
A cognitive bias where people tend to prefer information that aligns with, or confirms, their existing beliefs. People can exhibit confirmation bias in the search for, interpretation of, and recall of information. In the famous Wason selection task experiments, participants repeatedly showed a preference for confirmation over falsification. They were tasked with identifying an underlying rule that applied to number triples they were shown, and they overwhelmingly tested triples that confirmed rather than falsified their hypothesized rule.
Confirmation Bias
A cognitive bias where people tend to prefer information that aligns with, or confirms, their existing beliefs. People can exhibit confirmation bias in the search for, interpretation of, and recall of information. In the famous Wason selection task experiments, participants repeatedly showed a preference for confirmation over falsification. They were tasked with identifying an underlying rule that applied to number triples they were shown, and they overwhelmingly tested triples that confirmed rather than falsified their hypothesized rule.
https://doi.org/10.6028/NIST.SP.1270
Arises when an algorithm or platform provides users with a new venue within which to express their biases, and may occur from either side, or party, in a digital interaction..
Consumer Bias
Arises when an algorithm or platform provides users with a new venue within which to express their biases, and may occur from either side, or party, in a digital interaction..
https://doi.org/10.6028/NIST.SP.1270
Arises from structural, lexical, semantic, and syntactic differences in the contents generated by users.
Content Production Bias
Arises from structural, lexical, semantic, and syntactic differences in the contents generated by users.
https://doi.org/10.6028/NIST.SP.1270
A concept to learn a model for a large number of tasks sequentially without forgetting knowledge obtained from the preceding tasks, where the data in the old tasks are not available any more during training new ones.
Incremental Learning
Life-Long Learning
Continual Learning
A concept to learn a model for a large number of tasks sequentially without forgetting knowledge obtained from the preceding tasks, where the data in the old tasks are not available any more during training new ones.
https://paperswithcode.com/task/continual-learning
Learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.
Contrastive Learning
Learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.
https://arxiv.org/abs/2202.14037
1D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.
ConvLSTM1D Layer
1D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM1D
2D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.
ConvLSTM2D Layer
2D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM2D
3D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.
ConvLSTM3D Layer
3D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM3D
Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 3) for data with 128 time steps and 3 channels.
Conv1DTranspose Layer
ConvTranspose1d
Convolution1DTranspose
Convolution1dTranspose
nn.ConvTranspose1d
Convolution1DTranspose Layer
Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 3) for data with 128 time steps and 3 channels.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1DTranspose
1D convolution layer (e.g. temporal convolution).
Conv1D Layer
Conv1d
Convolution1D
Convolution1d
nn.Conv1d
Convolution1D Layer
1D convolution layer (e.g. temporal convolution).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1D
Transposed convolution layer (sometimes called Deconvolution).
Conv2DTranspose Layer
ConvTranspose2d
Convolution2DTranspose
Convolution2dTranspose
nn.ConvTranspose2d
Convolution2DTranspose Layer
Transposed convolution layer (sometimes called Deconvolution).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2DTranspose
2D convolution layer (e.g. spatial convolution over images). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last". You can use None when a dimension has variable size.
Conv2D Layer
Conv2d
Convolution2D
Convolution2d
nn.Conv2d
Convolution2D Layer
2D convolution layer (e.g. spatial convolution over images). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last". You can use None when a dimension has variable size.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D
Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 3) for a 128x128x128 volume with 3 channels if data_format="channels_last".
Conv3DTranspose Layer
ConvTranspose3d
Convolution3DTranspose
Convolution3dTranspose
nn.ConvTranspose3d
Convolution3DTranspose Layer
Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 3) for a 128x128x128 volume with 3 channels if data_format="channels_last".
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv3DTranspose
3D convolution layer (e.g. spatial convolution over volumes). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 1) for 128x128x128 volumes with a single channel, in data_format="channels_last".
Conv3D Layer
Conv3d
Convolution3D
Convolution3d
nn.Conv3d
Convolution3D Layer
3D convolution layer (e.g. spatial convolution over volumes). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 1) for 128x128x128 volumes with a single channel, in data_format="channels_last".
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv3D
A convolutional layer is the main building block of a CNN. It contains a set of filters (or kernels), parameters of which are to be learned throughout the training. The size of the filters is usually smaller than the actual image. Each filter convolves with the image and creates an activation map.
Convolutional Layer
A convolutional layer is the main building block of a CNN. It contains a set of filters (or kernels), parameters of which are to be learned throughout the training. The size of the filters is usually smaller than the actual image. Each filter convolves with the image and creates an activation map.
https://www.sciencedirect.com/topics/engineering/convolutional-layer#:~:text=A%20convolutional%20layer%20is%20the,and%20creates%20an%20activation%20map.
Cropping layer for 1D input (e.g. temporal sequence). It crops along the time dimension (axis 1).
Cropping1D Layer
Cropping layer for 1D input (e.g. temporal sequence). It crops along the time dimension (axis 1).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Cropping1D
Cropping layer for 2D input (e.g. picture). It crops along spatial dimensions, i.e. height and width.
Cropping2D Layer
Cropping layer for 2D input (e.g. picture). It crops along spatial dimensions, i.e. height and width.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Cropping2D
Cropping layer for 3D data (e.g. spatial or spatio-temporal).
Cropping3D Layer
Cropping layer for 3D data (e.g. spatial or spatio-temporal).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Cropping3D
A statistical bias in which testing huge numbers of hypotheses of a dataset may appear to yield statistical significance even when the results are statistically nonsignificant.
Data Dredging
Data Dredging Bias
A statistical bias in which testing huge numbers of hypotheses of a dataset may appear to yield statistical significance even when the results are statistically nonsignificant.
https://doi.org/10.6028/NIST.SP.1270
Arises from the addition of synthetic or redundant data samples to a dataset.
Data Generation Bias
Arises from the addition of synthetic or redundant data samples to a dataset.
https://doi.org/10.6028/NIST.SP.1270
Methods that replace missing data with substituted values.
Data Imputation
Methods that replace missing data with substituted values.
https://en.wikipedia.org/wiki/Imputation_(statistics)
A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
Decision Tree
A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
https://en.wikipedia.org/wiki/Decision_tree
In the decoder-only architecture, the model consists of only a decoder, which is trained to predict the next token in a sequence given the previous tokens. The critical difference between the Decoder-only architecture and the Encoder-Decoder architecture is that the Decoder-only architecture does not have an explicit encoder to summarize the input information. Instead, the information is encoded implicitly in the hidden state of the decoder, which is updated at each step of the generation process.
LLM
Decoder LLM
In the decoder-only architecture, the model consists of only a decoder, which is trained to predict the next token in a sequence given the previous tokens. The critical difference between the Decoder-only architecture and the Encoder-Decoder architecture is that the Decoder-only architecture does not have an explicit encoder to summarize the input information. Instead, the information is encoded implicitly in the hidden state of the decoder, which is updated at each step of the generation process.
https://www.practicalai.io/understanding-transformer-model-architectures/#:~:text=Encoder%2Donly&text=These%20models%20have%20a%20pre,Named%20entity%20recognition
Deconvolutional Networks, a framework that permits the unsupervised construction of hierarchical image representations. These representations can be used for both low-level tasks such as denoising, as well as providing features for object recognition. Each level of the hierarchy groups information from the level beneath to form more complex features that exist over a larger scale in the image. (https://ieeexplore.ieee.org/document/5539957)
DN
Input, Kernel, Convolutional/Pool, Output
Deconvolutional Network
Deconvolutional Networks, a framework that permits the unsupervised construction of hierarchical image representations. These representations can be used for both low-level tasks such as denoising, as well as providing features for object recognition. Each level of the hierarchy groups information from the level beneath to form more complex features that exist over a larger scale in the image. (https://ieeexplore.ieee.org/document/5539957)
https://ieeexplore.ieee.org/document/5539957
The combination of deep learning and active learning, where active learning attempts to maximize a model’s performance gain while annotating the fewest samples possible.
DeepAL
Deep Active Learning
The combination of deep learning and active learning, where active learning attempts to maximize a model’s performance gain while annotating the fewest samples possible.
https://arxiv.org/pdf/2009.00236.pdf
In machine Learning, a deep belief network (DBN) is a generative graphical model, or alternatively a class of deep neural network, composed of multiple layers of latent variables ("hidden units"), with connections between the layers but not between units within each layer. When trained on a set of examples without supervision, a DBN can learn to probabilistically reconstruct its inputs. The layers then act as feature detectors. After this Learning step, a DBN can be further trained with supervision to perform classification. DBNs can be viewed as a composition of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, where each sub-network's hidden layer serves as the visible layer for the next. An RBM is an undirected, generative energy-based model with a "visible" input layer and a hidden layer and connections between but not within layers. This composition leads to a fast, layer-by-layer unsupervised training procedure, where contrastive divergence is applied to each sub-network in turn, starting from the "lowest" pair of layers (the lowest visible layer is a training set). The observation that DBNs can be trained greedily, one layer at a time, led to one of the first effective deep Learning algorithms. (https://en.wikipedia.org/wiki/Deep_belief_network)
DBN
Backfed Input, Probabilistic Hidden, Hidden, Matched Output-Input
Deep Belief Network
In machine Learning, a deep belief network (DBN) is a generative graphical model, or alternatively a class of deep neural network, composed of multiple layers of latent variables ("hidden units"), with connections between the layers but not between units within each layer. When trained on a set of examples without supervision, a DBN can learn to probabilistically reconstruct its inputs. The layers then act as feature detectors. After this Learning step, a DBN can be further trained with supervision to perform classification. DBNs can be viewed as a composition of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, where each sub-network's hidden layer serves as the visible layer for the next. An RBM is an undirected, generative energy-based model with a "visible" input layer and a hidden layer and connections between but not within layers. This composition leads to a fast, layer-by-layer unsupervised training procedure, where contrastive divergence is applied to each sub-network in turn, starting from the "lowest" pair of layers (the lowest visible layer is a training set). The observation that DBNs can be trained greedily, one layer at a time, led to one of the first effective deep Learning algorithms. (https://en.wikipedia.org/wiki/Deep_belief_network)
https://en.wikipedia.org/wiki/Deep_belief_network
A Deep Convolution Inverse Graphics Network (DC-IGN) is a model that learns an interpretable representation of images. This representation is disentangled with respect to transformations such as out-of-plane rotations and lighting variations. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm. (https://arxiv.org/abs/1503.03167)
DCIGN
Input, Kernel, Convolutional/Pool, Probabilistic Hidden, Convolutional/Pool, Kernel, Output
Deep Convolutional Inverse Graphics Network
A convolutional neural network (CNN, or ConvNet) is a class of artificial neural network, most commonly applied to analyze visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation equivariant responses known as feature maps. CNNs are regularized versions of multilayer perceptrons. (https://en.wikipedia.org/wiki/Convolutional_neural_network)
CNN
ConvNet
Convolutional Neural Network
DCN
Input, Kernel, Convolutional/Pool, Hidden, Output
Deep Convolutional Network
A convolutional neural network (CNN, or ConvNet) is a class of artificial neural network, most commonly applied to analyze visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation equivariant responses known as feature maps. CNNs are regularized versions of multilayer perceptrons. (https://en.wikipedia.org/wiki/Convolutional_neural_network)
https://en.wikipedia.org/wiki/Convolutional_neural_network
The feedforward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction—forward—from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.
DFF
FFN
Feedforward Network
MLP
Multilayer Perceptoron
Input, Hidden, Output
Deep FeedFoward
The feedforward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction—forward—from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.
https://en.wikipedia.org/wiki/Feedforward_neural_network
A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers.[13][2] There are different types of neural networks but they always consist of the same components: neurons, synapses, weights, biases, and functions. (https://en.wikipedia.org/wiki/Deep_Learning#:~:text=A%20deep%20neural%20network%20(DNN,weights%2C%20biases%2C%20and%20functions.)
DNN
Deep Neural Network
Deep transfer learning methods relax the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data.
Deep Transfer Learning
Deep transfer learning methods relax the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data.
https://arxiv.org/abs/1808.01974
Denoising Auto Encoders (DAEs) take a partially corrupted input and are trained to recover the original undistorted input. In practice, the objective of denoising autoencoders is that of cleaning the corrupted input, or denoising. (https://en.wikipedia.org/wiki/Autoencoder)
DAE
Noisy Input, Hidden, Matched Output-Input
Denoising Auto Encoder
A layer that produces a dense Tensor based on given feature_columns. Generally a single example in training data is described with FeatureColumns. At the first layer of the model, this column oriented data should be converted to a single Tensor. This layer can be called multiple times with different features. This is the V2 version of this layer that uses name_scopes to create variables instead of variable_scopes. But this approach currently lacks support for partitioned variables. In that case, use the V1 version instead.
DenseFeatures Layer
A layer that produces a dense Tensor based on given feature_columns. Generally a single example in training data is described with FeatureColumns. At the first layer of the model, this column oriented data should be converted to a single Tensor. This layer can be called multiple times with different features. This is the V2 version of this layer that uses name_scopes to create variables instead of variable_scopes. But this approach currently lacks support for partitioned variables. In that case, use the V1 version instead.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/DenseFeatures
Just your regular densely-connected NN layer.
Dense Layer
Just your regular densely-connected NN layer.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense
Arises when systems are used as decision aids for humans, since the human intermediary may act on predictions in ways that are typically not modeled in the system. However, it is still individuals using the deployed system.
Deployment Bias
Arises when systems are used as decision aids for humans, since the human intermediary may act on predictions in ways that are typically not modeled in the system. However, it is still individuals using the deployed system.
https://doi.org/10.6028/NIST.SP.1270
Depthwise 1D convolution. Depthwise convolution is a type of convolution in which each input channel is convolved with a different kernel (called a depthwise kernel). You can understand depthwise convolution as the first step in a depthwise separable convolution. It is implemented via the following steps: Split the input into individual channels. Convolve each channel with an individual depthwise kernel with depth_multiplier output channels. Concatenate the convolved outputs along the channels axis. Unlike a regular 1D convolution, depthwise convolution does not mix information across different input channels. The depth_multiplier argument determines how many filter are applied to one input channel. As such, it controls the amount of output channels that are generated per input channel in the depthwise step.
DepthwiseConv1D Layer
Depthwise 1D convolution. Depthwise convolution is a type of convolution in which each input channel is convolved with a different kernel (called a depthwise kernel). You can understand depthwise convolution as the first step in a depthwise separable convolution. It is implemented via the following steps: Split the input into individual channels. Convolve each channel with an individual depthwise kernel with depth_multiplier output channels. Concatenate the convolved outputs along the channels axis. Unlike a regular 1D convolution, depthwise convolution does not mix information across different input channels. The depth_multiplier argument determines how many filter are applied to one input channel. As such, it controls the amount of output channels that are generated per input channel in the depthwise step.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/DepthwiseConv1D
Depthwise 2D convolution.
DepthwiseConv2D Layer
Depthwise 2D convolution.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/DepthwiseConv2D
Systematic differences between groups in how outcomes are determined and may cause an over- or underestimation of the size of the effect.
Detection Bias
Systematic differences between groups in how outcomes are determined and may cause an over- or underestimation of the size of the effect.
https://doi.org/10.6028/NIST.SP.1270
The transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension.
Dimension Reduction
Dimensionality Reduction
The transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension.
https://en.wikipedia.org/wiki/Dimensionality_reduction
A preprocessing layer which buckets continuous features by ranges.
Discretization Layer
A preprocessing layer which buckets continuous features by ranges.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Discretization
Layer that computes a dot product between samples in two tensors. E.g. if applied to a list of two tensors a and b of shape (batch_size, n), the output will be a tensor of shape (batch_size, 1) where each entry i will be the dot product between a[i] and b[i].
Dot Layer
Layer that computes a dot product between samples in two tensors. E.g. if applied to a list of two tensors a and b of shape (batch_size, n), the output will be a tensor of shape (batch_size, 1) where each entry i will be the dot product between a[i] and b[i].
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dot
Applies Dropout to the input. The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged. Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer. (This is in contrast to setting trainable=False for a Dropout layer. trainable does not affect the layer's behavior, as Dropout does not have any variables/weights that can be frozen during training.)
Dropout Layer
Applies Dropout to the input. The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged. Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer. (This is in contrast to setting trainable=False for a Dropout layer. trainable does not affect the layer's behavior, as Dropout does not have any variables/weights that can be frozen during training.)
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout
The tendency of people with low ability in a given area or task to overestimate their self-assessed ability. Typically measured by comparing self-assessment with objective performance, often called subjective ability and objective ability, respectively.
Dunning-Kruger Effect
Dunning-Kruger Effect Bias
The tendency of people with low ability in a given area or task to overestimate their self-assessed ability. Typically measured by comparing self-assessment with objective performance, often called subjective ability and objective ability, respectively.
https://doi.org/10.6028/NIST.SP.1270
Exponential Linear Unit.
ELU Layer
Exponential Linear Unit.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ELU
The echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behaviour is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.
ESN
Input, Recurrent, Output
Echo State Network
The echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behaviour is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.
https://en.wikipedia.org/wiki/Echo_state_network#:~:text=The%20echo%20state%20network%20(ESN,are%20fixed%20and%20randomly%20assigned
Occurs when an inference is made about an individual based on their membership within a group.
Ecological Fallacy
Ecological Fallacy Bias
Occurs when an inference is made about an individual based on their membership within a group.
https://doi.org/10.6028/NIST.SP.1270
Turns positive integers (indexes) into dense vectors of fixed size.
Embedding Layer
Turns positive integers (indexes) into dense vectors of fixed size.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding
Emergent bias is the result of the use and reliance on algorithms across new or unanticipated contexts.
Emergent Bias
Emergent bias is the result of the use and reliance on algorithms across new or unanticipated contexts.
https://doi.org/10.6028/NIST.SP.1270
The Encoder-Decoder architecture was the original transformer architecture introduced in the Attention Is All You Need (https://arxiv.org/abs/1706.03762) paper. The encoder processes the input sequence and generates a hidden representation that summarizes the input information. The decoder uses this hidden representation to generate the desired output sequence. The encoder and decoder are trained end-to-end to maximize the likelihood of the correct output sequence given the input sequence.
LLM
Encoder-Decoder LLM
The Encoder-Decoder architecture was the original transformer architecture introduced in the Attention Is All You Need (https://arxiv.org/abs/1706.03762) paper. The encoder processes the input sequence and generates a hidden representation that summarizes the input information. The decoder uses this hidden representation to generate the desired output sequence. The encoder and decoder are trained end-to-end to maximize the likelihood of the correct output sequence given the input sequence.
https://www.practicalai.io/understanding-transformer-model-architectures/#:~:text=Encoder%2Donly&text=These%20models%20have%20a%20pre,Named%20entity%20recognition
The Encoder-only architecture is used when only encoding the input sequence is required and the decoder is not necessary. The input sequence is encoded into a fixed-length representation and then used as input to a classifier or a regressor to make a prediction. These models have a pre-trained general-purpose encoder but will require fine-tuning of the final classifier or regressor.
LLM
Encoder LLM
The Encoder-only architecture is used when only encoding the input sequence is required and the decoder is not necessary. The input sequence is encoded into a fixed-length representation and then used as input to a classifier or a regressor to make a prediction. These models have a pre-trained general-purpose encoder but will require fine-tuning of the final classifier or regressor.
https://www.practicalai.io/understanding-transformer-model-architectures/#:~:text=Encoder%2Donly&text=These%20models%20have%20a%20pre,Named%20entity%20recognition
Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
Ensemble Learning
Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
https://en.wikipedia.org/wiki/Ensemble_learning
The effect of variables' uncertainties (or errors, more specifically random errors) on the uncertainty of a function based on them.
Error Propagation
Error Propagation Bias
The effect of variables' uncertainties (or errors, more specifically random errors) on the uncertainty of a function based on them.
https://doi.org/10.6028/NIST.SP.1270
Arises when the testing or external benchmark populations do not equally represent the various parts of the user population or from the use of performance metrics that are not appropriate for the way in which the model will be used.
Evaluation Bias
Arises when the testing or external benchmark populations do not equally represent the various parts of the user population or from the use of performance metrics that are not appropriate for the way in which the model will be used.
https://doi.org/10.6028/NIST.SP.1270
When specific groups of user populations are excluded from testing and subsequent analyses.
Exclusion Bias
When specific groups of user populations are excluded from testing and subsequent analyses.
https://doi.org/10.6028/NIST.SP.1270
The exponential function is a mathematical function denoted by f(x)=exp or e^{x}.
Exponential Function
The exponential function is a mathematical function denoted by f(x)=exp or e^{x}.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/exponential
Extreme Learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature Learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes (not just the weights connecting inputs to hidden nodes) need not be tuned. These hidden nodes can be randomly assigned and never updated (i.e. they are random projection but with nonlinear transforms), or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to Learning a linear model. (https://en.wikipedia.org/wiki/Extreme_Learning_machine)
ELM
Input, Hidden, Output
Extreme Learning Machine
Extreme Learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature Learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes (not just the weights connecting inputs to hidden nodes) need not be tuned. These hidden nodes can be randomly assigned and never updated (i.e. they are random projection but with nonlinear transforms), or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to Learning a linear model. (https://en.wikipedia.org/wiki/Extreme_Learning_machine)
https://en.wikipedia.org/wiki/Extreme_Learning_machine
A technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them.
Federated Learning
A technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them.
https://en.wikipedia.org/wiki/Federated_learning
Effects that may occur when an algorithm learns from user behavior and feeds that behavior back into the model.
Feedback Loop Bias
Effects that may occur when an algorithm learns from user behavior and feeds that behavior back into the model.
https://doi.org/10.6028/NIST.SP.1270
A feedback based approach in which the representation is formed in an iterative manner based on a feedback received from previous iteration's output. (https://arxiv.org/abs/1612.09508)
FBN
Input, Hidden, Output, Hidden
Feedback Network
A statistical model in which the model parameters are fixed or non-random quantities.
FEM
Fixed Effects Model
A statistical model in which the model parameters are fixed or non-random quantities.
https://en.wikipedia.org/wiki/Fixed_effects_model
Flattens the input. Does not affect the batch size.
Flatten Layer
Flattens the input. Does not affect the batch size.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten
Applies a 2D fractional max pooling over an input signal composed of several input planes.
FractionalMaxPool2D
FractionalMaxPool2d
FractionalMaxPool2D Layer
Applies a 2D fractional max pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies a 3D fractional max pooling over an input signal composed of several input planes.
FractionalMaxPool3D
FractionalMaxPool3d
FractionalMaxPool3D Layer
Applies a 3D fractional max pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Arises when biased results are reported in order to support or satisfy the funding agency or financial supporter of the research study, but it can also be the individual researcher.
Funding Bias
Arises when biased results are reported in order to support or satisfy the funding agency or financial supporter of the research study, but it can also be the individual researcher.
https://doi.org/10.6028/NIST.SP.1270
Cell class for the GRU layer. This class processes one step within the whole time sequence input, whereas tf.keras.layer.GRU processes the whole sequence.
GRUCell Layer
Cell class for the GRU layer. This class processes one step within the whole time sequence input, whereas tf.keras.layer.GRU processes the whole sequence.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRUCell
Gated Recurrent Unit - Cho et al. 2014. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: activation == tanh, recurrent_activation == sigmoid, recurrent_dropout == 0, unroll is False, use_bias is True, reset_after is True. Inputs, if use masking, are strictly right-padded. Eager execution is enabled in the outermost context. There are two variants of the GRU implementation. The default one is based on v3 and has reset gate applied to hidden state before matrix multiplication. The other one is based on original and has the order reversed. The second variant is compatible with CuDNNGRU (GPU-only) and allows inference on CPU. Thus it has separate biases for kernel and recurrent_kernel. To use this variant, set reset_after=True and recurrent_activation='sigmoid'.
GRU Layer
Gated Recurrent Unit - Cho et al. 2014. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: activation == tanh, recurrent_activation == sigmoid, recurrent_dropout == 0, unroll is False, use_bias is True, reset_after is True. Inputs, if use masking, are strictly right-padded. Eager execution is enabled in the outermost context. There are two variants of the GRU implementation. The default one is based on v3 and has reset gate applied to hidden state before matrix multiplication. The other one is based on original and has the order reversed. The second variant is compatible with CuDNNGRU (GPU-only) and allows inference on CPU. Thus it has separate biases for kernel and recurrent_kernel. To use this variant, set reset_after=True and recurrent_activation='sigmoid'.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU
Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a forget gate, but has fewer parameters than LSTM, as it lacks an output gate. GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM.[4][5] GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets.
GRU
Input, Memory Cell, Output
Gated Recurrent Unit
Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a forget gate, but has fewer parameters than LSTM, as it lacks an output gate. GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM.[4][5] GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets.
https://en.wikipedia.org/wiki/Gated_recurrent_unit
Apply multiplicative 1-centered Gaussian noise. As it is a regularization layer, it is only active at training time.
GaussianDropout Layer
Apply multiplicative 1-centered Gaussian noise. As it is a regularization layer, it is only active at training time.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GaussianDropout
Apply additive zero-centered Gaussian noise. This is useful to mitigate overfitting (you could see it as a form of random data augmentation). Gaussian Noise (GS) is a natural choice as corruption process for real valued inputs. As it is a regularization layer, it is only active at training time.
GaussianNoise Layer
Apply additive zero-centered Gaussian noise. This is useful to mitigate overfitting (you could see it as a form of random data augmentation). Gaussian Noise (GS) is a natural choice as corruption process for real valued inputs. As it is a regularization layer, it is only active at training time.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GaussianNoise
Gaussian error linear unit (GELU) computes x * P(X <= x), where P(X) ~ N(0, 1). The (GELU) nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLU.
GELU
Gaussian Error Linear Unit
GELU Function
Gaussian error linear unit (GELU) computes x * P(X <= x), where P(X) ~ N(0, 1). The (GELU) nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLU.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/gelu
Methods that can learn novel classes from only few samples per class, preventing catastrophic forgetting of base classes, and classifier calibration across novel and base classes.
GFSL
Generalized Few-shot Learning
Methods that can learn novel classes from only few samples per class, preventing catastrophic forgetting of base classes, and classifier calibration across novel and base classes.
https://paperswithcode.com/paper/generalized-and-incremental-few-shot-learning/review/
This model generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
GLM
Generalized Linear Model
This model generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
https://en.wikipedia.org/wiki/Generalized_linear_model
A generative adversarial network (GAN) is a class of machine Learning frameworks designed by Ian Goodfellow and his colleagues in 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss). Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised Learning, GANs have also proven useful for semi-supervised Learning, fully supervised Learning,[ and reinforcement Learning. The core idea of a GAN is based on the "indirect" training through the discriminator,[clarification needed] which itself is also being updated dynamically. This basically means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner.
GAN
Backfed Input, Hidden, Matched Output-Input, Hidden, Matched Output-Input
Generative Adversarial Network
A generative adversarial network (GAN) is a class of machine Learning frameworks designed by Ian Goodfellow and his colleagues in 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss). Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised Learning, GANs have also proven useful for semi-supervised Learning, fully supervised Learning,[ and reinforcement Learning. The core idea of a GAN is based on the "indirect" training through the discriminator,[clarification needed] which itself is also being updated dynamically. This basically means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner.
https://en.wikipedia.org/wiki/Generative_adversarial_network
Global average pooling operation for temporal data.
GlobalAvgPool1D
GlobalAvgPool1d
GlobalAveragePooling1D Layer
Global average pooling operation for temporal data.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling1D
Global average pooling operation for spatial data.
GlobalAvgPool2D
GlobalAvgPool2d
GlobalAveragePooling2D Layer
Global average pooling operation for spatial data.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling2D
Global Average pooling operation for 3D data.
GlobalAvgPool3D
GlobalAvgPool3d
GlobalAveragePooling3D Layer
Global Average pooling operation for 3D data.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling3D
Global max pooling operation for 1D temporal data.
GlobalMaxPool1D
GlobalMaxPool1d
GlobalMaxPooling1D Layer
Global max pooling operation for 1D temporal data.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalMaxPool1D
Global max pooling operation for spatial data.
GlobalMaxPool2D
GlobalMaxPool2d
GlobalMaxPooling2D Layer
Global max pooling operation for spatial data.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalMaxPool2D
Global Max pooling operation for 3D data.
GlobalMaxPool3D
GlobalMaxPool3d
GlobalMaxPooling3D Layer
Global Max pooling operation for 3D data.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalMaxPool3D
GCN is a type of convolutional neural network that can work directly on graphs and take advantage of their structural information. (https://arxiv.org/abs/1609.02907)
GCN
Input, Hidden, Hidden, Output
Graph Convolutional Network
GCN is a type of convolutional neural network that can work directly on graphs and take advantage of their structural information. (https://arxiv.org/abs/1609.02907)
https://arxiv.org/abs/1609.02907
Graph Convolutional Policy Network (GCPN), a general graph convolutional network based model for goal-directed graph generation through reinforcement Learning. The model is trained to optimize domain-specific rewards and adversarial loss through policy gradient, and acts in an environment that incorporates domain-specific rules.
GPCN
Input, Hidden, Hidden, Policy, Output
Graph Convolutional Policy Network
Graph Convolutional Policy Network (GCPN), a general graph convolutional network based model for goal-directed graph generation through reinforcement Learning. The model is trained to optimize domain-specific rewards and adversarial loss through policy gradient, and acts in an environment that incorporates domain-specific rules.
https://arxiv.org/abs/1806.02473
Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization
GroupNorm
nn.GroupNorm
GroupNorm Layer
Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization
https://pytorch.org/docs/stable/nn.html#normalization-layers
A pattern of favoring members of one's in-group over out-group members. This can be expressed in evaluation of others, in allocation of resources, and in many other ways.
In-group Favoritism
In-group bias
In-group preference
In-group–out-group Bias
Intergroup bias
Group Bias
A pattern of favoring members of one's in-group over out-group members. This can be expressed in evaluation of others, in allocation of resources, and in many other ways.
https://en.wikipedia.org/wiki/In-group_favoritism
A psychological phenomenon that occurs when people in a group tend to make non-optimal decisions based on their desire to conform to the group, or fear of dissenting with the group. In groupthink, individuals often refrain from expressing their personal disagreement with the group, hesitating to voice opinions that do not align with the group.
Groupthink
Groupthink Bias
A psychological phenomenon that occurs when people in a group tend to make non-optimal decisions based on their desire to conform to the group, or fear of dissenting with the group. In groupthink, individuals often refrain from expressing their personal disagreement with the group, hesitating to voice opinions that do not align with the group.
https://doi.org/10.6028/NIST.SP.1270
A faster approximation of the sigmoid activation. Piecewise linear approximation of the sigmoid function. Ref: 'https://en.wikipedia.org/wiki/Hard_sigmoid'
Hard Sigmoid Function
A faster approximation of the sigmoid activation. Piecewise linear approximation of the sigmoid function. Ref: 'https://en.wikipedia.org/wiki/Hard_sigmoid'
https://www.tensorflow.org/api_docs/python/tf/keras/activations/hard_sigmoid
A preprocessing layer which hashes and bins categorical features. This layer transforms categorical inputs to hashed output. It element-wise converts a ints or strings to ints in a fixed range. The stable hash function uses tensorflow::ops::Fingerprint to produce the same output consistently across all platforms. This layer uses FarmHash64 by default, which provides a consistent hashed output across different platforms and is stable across invocations, regardless of device and context, by mixing the input bits thoroughly. If you want to obfuscate the hashed output, you can also pass a random salt argument in the constructor. In that case, the layer will use the SipHash64 hash function, with the salt value serving as additional input to the hash function.
Hashing Layer
A preprocessing layer which hashes and bins categorical features. This layer transforms categorical inputs to hashed output. It element-wise converts a ints or strings to ints in a fixed range. The stable hash function uses tensorflow::ops::Fingerprint to produce the same output consistently across all platforms. This layer uses FarmHash64 by default, which provides a consistent hashed output across different platforms and is stable across invocations, regardless of device and context, by mixing the input bits thoroughly. If you want to obfuscate the hashed output, you can also pass a random salt argument in the constructor. In that case, the layer will use the SipHash64 hash function, with the salt value serving as additional input to the hash function.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Hashing
A hidden layer is located between the input and output of the algorithm, in which the function applies weights to the inputs and directs them through an activation function as the output. In short, the hidden layers perform nonlinear transformations of the inputs entered into the network. Hidden layers vary depending on the function of the neural network, and similarly, the layers may vary depending on their associated weights.
Hidden Layer
A hidden layer is located between the input and output of the algorithm, in which the function applies weights to the inputs and directs them through an activation function as the output. In short, the hidden layers perform nonlinear transformations of the inputs entered into the network. Hidden layers vary depending on the function of the neural network, and similarly, the layers may vary depending on their associated weights.
https://deepai.org/machine-Learning-glossary-and-terms/hidden-layer-machine-Learning
Methods that group things according to a hierarchy.
Hierarchical Classification
Methods that group things according to a hierarchy.
https://en.wikipedia.org/wiki/Hierarchical_classification
Methods that seek to build a hierarchy of clusters.
HCL
Hierarchical Clustering
Methods that seek to build a hierarchy of clusters.
https://en.wikipedia.org/wiki/Hierarchical_clustering
Referring to the long-standing biases encoded in society over time. Related to, but distinct from, biases in historical description, or the interpretation, analysis, and explanation of history. A common example of historical bias is the tendency to view the larger world from a Western or European view.
Historical Bias
Referring to the long-standing biases encoded in society over time. Related to, but distinct from, biases in historical description, or the interpretation, analysis, and explanation of history. A common example of historical bias is the tendency to view the larger world from a Western or European view.
https://doi.org/10.6028/NIST.SP.1270
A Hopfield network is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 based on Ernst Ising's work with Wilhelm Lenz on the Ising model. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory. (https://en.wikipedia.org/wiki/Hopfield_network)
HN
Ising model of a neural network
Ising–Lenz–Little model
Backfed input
Hopfield Network
A Hopfield network is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 based on Ernst Ising's work with Wilhelm Lenz on the Ising model. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory. (https://en.wikipedia.org/wiki/Hopfield_network)
https://en.wikipedia.org/wiki/Hopfield_network
A bias wherein individuals perceive benign or ambiguous behaviors as hostile.
Hostile Attribution Bias
A bias wherein individuals perceive benign or ambiguous behaviors as hostile.
https://en.wikipedia.org/wiki/Interpretive_bias
Systematic errors in human thought based on a limited number of heuristic principles and predicting values to simpler judgmental operations.
Human Bias
Systematic errors in human thought based on a limited number of heuristic principles and predicting values to simpler judgmental operations.
https://doi.org/10.6028/NIST.SP.1270
When users rely on automation as a heuristic replacement for their own information seeking and processing.
Human Reporting Bias
When users rely on automation as a heuristic replacement for their own information seeking and processing.
https://doi.org/10.6028/NIST.SP.1270
A layer that performs image data preprocessing augmentations.
Image Augmentation Layer
A layer that performs image data preprocessing augmentations.
https://keras.io/guides/preprocessing_layers/
A layer that performs image data preprocessing operations.
Image Preprocessing Layer
A layer that performs image data preprocessing operations.
https://keras.io/guides/preprocessing_layers/
An unconscious belief, attitude, feeling, association, or stereotype that can affect the way in which humans process information, make decisions, and take actions.
Confirmatory Bias
Implicit Bias
An unconscious belief, attitude, feeling, association, or stereotype that can affect the way in which humans process information, make decisions, and take actions.
https://doi.org/10.6028/NIST.SP.1270
Methods that train a network on a base set of classes and then is presented several novel classes, each with only a few labeled examples.
IFSL
Incremenetal Few-shot Learning
Methods that train a network on a base set of classes and then is presented several novel classes, each with only a few labeled examples.
https://arxiv.org/abs/1810.07218
Individual bias is a persistent point of view or limited list of such points of view that one applies ("parent", "academic", "professional", or etc.).
Individual Bias
Individual bias is a persistent point of view or limited list of such points of view that one applies ("parent", "academic", "professional", or etc.).
https://develop.consumerium.org/wiki/Individual_bias
Arises when applications that are built with machine Learning are used to generate inputs for other machine Learning algorithms. If the output is biased in any way, this bias may be inherited by systems using the output as input to learn other models.
Inherited Bias
Arises when applications that are built with machine Learning are used to generate inputs for other machine Learning algorithms. If the output is biased in any way, this bias may be inherited by systems using the output as input to learn other models.
https://doi.org/10.6028/NIST.SP.1270
Layer to be used as an entry point into a Network (a graph of layers).
InputLayer Layer
Layer to be used as an entry point into a Network (a graph of layers).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/InputLayer
Specifies the rank, dtype and shape of every input to a layer. Layers can expose (if appropriate) an input_spec attribute: an instance of InputSpec, or a nested structure of InputSpec instances (one per input tensor). These objects enable the layer to run input compatibility checks for input structure, input rank, input shape, and input dtype. A None entry in a shape is compatible with any dimension, a None shape is compatible with any shape.
InputSpec Layer
Specifies the rank, dtype and shape of every input to a layer. Layers can expose (if appropriate) an input_spec attribute: an instance of InputSpec, or a nested structure of InputSpec instances (one per input tensor). These objects enable the layer to run input compatibility checks for input structure, input rank, input shape, and input dtype. A None entry in a shape is compatible with any dimension, a None shape is compatible with any shape.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/InputSpec
The input layer of a neural network is composed of artificial input neurons, and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The input layer is the very beginning of the workflow for the artificial neural network.
Input Layer
The input layer of a neural network is composed of artificial input neurons, and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The input layer is the very beginning of the workflow for the artificial neural network.
https://www.techopedia.com/definition/33262/input-layer-neural-networks#:~:text=Explains%20Input%20Layer-,What%20Does%20Input%20Layer%20Mean%3F,for%20the%20artificial%20neural%20network.
Applies Instance Normalization over a 2D (unbatched) or 3D (batched) input as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
InstanceNorm1D
InstanceNorm1d
nn.InstanceNorm1d
InstanceNorm1d Layer
Applies Instance Normalization over a 2D (unbatched) or 3D (batched) input as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
https://pytorch.org/docs/stable/nn.html#normalization-layers
Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
InstanceNorm2D
InstanceNorm2d
nn.InstanceNorm2d
InstanceNorm2d
Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
https://pytorch.org/docs/stable/nn.html#normalization-layers
Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
InstanceNorm3D
InstanceNorm3d
nn.InstanceNorm3d
InstanceNorm3d Layer
Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization.
https://pytorch.org/docs/stable/nn.html#normalization-layers
In contrast to biases exhibited at the level of individual persons, institutional bias refers to a tendency exhibited at the level of entire institutions, where practices or norms result in the favoring or disadvantaging of certain social groups. Common examples include institutional racism and institutional sexism.
Institutional Bias
In contrast to biases exhibited at the level of individual persons, institutional bias refers to a tendency exhibited at the level of entire institutions, where practices or norms result in the favoring or disadvantaging of certain social groups. Common examples include institutional racism and institutional sexism.
https://doi.org/10.6028/NIST.SP.1270
A preprocessing layer which maps integer features to contiguous ranges.
IntegerLookup Layer
A preprocessing layer which maps integer features to contiguous ranges.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/IntegerLookup
A form of information processing bias that can occur when users interpret algorithmic outputs according to their internalized biases and views.
Interpretation Bias
A form of information processing bias that can occur when users interpret algorithmic outputs according to their internalized biases and views.
https://doi.org/10.6028/NIST.SP.1270
An algorithm to group objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors
K-NN
KNN
K-nearest Neighbor Algorithm
An algorithm to group objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
An algorithm to classify objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors
K-NN
KNN
K-nearest Neighbor Classification Algorithm
An algorithm to classify objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
An algorithm to assign the average of the values of k nearest neighbors to objects.
K-NN
KNN
K-nearest Neighbor Regression Algorithm
An algorithm to assign the average of the values of k nearest neighbors to objects.
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine Learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the topological structure of the data. For example, a data set with p variables measured in n observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze. An SOM is a type of artificial neural network but is trained using competitive Learning rather than the error-correction Learning (e.g., backpropagation with gradient descent) used by other artificial neural networks. The SOM was introduced by the Finnish professor Teuvo Kohonen in the 1980s and therefore is sometimes called a Kohonen map or Kohonen network.[1][2] The Kohonen map or network is a computationally convenient abstraction building on biological models of neural systems from the 1970s[3] and morphogenesis models dating back to Alan Turing in the 1950s.
KN
SOFM
SOM
Self-Organizing Feature Map
Self-Organizing Map
Input, Hidden
Kohonen Network
A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine Learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the topological structure of the data. For example, a data set with p variables measured in n observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze. An SOM is a type of artificial neural network but is trained using competitive Learning rather than the error-correction Learning (e.g., backpropagation with gradient descent) used by other artificial neural networks. The SOM was introduced by the Finnish professor Teuvo Kohonen in the 1980s and therefore is sometimes called a Kohonen map or Kohonen network.[1][2] The Kohonen map or network is a computationally convenient abstraction building on biological models of neural systems from the 1970s[3] and morphogenesis models dating back to Alan Turing in the 1950s.
https://en.wikipedia.org/wiki/Self-organizing_map
Applies a 1D power-average pooling over an input signal composed of several input planes.
LPPool1D
LPPool1d
LPPool1D Layer
Applies a 1D power-average pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Applies a 2D power-average pooling over an input signal composed of several input planes.
LPPool2D
LPPool2d
LPPool2D Layer
Applies a 2D power-average pooling over an input signal composed of several input planes.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Cell class for the LSTM layer.
LSTMCell Layer
Cell class for the LSTM layer.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTMCell
Long Short-Term Memory layer - Hochreiter 1997. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: 1. activation == tanh, 2. recurrent_activation == sigmoid, 3. recurrent_dropout == 0, 4. unroll is False, 5. use_bias is True, 6. Inputs, if use masking, are strictly right-padded, 7. Eager execution is enabled in the outermost context.
LSTM Layer
Long Short-Term Memory layer - Hochreiter 1997. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: 1. activation == tanh, 2. recurrent_activation == sigmoid, 3. recurrent_dropout == 0, 4. unroll is False, 5. use_bias is True, 6. Inputs, if use masking, are strictly right-padded, 7. Eager execution is enabled in the outermost context.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM
Wraps arbitrary expressions as a Layer object.
Lambda Layer
Wraps arbitrary expressions as a Layer object.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda
A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning.
LLM
Large Language Model
A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning.
https://en.wikipedia.org/wiki/Large_language_model
A regression analysis method that performs both variable selection and regularizationin order to enhance the prediction accuracy and interpretability of the resulting statistical model.
Lasso Regression
A regression analysis method that performs both variable selection and regularizationin order to enhance the prediction accuracy and interpretability of the resulting statistical model.
https://en.wikipedia.org/wiki/Lasso_(statistics)
Network layer parent class
Layer
Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
LayerNorm
nn.LayerNorm
LayerNorm Layer
Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
https://pytorch.org/docs/stable/nn.html#normalization-layers
Layer normalization layer (Ba et al., 2016). Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1. Given a tensor inputs, moments are calculated and normalization is performed across the axes specified in axis.
LayerNormalization Layer
Layer normalization layer (Ba et al., 2016). Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1. Given a tensor inputs, moments are calculated and normalization is performed across the axes specified in axis.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization
This is the class from which all layers inherit. A layer is a callable object that takes as input one or more tensors and that outputs one or more tensors. It involves computation, defined in the call() method, and a state (weight variables). State can be created in various places, at the convenience of the subclass implementer: in __init__(); in the optional build() method, which is invoked by the first __call__() to the layer, and supplies the shape(s) of the input(s), which may not have been known at initialization time; in the first invocation of call(), with some caveats discussed below. Users will just instantiate a layer and then treat it as a callable.
Layer Layer
This is the class from which all layers inherit. A layer is a callable object that takes as input one or more tensors and that outputs one or more tensors. It involves computation, defined in the call() method, and a state (weight variables). State can be created in various places, at the convenience of the subclass implementer: in __init__(); in the optional build() method, which is invoked by the first __call__() to the layer, and supplies the shape(s) of the input(s), which may not have been known at initialization time; in the first invocation of call(), with some caveats discussed below. Users will just instantiate a layer and then treat it as a callable.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer
A torch.nn.BatchNorm1d module with lazy initialization of the num_features argument of the BatchNorm1d that is inferred from the input.size(1).
LazyBatchNorm1D
LazyBatchNorm1d
nn.LazyBatchNorm1d
LazyBatchNorm1D Layer
A torch.nn.BatchNorm1d module with lazy initialization of the num_features argument of the BatchNorm1d that is inferred from the input.size(1).
https://pytorch.org/docs/stable/nn.html#normalization-layers
A torch.nn.BatchNorm2d module with lazy initialization of the num_features argument of the BatchNorm2d that is inferred from the input.size(1).
LazyBatchNorm2D
LazyBatchNorm2d
nn.LazyBatchNorm2d
LazyBatchNorm2D Layer
A torch.nn.BatchNorm2d module with lazy initialization of the num_features argument of the BatchNorm2d that is inferred from the input.size(1).
https://pytorch.org/docs/stable/nn.html#normalization-layers
A torch.nn.BatchNorm3d module with lazy initialization of the num_features argument of the BatchNorm3d that is inferred from the input.size(1).
LazyBatchNorm3D
LazyBatchNorm3d
nn.LazyBatchNorm3d
LazyBatchNorm3D Layer
A torch.nn.BatchNorm3d module with lazy initialization of the num_features argument of the BatchNorm3d that is inferred from the input.size(1).
https://pytorch.org/docs/stable/nn.html#normalization-layers
A torch.nn.InstanceNorm1d module with lazy initialization of the num_features argument of the InstanceNorm1d that is inferred from the input.size(1).
LazyInstanceNorm1D
LazyInstanceNorm1d
nn.LazyInstanceNorm1d
LazyInstanceNorm1d Layer
A torch.nn.InstanceNorm1d module with lazy initialization of the num_features argument of the InstanceNorm1d that is inferred from the input.size(1).
https://pytorch.org/docs/stable/nn.html#normalization-layers
A torch.nn.InstanceNorm2d module with lazy initialization of the num_features argument of the InstanceNorm2d that is inferred from the input.size(1).
LazyInstanceNorm2D
LazyInstanceNorm2d
nn.LazyInstanceNorm2d
LazyInstanceNorm2d Layer
A torch.nn.InstanceNorm2d module with lazy initialization of the num_features argument of the InstanceNorm2d that is inferred from the input.size(1).
https://pytorch.org/docs/stable/nn.html#normalization-layers
A torch.nn.InstanceNorm3d module with lazy initialization of the num_features argument of the InstanceNorm3d that is inferred from the input.size(1).
LazyInstanceNorm3D
LazyInstanceNorm3d
nn.LazyInstanceNorm3d
LazyInstanceNorm3d Layer
A torch.nn.InstanceNorm3d module with lazy initialization of the num_features argument of the InstanceNorm3d that is inferred from the input.size(1).
https://pytorch.org/docs/stable/nn.html#normalization-layers
Leaky version of a Rectified Linear Unit.
LeakyReLU Layer
Leaky version of a Rectified Linear Unit.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LeakyReLU
A standard approach in regression analysis to approximate the solution of overdetermined systems(sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the residuals (a residual being the difference between an observed value and the fitted value provided by a model) made in the results of each individual equation.
Least-squares Analysis
A standard approach in regression analysis to approximate the solution of overdetermined systems(sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the residuals (a residual being the difference between an observed value and the fitted value provided by a model) made in the results of each individual equation.
https://en.wikipedia.org/wiki/Least_squares
A linear function has the form f(x) = a + bx.
Linear Function
A linear function has the form f(x) = a + bx.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/linear
A linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).
Linear Regression
A linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).
https://en.wikipedia.org/wiki/Linear_regression
Arises when network attributes obtained from user connections, activities, or interactions differ and misrepresent the true behavior of the users.
Linking Bias
Arises when network attributes obtained from user connections, activities, or interactions differ and misrepresent the true behavior of the users.
https://doi.org/10.6028/NIST.SP.1270
A liquid state machine (LSM) is a type of reservoir computer that uses a spiking neural network. An LSM consists of a large collection of units (called nodes, or neurons). Each node receives time varying input from external sources (the inputs) as well as from other nodes. Nodes are randomly connected to each other. The recurrent nature of the connections turns the time varying input into a spatio-temporal pattern of activations in the network nodes. The spatio-temporal patterns of activation are read out by linear discriminant units. The soup of recurrently connected nodes will end up computing a large variety of nonlinear functions on the input. Given a large enough variety of such nonlinear functions, it is theoretically possible to obtain linear combinations (using the read out units) to perform whatever mathematical operation is needed to perform a certain task, such as speech recognition or computer vision. The word liquid in the name comes from the analogy drawn to dropping a stone into a still body of water or other liquid. The falling stone will generate ripples in the liquid. The input (motion of the falling stone) has been converted into a spatio-temporal pattern of liquid displacement (ripples). (https://en.wikipedia.org/wiki/Liquid_state_machine)
LSM
Input, Spiking Hidden, Output
Liquid State Machine Network
A liquid state machine (LSM) is a type of reservoir computer that uses a spiking neural network. An LSM consists of a large collection of units (called nodes, or neurons). Each node receives time varying input from external sources (the inputs) as well as from other nodes. Nodes are randomly connected to each other. The recurrent nature of the connections turns the time varying input into a spatio-temporal pattern of activations in the network nodes. The spatio-temporal patterns of activation are read out by linear discriminant units. The soup of recurrently connected nodes will end up computing a large variety of nonlinear functions on the input. Given a large enough variety of such nonlinear functions, it is theoretically possible to obtain linear combinations (using the read out units) to perform whatever mathematical operation is needed to perform a certain task, such as speech recognition or computer vision. The word liquid in the name comes from the analogy drawn to dropping a stone into a still body of water or other liquid. The falling stone will generate ripples in the liquid. The input (motion of the falling stone) has been converted into a spatio-temporal pattern of liquid displacement (ripples). (https://en.wikipedia.org/wiki/Liquid_state_machine)
https://en.wikipedia.org/wiki/Liquid_state_machine
Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension.
LocalResponseNorm
nn.LocalResponseNorm
LocalResponseNorm Layer
Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension.
https://pytorch.org/docs/stable/nn.html#normalization-layers
The LocallyConnected1D layer works similarly to the Convolution1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
Locally-connected Layer
The LocallyConnected1D layer works similarly to the Convolution1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
https://faroit.com/keras-docs/1.2.2/layers/local/
Locally-connected layer for 1D inputs. The LocallyConnected1D layer works similarly to the Conv1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
LocallyConnected1D Layer
Locally-connected layer for 1D inputs. The LocallyConnected1D layer works similarly to the Conv1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LocallyConnected1D
Locally-connected layer for 2D inputs. The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
LocallyConnected2D Layer
Locally-connected layer for 2D inputs. The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LocallyConnected2D
A statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables.
Logistic Regression
A statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables.
https://en.wikipedia.org/wiki/Logistic_regression
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep Learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can process not only single data points (such as images), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition and anomaly detection in network traffic or IDSs (intrusion detection systems). A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.
LSTM
Input, Memory Cell, Output
Long Short Term Memory
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep Learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can process not only single data points (such as images), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition and anomaly detection in network traffic or IDSs (intrusion detection systems). A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.
https://en.wikipedia.org/wiki/Long_short-term_memory
When automation leads to humans being unaware of their situation such that, when control of a system is given back to them in a situation where humans and machines cooperate, they are unprepared to assume their duties. This can be a loss of awareness over what automation is and isn’t taking care of.
Loss Of Situational Awareness Bias
When automation leads to humans being unaware of their situation such that, when control of a system is given back to them in a situation where humans and machines cooperate, they are unprepared to assume their duties. This can be a loss of awareness over what automation is and isn’t taking care of.
https://doi.org/10.6028/NIST.SP.1270
A field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks.
Machine Learning
A field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks.
https://en.wikipedia.org/wiki/Machine_learning
Methods based on the assumption that one's observed data lie on a low-dimensional manifold embedded in a higher-dimensional space.
Manifold Learning
Methods based on the assumption that one's observed data lie on a low-dimensional manifold embedded in a higher-dimensional space.
https://arxiv.org/abs/2011.01307
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.[1][2][3] A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov.
MC
MP
Markov Process
Probalistic Hidden
Markov Chain
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.[1][2][3] A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov.
https://en.wikipedia.org/wiki/Markov_chain
Masks a sequence by using a mask value to skip timesteps. For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking). If any downstream layer does not support masking yet receives such an input mask, an exception will be raised.
Masking Layer
Masks a sequence by using a mask value to skip timesteps. For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking). If any downstream layer does not support masking yet receives such an input mask, an exception will be raised.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Masking
Max pooling operation for 1D temporal data. Downsamples the input representation by taking the maximum value over a spatial window of size pool_size. The window is shifted by strides. The resulting output, when using the "valid" padding option, has a shape of: output_shape = (input_shape - pool_size + 1) / strides) The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides.
MaxPool1D
MaxPool1d
MaxPooling1D
MaxPooling1d
MaxPooling1D Layer
Max pooling operation for 1D temporal data. Downsamples the input representation by taking the maximum value over a spatial window of size pool_size. The window is shifted by strides. The resulting output, when using the "valid" padding option, has a shape of: output_shape = (input_shape - pool_size + 1) / strides) The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool1D
Max pooling operation for 2D spatial data.
MaxPool2D
MaxPool2d
MaxPooling2D
MaxPooling2d
MaxPooling2D Layer
Max pooling operation for 2D spatial data.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D
Max pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension.
MaxPool3D
MaxPool3d
MaxPooling3D
MaxPooling3d
MaxPooling3D Layer
Max pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool3D
Computes a partial inverse of MaxPool1d.
MaxUnpool1D
MaxUnpool1d
MaxUnpool1D Layer
Computes a partial inverse of MaxPool1d.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Computes a partial inverse of MaxPool2d.
MaxUnpool2D
MaxUnpool2d
MaxUnpool2D Layer
Computes a partial inverse of MaxPool2d.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Computes a partial inverse of MaxPool3d.
MaxUnpool3D
MaxUnpool3d
MaxUnpool3D Layer
Computes a partial inverse of MaxPool3d.
https://pytorch.org/docs/stable/nn.html#pooling-layers
Layer that computes the maximum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Maximum Layer
Layer that computes the maximum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Maximum
Arises when features and labels are proxies for desired quantities, potentially leaving out important factors or introducing group or input-dependent noise that leads to differential performance.
Measurement Bias
Arises when features and labels are proxies for desired quantities, potentially leaving out important factors or introducing group or input-dependent noise that leads to differential performance.
https://doi.org/10.6028/NIST.SP.1270
A layer used to merge a list of inputs.
Merging Layer
A layer used to merge a list of inputs.
https://www.tutorialspoint.com/keras/keras_merge_layer.htm
Automatic learning algorithms applied to metadata about machine Learning experiments.
Meta-Learning
Automatic learning algorithms applied to metadata about machine Learning experiments.
https://en.wikipedia.org/wiki/Meta_learning_(computer_science)
Method parent class.
Method
Methods which can learn a representation function that maps objects into an embedded space.
Distance Metric Learning
Metric Learning
Methods which can learn a representation function that maps objects into an embedded space.
https://paperswithcode.com/task/metric-learning
Layer that computes the minimum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Minimum Layer
Layer that computes the minimum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Minimum
When modal interfaces confuse human operators, who misunderstand which mode the system is using, taking actions which are correct for a different mode but incorrect for their current situation. This is the cause of many deadly accidents, but also a source of confusion in everyday life.
Mode Confusion Bias
When modal interfaces confuse human operators, who misunderstand which mode the system is using, taking actions which are correct for a different mode but incorrect for their current situation. This is the cause of many deadly accidents, but also a source of confusion in everyday life.
https://doi.org/10.6028/NIST.SP.1270
The bias introduced while using the data to select a single seemingly “best” model from a large set of models employing many predictor variables. Model selection bias also occurs when an explanatory variable has a weak relationship with the response variable.
Model Selection Bias
The bias introduced while using the data to select a single seemingly “best” model from a large set of models employing many predictor variables. Model selection bias also occurs when an explanatory variable has a weak relationship with the response variable.
https://doi.org/10.6028/NIST.SP.1270
MultiHeadAttention layer. This is an implementation of multi-headed attention as described in the paper "Attention is all you Need" (Vaswani et al., 2017). If query, key, value are the same, then this is self-attention. Each timestep in query attends to the corresponding sequence in key, and returns a fixed-width vector.This layer first projects query, key and value. These are (effectively) a list of tensors of length num_attention_heads, where the corresponding shapes are (batch_size, <query dimensions>, key_dim), (batch_size, <key/value dimensions>, key_dim), (batch_size, <key/value dimensions>, value_dim).Then, the query and key tensors are dot-producted and scaled. These are softmaxed to obtain attention probabilities. The value tensors are then interpolated by these probabilities, then concatenated back to a single tensor. Finally, the result tensor with the last dimension as value_dim can take an linear projection and return. When using MultiHeadAttention inside a custom Layer, the custom Layer must implement build() and call MultiHeadAttention's _build_from_signature(). This enables weights to be restored correctly when the model is loaded.
MultiHeadAttention Layer
MultiHeadAttention layer. This is an implementation of multi-headed attention as described in the paper "Attention is all you Need" (Vaswani et al., 2017). If query, key, value are the same, then this is self-attention. Each timestep in query attends to the corresponding sequence in key, and returns a fixed-width vector.This layer first projects query, key and value. These are (effectively) a list of tensors of length num_attention_heads, where the corresponding shapes are (batch_size, <query dimensions>, key_dim), (batch_size, <key/value dimensions>, key_dim), (batch_size, <key/value dimensions>, value_dim).Then, the query and key tensors are dot-producted and scaled. These are softmaxed to obtain attention probabilities. The value tensors are then interpolated by these probabilities, then concatenated back to a single tensor. Finally, the result tensor with the last dimension as value_dim can take an linear projection and return. When using MultiHeadAttention inside a custom Layer, the custom Layer must implement build() and call MultiHeadAttention's _build_from_signature(). This enables weights to be restored correctly when the model is loaded.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention
Methods that lassify instances into one of three or more classes (classifying instances into one of two classes is called binary classification).
Multinomial Classification
Multiclass Classification
Methods that lassify instances into one of three or more classes (classifying instances into one of two classes is called binary classification).
https://en.wikipedia.org/wiki/Multiclass_classification
A method that translates information about the pairwise distances among a set of objects or individuals into a configuration of points mapped into an abstract Cartesian space.
MDS
Multidimensional Scaling
A method that translates information about the pairwise distances among a set of objects or individuals into a configuration of points mapped into an abstract Cartesian space.
https://en.wikipedia.org/wiki/Multidimensional_scaling
Methods which can create models that can process and link information using various modalities.
Multimodal Deep Learning
Methods which can create models that can process and link information using various modalities.
https://arxiv.org/abs/2105.11087
Methods which can represent the joint representations of different modalities.
Multimodal Learning
Layer that multiplies (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Multiply Layer
Layer that multiplies (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Multiply
A subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.
NLP
Natural Language Processing
A subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.
https://en.wikipedia.org/wiki/Natural_language_processing
Network parent class
Network
A Neural Turing machine (NTMs) is a recurrent neural network model. The approach was published by Alex Graves et al. in 2014. NTMs combine the fuzzy pattern matching capabilities of neural networks with the algorithmic power of programmable computers. An NTM has a neural network controller coupled to external memory resources, which it interacts with through attentional mechanisms. The memory interactions are differentiable end-to-end, making it possible to optimize them using gradient descent. An NTM with a long short-term memory (LSTM) network controller can infer simple algorithms such as copying, sorting, and associative recall from examples alone.
NTM
Input, Hidden, Spiking Hidden, Output
Neural Turing Machine Network
A Neural Turing machine (NTMs) is a recurrent neural network model. The approach was published by Alex Graves et al. in 2014. NTMs combine the fuzzy pattern matching capabilities of neural networks with the algorithmic power of programmable computers. An NTM has a neural network controller coupled to external memory resources, which it interacts with through attentional mechanisms. The memory interactions are differentiable end-to-end, making it possible to optimize them using gradient descent. An NTM with a long short-term memory (LSTM) network controller can infer simple algorithms such as copying, sorting, and associative recall from examples alone.
https://en.wikipedia.org/wiki/Neural_Turing_machine
Noisy dense layer that injects random noise to the weights of dense layer. Noisy dense layers are fully connected layers whose weights and biases are augmented by factorised Gaussian noise. The factorised Gaussian noise is controlled through gradient descent by a second weights layer. A NoisyDense layer implements the operation: $$ mathrm{NoisyDense}(x) = mathrm{activation}(mathrm{dot}(x, mu + (sigma cdot epsilon)) mathrm{bias}) $$ where mu is the standard weights layer, epsilon is the factorised Gaussian noise, and delta is a second weights layer which controls epsilon.
Noise Dense Layer
Noisy dense layer that injects random noise to the weights of dense layer. Noisy dense layers are fully connected layers whose weights and biases are augmented by factorised Gaussian noise. The factorised Gaussian noise is controlled through gradient descent by a second weights layer. A NoisyDense layer implements the operation: $$ mathrm{NoisyDense}(x) = mathrm{activation}(mathrm{dot}(x, mu + (sigma cdot epsilon)) mathrm{bias}) $$ where mu is the standard weights layer, epsilon is the factorised Gaussian noise, and delta is a second weights layer which controls epsilon.
https://www.tensorflow.org/addons/api_docs/python/tfa/layers/NoisyDense
A preprocessing layer which normalizes continuous features.
Normalization Layer
A preprocessing layer which normalizes continuous features.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization
A layer that performs numerical data preprocessing operations.
Numerical Features Preprocessing Layer
A layer that performs numerical data preprocessing operations.
https://keras.io/guides/preprocessing_layers/
A method which aims to classify objects from one, or only a few, examples.
OSL
One-shot Learning
A method which aims to classify objects from one, or only a few, examples.
https://en.wikipedia.org/wiki/One-shot_learning
The output layer in an artificial neural network is the last layer of neurons that produces given outputs for the program. Though they are made much like other artificial neurons in the neural network, output layer neurons may be built or observed in a different way, given that they are the last “actor” nodes on the network.
Output Layer
The output layer in an artificial neural network is the last layer of neurons that produces given outputs for the program. Though they are made much like other artificial neurons in the neural network, output layer neurons may be built or observed in a different way, given that they are the last “actor” nodes on the network.
https://www.techopedia.com/definition/33263/output-layer-neural-networks
Parametric Rectified Linear Unit.
PReLU Layer
Parametric Rectified Linear Unit.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/PReLU
The perceptron is an algorithm for supervised Learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector. (https://en.wikipedia.org/wiki/Perceptron)
SLP
Single Layer Perceptron
Input, Output
Perceptron
Permutes the dimensions of the input according to a given pattern. Useful e.g. connecting RNNs and convnets.
Permute Layer
Permutes the dimensions of the input according to a given pattern. Useful e.g. connecting RNNs and convnets.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Permute
Pooling layers serve the dual purposes of mitigating the sensitivity of convolutional layers to location and of spatially downsampling representations.
Pooling Layer
Pooling layers serve the dual purposes of mitigating the sensitivity of convolutional layers to location and of spatially downsampling representations.
https://d2l.ai/chapter_convolutional-neural-networks/pooling.html
A form of selection bias that occurs when items that are more popular are more exposed and less popular items are under-represented.
Popularity Bias
A form of selection bias that occurs when items that are more popular are more exposed and less popular items are under-represented.
https://doi.org/10.6028/NIST.SP.1270
A form of selection bias that occurs when items that are more popular are more exposed and less popular items are under-represented.aSystematic distortions in demographics or other user characteristics between a population of users represented in a dataset or on a platform and some target population.
Population Bias
A form of selection bias that occurs when items that are more popular are more exposed and less popular items are under-represented.aSystematic distortions in demographics or other user characteristics between a population of users represented in a dataset or on a platform and some target population.
https://doi.org/10.6028/NIST.SP.1270
A layer that performs data preprocessing operations.
Preprocessing Layer
A layer that performs data preprocessing operations.
https://www.tensorflow.org/guide/keras/preprocessing_layers
Biases arising from how information is presented on the Web, via a user interface, due to rating or ranking of output, or through users’ own self-selected, biased interaction.
Presentation Bias
Biases arising from how information is presented on the Web, via a user interface, due to rating or ranking of output, or through users’ own self-selected, biased interaction.
https://doi.org/10.6028/NIST.SP.1270
A method for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data.
PCA
Principal Component Analysis
A method for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data.
https://en.wikipedia.org/wiki/Principal_component_analysis
A probabilistic model for which a graph expresses the conditional dependence structure between random variables.
Graphical Model
PGM
Structure Probabilistic Model
Probabilistic Graphical Model
A probabilistic model for which a graph expresses the conditional dependence structure between random variables.
https://en.wikipedia.org/wiki/Graphical_model
Methods that use statistical methods to analyze the words in each text to discover common themes, how those themes are connected to each other, and how they change over time.
Probabilistic Topic Model
Methods that use statistical methods to analyze the words in each text to discover common themes, how those themes are connected to each other, and how they change over time.
https://pyro.ai/examples/prodlda.html
Judgement modulated by affect, which is influenced by the level of efficacy and efficiency in information processing; in cognitive sciences, processing bias is often referred to as an aesthetic judgement.
Validation Bias
Processing Bias
Judgement modulated by affect, which is influenced by the level of efficacy and efficiency in information processing; in cognitive sciences, processing bias is often referred to as an aesthetic judgement.
https://royalsocietypublishing.org/doi/10.1098/rspb.2019.0165#d1e5237
A surival modeling method where the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate.
Proportional Hazards Model
A surival modeling method where the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate.
https://en.wikipedia.org/wiki/Proportional_hazards_modelProportional Hazards Model
Base class for recurrent layers.
RNN Layer
Base class for recurrent layers.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RNN
Like recurrent neural networks (RNNs), transformers are designed to handle sequential input data, such as natural language, for tasks such as translation and text summarization. However, unlike RNNs, transformers do not necessarily process the data in order. Rather, the attention mechanism provides context for any position in the input sequence.
RBFN
RBN
Radial Basis Function Network
Input, Hidden, Output
Radial Basis Network
Like recurrent neural networks (RNNs), transformers are designed to handle sequential input data, such as natural language, for tasks such as translation and text summarization. However, unlike RNNs, transformers do not necessarily process the data in order. Rather, the attention mechanism provides context for any position in the input sequence.
https://en.wikipedia.org/wiki/Radial_basis_function_network
A preprocessing layer which randomly adjusts brightness during training. This layer will randomly increase/reduce the brightness for the input RGB images. At inference time, the output will be identical to the input. Call the layer with training=True to adjust the brightness of the input. Note that different brightness adjustment factors will be apply to each the images in the batch.
RandomBrightness Layer
A preprocessing layer which randomly adjusts brightness during training. This layer will randomly increase/reduce the brightness for the input RGB images. At inference time, the output will be identical to the input. Call the layer with training=True to adjust the brightness of the input. Note that different brightness adjustment factors will be apply to each the images in the batch.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomBrightness
A preprocessing layer which randomly adjusts contrast during training. This layer will randomly adjust the contrast of an image or images by a random factor. Contrast is adjusted independently for each channel of each image during training. For each channel, this layer computes the mean of the image pixels in the channel and then adjusts each component x of each pixel to (x - mean) * contrast_factor + mean. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and in integer or floating point dtype. By default, the layer will output floats. The output value will be clipped to the range [0, 255], the valid range of RGB colors.
RandomContrast Layer
A preprocessing layer which randomly adjusts contrast during training. This layer will randomly adjust the contrast of an image or images by a random factor. Contrast is adjusted independently for each channel of each image during training. For each channel, this layer computes the mean of the image pixels in the channel and then adjusts each component x of each pixel to (x - mean) * contrast_factor + mean. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and in integer or floating point dtype. By default, the layer will output floats. The output value will be clipped to the range [0, 255], the valid range of RGB colors.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomContrast
A preprocessing layer which randomly crops images during training. During training, this layer will randomly choose a location to crop images down to a target size. The layer will crop all the images in the same batch to the same cropping location. At inference time, and during training if an input image is smaller than the target size, the input will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. If you need to apply random cropping at inference time, set training to True when calling the layer. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
RandomCrop Layer
A preprocessing layer which randomly crops images during training. During training, this layer will randomly choose a location to crop images down to a target size. The layer will crop all the images in the same batch to the same cropping location. At inference time, and during training if an input image is smaller than the target size, the input will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. If you need to apply random cropping at inference time, set training to True when calling the layer. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomCrop
A preprocessing layer which randomly flips images during training. This layer will flip the images horizontally and or vertically based on the mode attribute. During inference time, the output will be identical to input. Call the layer with training=True to flip the input. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
RandomFlip Layer
A preprocessing layer which randomly flips images during training. This layer will flip the images horizontally and or vertically based on the mode attribute. During inference time, the output will be identical to input. Call the layer with training=True to flip the input. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomFlip
A preprocessing layer which randomly varies image height during training. This layer adjusts the height of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference.
RandomHeight Layer
A preprocessing layer which randomly varies image height during training. This layer adjusts the height of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomHeight
A preprocessing layer which randomly rotates images during training.
RandomRotation Layer
A preprocessing layer which randomly rotates images during training.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomRotation
A preprocessing layer which randomly translates images during training. This layer will apply random translations to each image during training, filling empty space according to fill_mode. aInput pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
RandomTranslation Layer
A preprocessing layer which randomly translates images during training. This layer will apply random translations to each image during training, filling empty space according to fill_mode. aInput pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomTranslation
A preprocessing layer which randomly varies image width during training. This layer will randomly adjusts the width of a batch of images of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference.
RandomWidth Layer
A preprocessing layer which randomly varies image width during training. This layer will randomly adjusts the width of a batch of images of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomWidth
A preprocessing layer which randomly zooms images during training. This layer will randomly zoom in or out on each axis of an image independently, filling empty space according to fill_mode.Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
RandomZoom Layer
A preprocessing layer which randomly zooms images during training. This layer will randomly zoom in or out on each axis of an image independently, filling empty space according to fill_mode.Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomZoom
A statistical model where the model parameters are random variables.
REM
Random Effects Model
A statistical model where the model parameters are random variables.
https://en.wikipedia.org/wiki/Random_effects_model
An ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time.
Random Forest
An ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time.
https://en.wikipedia.org/wiki/Random_forest
The idea that top-ranked results are the most relevant and important and will result in more clicks than other results.
Ranking Bias
The idea that top-ranked results are the most relevant and important and will result in more clicks than other results.
https://doi.org/10.6028/NIST.SP.1270
Refers to differences in perspective, memory and recall, interpretation, and reporting on the same event from multiple persons or witnesses.
Rashomon Effect
Rashomon Principle
Rashomon Effect Bias
Refers to differences in perspective, memory and recall, interpretation, and reporting on the same event from multiple persons or witnesses.
https://doi.org/10.6028/NIST.SP.1270
The ReLU activation function returns: max(x, 0), the element-wise maximum of 0 and the input tensor.
ReLU
Rectified Linear Unit
ReLU Function
The ReLU activation function returns: max(x, 0), the element-wise maximum of 0 and the input tensor.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/relu
Rectified Linear Unit activation function. With default values, it returns element-wise max(x, 0).
ReLU Layer
Rectified Linear Unit activation function. With default values, it returns element-wise max(x, 0).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU
A layer of an RNB, composed of recurrent units and with the number of which is the hidden size of the layer.
Recurrent Layer
A layer of an RNB, composed of recurrent units and with the number of which is the hidden size of the layer.
https://docs.nvidia.com/deepLearning/performance/dl-performance-recurrent/index.html#recurrent-layer
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs.
RN
RecNN
Recurrent Network
Input, Memory Cell, Output
Recurrent Neural Network
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs.
https://en.wikipedia.org/wiki/Recurrent_neural_network
A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. Recursive neural networks, sometimes abbreviated as RvNNs, have been successful, for instance, in Learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding.
RecuNN
RvNN
Recursive Neural Network
A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. Recursive neural networks, sometimes abbreviated as RvNNs, have been successful, for instance, in Learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding.
https://en.wikipedia.org/wiki/Recursive_neural_network
A set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features').
Regression analysis
Regression model
Regression Analysis
A set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features').
https://en.wikipedia.org/wiki/Regression_analysis
Regularizers allow you to apply penalties on layer parameters or layer activity during optimization. These penalties are summed into the loss function that the network optimizes. Regularization penalties are applied on a per-layer basis.
Regularization Layer
Regularizers allow you to apply penalties on layer parameters or layer activity during optimization. These penalties are summed into the loss function that the network optimizes. Regularization penalties are applied on a per-layer basis.
https://keras.io/api/layers/regularizers/
Methods that do not need labelled input/output pairs be presented, nor needing sub-optimal actions to be explicitly corrected. Instead they focus on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).
Reinforcement Learning
Methods that do not need labelled input/output pairs be presented, nor needing sub-optimal actions to be explicitly corrected. Instead they focus on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).
https://en.wikipedia.org/wiki/Reinforcement_learning
Repeats the input n times.
RepeatVector Layer
Repeats the input n times.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/RepeatVector
Arises due to non-random sampling of subgroups, causing trends estimated for one population to not be generalizable to data collected from a new population.
Representation Bias
Arises due to non-random sampling of subgroups, causing trends estimated for one population to not be generalizable to data collected from a new population.
https://doi.org/10.6028/NIST.SP.1270
Methods that allow a system to discover the representations required for feature detection or classification from raw data.
Feature Learning
Representation Learning
Methods that allow a system to discover the representations required for feature detection or classification from raw data.
https://en.wikipedia.org/wiki/Feature_learning
A preprocessing layer which rescales input values to a new range.
Rescaling Layer
A preprocessing layer which rescales input values to a new range.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Rescaling
Layer that reshapes inputs into the given shape.
Reshape Layer
Layer that reshapes inputs into the given shape.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape
Reshape layers are used to change the shape of the input.
Reshaping Layer
A residual neural network (ResNet) is an artificial neural network (ANN) of a kind that builds on constructs known from pyramidal cells in the cerebral cortex. Residual neural networks do this by utilizing skip connections, or shortcuts to jump over some layers. Typical ResNet models are implemented with double- or triple- layer skips that contain nonlinearities (ReLU) and batch normalization in between. An additional weight matrix may be used to learn the skip weights; these models are known as HighwayNets. Models with several parallel skips are referred to as DenseNets. In the context of residual neural networks, a non-residual network may be described as a 'plain network'.
DRN
Deep Residual Network
ResNN
ResNet
Input, Weight, BN, ReLU, Weight, BN, Addition, ReLU
Residual Neural Network
A residual neural network (ResNet) is an artificial neural network (ANN) of a kind that builds on constructs known from pyramidal cells in the cerebral cortex. Residual neural networks do this by utilizing skip connections, or shortcuts to jump over some layers. Typical ResNet models are implemented with double- or triple- layer skips that contain nonlinearities (ReLU) and batch normalization in between. An additional weight matrix may be used to learn the skip weights; these models are known as HighwayNets. Models with several parallel skips are referred to as DenseNets. In the context of residual neural networks, a non-residual network may be described as a 'plain network'.
https://en.wikipedia.org/wiki/Residual_neural_network
A preprocessing layer which resizes images. This layer resizes an image input to a target height and width. The input should be a 4D (batched) or 3D (unbatched) tensor in "channels_last" format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. This layer can be called on tf.RaggedTensor batches of input images of distinct sizes, and will resize the outputs to dense tensors of uniform size.
Resizing Layer
A preprocessing layer which resizes images. This layer resizes an image input to a target height and width. The input should be a 4D (batched) or 3D (unbatched) tensor in "channels_last" format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. This layer can be called on tf.RaggedTensor batches of input images of distinct sizes, and will resize the outputs to dense tensors of uniform size.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Resizing
A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.
RBM
Backfed Input, Probabilistic Hidden
Restricted Boltzmann Machine
A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.
https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine
A method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated.[1] It has been used in many fields including econometrics, chemistry, and engineering.
Ridge Regression
A method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated.[1] It has been used in many fields including econometrics, chemistry, and engineering.
https://en.wikipedia.org/wiki/Ridge_regression
The SELU activation function multiplies scale (> 1) with the output of the ELU function to ensure a slope larger than one for positive inputs.
SELU
Scaled Exponential Linear Unit
SELU Function
The SELU activation function multiplies scale (> 1) with the output of the ELU function to ensure a slope larger than one for positive inputs.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/selu
Bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed.
Sampling Bias
Selection Bias
Selection Effect
Selection And Sampling Bias
Bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed.
https://en.wikipedia.org/wiki/Selection_bias
Decision-makers’ inclination to selectively adopt algorithmic advice when it matches their pre-existing beliefs and stereotypes.
Selective Adherence Bias
Decision-makers’ inclination to selectively adopt algorithmic advice when it matches their pre-existing beliefs and stereotypes.
https://doi.org/10.6028/NIST.SP.1270
Regarded as an intermediate form between supervised and unsupervised learning.
Self-supervised Learning
Regarded as an intermediate form between supervised and unsupervised learning.
https://en.wikipedia.org/wiki/Self-supervised_learning
Depthwise separable 1D convolution. This layer performs a depthwise convolution that acts separately on channels, followed by a pointwise convolution that mixes channels. If use_bias is True and a bias initializer is provided, it adds a bias vector to the output. It then optionally applies an activation function to produce the final output.a
SeparableConv1D Layer
SeparableConvolution1D Layer
Depthwise separable 1D convolution. This layer performs a depthwise convolution that acts separately on channels, followed by a pointwise convolution that mixes channels. If use_bias is True and a bias initializer is provided, it adds a bias vector to the output. It then optionally applies an activation function to produce the final output.a
https://www.tensorflow.org/api_docs/python/tf/keras/layers/SeparableConv1D
Depthwise separable 2D convolution. Separable convolutions consist of first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes the resulting output channels. The depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step. Intuitively, separable convolutions can be understood as a way to factorize a convolution kernel into two smaller kernels, or as an extreme version of an Inception block.
SeparableConv2D Layer
SeparableConvolution2D Layer
Depthwise separable 2D convolution. Separable convolutions consist of first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes the resulting output channels. The depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step. Intuitively, separable convolutions can be understood as a way to factorize a convolution kernel into two smaller kernels, or as an extreme version of an Inception block.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/SeparableConv2D
Applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)). For small values (<-5), sigmoid returns a value close to zero, and for large values (>5) the result of the function gets close to 1. Sigmoid is equivalent to a 2-element Softmax, where the second element is assumed to be zero. The sigmoid function always returns a value between 0 and 1.
Sigmoid Function
Applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)). For small values (<-5), sigmoid returns a value close to zero, and for large values (>5) the result of the function gets close to 1. Sigmoid is equivalent to a 2-element Softmax, where the second element is assumed to be zero. The sigmoid function always returns a value between 0 and 1.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/sigmoid
Cell class for SimpleRNN. This class processes one step within the whole time sequence input, whereas tf.keras.layer.SimpleRNN processes the whole sequence.
SimpleRNNCell Layer
Cell class for SimpleRNN. This class processes one step within the whole time sequence input, whereas tf.keras.layer.SimpleRNN processes the whole sequence.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNNCell
Fully-connected RNN where the output is to be fed back to input.
SimpleRNN Layer
Fully-connected RNN where the output is to be fed back to input.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNN
Can be positive or negative, and take a number of different forms, but is typically characterized as being for or against groups or individuals based on social identities, demographic factors, or immutable physical characteristics. Societal or social biases are often stereotypes. Common examples of societal or social biases are based on concepts like race, ethnicity, gender, sexual orientation, socioeconomic status, education, and more. Societal bias is often recognized and discussed in the context of NLP (Natural Language Processing) models.
Social Bias
Societal Bias
Can be positive or negative, and take a number of different forms, but is typically characterized as being for or against groups or individuals based on social identities, demographic factors, or immutable physical characteristics. Societal or social biases are often stereotypes. Common examples of societal or social biases are based on concepts like race, ethnicity, gender, sexual orientation, socioeconomic status, education, and more. Societal bias is often recognized and discussed in the context of NLP (Natural Language Processing) models.
https://doi.org/10.6028/NIST.SP.1270
The elements of the output vector are in range (0, 1) and sum to 1. Each vector is handled independently. The axis argument sets which axis of the input the function is applied along. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution. The softmax of each vector x is computed as exp(x) / tf.reduce_sum(exp(x)). The input values in are the log-odds of the resulting probability.
Softmax Function
The elements of the output vector are in range (0, 1) and sum to 1. Each vector is handled independently. The axis argument sets which axis of the input the function is applied along. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution. The softmax of each vector x is computed as exp(x) / tf.reduce_sum(exp(x)). The input values in are the log-odds of the resulting probability.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/softmax
Softmax activation function.
Softmax Layer
Softmax activation function.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Softmax
softplus(x) = log(exp(x) + 1)
Softplus Function
softplus(x) = log(exp(x) + 1)
https://www.tensorflow.org/api_docs/python/tf/keras/activations/softplus
softsign(x) = x / (abs(x) + 1)
Softsign Function
softsign(x) = x / (abs(x) + 1)
https://www.tensorflow.org/api_docs/python/tf/keras/activations/softsign
Sparse autoencoders may include more (rather than fewer) hidden units than inputs, but only a small number of the hidden units are allowed to be active at the same time (thus, sparse). This constraint forces the model to respond to the unique statistical features of the training data. (https://en.wikipedia.org/wiki/Autoencoder)
SAE
Input, Hidden, Matched Output-Input
Sparse AE
Methods which aim to find sparse representations of the input data in the form of a linear combination of basic elements as well as those basic elements themselves.
Sparse coding
Sparse dictionary Learning
Sparse Learning
Methods which aim to find sparse representations of the input data in the form of a linear combination of basic elements as well as those basic elements themselves.
https://en.wikipedia.org/wiki/Sparse_dictionary_learning
Spatial 1D version of Dropout. This version performs the same function as Dropout, however, it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout1D will help promote independence between feature maps and should be used instead.
SpatialDropout1D Layer
Spatial 1D version of Dropout. This version performs the same function as Dropout, however, it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout1D will help promote independence between feature maps and should be used instead.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/SpatialDropout1D
Spatial 2D version of Dropout. This version performs the same function as Dropout, however, it drops entire 2D feature maps instead of individual elements. If adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout2D will help promote independence between feature maps and should be used instead.a
SpatialDropout2D Layer
Spatial 2D version of Dropout. This version performs the same function as Dropout, however, it drops entire 2D feature maps instead of individual elements. If adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout2D will help promote independence between feature maps and should be used instead.a
https://www.tensorflow.org/api_docs/python/tf/keras/layers/SpatialDropout2D
Spatial 3D version of Dropout. This version performs the same function as Dropout, however, it drops entire 3D feature maps instead of individual elements. If adjacent voxels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout3D will help promote independence between feature maps and should be used instead.
SpatialDropout3D Layer
Spatial 3D version of Dropout. This version performs the same function as Dropout, however, it drops entire 3D feature maps instead of individual elements. If adjacent voxels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout3D will help promote independence between feature maps and should be used instead.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/SpatialDropout3D
Regression method used to model spatial relationships.
Spatial Regression
Regression method used to model spatial relationships.
https://gisgeography.com/spatial-regression-models-arcgis/
Wrapper allowing a stack of RNN cells to behave as a single cell. Used to implement efficient stacked RNNs.
StackedRNNCells Layer
Wrapper allowing a stack of RNN cells to behave as a single cell. Used to implement efficient stacked RNNs.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/StackedRNNCells
A bias whereby people tend to search only where it is easiest to look.
Streetlight Effect
Streetlight Effect Bias
A bias whereby people tend to search only where it is easiest to look.
https://doi.org/10.6028/NIST.SP.1270
A preprocessing layer which maps string features to integer indices.
StringLookup Layer
A preprocessing layer which maps string features to integer indices.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup
Layer that subtracts two inputs. It takes as input a list of tensors of size 2, both of the same shape, and returns a single tensor, (inputs[0] - inputs[1]), also of the same shape.
Subtract Layer
Layer that subtracts two inputs. It takes as input a list of tensors of size 2, both of the same shape, and returns a single tensor, (inputs[0] - inputs[1]), also of the same shape.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Subtract
A human tendency where people opt to continue with an endeavor or behavior due to previously spent or invested resources, such as money, time, and effort, regardless of whether costs outweigh benefits. For example, in AI, the sunk cost fallacy could lead development teams and organizations to feel that because they have already invested so much time and money into a particular AI application, they must pursue it to market rather than deciding to end the effort, even in the face of significant technical debt and/or ethical debt.
Sunk Cost Fallacy
Sunk Cost Fallacy Bias
A human tendency where people opt to continue with an endeavor or behavior due to previously spent or invested resources, such as money, time, and effort, regardless of whether costs outweigh benefits. For example, in AI, the sunk cost fallacy could lead development teams and organizations to feel that because they have already invested so much time and money into a particular AI application, they must pursue it to market rather than deciding to end the effort, even in the face of significant technical debt and/or ethical debt.
https://doi.org/10.6028/NIST.SP.1270
Methods that simultaneously cluster the rows and columns of a labeled matrix, also taking into account the data label contributions to cluster coherence.
Supervised Block Clustering
Supervised Co-clustering
Supervised Joint Clustering
Supervised Two-mode Clustering
Supervised Two-way Clustering
Supervised Biclustering
Methods that simultaneously cluster the rows and columns of a labeled matrix, also taking into account the data label contributions to cluster coherence.
https://en.wikipedia.org/wiki/Biclustering
Methods that group a set of labeled objects in such a way that objects in the same group (called a cluster) are more similarly labeled (in some sense) relative to those in other groups (clusters).
Cluster analysis
Supervised Clustering
Methods that group a set of labeled objects in such a way that objects in the same group (called a cluster) are more similarly labeled (in some sense) relative to those in other groups (clusters).
https://en.wikipedia.org/wiki/Cluster_analysis
Methods that can learn a function that maps an input to an output based on example input-output pairs.
Supervised Learning
Methods that can learn a function that maps an input to an output based on example input-output pairs.
https://en.wikipedia.org/wiki/Supervised_learning
In machine Learning, support-vector machines (SVMs, also support-vector networks) are supervised Learning models with associated Learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues (Boser et al., 1992, Guyon et al., 1993, Vapnik et al., 1997) SVMs are one of the most robust prediction methods, being based on statistical Learning frameworks or VC theory proposed by Vapnik (1982, 1995) and Chervonenkis (1974). Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). SVM maps training examples to points in space so as to maximise the width of the gap between the two categories. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.
SVM
SVN
Supper Vector Network
Input, Hidden, Output
Support Vector Machine
In machine Learning, support-vector machines (SVMs, also support-vector networks) are supervised Learning models with associated Learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues (Boser et al., 1992, Guyon et al., 1993, Vapnik et al., 1997) SVMs are one of the most robust prediction methods, being based on statistical Learning frameworks or VC theory proposed by Vapnik (1982, 1995) and Chervonenkis (1974). Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). SVM maps training examples to points in space so as to maximise the width of the gap between the two categories. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.
https://en.wikipedia.org/wiki/Support-vector_machine
Methods for nalyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems.
Survival Analysis
Methods for nalyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems.
https://en.wikipedia.org/wiki/Survival_analysis
Tendency for people to focus on the items, observations, or people that “survive” or make it past a selection process, while overlooking those that did not.
Survivorship Bias
Tendency for people to focus on the items, observations, or people that “survive” or make it past a selection process, while overlooking those that did not.
https://doi.org/10.6028/NIST.SP.1270
x*sigmoid(x). It is a smooth, non-monotonic function that consistently matches or outperforms ReLU on deep networks, it is unbounded above and bounded below.
Swish Function
x*sigmoid(x). It is a smooth, non-monotonic function that consistently matches or outperforms ReLU on deep networks, it is unbounded above and bounded below.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/swish
Like recurrent networks, but the connections between units are symmetrical (they have the same weight in both directions).
SCN
Symmetrically Connected Network
Like recurrent networks, but the connections between units are symmetrical (they have the same weight in both directions).
https://ieeexplore.ieee.org/document/287176
Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
SyncBatchNorm
nn.SyncBatchNorm
SyncBatchNorm Layer
Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
https://pytorch.org/docs/stable/nn.html#normalization-layers
Biases that result from procedures and practices of particular institutions that operate in ways which result in certain social groups being advantaged or favored and others being disadvantaged or devalued.
Institutional Bias
Societal Bias
Systemic Bias
Biases that result from procedures and practices of particular institutions that operate in ways which result in certain social groups being advantaged or favored and others being disadvantaged or devalued.
https://doi.org/10.6028/NIST.SP.1270
Hyperbolic tangent activation function.
hyperbolic tangent
Tanh Function
Hyperbolic tangent activation function.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/tanh
Bias that arises from differences in populations and behaviors over time.
Temporal Bias
Bias that arises from differences in populations and behaviors over time.
https://doi.org/10.6028/NIST.SP.1270
A preprocessing layer which maps text features to integer sequences.
TextVectorization Layer
A preprocessing layer which maps text features to integer sequences.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization
A layer that performs text data preprocessing operations.
Text Preprocessing Layer
A layer that performs text data preprocessing operations.
https://keras.io/guides/preprocessing_layers/
Thresholded Rectified Linear Unit.
ThresholdedReLU Layer
Thresholded Rectified Linear Unit.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ThresholdedReLU
This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension. Consider a batch of 32 video samples, where each sample is a 128x128 RGB image with channels_last data format, across 10 timesteps. The batch input shape is (32, 10, 128, 128, 3). You can then use TimeDistributed to apply the same Conv2D layer to each of the 10 timesteps, independently:
TimeDistributed Layer
This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension. Consider a batch of 32 video samples, where each sample is a 128x128 RGB image with channels_last data format, across 10 timesteps. The batch input shape is (32, 10, 128, 128, 3). You can then use TimeDistributed to apply the same Conv2D layer to each of the 10 timesteps, independently:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed
Methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data.
Time Series Analysis
Methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data.
https://en.wikipedia.org/wiki/Time_series
Methods that predict future values based on previously observed values.
Time Series Forecasting
Methods that predict future values based on previously observed values.
https://en.wikipedia.org/wiki/Time_series
Methods which can reuse or transfer information from previously learned tasks for the Learning of new tasks.
Transfer Learning
Methods which can reuse or transfer information from previously learned tasks for the Learning of new tasks.
https://en.wikipedia.org/wiki/Transfer_learning
A transformer is a deep Learning model that adopts the mechanism of attention, differentially weighing the significance of each part of the input data. It is used primarily in the field of natural language processing (NLP) and in computer vision (CV). (https://en.wikipedia.org/wiki/Transformer_(machine_Learning_model))
Transformer Network
A transformer is a deep Learning model that adopts the mechanism of attention, differentially weighing the significance of each part of the input data. It is used primarily in the field of natural language processing (NLP) and in computer vision (CV). (https://en.wikipedia.org/wiki/Transformer_(machine_Learning_model))
https://en.wikipedia.org/wiki/Transformer_(machine_Learning_model)
Arises when predictive algorithms favor groups that are better represented in the training data, since there will be less uncertainty associated with those predictions.
Uncertainty Bias
Arises when predictive algorithms favor groups that are better represented in the training data, since there will be less uncertainty associated with those predictions.
https://doi.org/10.6028/NIST.SP.1270
Unit normalization layer. Normalize a batch of inputs so that each input in the batch has a L2 norm equal to 1 (across the axes specified in axis).
UnitNormalization Layer
Unit normalization layer. Normalize a batch of inputs so that each input in the batch has a L2 norm equal to 1 (across the axes specified in axis).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/UnitNormalization
Methods that simultaneously cluster the rows and columns of an unlabeled input matrix.
Block Clustering
Co-clustering
Joint Clustering
Two-mode Clustering
Two-way Clustering
Unsupervised Biclustering
Methods that simultaneously cluster the rows and columns of an unlabeled input matrix.
https://en.wikipedia.org/wiki/Biclustering
Methods that group a set of objects in such a way that objects without labels in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
Cluster analysis
Unsupervised Clustering
Methods that group a set of objects in such a way that objects without labels in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
https://en.wikipedia.org/wiki/Cluster_analysis
Algorithms that learns patterns from unlabeled data.
Unsupervised Learning
Algorithms that learns patterns from unlabeled data.
https://en.wikipedia.org/wiki/Unsupervised_learning
Unsupervised pre-training initializes a discriminative neural net from one which was trained using an unsupervised criterion, such as a deep belief network or a deep autoencoder. This method can sometimes help with both the optimization and the overfitting issues.
UPN
Unsupervised Pretrained Network
Unsupervised pre-training initializes a discriminative neural net from one which was trained using an unsupervised criterion, such as a deep belief network or a deep autoencoder. This method can sometimes help with both the optimization and the overfitting issues.
https://metacademy.org/graphs/concepts/unsupervised_pre_training#:~:text=Unsupervised%20pre%2Dtraining%20initializes%20a,optimization%20and%20the%20overfitting%20issues
Upsampling layer for 1D inputs. Repeats each temporal step size times along the time axis.
UpSampling1D Layer
Upsampling layer for 1D inputs. Repeats each temporal step size times along the time axis.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/UpSampling1D
Upsampling layer for 2D inputs. Repeats the rows and columns of the data by size[0] and size[1] respectively.
UpSampling2D Layer
Upsampling layer for 2D inputs. Repeats the rows and columns of the data by size[0] and size[1] respectively.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/UpSampling2D
Upsampling layer for 3D inputs.
UpSampling3D Layer
Upsampling layer for 3D inputs.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/UpSampling3D
An information-processing bias, the tendency to inappropriately analyze ambiguous stimuli, scenarios and events.
Interpretive Bias
Use And Interpretation Bias
An information-processing bias, the tendency to inappropriately analyze ambiguous stimuli, scenarios and events.
https://en.wikipedia.org/wiki/Interpretive_bias
Arises when a user imposes their own self-selected biases and behavior during interaction with data, output, results, etc.
User Interaction Bias
Arises when a user imposes their own self-selected biases and behavior during interaction with data, output, results, etc.
https://doi.org/10.6028/NIST.SP.1270
Variational autoencoders are meant to compress the input information into a constrained multivariate latent distribution (encoding) to reconstruct it as accurately as possible (decoding). (https://en.wikipedia.org/wiki/Variational_autoencoder)
VAE
Input, Probabilistic Hidden, Matched Output-Input
Variational Auto Encoder
Abstract wrapper base class. Wrappers take another layer and augment it in various ways. Do not use this class as a layer, it is only an abstract base class. Two usable wrappers are the TimeDistributed and Bidirectional wrappers.
Wrapper Layer
Abstract wrapper base class. Wrappers take another layer and augment it in various ways. Do not use this class as a layer, it is only an abstract base class. Two usable wrappers are the TimeDistributed and Bidirectional wrappers.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Wrapper
Methods where at test time, a learner observes samples from classes, which were not observed during training, and needs to predict the class that they belong to.
ZSL
Zero-shot Learning
Methods where at test time, a learner observes samples from classes, which were not observed during training, and needs to predict the class that they belong to.
https://en.wikipedia.org/wiki/Zero-shot_learning
Zero-padding layer for 1D input (e.g. temporal sequence).
ZeroPadding1D Layer
Zero-padding layer for 1D input (e.g. temporal sequence).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding1D
Zero-padding layer for 2D input (e.g. picture). This layer can add rows and columns of zeros at the top, bottom, left and right side of an image tensor.
ZeroPadding2D Layer
Zero-padding layer for 2D input (e.g. picture). This layer can add rows and columns of zeros at the top, bottom, left and right side of an image tensor.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding2D
Zero-padding layer for 3D data (spatial or spatio-temporal).
ZeroPadding3D Layer
Zero-padding layer for 3D data (spatial or spatio-temporal).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding3D
The exponential linear unit (ELU) with alpha > 0 is: x if x > 0 and alpha * (exp(x) - 1) if x < 0 The ELU hyperparameter alpha controls the value to which an ELU saturates for negative net inputs. ELUs diminish the vanishing gradient effect. ELUs have negative values which pushes the mean of the activations closer to zero. Mean activations that are closer to zero enable faster Learning as they bring the gradient closer to the natural gradient. ELUs saturate to a negative value when the argument gets smaller. Saturation means a small derivative which decreases the variation and the information that is propagated to the next layer.
ELU
Exponential Linear Unit
ELU Function
The exponential linear unit (ELU) with alpha > 0 is: x if x > 0 and alpha * (exp(x) - 1) if x < 0 The ELU hyperparameter alpha controls the value to which an ELU saturates for negative net inputs. ELUs diminish the vanishing gradient effect. ELUs have negative values which pushes the mean of the activations closer to zero. Mean activations that are closer to zero enable faster Learning as they bring the gradient closer to the natural gradient. ELUs saturate to a negative value when the argument gets smaller. Saturation means a small derivative which decreases the variation and the information that is propagated to the next layer.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/elu
In the continuous bag-of-words architecture, the model predicts the current node from a window of surrounding context nodes. The order of context nodes does not influence prediction (bag-of-words assumption).
N2V-CBOW
CBOW
Input, Hidden, Output
node2vec-CBOW
In the continuous bag-of-words architecture, the model predicts the current node from a window of surrounding context nodes. The order of context nodes does not influence prediction (bag-of-words assumption).
https://en.wikipedia.org/wiki/Word2vec
In the continuous skip-gram architecture, the model uses the current node to predict the surrounding window of context nodes. The skip-gram architecture weighs nearby context nodes more heavily than more distant context nodes. (https://en.wikipedia.org/wiki/Word2vec)
N2V-SkipGram
SkipGram
Input, Hidden, Output
node2vec-SkipGram
In the continuous skip-gram architecture, the model uses the current node to predict the surrounding window of context nodes. The skip-gram architecture weighs nearby context nodes more heavily than more distant context nodes. (https://en.wikipedia.org/wiki/Word2vec)
https://en.wikipedia.org/wiki/Word2vec
A statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.
t-SNE
tSNE
t-Distributed Stochastic Neighbor embedding
A statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.
https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding
In the continuous bag-of-words architecture, the model predicts the current word from a window of surrounding context words. The order of context words does not influence prediction (bag-of-words assumption). (https://en.wikipedia.org/wiki/Word2vec)
W2V-CBOW
CBOW
Input, Hidden, Output
word2vec-CBOW
In the continuous bag-of-words architecture, the model predicts the current word from a window of surrounding context words. The order of context words does not influence prediction (bag-of-words assumption). (https://en.wikipedia.org/wiki/Word2vec)
https://en.wikipedia.org/wiki/Word2vec
In the continuous skip-gram architecture, the model uses the current word to predict the surrounding window of context words. The skip-gram architecture weighs nearby context words more heavily than more distant context words.
W2V-SkipGram
SkipGram
Input, Hidden, Output
word2vec-SkipGram
In the continuous skip-gram architecture, the model uses the current word to predict the surrounding window of context words. The skip-gram architecture weighs nearby context words more heavily than more distant context words.
https://en.wikipedia.org/wiki/Word2vec
A statistical phenomenon where the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more other variables. For example, the statistical association or correlation that has been detected between two variables for an entire population disappears or reverses when the population is divided into subgroups.
Simpson's Paradox
Simpon's Paradox Bias
A statistical phenomenon where the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more other variables. For example, the statistical association or correlation that has been detected between two variables for an entire population disappears or reverses when the population is divided into subgroups.
https://doi.org/10.6028/NIST.SP.1270