This ontology models classes and relationships describing deep learning networks, their component layers and activation functions, as well as potential biases. Artificial Intelligence Ontology 2023-09-08 Abstract object representing an RNN cell. This is the base class for implementing RNN cells with custom behavior. AbstractRNNCell Abstract object representing an RNN cell. This is the base class for implementing RNN cells with custom behavior. https://www.tensorflow.org/api_docs/python/tf/keras/layers/AbstractRNNCell Applies an activation function to an output. Activation Layer Applies an activation function to an output. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Activation Methods which can interactively query a user (or some other information source) to label new data points with the desired outputs. Query Learning Active Learning Methods which can interactively query a user (or some other information source) to label new data points with the desired outputs. https://en.wikipedia.org/wiki/Active_learning_(machine_learning) Layer that applies an update to the cost function based input activity. ActivityRegularization Layer Layer that applies an update to the cost function based input activity. https://www.tensorflow.org/api_docs/python/tf/keras/layers/ActivityRegularization A type of selection bias that occurs when systems/platforms get their training data from their most active users, rather than those less active (or inactive). Activity Bias A type of selection bias that occurs when systems/platforms get their training data from their most active users, rather than those less active (or inactive). https://doi.org/10.6028/NIST.SP.1270 Applies a 1D adaptive average pooling over an input signal composed of several input planes. AdaptiveAvgPool1D AdaptiveAvgPool1d AdaptiveAvgPool1D Layer Applies a 1D adaptive average pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies a 2D adaptive average pooling over an input signal composed of several input planes. AdaptiveAvgPool2D AdaptiveAvgPool2d AdaptiveAvgPool2D Layer Applies a 2D adaptive average pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies a 3D adaptive average pooling over an input signal composed of several input planes. AdaptiveAvgPool3D AdaptiveAvgPool3d AdaptiveAvgPool3D Layer Applies a 3D adaptive average pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies a 1D adaptive max pooling over an input signal composed of several input planes. AdaptiveMaxPool1D AdaptiveMaxPool1d AdaptiveMaxPool1D Layer Applies a 1D adaptive max pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies a 2D adaptive max pooling over an input signal composed of several input planes. AdaptiveMaxPool2D AdaptiveMaxPool2d AdaptiveMaxPool2D Layer Applies a 2D adaptive max pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies a 3D adaptive max pooling over an input signal composed of several input planes. AdaptiveMaxPool3D AdaptiveMaxPool3d AdaptiveMaxPool3D Layer Applies a 3D adaptive max pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Layer that adds a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Add Layer Layer that adds a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). https://www.tensorflow.org/api_docs/python/tf/keras/layers/Add Additive attention layer, a.k.a. Bahdanau-style attention. AdditiveAttention Layer Additive attention layer, a.k.a. Bahdanau-style attention. https://www.tensorflow.org/api_docs/python/tf/keras/layers/AdditiveAttention Applies Alpha Dropout to the input. Alpha Dropout is a Dropout that keeps mean and variance of inputs to their original values, in order to ensure the self-normalizing property even after this dropout. Alpha Dropout fits well to Scaled Exponential Linear Units by randomly setting activations to the negative saturation value. AlphaDropout Layer Applies Alpha Dropout to the input. Alpha Dropout is a Dropout that keeps mean and variance of inputs to their original values, in order to ensure the self-normalizing property even after this dropout. Alpha Dropout fits well to Scaled Exponential Linear Units by randomly setting activations to the negative saturation value. https://www.tensorflow.org/api_docs/python/tf/keras/layers/AlphaDropout Arises when the distribution over prediction outputs is skewed in comparison to the prior distribution of the prediction target. Amplification Bias Arises when the distribution over prediction outputs is skewed in comparison to the prior distribution of the prediction target. https://doi.org/10.6028/NIST.SP.1270 A cognitive bias, the influence of a particular reference point or anchor on people’s decisions. Often more fully referred to as anchoring-and-adjustment, or anchoring-and-adjusting: after an anchor is set, people adjust insufficiently from that anchor point to arrive at a final answer. Decision makers are biased towards an initially presented value. Anchoring Bias A cognitive bias, the influence of a particular reference point or anchor on people’s decisions. Often more fully referred to as anchoring-and-adjustment, or anchoring-and-adjusting: after an anchor is set, people adjust insufficiently from that anchor point to arrive at a final answer. Decision makers are biased towards an initially presented value. https://doi.org/10.6028/NIST.SP.1270 When users rely on automation as a heuristic replacement for their own information seeking and processing. A form of individual bias but often discussed as a group bias, or the larger effects on natural language processing models. Annotator Reporting Bias When users rely on automation as a heuristic replacement for their own information seeking and processing. A form of individual bias but often discussed as a group bias, or the larger effects on natural language processing models. https://doi.org/10.6028/NIST.SP.1270 An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons. An artificial neuron receives a signal then processes it and can signal neurons connected to it. The "signal" at a connection is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs. The connections are called edges. Neurons and edges typically have a weight that adjusts as Learning proceeds. The weight increases or decreases the strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times. ANN NN Artificial Neural Network An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons. An artificial neuron receives a signal then processes it and can signal neurons connected to it. The "signal" at a connection is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs. The connections are called edges. Neurons and edges typically have a weight that adjusts as Learning proceeds. The weight increases or decreases the strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times. https://en.wikipedia.org/wiki/Artificial_neural_network A rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. Association Rule Learning A rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. https://en.wikipedia.org/wiki/Association_rule_learning Dot-product attention layer, a.k.a. Luong-style attention. Attention Layer Dot-product attention layer, a.k.a. Luong-style attention. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised Learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”). (https://en.wikipedia.org/wiki/Autoencoder) AE Input, Hidden, Matched Output-Input Auto Encoder Network An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised Learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”). (https://en.wikipedia.org/wiki/Autoencoder) https://en.wikipedia.org/wiki/Autoencoder When humans over-rely on automated systems or have their skills attenuated by such over-reliance (e.g., spelling and autocorrect or spellcheckers). Automation Complaceny Automation Complacency Bias When humans over-rely on automated systems or have their skills attenuated by such over-reliance (e.g., spelling and autocorrect or spellcheckers). https://doi.org/10.6028/NIST.SP.1270 A mental shortcut whereby people tend to overweight what comes easily or quickly to mind, meaning that what is easier to recall—e.g., more “available”—receives greater emphasis in judgement and decision-making. Availability Bias Availability Heuristic Availability Heuristic Bias A mental shortcut whereby people tend to overweight what comes easily or quickly to mind, meaning that what is easier to recall—e.g., more “available”—receives greater emphasis in judgement and decision-making. https://doi.org/10.6028/NIST.SP.1270 Average pooling for temporal data. Downsamples the input representation by taking the average value over the window defined by pool_size. The window is shifted by strides. The resulting output when using "valid" padding option has a shape of: output_shape = (input_shape - pool_size + 1) / strides). The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides. AvgPool1D AvgPool1d AveragePooling1D Layer Average pooling for temporal data. Downsamples the input representation by taking the average value over the window defined by pool_size. The window is shifted by strides. The resulting output when using "valid" padding option has a shape of: output_shape = (input_shape - pool_size + 1) / strides). The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides. https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling1D Average pooling operation for spatial data. Downsamples the input along its spatial dimensions (height and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. The resulting output when using "valid" padding option has a shape (number of rows or columns) of: output_shape = math.floor((input_shape - pool_size) / strides) + 1 (when input_shape >= pool_size). The resulting output shape when using the "same" padding option is: output_shape = math.floor((input_shape - 1) / strides) + 1. AvgPool2D AvgPool2d AveragePooling2D Layer Average pooling operation for spatial data. Downsamples the input along its spatial dimensions (height and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. The resulting output when using "valid" padding option has a shape (number of rows or columns) of: output_shape = math.floor((input_shape - pool_size) / strides) + 1 (when input_shape >= pool_size). The resulting output shape when using the "same" padding option is: output_shape = math.floor((input_shape - 1) / strides) + 1. https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling2D Average pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. AvgPool3D AvgPool3d AveragePooling3D Layer Average pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the average value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling3D Layer that averages a list of inputs element-wise. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Average Layer Layer that averages a list of inputs element-wise. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). https://www.tensorflow.org/api_docs/python/tf/keras/layers/Average Applies a 1D average pooling over an input signal composed of several input planes. AvgPool1D AvgPool1d AvgPool1D Layer Applies a 1D average pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies a 2D average pooling over an input signal composed of several input planes. AvgPool2D AvgPool2d AvgPool2D Layer Applies a 2D average pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies a 3D average pooling over an input signal composed of several input planes. AvgPool3D AvgPool3d AvgPool3D Layer Applies a 3D average pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies Batch Normalization over a 2D or 3D input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . BatchNorm1D BatchNorm1d nn.BatchNorm1d BatchNorm1D Layer Applies Batch Normalization over a 2D or 3D input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . https://pytorch.org/docs/stable/nn.html#normalization-layers Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . BatchNorm2D BatchNorm2d nn.BatchNorm2d BatchNorm2D Layer Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . https://pytorch.org/docs/stable/nn.html#normalization-layers Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . BatchNorm3D BatchNorm3d nn.BatchNorm3d BatchNorm3D Layer Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . https://pytorch.org/docs/stable/nn.html#normalization-layers Layer that normalizes its inputs. Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Importantly, batch normalization works differently during training and during inference. During training (i.e. when using fit() or when calling the layer/model with the argument training=True), the layer normalizes its output using the mean and standard deviation of the current batch of inputs. That is to say, for each channel being normalized, the layer returns gamma * (batch - mean(batch)) / sqrt(var(batch) + epsilon) + beta, where: epsilon is small constant (configurable as part of the constructor arguments), gamma is a learned scaling factor (initialized as 1), which can be disabled by passing scale=False to the constructor. beta is a learned offset factor (initialized as 0), which can be disabled by passing center=False to the constructor. During inference (i.e. when using evaluate() or predict() or when calling the layer/model with the argument training=False (which is the default), the layer normalizes its output using a moving average of the mean and standard deviation of the batches it has seen during training. That is to say, it returns gamma * (batch - self.moving_mean) / sqrt(self.moving_var + epsilon) + beta. self.moving_mean and self.moving_var are non-trainable variables that are updated each time the layer in called in training mode, as such: moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum) moving_var = moving_var * momentum + var(batch) * (1 - momentum). BatchNormalization Layer Layer that normalizes its inputs. Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Importantly, batch normalization works differently during training and during inference. During training (i.e. when using fit() or when calling the layer/model with the argument training=True), the layer normalizes its output using the mean and standard deviation of the current batch of inputs. That is to say, for each channel being normalized, the layer returns gamma * (batch - mean(batch)) / sqrt(var(batch) + epsilon) + beta, where: epsilon is small constant (configurable as part of the constructor arguments), gamma is a learned scaling factor (initialized as 1), which can be disabled by passing scale=False to the constructor. beta is a learned offset factor (initialized as 0), which can be disabled by passing center=False to the constructor. During inference (i.e. when using evaluate() or predict() or when calling the layer/model with the argument training=False (which is the default), the layer normalizes its output using a moving average of the mean and standard deviation of the batches it has seen during training. That is to say, it returns gamma * (batch - self.moving_mean) / sqrt(self.moving_var + epsilon) + beta. self.moving_mean and self.moving_var are non-trainable variables that are updated each time the layer in called in training mode, as such: moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum) moving_var = moving_var * momentum + var(batch) * (1 - momentum). https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization A probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian Network A probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). https://en.wikipedia.org/wiki/Bayesian_network Systematic distortions in user behavior across platforms or contexts, or across users represented in different datasets. Behavioral Bias Systematic distortions in user behavior across platforms or contexts, or across users represented in different datasets. https://doi.org/10.6028/NIST.SP.1270 Systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others. Bias Systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others. https://www.merriam-webster.com/dictionary/bias Methods that simultaneously cluster the rows and columns of a matrix. Block Clustering Co-clustering Joint Clustering Two-mode Clustering Two-way Clustering Biclustering Methods that simultaneously cluster the rows and columns of a matrix. https://en.wikipedia.org/wiki/Biclustering Bidirectional wrapper for RNNs. Bidirectional Layer Bidirectional wrapper for RNNs. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional Methods that classify the elements of a set into two groups (each called class) on the basis of a classification rule. Binary Classification Methods that classify the elements of a set into two groups (each called class) on the basis of a classification rule. https://en.wikipedia.org/wiki/Binary_classification A Boltzmann machine is a type of stochastic recurrent neural network. It is a Markov random field. It was translated from statistical physics for use in cognitive science. The Boltzmann machine is based on a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model that is a stochastic Ising Model[2] and applied to machine Learning. BM Sherrington–Kirkpatrick model with external field stochastic Hopfield network with hidden units stochastic Ising-Lenz-Little model Backfed Input, Probabilistic Hidden Boltzmann Machine Network A Boltzmann machine is a type of stochastic recurrent neural network. It is a Markov random field. It was translated from statistical physics for use in cognitive science. The Boltzmann machine is based on a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model that is a stochastic Ising Model[2] and applied to machine Learning. https://en.wikipedia.org/wiki/Boltzmann_machine A layer that performs categorical data preprocessing operations. Categorical Features Preprocessing Layer A layer that performs categorical data preprocessing operations. https://keras.io/guides/preprocessing_layers/ A preprocessing layer which encodes integer features. This layer provides options for condensing data into a categorical encoding when the total number of tokens are known in advance. It accepts integer values as inputs, and it outputs a dense or sparse representation of those inputs. For integer inputs where the total number of tokens is not known, use tf.keras.layers.IntegerLookup instead. CategoryEncoding Layer A preprocessing layer which encodes integer features. This layer provides options for condensing data into a categorical encoding when the total number of tokens are known in advance. It accepts integer values as inputs, and it outputs a dense or sparse representation of those inputs. For integer inputs where the total number of tokens is not known, use tf.keras.layers.IntegerLookup instead. https://www.tensorflow.org/api_docs/python/tf/keras/layers/CategoryEncoding Probabilistic graphical models used to encode assumptions about the data-generating process. Casaul Bayesian Network Casaul Graph DAG Directed Acyclic Graph Path Diagram Causal Graphical Model Probabilistic graphical models used to encode assumptions about the data-generating process. https://en.wikipedia.org/wiki/Causal_graph A preprocessing layer which crops images. This layers crops the central portion of the images to a target size. If an image is smaller than the target size, it will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. CenterCrop Layer A preprocessing layer which crops images. This layers crops the central portion of the images to a target size. If an image is smaller than the target size, it will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. https://www.tensorflow.org/api_docs/python/tf/keras/layers/CenterCrop Methods that distinguishand distribute kinds of "things" into different groups. Classification Methods that distinguishand distribute kinds of "things" into different groups. https://en.wikipedia.org/wiki/Classification_(general_theory) Methods that group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). Cluster analysis Clustering Methods that group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). https://en.wikipedia.org/wiki/Cluster_analysis A broad term referring generally to a systematic pattern of deviation from rational judgement and decision-making. A large variety of cognitive biases have been identified over many decades of research in judgement and decision-making, some of which are adaptive mental shortcuts known as heuristics. Cognitive Bias A broad term referring generally to a systematic pattern of deviation from rational judgement and decision-making. A large variety of cognitive biases have been identified over many decades of research in judgement and decision-making, some of which are adaptive mental shortcuts known as heuristics. https://doi.org/10.6028/NIST.SP.1270 A systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the estimator chosen, and the ways the data was analyzed. Statistical Bias Computational Bias A systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the estimator chosen, and the ways the data was analyzed. https://en.wikipedia.org/wiki/Bias_(statistics) Layer that concatenates a list of inputs. It takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs. Concatenate Layer Layer that concatenates a list of inputs. It takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Concatenate Use of a system outside the planned domain of application, and a common cause of performance gaps between laboratory settings and the real world. Concept Drift Concept Drift Bias Use of a system outside the planned domain of application, and a common cause of performance gaps between laboratory settings and the real world. https://doi.org/10.6028/NIST.SP.1270 A cognitive bias where people tend to prefer information that aligns with, or confirms, their existing beliefs. People can exhibit confirmation bias in the search for, interpretation of, and recall of information. In the famous Wason selection task experiments, participants repeatedly showed a preference for confirmation over falsification. They were tasked with identifying an underlying rule that applied to number triples they were shown, and they overwhelmingly tested triples that confirmed rather than falsified their hypothesized rule. Confirmation Bias A cognitive bias where people tend to prefer information that aligns with, or confirms, their existing beliefs. People can exhibit confirmation bias in the search for, interpretation of, and recall of information. In the famous Wason selection task experiments, participants repeatedly showed a preference for confirmation over falsification. They were tasked with identifying an underlying rule that applied to number triples they were shown, and they overwhelmingly tested triples that confirmed rather than falsified their hypothesized rule. https://doi.org/10.6028/NIST.SP.1270 Arises when an algorithm or platform provides users with a new venue within which to express their biases, and may occur from either side, or party, in a digital interaction.. Consumer Bias Arises when an algorithm or platform provides users with a new venue within which to express their biases, and may occur from either side, or party, in a digital interaction.. https://doi.org/10.6028/NIST.SP.1270 Arises from structural, lexical, semantic, and syntactic differences in the contents generated by users. Content Production Bias Arises from structural, lexical, semantic, and syntactic differences in the contents generated by users. https://doi.org/10.6028/NIST.SP.1270 A concept to learn a model for a large number of tasks sequentially without forgetting knowledge obtained from the preceding tasks, where the data in the old tasks are not available any more during training new ones. Incremental Learning Life-Long Learning Continual Learning A concept to learn a model for a large number of tasks sequentially without forgetting knowledge obtained from the preceding tasks, where the data in the old tasks are not available any more during training new ones. https://paperswithcode.com/task/continual-learning Learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. Contrastive Learning Learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. https://arxiv.org/abs/2202.14037 1D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional. ConvLSTM1D Layer 1D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional. https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM1D 2D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional. ConvLSTM2D Layer 2D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional. https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM2D 3D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional. ConvLSTM3D Layer 3D Convolutional LSTM. Similar to an LSTM layer, but the input transformations and recurrent transformations are both convolutional. https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM3D Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 3) for data with 128 time steps and 3 channels. Conv1DTranspose Layer ConvTranspose1d Convolution1DTranspose Convolution1dTranspose nn.ConvTranspose1d Convolution1DTranspose Layer Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 3) for data with 128 time steps and 3 channels. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1DTranspose 1D convolution layer (e.g. temporal convolution). Conv1D Layer Conv1d Convolution1D Convolution1d nn.Conv1d Convolution1D Layer 1D convolution layer (e.g. temporal convolution). https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1D Transposed convolution layer (sometimes called Deconvolution). Conv2DTranspose Layer ConvTranspose2d Convolution2DTranspose Convolution2dTranspose nn.ConvTranspose2d Convolution2DTranspose Layer Transposed convolution layer (sometimes called Deconvolution). https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2DTranspose 2D convolution layer (e.g. spatial convolution over images). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last". You can use None when a dimension has variable size. Conv2D Layer Conv2d Convolution2D Convolution2d nn.Conv2d Convolution2D Layer 2D convolution layer (e.g. spatial convolution over images). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last". You can use None when a dimension has variable size. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 3) for a 128x128x128 volume with 3 channels if data_format="channels_last". Conv3DTranspose Layer ConvTranspose3d Convolution3DTranspose Convolution3dTranspose nn.ConvTranspose3d Convolution3DTranspose Layer Transposed convolution layer (sometimes called Deconvolution). The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 3) for a 128x128x128 volume with 3 channels if data_format="channels_last". https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv3DTranspose 3D convolution layer (e.g. spatial convolution over volumes). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 1) for 128x128x128 volumes with a single channel, in data_format="channels_last". Conv3D Layer Conv3d Convolution3D Convolution3d nn.Conv3d Convolution3D Layer 3D convolution layer (e.g. spatial convolution over volumes). This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 128, 1) for 128x128x128 volumes with a single channel, in data_format="channels_last". https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv3D A convolutional layer is the main building block of a CNN. It contains a set of filters (or kernels), parameters of which are to be learned throughout the training. The size of the filters is usually smaller than the actual image. Each filter convolves with the image and creates an activation map. Convolutional Layer A convolutional layer is the main building block of a CNN. It contains a set of filters (or kernels), parameters of which are to be learned throughout the training. The size of the filters is usually smaller than the actual image. Each filter convolves with the image and creates an activation map. https://www.sciencedirect.com/topics/engineering/convolutional-layer#:~:text=A%20convolutional%20layer%20is%20the,and%20creates%20an%20activation%20map. Cropping layer for 1D input (e.g. temporal sequence). It crops along the time dimension (axis 1). Cropping1D Layer Cropping layer for 1D input (e.g. temporal sequence). It crops along the time dimension (axis 1). https://www.tensorflow.org/api_docs/python/tf/keras/layers/Cropping1D Cropping layer for 2D input (e.g. picture). It crops along spatial dimensions, i.e. height and width. Cropping2D Layer Cropping layer for 2D input (e.g. picture). It crops along spatial dimensions, i.e. height and width. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Cropping2D Cropping layer for 3D data (e.g. spatial or spatio-temporal). Cropping3D Layer Cropping layer for 3D data (e.g. spatial or spatio-temporal). https://www.tensorflow.org/api_docs/python/tf/keras/layers/Cropping3D A statistical bias in which testing huge numbers of hypotheses of a dataset may appear to yield statistical significance even when the results are statistically nonsignificant. Data Dredging Data Dredging Bias A statistical bias in which testing huge numbers of hypotheses of a dataset may appear to yield statistical significance even when the results are statistically nonsignificant. https://doi.org/10.6028/NIST.SP.1270 Arises from the addition of synthetic or redundant data samples to a dataset. Data Generation Bias Arises from the addition of synthetic or redundant data samples to a dataset. https://doi.org/10.6028/NIST.SP.1270 Methods that replace missing data with substituted values. Data Imputation Methods that replace missing data with substituted values. https://en.wikipedia.org/wiki/Imputation_(statistics) A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. Decision Tree A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. https://en.wikipedia.org/wiki/Decision_tree In the decoder-only architecture, the model consists of only a decoder, which is trained to predict the next token in a sequence given the previous tokens. The critical difference between the Decoder-only architecture and the Encoder-Decoder architecture is that the Decoder-only architecture does not have an explicit encoder to summarize the input information. Instead, the information is encoded implicitly in the hidden state of the decoder, which is updated at each step of the generation process. LLM Decoder LLM In the decoder-only architecture, the model consists of only a decoder, which is trained to predict the next token in a sequence given the previous tokens. The critical difference between the Decoder-only architecture and the Encoder-Decoder architecture is that the Decoder-only architecture does not have an explicit encoder to summarize the input information. Instead, the information is encoded implicitly in the hidden state of the decoder, which is updated at each step of the generation process. https://www.practicalai.io/understanding-transformer-model-architectures/#:~:text=Encoder%2Donly&text=These%20models%20have%20a%20pre,Named%20entity%20recognition Deconvolutional Networks, a framework that permits the unsupervised construction of hierarchical image representations. These representations can be used for both low-level tasks such as denoising, as well as providing features for object recognition. Each level of the hierarchy groups information from the level beneath to form more complex features that exist over a larger scale in the image. (https://ieeexplore.ieee.org/document/5539957) DN Input, Kernel, Convolutional/Pool, Output Deconvolutional Network Deconvolutional Networks, a framework that permits the unsupervised construction of hierarchical image representations. These representations can be used for both low-level tasks such as denoising, as well as providing features for object recognition. Each level of the hierarchy groups information from the level beneath to form more complex features that exist over a larger scale in the image. (https://ieeexplore.ieee.org/document/5539957) https://ieeexplore.ieee.org/document/5539957 The combination of deep learning and active learning, where active learning attempts to maximize a model’s performance gain while annotating the fewest samples possible. DeepAL Deep Active Learning The combination of deep learning and active learning, where active learning attempts to maximize a model’s performance gain while annotating the fewest samples possible. https://arxiv.org/pdf/2009.00236.pdf In machine Learning, a deep belief network (DBN) is a generative graphical model, or alternatively a class of deep neural network, composed of multiple layers of latent variables ("hidden units"), with connections between the layers but not between units within each layer. When trained on a set of examples without supervision, a DBN can learn to probabilistically reconstruct its inputs. The layers then act as feature detectors. After this Learning step, a DBN can be further trained with supervision to perform classification. DBNs can be viewed as a composition of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, where each sub-network's hidden layer serves as the visible layer for the next. An RBM is an undirected, generative energy-based model with a "visible" input layer and a hidden layer and connections between but not within layers. This composition leads to a fast, layer-by-layer unsupervised training procedure, where contrastive divergence is applied to each sub-network in turn, starting from the "lowest" pair of layers (the lowest visible layer is a training set). The observation that DBNs can be trained greedily, one layer at a time, led to one of the first effective deep Learning algorithms. (https://en.wikipedia.org/wiki/Deep_belief_network) DBN Backfed Input, Probabilistic Hidden, Hidden, Matched Output-Input Deep Belief Network In machine Learning, a deep belief network (DBN) is a generative graphical model, or alternatively a class of deep neural network, composed of multiple layers of latent variables ("hidden units"), with connections between the layers but not between units within each layer. When trained on a set of examples without supervision, a DBN can learn to probabilistically reconstruct its inputs. The layers then act as feature detectors. After this Learning step, a DBN can be further trained with supervision to perform classification. DBNs can be viewed as a composition of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, where each sub-network's hidden layer serves as the visible layer for the next. An RBM is an undirected, generative energy-based model with a "visible" input layer and a hidden layer and connections between but not within layers. This composition leads to a fast, layer-by-layer unsupervised training procedure, where contrastive divergence is applied to each sub-network in turn, starting from the "lowest" pair of layers (the lowest visible layer is a training set). The observation that DBNs can be trained greedily, one layer at a time, led to one of the first effective deep Learning algorithms. (https://en.wikipedia.org/wiki/Deep_belief_network) https://en.wikipedia.org/wiki/Deep_belief_network A Deep Convolution Inverse Graphics Network (DC-IGN) is a model that learns an interpretable representation of images. This representation is disentangled with respect to transformations such as out-of-plane rotations and lighting variations. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm. (https://arxiv.org/abs/1503.03167) DCIGN Input, Kernel, Convolutional/Pool, Probabilistic Hidden, Convolutional/Pool, Kernel, Output Deep Convolutional Inverse Graphics Network A convolutional neural network (CNN, or ConvNet) is a class of artificial neural network, most commonly applied to analyze visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation equivariant responses known as feature maps. CNNs are regularized versions of multilayer perceptrons. (https://en.wikipedia.org/wiki/Convolutional_neural_network) CNN ConvNet Convolutional Neural Network DCN Input, Kernel, Convolutional/Pool, Hidden, Output Deep Convolutional Network A convolutional neural network (CNN, or ConvNet) is a class of artificial neural network, most commonly applied to analyze visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation equivariant responses known as feature maps. CNNs are regularized versions of multilayer perceptrons. (https://en.wikipedia.org/wiki/Convolutional_neural_network) https://en.wikipedia.org/wiki/Convolutional_neural_network The feedforward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction—forward—from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network. DFF FFN Feedforward Network MLP Multilayer Perceptoron Input, Hidden, Output Deep FeedFoward The feedforward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction—forward—from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network. https://en.wikipedia.org/wiki/Feedforward_neural_network A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers.[13][2] There are different types of neural networks but they always consist of the same components: neurons, synapses, weights, biases, and functions. (https://en.wikipedia.org/wiki/Deep_Learning#:~:text=A%20deep%20neural%20network%20(DNN,weights%2C%20biases%2C%20and%20functions.) DNN Deep Neural Network Deep transfer learning methods relax the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. Deep Transfer Learning Deep transfer learning methods relax the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. https://arxiv.org/abs/1808.01974 Denoising Auto Encoders (DAEs) take a partially corrupted input and are trained to recover the original undistorted input. In practice, the objective of denoising autoencoders is that of cleaning the corrupted input, or denoising. (https://en.wikipedia.org/wiki/Autoencoder) DAE Noisy Input, Hidden, Matched Output-Input Denoising Auto Encoder A layer that produces a dense Tensor based on given feature_columns. Generally a single example in training data is described with FeatureColumns. At the first layer of the model, this column oriented data should be converted to a single Tensor. This layer can be called multiple times with different features. This is the V2 version of this layer that uses name_scopes to create variables instead of variable_scopes. But this approach currently lacks support for partitioned variables. In that case, use the V1 version instead. DenseFeatures Layer A layer that produces a dense Tensor based on given feature_columns. Generally a single example in training data is described with FeatureColumns. At the first layer of the model, this column oriented data should be converted to a single Tensor. This layer can be called multiple times with different features. This is the V2 version of this layer that uses name_scopes to create variables instead of variable_scopes. But this approach currently lacks support for partitioned variables. In that case, use the V1 version instead. https://www.tensorflow.org/api_docs/python/tf/keras/layers/DenseFeatures Just your regular densely-connected NN layer. Dense Layer Just your regular densely-connected NN layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense Arises when systems are used as decision aids for humans, since the human intermediary may act on predictions in ways that are typically not modeled in the system. However, it is still individuals using the deployed system. Deployment Bias Arises when systems are used as decision aids for humans, since the human intermediary may act on predictions in ways that are typically not modeled in the system. However, it is still individuals using the deployed system. https://doi.org/10.6028/NIST.SP.1270 Depthwise 1D convolution. Depthwise convolution is a type of convolution in which each input channel is convolved with a different kernel (called a depthwise kernel). You can understand depthwise convolution as the first step in a depthwise separable convolution. It is implemented via the following steps: Split the input into individual channels. Convolve each channel with an individual depthwise kernel with depth_multiplier output channels. Concatenate the convolved outputs along the channels axis. Unlike a regular 1D convolution, depthwise convolution does not mix information across different input channels. The depth_multiplier argument determines how many filter are applied to one input channel. As such, it controls the amount of output channels that are generated per input channel in the depthwise step. DepthwiseConv1D Layer Depthwise 1D convolution. Depthwise convolution is a type of convolution in which each input channel is convolved with a different kernel (called a depthwise kernel). You can understand depthwise convolution as the first step in a depthwise separable convolution. It is implemented via the following steps: Split the input into individual channels. Convolve each channel with an individual depthwise kernel with depth_multiplier output channels. Concatenate the convolved outputs along the channels axis. Unlike a regular 1D convolution, depthwise convolution does not mix information across different input channels. The depth_multiplier argument determines how many filter are applied to one input channel. As such, it controls the amount of output channels that are generated per input channel in the depthwise step. https://www.tensorflow.org/api_docs/python/tf/keras/layers/DepthwiseConv1D Depthwise 2D convolution. DepthwiseConv2D Layer Depthwise 2D convolution. https://www.tensorflow.org/api_docs/python/tf/keras/layers/DepthwiseConv2D Systematic differences between groups in how outcomes are determined and may cause an over- or underestimation of the size of the effect. Detection Bias Systematic differences between groups in how outcomes are determined and may cause an over- or underestimation of the size of the effect. https://doi.org/10.6028/NIST.SP.1270 The transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Dimension Reduction Dimensionality Reduction The transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. https://en.wikipedia.org/wiki/Dimensionality_reduction A preprocessing layer which buckets continuous features by ranges. Discretization Layer A preprocessing layer which buckets continuous features by ranges. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Discretization Layer that computes a dot product between samples in two tensors. E.g. if applied to a list of two tensors a and b of shape (batch_size, n), the output will be a tensor of shape (batch_size, 1) where each entry i will be the dot product between a[i] and b[i]. Dot Layer Layer that computes a dot product between samples in two tensors. E.g. if applied to a list of two tensors a and b of shape (batch_size, n), the output will be a tensor of shape (batch_size, 1) where each entry i will be the dot product between a[i] and b[i]. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dot Applies Dropout to the input. The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged. Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer. (This is in contrast to setting trainable=False for a Dropout layer. trainable does not affect the layer's behavior, as Dropout does not have any variables/weights that can be frozen during training.) Dropout Layer Applies Dropout to the input. The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged. Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer. (This is in contrast to setting trainable=False for a Dropout layer. trainable does not affect the layer's behavior, as Dropout does not have any variables/weights that can be frozen during training.) https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout The tendency of people with low ability in a given area or task to overestimate their self-assessed ability. Typically measured by comparing self-assessment with objective performance, often called subjective ability and objective ability, respectively. Dunning-Kruger Effect Dunning-Kruger Effect Bias The tendency of people with low ability in a given area or task to overestimate their self-assessed ability. Typically measured by comparing self-assessment with objective performance, often called subjective ability and objective ability, respectively. https://doi.org/10.6028/NIST.SP.1270 Exponential Linear Unit. ELU Layer Exponential Linear Unit. https://www.tensorflow.org/api_docs/python/tf/keras/layers/ELU The echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behaviour is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system. ESN Input, Recurrent, Output Echo State Network The echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behaviour is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system. https://en.wikipedia.org/wiki/Echo_state_network#:~:text=The%20echo%20state%20network%20(ESN,are%20fixed%20and%20randomly%20assigned Occurs when an inference is made about an individual based on their membership within a group. Ecological Fallacy Ecological Fallacy Bias Occurs when an inference is made about an individual based on their membership within a group. https://doi.org/10.6028/NIST.SP.1270 Turns positive integers (indexes) into dense vectors of fixed size. Embedding Layer Turns positive integers (indexes) into dense vectors of fixed size. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding Emergent bias is the result of the use and reliance on algorithms across new or unanticipated contexts. Emergent Bias Emergent bias is the result of the use and reliance on algorithms across new or unanticipated contexts. https://doi.org/10.6028/NIST.SP.1270 The Encoder-Decoder architecture was the original transformer architecture introduced in the Attention Is All You Need (https://arxiv.org/abs/1706.03762) paper. The encoder processes the input sequence and generates a hidden representation that summarizes the input information. The decoder uses this hidden representation to generate the desired output sequence. The encoder and decoder are trained end-to-end to maximize the likelihood of the correct output sequence given the input sequence. LLM Encoder-Decoder LLM The Encoder-Decoder architecture was the original transformer architecture introduced in the Attention Is All You Need (https://arxiv.org/abs/1706.03762) paper. The encoder processes the input sequence and generates a hidden representation that summarizes the input information. The decoder uses this hidden representation to generate the desired output sequence. The encoder and decoder are trained end-to-end to maximize the likelihood of the correct output sequence given the input sequence. https://www.practicalai.io/understanding-transformer-model-architectures/#:~:text=Encoder%2Donly&text=These%20models%20have%20a%20pre,Named%20entity%20recognition The Encoder-only architecture is used when only encoding the input sequence is required and the decoder is not necessary. The input sequence is encoded into a fixed-length representation and then used as input to a classifier or a regressor to make a prediction. These models have a pre-trained general-purpose encoder but will require fine-tuning of the final classifier or regressor. LLM Encoder LLM The Encoder-only architecture is used when only encoding the input sequence is required and the decoder is not necessary. The input sequence is encoded into a fixed-length representation and then used as input to a classifier or a regressor to make a prediction. These models have a pre-trained general-purpose encoder but will require fine-tuning of the final classifier or regressor. https://www.practicalai.io/understanding-transformer-model-architectures/#:~:text=Encoder%2Donly&text=These%20models%20have%20a%20pre,Named%20entity%20recognition Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Ensemble Learning Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. https://en.wikipedia.org/wiki/Ensemble_learning The effect of variables' uncertainties (or errors, more specifically random errors) on the uncertainty of a function based on them. Error Propagation Error Propagation Bias The effect of variables' uncertainties (or errors, more specifically random errors) on the uncertainty of a function based on them. https://doi.org/10.6028/NIST.SP.1270 Arises when the testing or external benchmark populations do not equally represent the various parts of the user population or from the use of performance metrics that are not appropriate for the way in which the model will be used. Evaluation Bias Arises when the testing or external benchmark populations do not equally represent the various parts of the user population or from the use of performance metrics that are not appropriate for the way in which the model will be used. https://doi.org/10.6028/NIST.SP.1270 When specific groups of user populations are excluded from testing and subsequent analyses. Exclusion Bias When specific groups of user populations are excluded from testing and subsequent analyses. https://doi.org/10.6028/NIST.SP.1270 The exponential function is a mathematical function denoted by f(x)=exp or e^{x}. Exponential Function The exponential function is a mathematical function denoted by f(x)=exp or e^{x}. https://www.tensorflow.org/api_docs/python/tf/keras/activations/exponential Extreme Learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature Learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes (not just the weights connecting inputs to hidden nodes) need not be tuned. These hidden nodes can be randomly assigned and never updated (i.e. they are random projection but with nonlinear transforms), or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to Learning a linear model. (https://en.wikipedia.org/wiki/Extreme_Learning_machine) ELM Input, Hidden, Output Extreme Learning Machine Extreme Learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature Learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes (not just the weights connecting inputs to hidden nodes) need not be tuned. These hidden nodes can be randomly assigned and never updated (i.e. they are random projection but with nonlinear transforms), or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to Learning a linear model. (https://en.wikipedia.org/wiki/Extreme_Learning_machine) https://en.wikipedia.org/wiki/Extreme_Learning_machine A technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them. Federated Learning A technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them. https://en.wikipedia.org/wiki/Federated_learning Effects that may occur when an algorithm learns from user behavior and feeds that behavior back into the model. Feedback Loop Bias Effects that may occur when an algorithm learns from user behavior and feeds that behavior back into the model. https://doi.org/10.6028/NIST.SP.1270 A feedback based approach in which the representation is formed in an iterative manner based on a feedback received from previous iteration's output. (https://arxiv.org/abs/1612.09508) FBN Input, Hidden, Output, Hidden Feedback Network A statistical model in which the model parameters are fixed or non-random quantities. FEM Fixed Effects Model A statistical model in which the model parameters are fixed or non-random quantities. https://en.wikipedia.org/wiki/Fixed_effects_model Flattens the input. Does not affect the batch size. Flatten Layer Flattens the input. Does not affect the batch size. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten Applies a 2D fractional max pooling over an input signal composed of several input planes. FractionalMaxPool2D FractionalMaxPool2d FractionalMaxPool2D Layer Applies a 2D fractional max pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies a 3D fractional max pooling over an input signal composed of several input planes. FractionalMaxPool3D FractionalMaxPool3d FractionalMaxPool3D Layer Applies a 3D fractional max pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Arises when biased results are reported in order to support or satisfy the funding agency or financial supporter of the research study, but it can also be the individual researcher. Funding Bias Arises when biased results are reported in order to support or satisfy the funding agency or financial supporter of the research study, but it can also be the individual researcher. https://doi.org/10.6028/NIST.SP.1270 Cell class for the GRU layer. This class processes one step within the whole time sequence input, whereas tf.keras.layer.GRU processes the whole sequence. GRUCell Layer Cell class for the GRU layer. This class processes one step within the whole time sequence input, whereas tf.keras.layer.GRU processes the whole sequence. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRUCell Gated Recurrent Unit - Cho et al. 2014. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: activation == tanh, recurrent_activation == sigmoid, recurrent_dropout == 0, unroll is False, use_bias is True, reset_after is True. Inputs, if use masking, are strictly right-padded. Eager execution is enabled in the outermost context. There are two variants of the GRU implementation. The default one is based on v3 and has reset gate applied to hidden state before matrix multiplication. The other one is based on original and has the order reversed. The second variant is compatible with CuDNNGRU (GPU-only) and allows inference on CPU. Thus it has separate biases for kernel and recurrent_kernel. To use this variant, set reset_after=True and recurrent_activation='sigmoid'. GRU Layer Gated Recurrent Unit - Cho et al. 2014. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: activation == tanh, recurrent_activation == sigmoid, recurrent_dropout == 0, unroll is False, use_bias is True, reset_after is True. Inputs, if use masking, are strictly right-padded. Eager execution is enabled in the outermost context. There are two variants of the GRU implementation. The default one is based on v3 and has reset gate applied to hidden state before matrix multiplication. The other one is based on original and has the order reversed. The second variant is compatible with CuDNNGRU (GPU-only) and allows inference on CPU. Thus it has separate biases for kernel and recurrent_kernel. To use this variant, set reset_after=True and recurrent_activation='sigmoid'. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a forget gate, but has fewer parameters than LSTM, as it lacks an output gate. GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM.[4][5] GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets. GRU Input, Memory Cell, Output Gated Recurrent Unit Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term memory (LSTM) with a forget gate, but has fewer parameters than LSTM, as it lacks an output gate. GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM.[4][5] GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets. https://en.wikipedia.org/wiki/Gated_recurrent_unit Apply multiplicative 1-centered Gaussian noise. As it is a regularization layer, it is only active at training time. GaussianDropout Layer Apply multiplicative 1-centered Gaussian noise. As it is a regularization layer, it is only active at training time. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GaussianDropout Apply additive zero-centered Gaussian noise. This is useful to mitigate overfitting (you could see it as a form of random data augmentation). Gaussian Noise (GS) is a natural choice as corruption process for real valued inputs. As it is a regularization layer, it is only active at training time. GaussianNoise Layer Apply additive zero-centered Gaussian noise. This is useful to mitigate overfitting (you could see it as a form of random data augmentation). Gaussian Noise (GS) is a natural choice as corruption process for real valued inputs. As it is a regularization layer, it is only active at training time. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GaussianNoise Gaussian error linear unit (GELU) computes x * P(X <= x), where P(X) ~ N(0, 1). The (GELU) nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLU. GELU Gaussian Error Linear Unit GELU Function Gaussian error linear unit (GELU) computes x * P(X <= x), where P(X) ~ N(0, 1). The (GELU) nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLU. https://www.tensorflow.org/api_docs/python/tf/keras/activations/gelu Methods that can learn novel classes from only few samples per class, preventing catastrophic forgetting of base classes, and classifier calibration across novel and base classes. GFSL Generalized Few-shot Learning Methods that can learn novel classes from only few samples per class, preventing catastrophic forgetting of base classes, and classifier calibration across novel and base classes. https://paperswithcode.com/paper/generalized-and-incremental-few-shot-learning/review/ This model generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. GLM Generalized Linear Model This model generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. https://en.wikipedia.org/wiki/Generalized_linear_model A generative adversarial network (GAN) is a class of machine Learning frameworks designed by Ian Goodfellow and his colleagues in 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss). Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised Learning, GANs have also proven useful for semi-supervised Learning, fully supervised Learning,[ and reinforcement Learning. The core idea of a GAN is based on the "indirect" training through the discriminator,[clarification needed] which itself is also being updated dynamically. This basically means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner. GAN Backfed Input, Hidden, Matched Output-Input, Hidden, Matched Output-Input Generative Adversarial Network A generative adversarial network (GAN) is a class of machine Learning frameworks designed by Ian Goodfellow and his colleagues in 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss). Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised Learning, GANs have also proven useful for semi-supervised Learning, fully supervised Learning,[ and reinforcement Learning. The core idea of a GAN is based on the "indirect" training through the discriminator,[clarification needed] which itself is also being updated dynamically. This basically means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner. https://en.wikipedia.org/wiki/Generative_adversarial_network Global average pooling operation for temporal data. GlobalAvgPool1D GlobalAvgPool1d GlobalAveragePooling1D Layer Global average pooling operation for temporal data. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling1D Global average pooling operation for spatial data. GlobalAvgPool2D GlobalAvgPool2d GlobalAveragePooling2D Layer Global average pooling operation for spatial data. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling2D Global Average pooling operation for 3D data. GlobalAvgPool3D GlobalAvgPool3d GlobalAveragePooling3D Layer Global Average pooling operation for 3D data. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling3D Global max pooling operation for 1D temporal data. GlobalMaxPool1D GlobalMaxPool1d GlobalMaxPooling1D Layer Global max pooling operation for 1D temporal data. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalMaxPool1D Global max pooling operation for spatial data. GlobalMaxPool2D GlobalMaxPool2d GlobalMaxPooling2D Layer Global max pooling operation for spatial data. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalMaxPool2D Global Max pooling operation for 3D data. GlobalMaxPool3D GlobalMaxPool3d GlobalMaxPooling3D Layer Global Max pooling operation for 3D data. https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalMaxPool3D GCN is a type of convolutional neural network that can work directly on graphs and take advantage of their structural information. (https://arxiv.org/abs/1609.02907) GCN Input, Hidden, Hidden, Output Graph Convolutional Network GCN is a type of convolutional neural network that can work directly on graphs and take advantage of their structural information. (https://arxiv.org/abs/1609.02907) https://arxiv.org/abs/1609.02907 Graph Convolutional Policy Network (GCPN), a general graph convolutional network based model for goal-directed graph generation through reinforcement Learning. The model is trained to optimize domain-specific rewards and adversarial loss through policy gradient, and acts in an environment that incorporates domain-specific rules. GPCN Input, Hidden, Hidden, Policy, Output Graph Convolutional Policy Network Graph Convolutional Policy Network (GCPN), a general graph convolutional network based model for goal-directed graph generation through reinforcement Learning. The model is trained to optimize domain-specific rewards and adversarial loss through policy gradient, and acts in an environment that incorporates domain-specific rules. https://arxiv.org/abs/1806.02473 Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization GroupNorm nn.GroupNorm GroupNorm Layer Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization https://pytorch.org/docs/stable/nn.html#normalization-layers A pattern of favoring members of one's in-group over out-group members. This can be expressed in evaluation of others, in allocation of resources, and in many other ways. In-group Favoritism In-group bias In-group preference In-group–out-group Bias Intergroup bias Group Bias A pattern of favoring members of one's in-group over out-group members. This can be expressed in evaluation of others, in allocation of resources, and in many other ways. https://en.wikipedia.org/wiki/In-group_favoritism A psychological phenomenon that occurs when people in a group tend to make non-optimal decisions based on their desire to conform to the group, or fear of dissenting with the group. In groupthink, individuals often refrain from expressing their personal disagreement with the group, hesitating to voice opinions that do not align with the group. Groupthink Groupthink Bias A psychological phenomenon that occurs when people in a group tend to make non-optimal decisions based on their desire to conform to the group, or fear of dissenting with the group. In groupthink, individuals often refrain from expressing their personal disagreement with the group, hesitating to voice opinions that do not align with the group. https://doi.org/10.6028/NIST.SP.1270 A faster approximation of the sigmoid activation. Piecewise linear approximation of the sigmoid function. Ref: 'https://en.wikipedia.org/wiki/Hard_sigmoid' Hard Sigmoid Function A faster approximation of the sigmoid activation. Piecewise linear approximation of the sigmoid function. Ref: 'https://en.wikipedia.org/wiki/Hard_sigmoid' https://www.tensorflow.org/api_docs/python/tf/keras/activations/hard_sigmoid A preprocessing layer which hashes and bins categorical features. This layer transforms categorical inputs to hashed output. It element-wise converts a ints or strings to ints in a fixed range. The stable hash function uses tensorflow::ops::Fingerprint to produce the same output consistently across all platforms. This layer uses FarmHash64 by default, which provides a consistent hashed output across different platforms and is stable across invocations, regardless of device and context, by mixing the input bits thoroughly. If you want to obfuscate the hashed output, you can also pass a random salt argument in the constructor. In that case, the layer will use the SipHash64 hash function, with the salt value serving as additional input to the hash function. Hashing Layer A preprocessing layer which hashes and bins categorical features. This layer transforms categorical inputs to hashed output. It element-wise converts a ints or strings to ints in a fixed range. The stable hash function uses tensorflow::ops::Fingerprint to produce the same output consistently across all platforms. This layer uses FarmHash64 by default, which provides a consistent hashed output across different platforms and is stable across invocations, regardless of device and context, by mixing the input bits thoroughly. If you want to obfuscate the hashed output, you can also pass a random salt argument in the constructor. In that case, the layer will use the SipHash64 hash function, with the salt value serving as additional input to the hash function. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Hashing A hidden layer is located between the input and output of the algorithm, in which the function applies weights to the inputs and directs them through an activation function as the output. In short, the hidden layers perform nonlinear transformations of the inputs entered into the network. Hidden layers vary depending on the function of the neural network, and similarly, the layers may vary depending on their associated weights. Hidden Layer A hidden layer is located between the input and output of the algorithm, in which the function applies weights to the inputs and directs them through an activation function as the output. In short, the hidden layers perform nonlinear transformations of the inputs entered into the network. Hidden layers vary depending on the function of the neural network, and similarly, the layers may vary depending on their associated weights. https://deepai.org/machine-Learning-glossary-and-terms/hidden-layer-machine-Learning Methods that group things according to a hierarchy. Hierarchical Classification Methods that group things according to a hierarchy. https://en.wikipedia.org/wiki/Hierarchical_classification Methods that seek to build a hierarchy of clusters. HCL Hierarchical Clustering Methods that seek to build a hierarchy of clusters. https://en.wikipedia.org/wiki/Hierarchical_clustering Referring to the long-standing biases encoded in society over time. Related to, but distinct from, biases in historical description, or the interpretation, analysis, and explanation of history. A common example of historical bias is the tendency to view the larger world from a Western or European view. Historical Bias Referring to the long-standing biases encoded in society over time. Related to, but distinct from, biases in historical description, or the interpretation, analysis, and explanation of history. A common example of historical bias is the tendency to view the larger world from a Western or European view. https://doi.org/10.6028/NIST.SP.1270 A Hopfield network is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 based on Ernst Ising's work with Wilhelm Lenz on the Ising model. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory. (https://en.wikipedia.org/wiki/Hopfield_network) HN Ising model of a neural network Ising–Lenz–Little model Backfed input Hopfield Network A Hopfield network is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 based on Ernst Ising's work with Wilhelm Lenz on the Ising model. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory. (https://en.wikipedia.org/wiki/Hopfield_network) https://en.wikipedia.org/wiki/Hopfield_network A bias wherein individuals perceive benign or ambiguous behaviors as hostile. Hostile Attribution Bias A bias wherein individuals perceive benign or ambiguous behaviors as hostile. https://en.wikipedia.org/wiki/Interpretive_bias Systematic errors in human thought based on a limited number of heuristic principles and predicting values to simpler judgmental operations. Human Bias Systematic errors in human thought based on a limited number of heuristic principles and predicting values to simpler judgmental operations. https://doi.org/10.6028/NIST.SP.1270 When users rely on automation as a heuristic replacement for their own information seeking and processing. Human Reporting Bias When users rely on automation as a heuristic replacement for their own information seeking and processing. https://doi.org/10.6028/NIST.SP.1270 A layer that performs image data preprocessing augmentations. Image Augmentation Layer A layer that performs image data preprocessing augmentations. https://keras.io/guides/preprocessing_layers/ A layer that performs image data preprocessing operations. Image Preprocessing Layer A layer that performs image data preprocessing operations. https://keras.io/guides/preprocessing_layers/ An unconscious belief, attitude, feeling, association, or stereotype that can affect the way in which humans process information, make decisions, and take actions. Confirmatory Bias Implicit Bias An unconscious belief, attitude, feeling, association, or stereotype that can affect the way in which humans process information, make decisions, and take actions. https://doi.org/10.6028/NIST.SP.1270 Methods that train a network on a base set of classes and then is presented several novel classes, each with only a few labeled examples. IFSL Incremenetal Few-shot Learning Methods that train a network on a base set of classes and then is presented several novel classes, each with only a few labeled examples. https://arxiv.org/abs/1810.07218 Individual bias is a persistent point of view or limited list of such points of view that one applies ("parent", "academic", "professional", or etc.). Individual Bias Individual bias is a persistent point of view or limited list of such points of view that one applies ("parent", "academic", "professional", or etc.). https://develop.consumerium.org/wiki/Individual_bias Arises when applications that are built with machine Learning are used to generate inputs for other machine Learning algorithms. If the output is biased in any way, this bias may be inherited by systems using the output as input to learn other models. Inherited Bias Arises when applications that are built with machine Learning are used to generate inputs for other machine Learning algorithms. If the output is biased in any way, this bias may be inherited by systems using the output as input to learn other models. https://doi.org/10.6028/NIST.SP.1270 Layer to be used as an entry point into a Network (a graph of layers). InputLayer Layer Layer to be used as an entry point into a Network (a graph of layers). https://www.tensorflow.org/api_docs/python/tf/keras/layers/InputLayer Specifies the rank, dtype and shape of every input to a layer. Layers can expose (if appropriate) an input_spec attribute: an instance of InputSpec, or a nested structure of InputSpec instances (one per input tensor). These objects enable the layer to run input compatibility checks for input structure, input rank, input shape, and input dtype. A None entry in a shape is compatible with any dimension, a None shape is compatible with any shape. InputSpec Layer Specifies the rank, dtype and shape of every input to a layer. Layers can expose (if appropriate) an input_spec attribute: an instance of InputSpec, or a nested structure of InputSpec instances (one per input tensor). These objects enable the layer to run input compatibility checks for input structure, input rank, input shape, and input dtype. A None entry in a shape is compatible with any dimension, a None shape is compatible with any shape. https://www.tensorflow.org/api_docs/python/tf/keras/layers/InputSpec The input layer of a neural network is composed of artificial input neurons, and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The input layer is the very beginning of the workflow for the artificial neural network. Input Layer The input layer of a neural network is composed of artificial input neurons, and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The input layer is the very beginning of the workflow for the artificial neural network. https://www.techopedia.com/definition/33262/input-layer-neural-networks#:~:text=Explains%20Input%20Layer-,What%20Does%20Input%20Layer%20Mean%3F,for%20the%20artificial%20neural%20network. Applies Instance Normalization over a 2D (unbatched) or 3D (batched) input as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. InstanceNorm1D InstanceNorm1d nn.InstanceNorm1d InstanceNorm1d Layer Applies Instance Normalization over a 2D (unbatched) or 3D (batched) input as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. https://pytorch.org/docs/stable/nn.html#normalization-layers Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. InstanceNorm2D InstanceNorm2d nn.InstanceNorm2d InstanceNorm2d Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. https://pytorch.org/docs/stable/nn.html#normalization-layers Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. InstanceNorm3D InstanceNorm3d nn.InstanceNorm3d InstanceNorm3d Layer Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization. https://pytorch.org/docs/stable/nn.html#normalization-layers In contrast to biases exhibited at the level of individual persons, institutional bias refers to a tendency exhibited at the level of entire institutions, where practices or norms result in the favoring or disadvantaging of certain social groups. Common examples include institutional racism and institutional sexism. Institutional Bias In contrast to biases exhibited at the level of individual persons, institutional bias refers to a tendency exhibited at the level of entire institutions, where practices or norms result in the favoring or disadvantaging of certain social groups. Common examples include institutional racism and institutional sexism. https://doi.org/10.6028/NIST.SP.1270 A preprocessing layer which maps integer features to contiguous ranges. IntegerLookup Layer A preprocessing layer which maps integer features to contiguous ranges. https://www.tensorflow.org/api_docs/python/tf/keras/layers/IntegerLookup A form of information processing bias that can occur when users interpret algorithmic outputs according to their internalized biases and views. Interpretation Bias A form of information processing bias that can occur when users interpret algorithmic outputs according to their internalized biases and views. https://doi.org/10.6028/NIST.SP.1270 An algorithm to group objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors K-NN KNN K-nearest Neighbor Algorithm An algorithm to group objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm An algorithm to classify objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors K-NN KNN K-nearest Neighbor Classification Algorithm An algorithm to classify objects by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm An algorithm to assign the average of the values of k nearest neighbors to objects. K-NN KNN K-nearest Neighbor Regression Algorithm An algorithm to assign the average of the values of k nearest neighbors to objects. https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine Learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the topological structure of the data. For example, a data set with p variables measured in n observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze. An SOM is a type of artificial neural network but is trained using competitive Learning rather than the error-correction Learning (e.g., backpropagation with gradient descent) used by other artificial neural networks. The SOM was introduced by the Finnish professor Teuvo Kohonen in the 1980s and therefore is sometimes called a Kohonen map or Kohonen network.[1][2] The Kohonen map or network is a computationally convenient abstraction building on biological models of neural systems from the 1970s[3] and morphogenesis models dating back to Alan Turing in the 1950s. KN SOFM SOM Self-Organizing Feature Map Self-Organizing Map Input, Hidden Kohonen Network A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine Learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the topological structure of the data. For example, a data set with p variables measured in n observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze. An SOM is a type of artificial neural network but is trained using competitive Learning rather than the error-correction Learning (e.g., backpropagation with gradient descent) used by other artificial neural networks. The SOM was introduced by the Finnish professor Teuvo Kohonen in the 1980s and therefore is sometimes called a Kohonen map or Kohonen network.[1][2] The Kohonen map or network is a computationally convenient abstraction building on biological models of neural systems from the 1970s[3] and morphogenesis models dating back to Alan Turing in the 1950s. https://en.wikipedia.org/wiki/Self-organizing_map Applies a 1D power-average pooling over an input signal composed of several input planes. LPPool1D LPPool1d LPPool1D Layer Applies a 1D power-average pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Applies a 2D power-average pooling over an input signal composed of several input planes. LPPool2D LPPool2d LPPool2D Layer Applies a 2D power-average pooling over an input signal composed of several input planes. https://pytorch.org/docs/stable/nn.html#pooling-layers Cell class for the LSTM layer. LSTMCell Layer Cell class for the LSTM layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTMCell Long Short-Term Memory layer - Hochreiter 1997. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: 1. activation == tanh, 2. recurrent_activation == sigmoid, 3. recurrent_dropout == 0, 4. unroll is False, 5. use_bias is True, 6. Inputs, if use masking, are strictly right-padded, 7. Eager execution is enabled in the outermost context. LSTM Layer Long Short-Term Memory layer - Hochreiter 1997. Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation. The requirements to use the cuDNN implementation are: 1. activation == tanh, 2. recurrent_activation == sigmoid, 3. recurrent_dropout == 0, 4. unroll is False, 5. use_bias is True, 6. Inputs, if use masking, are strictly right-padded, 7. Eager execution is enabled in the outermost context. https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM Wraps arbitrary expressions as a Layer object. Lambda Layer Wraps arbitrary expressions as a Layer object. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning. LLM Large Language Model A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using self-supervised learning or semi-supervised learning. https://en.wikipedia.org/wiki/Large_language_model A regression analysis method that performs both variable selection and regularizationin order to enhance the prediction accuracy and interpretability of the resulting statistical model. Lasso Regression A regression analysis method that performs both variable selection and regularizationin order to enhance the prediction accuracy and interpretability of the resulting statistical model. https://en.wikipedia.org/wiki/Lasso_(statistics) Network layer parent class Layer Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization LayerNorm nn.LayerNorm LayerNorm Layer Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization https://pytorch.org/docs/stable/nn.html#normalization-layers Layer normalization layer (Ba et al., 2016). Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1. Given a tensor inputs, moments are calculated and normalization is performed across the axes specified in axis. LayerNormalization Layer Layer normalization layer (Ba et al., 2016). Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1. Given a tensor inputs, moments are calculated and normalization is performed across the axes specified in axis. https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization This is the class from which all layers inherit. A layer is a callable object that takes as input one or more tensors and that outputs one or more tensors. It involves computation, defined in the call() method, and a state (weight variables). State can be created in various places, at the convenience of the subclass implementer: in __init__(); in the optional build() method, which is invoked by the first __call__() to the layer, and supplies the shape(s) of the input(s), which may not have been known at initialization time; in the first invocation of call(), with some caveats discussed below. Users will just instantiate a layer and then treat it as a callable. Layer Layer This is the class from which all layers inherit. A layer is a callable object that takes as input one or more tensors and that outputs one or more tensors. It involves computation, defined in the call() method, and a state (weight variables). State can be created in various places, at the convenience of the subclass implementer: in __init__(); in the optional build() method, which is invoked by the first __call__() to the layer, and supplies the shape(s) of the input(s), which may not have been known at initialization time; in the first invocation of call(), with some caveats discussed below. Users will just instantiate a layer and then treat it as a callable. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer A torch.nn.BatchNorm1d module with lazy initialization of the num_features argument of the BatchNorm1d that is inferred from the input.size(1). LazyBatchNorm1D LazyBatchNorm1d nn.LazyBatchNorm1d LazyBatchNorm1D Layer A torch.nn.BatchNorm1d module with lazy initialization of the num_features argument of the BatchNorm1d that is inferred from the input.size(1). https://pytorch.org/docs/stable/nn.html#normalization-layers A torch.nn.BatchNorm2d module with lazy initialization of the num_features argument of the BatchNorm2d that is inferred from the input.size(1). LazyBatchNorm2D LazyBatchNorm2d nn.LazyBatchNorm2d LazyBatchNorm2D Layer A torch.nn.BatchNorm2d module with lazy initialization of the num_features argument of the BatchNorm2d that is inferred from the input.size(1). https://pytorch.org/docs/stable/nn.html#normalization-layers A torch.nn.BatchNorm3d module with lazy initialization of the num_features argument of the BatchNorm3d that is inferred from the input.size(1). LazyBatchNorm3D LazyBatchNorm3d nn.LazyBatchNorm3d LazyBatchNorm3D Layer A torch.nn.BatchNorm3d module with lazy initialization of the num_features argument of the BatchNorm3d that is inferred from the input.size(1). https://pytorch.org/docs/stable/nn.html#normalization-layers A torch.nn.InstanceNorm1d module with lazy initialization of the num_features argument of the InstanceNorm1d that is inferred from the input.size(1). LazyInstanceNorm1D LazyInstanceNorm1d nn.LazyInstanceNorm1d LazyInstanceNorm1d Layer A torch.nn.InstanceNorm1d module with lazy initialization of the num_features argument of the InstanceNorm1d that is inferred from the input.size(1). https://pytorch.org/docs/stable/nn.html#normalization-layers A torch.nn.InstanceNorm2d module with lazy initialization of the num_features argument of the InstanceNorm2d that is inferred from the input.size(1). LazyInstanceNorm2D LazyInstanceNorm2d nn.LazyInstanceNorm2d LazyInstanceNorm2d Layer A torch.nn.InstanceNorm2d module with lazy initialization of the num_features argument of the InstanceNorm2d that is inferred from the input.size(1). https://pytorch.org/docs/stable/nn.html#normalization-layers A torch.nn.InstanceNorm3d module with lazy initialization of the num_features argument of the InstanceNorm3d that is inferred from the input.size(1). LazyInstanceNorm3D LazyInstanceNorm3d nn.LazyInstanceNorm3d LazyInstanceNorm3d Layer A torch.nn.InstanceNorm3d module with lazy initialization of the num_features argument of the InstanceNorm3d that is inferred from the input.size(1). https://pytorch.org/docs/stable/nn.html#normalization-layers Leaky version of a Rectified Linear Unit. LeakyReLU Layer Leaky version of a Rectified Linear Unit. https://www.tensorflow.org/api_docs/python/tf/keras/layers/LeakyReLU A standard approach in regression analysis to approximate the solution of overdetermined systems(sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the residuals (a residual being the difference between an observed value and the fitted value provided by a model) made in the results of each individual equation. Least-squares Analysis A standard approach in regression analysis to approximate the solution of overdetermined systems(sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the residuals (a residual being the difference between an observed value and the fitted value provided by a model) made in the results of each individual equation. https://en.wikipedia.org/wiki/Least_squares A linear function has the form f(x) = a + bx. Linear Function A linear function has the form f(x) = a + bx. https://www.tensorflow.org/api_docs/python/tf/keras/activations/linear A linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). Linear Regression A linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). https://en.wikipedia.org/wiki/Linear_regression Arises when network attributes obtained from user connections, activities, or interactions differ and misrepresent the true behavior of the users. Linking Bias Arises when network attributes obtained from user connections, activities, or interactions differ and misrepresent the true behavior of the users. https://doi.org/10.6028/NIST.SP.1270 A liquid state machine (LSM) is a type of reservoir computer that uses a spiking neural network. An LSM consists of a large collection of units (called nodes, or neurons). Each node receives time varying input from external sources (the inputs) as well as from other nodes. Nodes are randomly connected to each other. The recurrent nature of the connections turns the time varying input into a spatio-temporal pattern of activations in the network nodes. The spatio-temporal patterns of activation are read out by linear discriminant units. The soup of recurrently connected nodes will end up computing a large variety of nonlinear functions on the input. Given a large enough variety of such nonlinear functions, it is theoretically possible to obtain linear combinations (using the read out units) to perform whatever mathematical operation is needed to perform a certain task, such as speech recognition or computer vision. The word liquid in the name comes from the analogy drawn to dropping a stone into a still body of water or other liquid. The falling stone will generate ripples in the liquid. The input (motion of the falling stone) has been converted into a spatio-temporal pattern of liquid displacement (ripples). (https://en.wikipedia.org/wiki/Liquid_state_machine) LSM Input, Spiking Hidden, Output Liquid State Machine Network A liquid state machine (LSM) is a type of reservoir computer that uses a spiking neural network. An LSM consists of a large collection of units (called nodes, or neurons). Each node receives time varying input from external sources (the inputs) as well as from other nodes. Nodes are randomly connected to each other. The recurrent nature of the connections turns the time varying input into a spatio-temporal pattern of activations in the network nodes. The spatio-temporal patterns of activation are read out by linear discriminant units. The soup of recurrently connected nodes will end up computing a large variety of nonlinear functions on the input. Given a large enough variety of such nonlinear functions, it is theoretically possible to obtain linear combinations (using the read out units) to perform whatever mathematical operation is needed to perform a certain task, such as speech recognition or computer vision. The word liquid in the name comes from the analogy drawn to dropping a stone into a still body of water or other liquid. The falling stone will generate ripples in the liquid. The input (motion of the falling stone) has been converted into a spatio-temporal pattern of liquid displacement (ripples). (https://en.wikipedia.org/wiki/Liquid_state_machine) https://en.wikipedia.org/wiki/Liquid_state_machine Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. LocalResponseNorm nn.LocalResponseNorm LocalResponseNorm Layer Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. https://pytorch.org/docs/stable/nn.html#normalization-layers The LocallyConnected1D layer works similarly to the Convolution1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. Locally-connected Layer The LocallyConnected1D layer works similarly to the Convolution1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. https://faroit.com/keras-docs/1.2.2/layers/local/ Locally-connected layer for 1D inputs. The LocallyConnected1D layer works similarly to the Conv1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. LocallyConnected1D Layer Locally-connected layer for 1D inputs. The LocallyConnected1D layer works similarly to the Conv1D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. https://www.tensorflow.org/api_docs/python/tf/keras/layers/LocallyConnected1D Locally-connected layer for 2D inputs. The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. LocallyConnected2D Layer Locally-connected layer for 2D inputs. The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input. https://www.tensorflow.org/api_docs/python/tf/keras/layers/LocallyConnected2D A statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. Logistic Regression A statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. https://en.wikipedia.org/wiki/Logistic_regression Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep Learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can process not only single data points (such as images), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition and anomaly detection in network traffic or IDSs (intrusion detection systems). A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell. LSTM Input, Memory Cell, Output Long Short Term Memory Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep Learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can process not only single data points (such as images), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition and anomaly detection in network traffic or IDSs (intrusion detection systems). A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell. https://en.wikipedia.org/wiki/Long_short-term_memory When automation leads to humans being unaware of their situation such that, when control of a system is given back to them in a situation where humans and machines cooperate, they are unprepared to assume their duties. This can be a loss of awareness over what automation is and isn’t taking care of. Loss Of Situational Awareness Bias When automation leads to humans being unaware of their situation such that, when control of a system is given back to them in a situation where humans and machines cooperate, they are unprepared to assume their duties. This can be a loss of awareness over what automation is and isn’t taking care of. https://doi.org/10.6028/NIST.SP.1270 A field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. Machine Learning A field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. https://en.wikipedia.org/wiki/Machine_learning Methods based on the assumption that one's observed data lie on a low-dimensional manifold embedded in a higher-dimensional space. Manifold Learning Methods based on the assumption that one's observed data lie on a low-dimensional manifold embedded in a higher-dimensional space. https://arxiv.org/abs/2011.01307 A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.[1][2][3] A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov. MC MP Markov Process Probalistic Hidden Markov Chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.[1][2][3] A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov. https://en.wikipedia.org/wiki/Markov_chain Masks a sequence by using a mask value to skip timesteps. For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking). If any downstream layer does not support masking yet receives such an input mask, an exception will be raised. Masking Layer Masks a sequence by using a mask value to skip timesteps. For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking). If any downstream layer does not support masking yet receives such an input mask, an exception will be raised. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Masking Max pooling operation for 1D temporal data. Downsamples the input representation by taking the maximum value over a spatial window of size pool_size. The window is shifted by strides. The resulting output, when using the "valid" padding option, has a shape of: output_shape = (input_shape - pool_size + 1) / strides) The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides. MaxPool1D MaxPool1d MaxPooling1D MaxPooling1d MaxPooling1D Layer Max pooling operation for 1D temporal data. Downsamples the input representation by taking the maximum value over a spatial window of size pool_size. The window is shifted by strides. The resulting output, when using the "valid" padding option, has a shape of: output_shape = (input_shape - pool_size + 1) / strides) The resulting output shape when using the "same" padding option is: output_shape = input_shape / strides. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool1D Max pooling operation for 2D spatial data. MaxPool2D MaxPool2d MaxPooling2D MaxPooling2d MaxPooling2D Layer Max pooling operation for 2D spatial data. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D Max pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. MaxPool3D MaxPool3d MaxPooling3D MaxPooling3d MaxPooling3D Layer Max pooling operation for 3D data (spatial or spatio-temporal). Downsamples the input along its spatial dimensions (depth, height, and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool3D Computes a partial inverse of MaxPool1d. MaxUnpool1D MaxUnpool1d MaxUnpool1D Layer Computes a partial inverse of MaxPool1d. https://pytorch.org/docs/stable/nn.html#pooling-layers Computes a partial inverse of MaxPool2d. MaxUnpool2D MaxUnpool2d MaxUnpool2D Layer Computes a partial inverse of MaxPool2d. https://pytorch.org/docs/stable/nn.html#pooling-layers Computes a partial inverse of MaxPool3d. MaxUnpool3D MaxUnpool3d MaxUnpool3D Layer Computes a partial inverse of MaxPool3d. https://pytorch.org/docs/stable/nn.html#pooling-layers Layer that computes the maximum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Maximum Layer Layer that computes the maximum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). https://www.tensorflow.org/api_docs/python/tf/keras/layers/Maximum Arises when features and labels are proxies for desired quantities, potentially leaving out important factors or introducing group or input-dependent noise that leads to differential performance. Measurement Bias Arises when features and labels are proxies for desired quantities, potentially leaving out important factors or introducing group or input-dependent noise that leads to differential performance. https://doi.org/10.6028/NIST.SP.1270 A layer used to merge a list of inputs. Merging Layer A layer used to merge a list of inputs. https://www.tutorialspoint.com/keras/keras_merge_layer.htm Automatic learning algorithms applied to metadata about machine Learning experiments. Meta-Learning Automatic learning algorithms applied to metadata about machine Learning experiments. https://en.wikipedia.org/wiki/Meta_learning_(computer_science) Method parent class. Method Methods which can learn a representation function that maps objects into an embedded space. Distance Metric Learning Metric Learning Methods which can learn a representation function that maps objects into an embedded space. https://paperswithcode.com/task/metric-learning Layer that computes the minimum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Minimum Layer Layer that computes the minimum (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). https://www.tensorflow.org/api_docs/python/tf/keras/layers/Minimum When modal interfaces confuse human operators, who misunderstand which mode the system is using, taking actions which are correct for a different mode but incorrect for their current situation. This is the cause of many deadly accidents, but also a source of confusion in everyday life. Mode Confusion Bias When modal interfaces confuse human operators, who misunderstand which mode the system is using, taking actions which are correct for a different mode but incorrect for their current situation. This is the cause of many deadly accidents, but also a source of confusion in everyday life. https://doi.org/10.6028/NIST.SP.1270 The bias introduced while using the data to select a single seemingly “best” model from a large set of models employing many predictor variables. Model selection bias also occurs when an explanatory variable has a weak relationship with the response variable. Model Selection Bias The bias introduced while using the data to select a single seemingly “best” model from a large set of models employing many predictor variables. Model selection bias also occurs when an explanatory variable has a weak relationship with the response variable. https://doi.org/10.6028/NIST.SP.1270 MultiHeadAttention layer. This is an implementation of multi-headed attention as described in the paper "Attention is all you Need" (Vaswani et al., 2017). If query, key, value are the same, then this is self-attention. Each timestep in query attends to the corresponding sequence in key, and returns a fixed-width vector.This layer first projects query, key and value. These are (effectively) a list of tensors of length num_attention_heads, where the corresponding shapes are (batch_size, <query dimensions>, key_dim), (batch_size, <key/value dimensions>, key_dim), (batch_size, <key/value dimensions>, value_dim).Then, the query and key tensors are dot-producted and scaled. These are softmaxed to obtain attention probabilities. The value tensors are then interpolated by these probabilities, then concatenated back to a single tensor. Finally, the result tensor with the last dimension as value_dim can take an linear projection and return. When using MultiHeadAttention inside a custom Layer, the custom Layer must implement build() and call MultiHeadAttention's _build_from_signature(). This enables weights to be restored correctly when the model is loaded. MultiHeadAttention Layer MultiHeadAttention layer. This is an implementation of multi-headed attention as described in the paper "Attention is all you Need" (Vaswani et al., 2017). If query, key, value are the same, then this is self-attention. Each timestep in query attends to the corresponding sequence in key, and returns a fixed-width vector.This layer first projects query, key and value. These are (effectively) a list of tensors of length num_attention_heads, where the corresponding shapes are (batch_size, <query dimensions>, key_dim), (batch_size, <key/value dimensions>, key_dim), (batch_size, <key/value dimensions>, value_dim).Then, the query and key tensors are dot-producted and scaled. These are softmaxed to obtain attention probabilities. The value tensors are then interpolated by these probabilities, then concatenated back to a single tensor. Finally, the result tensor with the last dimension as value_dim can take an linear projection and return. When using MultiHeadAttention inside a custom Layer, the custom Layer must implement build() and call MultiHeadAttention's _build_from_signature(). This enables weights to be restored correctly when the model is loaded. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention Methods that lassify instances into one of three or more classes (classifying instances into one of two classes is called binary classification). Multinomial Classification Multiclass Classification Methods that lassify instances into one of three or more classes (classifying instances into one of two classes is called binary classification). https://en.wikipedia.org/wiki/Multiclass_classification A method that translates information about the pairwise distances among a set of objects or individuals into a configuration of points mapped into an abstract Cartesian space. MDS Multidimensional Scaling A method that translates information about the pairwise distances among a set of objects or individuals into a configuration of points mapped into an abstract Cartesian space. https://en.wikipedia.org/wiki/Multidimensional_scaling Methods which can create models that can process and link information using various modalities. Multimodal Deep Learning Methods which can create models that can process and link information using various modalities. https://arxiv.org/abs/2105.11087 Methods which can represent the joint representations of different modalities. Multimodal Learning Layer that multiplies (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). Multiply Layer Layer that multiplies (element-wise) a list of inputs. It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape). https://www.tensorflow.org/api_docs/python/tf/keras/layers/Multiply A subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. NLP Natural Language Processing A subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. https://en.wikipedia.org/wiki/Natural_language_processing Network parent class Network A Neural Turing machine (NTMs) is a recurrent neural network model. The approach was published by Alex Graves et al. in 2014. NTMs combine the fuzzy pattern matching capabilities of neural networks with the algorithmic power of programmable computers. An NTM has a neural network controller coupled to external memory resources, which it interacts with through attentional mechanisms. The memory interactions are differentiable end-to-end, making it possible to optimize them using gradient descent. An NTM with a long short-term memory (LSTM) network controller can infer simple algorithms such as copying, sorting, and associative recall from examples alone. NTM Input, Hidden, Spiking Hidden, Output Neural Turing Machine Network A Neural Turing machine (NTMs) is a recurrent neural network model. The approach was published by Alex Graves et al. in 2014. NTMs combine the fuzzy pattern matching capabilities of neural networks with the algorithmic power of programmable computers. An NTM has a neural network controller coupled to external memory resources, which it interacts with through attentional mechanisms. The memory interactions are differentiable end-to-end, making it possible to optimize them using gradient descent. An NTM with a long short-term memory (LSTM) network controller can infer simple algorithms such as copying, sorting, and associative recall from examples alone. https://en.wikipedia.org/wiki/Neural_Turing_machine Noisy dense layer that injects random noise to the weights of dense layer. Noisy dense layers are fully connected layers whose weights and biases are augmented by factorised Gaussian noise. The factorised Gaussian noise is controlled through gradient descent by a second weights layer. A NoisyDense layer implements the operation: $$ mathrm{NoisyDense}(x) = mathrm{activation}(mathrm{dot}(x, mu + (sigma cdot epsilon)) mathrm{bias}) $$ where mu is the standard weights layer, epsilon is the factorised Gaussian noise, and delta is a second weights layer which controls epsilon. Noise Dense Layer Noisy dense layer that injects random noise to the weights of dense layer. Noisy dense layers are fully connected layers whose weights and biases are augmented by factorised Gaussian noise. The factorised Gaussian noise is controlled through gradient descent by a second weights layer. A NoisyDense layer implements the operation: $$ mathrm{NoisyDense}(x) = mathrm{activation}(mathrm{dot}(x, mu + (sigma cdot epsilon)) mathrm{bias}) $$ where mu is the standard weights layer, epsilon is the factorised Gaussian noise, and delta is a second weights layer which controls epsilon. https://www.tensorflow.org/addons/api_docs/python/tfa/layers/NoisyDense A preprocessing layer which normalizes continuous features. Normalization Layer A preprocessing layer which normalizes continuous features. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization A layer that performs numerical data preprocessing operations. Numerical Features Preprocessing Layer A layer that performs numerical data preprocessing operations. https://keras.io/guides/preprocessing_layers/ A method which aims to classify objects from one, or only a few, examples. OSL One-shot Learning A method which aims to classify objects from one, or only a few, examples. https://en.wikipedia.org/wiki/One-shot_learning The output layer in an artificial neural network is the last layer of neurons that produces given outputs for the program. Though they are made much like other artificial neurons in the neural network, output layer neurons may be built or observed in a different way, given that they are the last “actor” nodes on the network. Output Layer The output layer in an artificial neural network is the last layer of neurons that produces given outputs for the program. Though they are made much like other artificial neurons in the neural network, output layer neurons may be built or observed in a different way, given that they are the last “actor” nodes on the network. https://www.techopedia.com/definition/33263/output-layer-neural-networks Parametric Rectified Linear Unit. PReLU Layer Parametric Rectified Linear Unit. https://www.tensorflow.org/api_docs/python/tf/keras/layers/PReLU The perceptron is an algorithm for supervised Learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector. (https://en.wikipedia.org/wiki/Perceptron) SLP Single Layer Perceptron Input, Output Perceptron Permutes the dimensions of the input according to a given pattern. Useful e.g. connecting RNNs and convnets. Permute Layer Permutes the dimensions of the input according to a given pattern. Useful e.g. connecting RNNs and convnets. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Permute Pooling layers serve the dual purposes of mitigating the sensitivity of convolutional layers to location and of spatially downsampling representations. Pooling Layer Pooling layers serve the dual purposes of mitigating the sensitivity of convolutional layers to location and of spatially downsampling representations. https://d2l.ai/chapter_convolutional-neural-networks/pooling.html A form of selection bias that occurs when items that are more popular are more exposed and less popular items are under-represented. Popularity Bias A form of selection bias that occurs when items that are more popular are more exposed and less popular items are under-represented. https://doi.org/10.6028/NIST.SP.1270 A form of selection bias that occurs when items that are more popular are more exposed and less popular items are under-represented.aSystematic distortions in demographics or other user characteristics between a population of users represented in a dataset or on a platform and some target population. Population Bias A form of selection bias that occurs when items that are more popular are more exposed and less popular items are under-represented.aSystematic distortions in demographics or other user characteristics between a population of users represented in a dataset or on a platform and some target population. https://doi.org/10.6028/NIST.SP.1270 A layer that performs data preprocessing operations. Preprocessing Layer A layer that performs data preprocessing operations. https://www.tensorflow.org/guide/keras/preprocessing_layers Biases arising from how information is presented on the Web, via a user interface, due to rating or ranking of output, or through users’ own self-selected, biased interaction. Presentation Bias Biases arising from how information is presented on the Web, via a user interface, due to rating or ranking of output, or through users’ own self-selected, biased interaction. https://doi.org/10.6028/NIST.SP.1270 A method for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data. PCA Principal Component Analysis A method for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data. https://en.wikipedia.org/wiki/Principal_component_analysis A probabilistic model for which a graph expresses the conditional dependence structure between random variables. Graphical Model PGM Structure Probabilistic Model Probabilistic Graphical Model A probabilistic model for which a graph expresses the conditional dependence structure between random variables. https://en.wikipedia.org/wiki/Graphical_model Methods that use statistical methods to analyze the words in each text to discover common themes, how those themes are connected to each other, and how they change over time. Probabilistic Topic Model Methods that use statistical methods to analyze the words in each text to discover common themes, how those themes are connected to each other, and how they change over time. https://pyro.ai/examples/prodlda.html Judgement modulated by affect, which is influenced by the level of efficacy and efficiency in information processing; in cognitive sciences, processing bias is often referred to as an aesthetic judgement. Validation Bias Processing Bias Judgement modulated by affect, which is influenced by the level of efficacy and efficiency in information processing; in cognitive sciences, processing bias is often referred to as an aesthetic judgement. https://royalsocietypublishing.org/doi/10.1098/rspb.2019.0165#d1e5237 A surival modeling method where the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. Proportional Hazards Model A surival modeling method where the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. https://en.wikipedia.org/wiki/Proportional_hazards_modelProportional Hazards Model Base class for recurrent layers. RNN Layer Base class for recurrent layers. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RNN Like recurrent neural networks (RNNs), transformers are designed to handle sequential input data, such as natural language, for tasks such as translation and text summarization. However, unlike RNNs, transformers do not necessarily process the data in order. Rather, the attention mechanism provides context for any position in the input sequence. RBFN RBN Radial Basis Function Network Input, Hidden, Output Radial Basis Network Like recurrent neural networks (RNNs), transformers are designed to handle sequential input data, such as natural language, for tasks such as translation and text summarization. However, unlike RNNs, transformers do not necessarily process the data in order. Rather, the attention mechanism provides context for any position in the input sequence. https://en.wikipedia.org/wiki/Radial_basis_function_network A preprocessing layer which randomly adjusts brightness during training. This layer will randomly increase/reduce the brightness for the input RGB images. At inference time, the output will be identical to the input. Call the layer with training=True to adjust the brightness of the input. Note that different brightness adjustment factors will be apply to each the images in the batch. RandomBrightness Layer A preprocessing layer which randomly adjusts brightness during training. This layer will randomly increase/reduce the brightness for the input RGB images. At inference time, the output will be identical to the input. Call the layer with training=True to adjust the brightness of the input. Note that different brightness adjustment factors will be apply to each the images in the batch. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomBrightness A preprocessing layer which randomly adjusts contrast during training. This layer will randomly adjust the contrast of an image or images by a random factor. Contrast is adjusted independently for each channel of each image during training. For each channel, this layer computes the mean of the image pixels in the channel and then adjusts each component x of each pixel to (x - mean) * contrast_factor + mean. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and in integer or floating point dtype. By default, the layer will output floats. The output value will be clipped to the range [0, 255], the valid range of RGB colors. RandomContrast Layer A preprocessing layer which randomly adjusts contrast during training. This layer will randomly adjust the contrast of an image or images by a random factor. Contrast is adjusted independently for each channel of each image during training. For each channel, this layer computes the mean of the image pixels in the channel and then adjusts each component x of each pixel to (x - mean) * contrast_factor + mean. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and in integer or floating point dtype. By default, the layer will output floats. The output value will be clipped to the range [0, 255], the valid range of RGB colors. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomContrast A preprocessing layer which randomly crops images during training. During training, this layer will randomly choose a location to crop images down to a target size. The layer will crop all the images in the same batch to the same cropping location. At inference time, and during training if an input image is smaller than the target size, the input will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. If you need to apply random cropping at inference time, set training to True when calling the layer. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. RandomCrop Layer A preprocessing layer which randomly crops images during training. During training, this layer will randomly choose a location to crop images down to a target size. The layer will crop all the images in the same batch to the same cropping location. At inference time, and during training if an input image is smaller than the target size, the input will be resized and cropped so as to return the largest possible window in the image that matches the target aspect ratio. If you need to apply random cropping at inference time, set training to True when calling the layer. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomCrop A preprocessing layer which randomly flips images during training. This layer will flip the images horizontally and or vertically based on the mode attribute. During inference time, the output will be identical to input. Call the layer with training=True to flip the input. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. RandomFlip Layer A preprocessing layer which randomly flips images during training. This layer will flip the images horizontally and or vertically based on the mode attribute. During inference time, the output will be identical to input. Call the layer with training=True to flip the input. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomFlip A preprocessing layer which randomly varies image height during training. This layer adjusts the height of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference. RandomHeight Layer A preprocessing layer which randomly varies image height during training. This layer adjusts the height of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomHeight A preprocessing layer which randomly rotates images during training. RandomRotation Layer A preprocessing layer which randomly rotates images during training. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomRotation A preprocessing layer which randomly translates images during training. This layer will apply random translations to each image during training, filling empty space according to fill_mode. aInput pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. RandomTranslation Layer A preprocessing layer which randomly translates images during training. This layer will apply random translations to each image during training, filling empty space according to fill_mode. aInput pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomTranslation A preprocessing layer which randomly varies image width during training. This layer will randomly adjusts the width of a batch of images of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference. RandomWidth Layer A preprocessing layer which randomly varies image width during training. This layer will randomly adjusts the width of a batch of images of a batch of images by a random factor. The input should be a 3D (unbatched) or 4D (batched) tensor in the "channels_last" image data format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. By default, this layer is inactive during inference. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomWidth A preprocessing layer which randomly zooms images during training. This layer will randomly zoom in or out on each axis of an image independently, filling empty space according to fill_mode.Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. RandomZoom Layer A preprocessing layer which randomly zooms images during training. This layer will randomly zoom in or out on each axis of an image independently, filling empty space according to fill_mode.Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomZoom A statistical model where the model parameters are random variables. REM Random Effects Model A statistical model where the model parameters are random variables. https://en.wikipedia.org/wiki/Random_effects_model An ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. Random Forest An ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. https://en.wikipedia.org/wiki/Random_forest The idea that top-ranked results are the most relevant and important and will result in more clicks than other results. Ranking Bias The idea that top-ranked results are the most relevant and important and will result in more clicks than other results. https://doi.org/10.6028/NIST.SP.1270 Refers to differences in perspective, memory and recall, interpretation, and reporting on the same event from multiple persons or witnesses. Rashomon Effect Rashomon Principle Rashomon Effect Bias Refers to differences in perspective, memory and recall, interpretation, and reporting on the same event from multiple persons or witnesses. https://doi.org/10.6028/NIST.SP.1270 The ReLU activation function returns: max(x, 0), the element-wise maximum of 0 and the input tensor. ReLU Rectified Linear Unit ReLU Function The ReLU activation function returns: max(x, 0), the element-wise maximum of 0 and the input tensor. https://www.tensorflow.org/api_docs/python/tf/keras/activations/relu Rectified Linear Unit activation function. With default values, it returns element-wise max(x, 0). ReLU Layer Rectified Linear Unit activation function. With default values, it returns element-wise max(x, 0). https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU A layer of an RNB, composed of recurrent units and with the number of which is the hidden size of the layer. Recurrent Layer A layer of an RNB, composed of recurrent units and with the number of which is the hidden size of the layer. https://docs.nvidia.com/deepLearning/performance/dl-performance-recurrent/index.html#recurrent-layer A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. RN RecNN Recurrent Network Input, Memory Cell, Output Recurrent Neural Network A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. https://en.wikipedia.org/wiki/Recurrent_neural_network A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. Recursive neural networks, sometimes abbreviated as RvNNs, have been successful, for instance, in Learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding. RecuNN RvNN Recursive Neural Network A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. Recursive neural networks, sometimes abbreviated as RvNNs, have been successful, for instance, in Learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding. https://en.wikipedia.org/wiki/Recursive_neural_network A set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features'). Regression analysis Regression model Regression Analysis A set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features'). https://en.wikipedia.org/wiki/Regression_analysis Regularizers allow you to apply penalties on layer parameters or layer activity during optimization. These penalties are summed into the loss function that the network optimizes. Regularization penalties are applied on a per-layer basis. Regularization Layer Regularizers allow you to apply penalties on layer parameters or layer activity during optimization. These penalties are summed into the loss function that the network optimizes. Regularization penalties are applied on a per-layer basis. https://keras.io/api/layers/regularizers/ Methods that do not need labelled input/output pairs be presented, nor needing sub-optimal actions to be explicitly corrected. Instead they focus on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). Reinforcement Learning Methods that do not need labelled input/output pairs be presented, nor needing sub-optimal actions to be explicitly corrected. Instead they focus on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). https://en.wikipedia.org/wiki/Reinforcement_learning Repeats the input n times. RepeatVector Layer Repeats the input n times. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RepeatVector Arises due to non-random sampling of subgroups, causing trends estimated for one population to not be generalizable to data collected from a new population. Representation Bias Arises due to non-random sampling of subgroups, causing trends estimated for one population to not be generalizable to data collected from a new population. https://doi.org/10.6028/NIST.SP.1270 Methods that allow a system to discover the representations required for feature detection or classification from raw data. Feature Learning Representation Learning Methods that allow a system to discover the representations required for feature detection or classification from raw data. https://en.wikipedia.org/wiki/Feature_learning A preprocessing layer which rescales input values to a new range. Rescaling Layer A preprocessing layer which rescales input values to a new range. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Rescaling Layer that reshapes inputs into the given shape. Reshape Layer Layer that reshapes inputs into the given shape. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape Reshape layers are used to change the shape of the input. Reshaping Layer A residual neural network (ResNet) is an artificial neural network (ANN) of a kind that builds on constructs known from pyramidal cells in the cerebral cortex. Residual neural networks do this by utilizing skip connections, or shortcuts to jump over some layers. Typical ResNet models are implemented with double- or triple- layer skips that contain nonlinearities (ReLU) and batch normalization in between. An additional weight matrix may be used to learn the skip weights; these models are known as HighwayNets. Models with several parallel skips are referred to as DenseNets. In the context of residual neural networks, a non-residual network may be described as a 'plain network'. DRN Deep Residual Network ResNN ResNet Input, Weight, BN, ReLU, Weight, BN, Addition, ReLU Residual Neural Network A residual neural network (ResNet) is an artificial neural network (ANN) of a kind that builds on constructs known from pyramidal cells in the cerebral cortex. Residual neural networks do this by utilizing skip connections, or shortcuts to jump over some layers. Typical ResNet models are implemented with double- or triple- layer skips that contain nonlinearities (ReLU) and batch normalization in between. An additional weight matrix may be used to learn the skip weights; these models are known as HighwayNets. Models with several parallel skips are referred to as DenseNets. In the context of residual neural networks, a non-residual network may be described as a 'plain network'. https://en.wikipedia.org/wiki/Residual_neural_network A preprocessing layer which resizes images. This layer resizes an image input to a target height and width. The input should be a 4D (batched) or 3D (unbatched) tensor in "channels_last" format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. This layer can be called on tf.RaggedTensor batches of input images of distinct sizes, and will resize the outputs to dense tensors of uniform size. Resizing Layer A preprocessing layer which resizes images. This layer resizes an image input to a target height and width. The input should be a 4D (batched) or 3D (unbatched) tensor in "channels_last" format. Input pixel values can be of any range (e.g. [0., 1.) or [0, 255]) and of interger or floating point dtype. By default, the layer will output floats. This layer can be called on tf.RaggedTensor batches of input images of distinct sizes, and will resize the outputs to dense tensors of uniform size. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Resizing A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. RBM Backfed Input, Probabilistic Hidden Restricted Boltzmann Machine A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine A method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated.[1] It has been used in many fields including econometrics, chemistry, and engineering. Ridge Regression A method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated.[1] It has been used in many fields including econometrics, chemistry, and engineering. https://en.wikipedia.org/wiki/Ridge_regression The SELU activation function multiplies scale (> 1) with the output of the ELU function to ensure a slope larger than one for positive inputs. SELU Scaled Exponential Linear Unit SELU Function The SELU activation function multiplies scale (> 1) with the output of the ELU function to ensure a slope larger than one for positive inputs. https://www.tensorflow.org/api_docs/python/tf/keras/activations/selu Bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed. Sampling Bias Selection Bias Selection Effect Selection And Sampling Bias Bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed. https://en.wikipedia.org/wiki/Selection_bias Decision-makers’ inclination to selectively adopt algorithmic advice when it matches their pre-existing beliefs and stereotypes. Selective Adherence Bias Decision-makers’ inclination to selectively adopt algorithmic advice when it matches their pre-existing beliefs and stereotypes. https://doi.org/10.6028/NIST.SP.1270 Regarded as an intermediate form between supervised and unsupervised learning. Self-supervised Learning Regarded as an intermediate form between supervised and unsupervised learning. https://en.wikipedia.org/wiki/Self-supervised_learning Depthwise separable 1D convolution. This layer performs a depthwise convolution that acts separately on channels, followed by a pointwise convolution that mixes channels. If use_bias is True and a bias initializer is provided, it adds a bias vector to the output. It then optionally applies an activation function to produce the final output.a SeparableConv1D Layer SeparableConvolution1D Layer Depthwise separable 1D convolution. This layer performs a depthwise convolution that acts separately on channels, followed by a pointwise convolution that mixes channels. If use_bias is True and a bias initializer is provided, it adds a bias vector to the output. It then optionally applies an activation function to produce the final output.a https://www.tensorflow.org/api_docs/python/tf/keras/layers/SeparableConv1D Depthwise separable 2D convolution. Separable convolutions consist of first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes the resulting output channels. The depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step. Intuitively, separable convolutions can be understood as a way to factorize a convolution kernel into two smaller kernels, or as an extreme version of an Inception block. SeparableConv2D Layer SeparableConvolution2D Layer Depthwise separable 2D convolution. Separable convolutions consist of first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes the resulting output channels. The depth_multiplier argument controls how many output channels are generated per input channel in the depthwise step. Intuitively, separable convolutions can be understood as a way to factorize a convolution kernel into two smaller kernels, or as an extreme version of an Inception block. https://www.tensorflow.org/api_docs/python/tf/keras/layers/SeparableConv2D Applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)). For small values (<-5), sigmoid returns a value close to zero, and for large values (>5) the result of the function gets close to 1. Sigmoid is equivalent to a 2-element Softmax, where the second element is assumed to be zero. The sigmoid function always returns a value between 0 and 1. Sigmoid Function Applies the sigmoid activation function sigmoid(x) = 1 / (1 + exp(-x)). For small values (<-5), sigmoid returns a value close to zero, and for large values (>5) the result of the function gets close to 1. Sigmoid is equivalent to a 2-element Softmax, where the second element is assumed to be zero. The sigmoid function always returns a value between 0 and 1. https://www.tensorflow.org/api_docs/python/tf/keras/activations/sigmoid Cell class for SimpleRNN. This class processes one step within the whole time sequence input, whereas tf.keras.layer.SimpleRNN processes the whole sequence. SimpleRNNCell Layer Cell class for SimpleRNN. This class processes one step within the whole time sequence input, whereas tf.keras.layer.SimpleRNN processes the whole sequence. https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNNCell Fully-connected RNN where the output is to be fed back to input. SimpleRNN Layer Fully-connected RNN where the output is to be fed back to input. https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNN Can be positive or negative, and take a number of different forms, but is typically characterized as being for or against groups or individuals based on social identities, demographic factors, or immutable physical characteristics. Societal or social biases are often stereotypes. Common examples of societal or social biases are based on concepts like race, ethnicity, gender, sexual orientation, socioeconomic status, education, and more. Societal bias is often recognized and discussed in the context of NLP (Natural Language Processing) models. Social Bias Societal Bias Can be positive or negative, and take a number of different forms, but is typically characterized as being for or against groups or individuals based on social identities, demographic factors, or immutable physical characteristics. Societal or social biases are often stereotypes. Common examples of societal or social biases are based on concepts like race, ethnicity, gender, sexual orientation, socioeconomic status, education, and more. Societal bias is often recognized and discussed in the context of NLP (Natural Language Processing) models. https://doi.org/10.6028/NIST.SP.1270 The elements of the output vector are in range (0, 1) and sum to 1. Each vector is handled independently. The axis argument sets which axis of the input the function is applied along. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution. The softmax of each vector x is computed as exp(x) / tf.reduce_sum(exp(x)). The input values in are the log-odds of the resulting probability. Softmax Function The elements of the output vector are in range (0, 1) and sum to 1. Each vector is handled independently. The axis argument sets which axis of the input the function is applied along. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution. The softmax of each vector x is computed as exp(x) / tf.reduce_sum(exp(x)). The input values in are the log-odds of the resulting probability. https://www.tensorflow.org/api_docs/python/tf/keras/activations/softmax Softmax activation function. Softmax Layer Softmax activation function. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Softmax softplus(x) = log(exp(x) + 1) Softplus Function softplus(x) = log(exp(x) + 1) https://www.tensorflow.org/api_docs/python/tf/keras/activations/softplus softsign(x) = x / (abs(x) + 1) Softsign Function softsign(x) = x / (abs(x) + 1) https://www.tensorflow.org/api_docs/python/tf/keras/activations/softsign Sparse autoencoders may include more (rather than fewer) hidden units than inputs, but only a small number of the hidden units are allowed to be active at the same time (thus, sparse). This constraint forces the model to respond to the unique statistical features of the training data. (https://en.wikipedia.org/wiki/Autoencoder) SAE Input, Hidden, Matched Output-Input Sparse AE Methods which aim to find sparse representations of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. Sparse coding Sparse dictionary Learning Sparse Learning Methods which aim to find sparse representations of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. https://en.wikipedia.org/wiki/Sparse_dictionary_learning Spatial 1D version of Dropout. This version performs the same function as Dropout, however, it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout1D will help promote independence between feature maps and should be used instead. SpatialDropout1D Layer Spatial 1D version of Dropout. This version performs the same function as Dropout, however, it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout1D will help promote independence between feature maps and should be used instead. https://www.tensorflow.org/api_docs/python/tf/keras/layers/SpatialDropout1D Spatial 2D version of Dropout. This version performs the same function as Dropout, however, it drops entire 2D feature maps instead of individual elements. If adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout2D will help promote independence between feature maps and should be used instead.a SpatialDropout2D Layer Spatial 2D version of Dropout. This version performs the same function as Dropout, however, it drops entire 2D feature maps instead of individual elements. If adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout2D will help promote independence between feature maps and should be used instead.a https://www.tensorflow.org/api_docs/python/tf/keras/layers/SpatialDropout2D Spatial 3D version of Dropout. This version performs the same function as Dropout, however, it drops entire 3D feature maps instead of individual elements. If adjacent voxels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout3D will help promote independence between feature maps and should be used instead. SpatialDropout3D Layer Spatial 3D version of Dropout. This version performs the same function as Dropout, however, it drops entire 3D feature maps instead of individual elements. If adjacent voxels within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective Learning rate decrease. In this case, SpatialDropout3D will help promote independence between feature maps and should be used instead. https://www.tensorflow.org/api_docs/python/tf/keras/layers/SpatialDropout3D Regression method used to model spatial relationships. Spatial Regression Regression method used to model spatial relationships. https://gisgeography.com/spatial-regression-models-arcgis/ Wrapper allowing a stack of RNN cells to behave as a single cell. Used to implement efficient stacked RNNs. StackedRNNCells Layer Wrapper allowing a stack of RNN cells to behave as a single cell. Used to implement efficient stacked RNNs. https://www.tensorflow.org/api_docs/python/tf/keras/layers/StackedRNNCells A bias whereby people tend to search only where it is easiest to look. Streetlight Effect Streetlight Effect Bias A bias whereby people tend to search only where it is easiest to look. https://doi.org/10.6028/NIST.SP.1270 A preprocessing layer which maps string features to integer indices. StringLookup Layer A preprocessing layer which maps string features to integer indices. https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup Layer that subtracts two inputs. It takes as input a list of tensors of size 2, both of the same shape, and returns a single tensor, (inputs[0] - inputs[1]), also of the same shape. Subtract Layer Layer that subtracts two inputs. It takes as input a list of tensors of size 2, both of the same shape, and returns a single tensor, (inputs[0] - inputs[1]), also of the same shape. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Subtract A human tendency where people opt to continue with an endeavor or behavior due to previously spent or invested resources, such as money, time, and effort, regardless of whether costs outweigh benefits. For example, in AI, the sunk cost fallacy could lead development teams and organizations to feel that because they have already invested so much time and money into a particular AI application, they must pursue it to market rather than deciding to end the effort, even in the face of significant technical debt and/or ethical debt. Sunk Cost Fallacy Sunk Cost Fallacy Bias A human tendency where people opt to continue with an endeavor or behavior due to previously spent or invested resources, such as money, time, and effort, regardless of whether costs outweigh benefits. For example, in AI, the sunk cost fallacy could lead development teams and organizations to feel that because they have already invested so much time and money into a particular AI application, they must pursue it to market rather than deciding to end the effort, even in the face of significant technical debt and/or ethical debt. https://doi.org/10.6028/NIST.SP.1270 Methods that simultaneously cluster the rows and columns of a labeled matrix, also taking into account the data label contributions to cluster coherence. Supervised Block Clustering Supervised Co-clustering Supervised Joint Clustering Supervised Two-mode Clustering Supervised Two-way Clustering Supervised Biclustering Methods that simultaneously cluster the rows and columns of a labeled matrix, also taking into account the data label contributions to cluster coherence. https://en.wikipedia.org/wiki/Biclustering Methods that group a set of labeled objects in such a way that objects in the same group (called a cluster) are more similarly labeled (in some sense) relative to those in other groups (clusters). Cluster analysis Supervised Clustering Methods that group a set of labeled objects in such a way that objects in the same group (called a cluster) are more similarly labeled (in some sense) relative to those in other groups (clusters). https://en.wikipedia.org/wiki/Cluster_analysis Methods that can learn a function that maps an input to an output based on example input-output pairs. Supervised Learning Methods that can learn a function that maps an input to an output based on example input-output pairs. https://en.wikipedia.org/wiki/Supervised_learning In machine Learning, support-vector machines (SVMs, also support-vector networks) are supervised Learning models with associated Learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues (Boser et al., 1992, Guyon et al., 1993, Vapnik et al., 1997) SVMs are one of the most robust prediction methods, being based on statistical Learning frameworks or VC theory proposed by Vapnik (1982, 1995) and Chervonenkis (1974). Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). SVM maps training examples to points in space so as to maximise the width of the gap between the two categories. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall. SVM SVN Supper Vector Network Input, Hidden, Output Support Vector Machine In machine Learning, support-vector machines (SVMs, also support-vector networks) are supervised Learning models with associated Learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues (Boser et al., 1992, Guyon et al., 1993, Vapnik et al., 1997) SVMs are one of the most robust prediction methods, being based on statistical Learning frameworks or VC theory proposed by Vapnik (1982, 1995) and Chervonenkis (1974). Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). SVM maps training examples to points in space so as to maximise the width of the gap between the two categories. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall. https://en.wikipedia.org/wiki/Support-vector_machine Methods for nalyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. Survival Analysis Methods for nalyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. https://en.wikipedia.org/wiki/Survival_analysis Tendency for people to focus on the items, observations, or people that “survive” or make it past a selection process, while overlooking those that did not. Survivorship Bias Tendency for people to focus on the items, observations, or people that “survive” or make it past a selection process, while overlooking those that did not. https://doi.org/10.6028/NIST.SP.1270 x*sigmoid(x). It is a smooth, non-monotonic function that consistently matches or outperforms ReLU on deep networks, it is unbounded above and bounded below. Swish Function x*sigmoid(x). It is a smooth, non-monotonic function that consistently matches or outperforms ReLU on deep networks, it is unbounded above and bounded below. https://www.tensorflow.org/api_docs/python/tf/keras/activations/swish Like recurrent networks, but the connections between units are symmetrical (they have the same weight in both directions). SCN Symmetrically Connected Network Like recurrent networks, but the connections between units are symmetrical (they have the same weight in both directions). https://ieeexplore.ieee.org/document/287176 Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . SyncBatchNorm nn.SyncBatchNorm SyncBatchNorm Layer Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . https://pytorch.org/docs/stable/nn.html#normalization-layers Biases that result from procedures and practices of particular institutions that operate in ways which result in certain social groups being advantaged or favored and others being disadvantaged or devalued. Institutional Bias Societal Bias Systemic Bias Biases that result from procedures and practices of particular institutions that operate in ways which result in certain social groups being advantaged or favored and others being disadvantaged or devalued. https://doi.org/10.6028/NIST.SP.1270 Hyperbolic tangent activation function. hyperbolic tangent Tanh Function Hyperbolic tangent activation function. https://www.tensorflow.org/api_docs/python/tf/keras/activations/tanh Bias that arises from differences in populations and behaviors over time. Temporal Bias Bias that arises from differences in populations and behaviors over time. https://doi.org/10.6028/NIST.SP.1270 A preprocessing layer which maps text features to integer sequences. TextVectorization Layer A preprocessing layer which maps text features to integer sequences. https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization A layer that performs text data preprocessing operations. Text Preprocessing Layer A layer that performs text data preprocessing operations. https://keras.io/guides/preprocessing_layers/ Thresholded Rectified Linear Unit. ThresholdedReLU Layer Thresholded Rectified Linear Unit. https://www.tensorflow.org/api_docs/python/tf/keras/layers/ThresholdedReLU This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension. Consider a batch of 32 video samples, where each sample is a 128x128 RGB image with channels_last data format, across 10 timesteps. The batch input shape is (32, 10, 128, 128, 3). You can then use TimeDistributed to apply the same Conv2D layer to each of the 10 timesteps, independently: TimeDistributed Layer This wrapper allows to apply a layer to every temporal slice of an input. Every input should be at least 3D, and the dimension of index one of the first input will be considered to be the temporal dimension. Consider a batch of 32 video samples, where each sample is a 128x128 RGB image with channels_last data format, across 10 timesteps. The batch input shape is (32, 10, 128, 128, 3). You can then use TimeDistributed to apply the same Conv2D layer to each of the 10 timesteps, independently: https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed Methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time Series Analysis Methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. https://en.wikipedia.org/wiki/Time_series Methods that predict future values based on previously observed values. Time Series Forecasting Methods that predict future values based on previously observed values. https://en.wikipedia.org/wiki/Time_series Methods which can reuse or transfer information from previously learned tasks for the Learning of new tasks. Transfer Learning Methods which can reuse or transfer information from previously learned tasks for the Learning of new tasks. https://en.wikipedia.org/wiki/Transfer_learning A transformer is a deep Learning model that adopts the mechanism of attention, differentially weighing the significance of each part of the input data. It is used primarily in the field of natural language processing (NLP) and in computer vision (CV). (https://en.wikipedia.org/wiki/Transformer_(machine_Learning_model)) Transformer Network A transformer is a deep Learning model that adopts the mechanism of attention, differentially weighing the significance of each part of the input data. It is used primarily in the field of natural language processing (NLP) and in computer vision (CV). (https://en.wikipedia.org/wiki/Transformer_(machine_Learning_model)) https://en.wikipedia.org/wiki/Transformer_(machine_Learning_model) Arises when predictive algorithms favor groups that are better represented in the training data, since there will be less uncertainty associated with those predictions. Uncertainty Bias Arises when predictive algorithms favor groups that are better represented in the training data, since there will be less uncertainty associated with those predictions. https://doi.org/10.6028/NIST.SP.1270 Unit normalization layer. Normalize a batch of inputs so that each input in the batch has a L2 norm equal to 1 (across the axes specified in axis). UnitNormalization Layer Unit normalization layer. Normalize a batch of inputs so that each input in the batch has a L2 norm equal to 1 (across the axes specified in axis). https://www.tensorflow.org/api_docs/python/tf/keras/layers/UnitNormalization Methods that simultaneously cluster the rows and columns of an unlabeled input matrix. Block Clustering Co-clustering Joint Clustering Two-mode Clustering Two-way Clustering Unsupervised Biclustering Methods that simultaneously cluster the rows and columns of an unlabeled input matrix. https://en.wikipedia.org/wiki/Biclustering Methods that group a set of objects in such a way that objects without labels in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). Cluster analysis Unsupervised Clustering Methods that group a set of objects in such a way that objects without labels in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). https://en.wikipedia.org/wiki/Cluster_analysis Algorithms that learns patterns from unlabeled data. Unsupervised Learning Algorithms that learns patterns from unlabeled data. https://en.wikipedia.org/wiki/Unsupervised_learning Unsupervised pre-training initializes a discriminative neural net from one which was trained using an unsupervised criterion, such as a deep belief network or a deep autoencoder. This method can sometimes help with both the optimization and the overfitting issues. UPN Unsupervised Pretrained Network Unsupervised pre-training initializes a discriminative neural net from one which was trained using an unsupervised criterion, such as a deep belief network or a deep autoencoder. This method can sometimes help with both the optimization and the overfitting issues. https://metacademy.org/graphs/concepts/unsupervised_pre_training#:~:text=Unsupervised%20pre%2Dtraining%20initializes%20a,optimization%20and%20the%20overfitting%20issues Upsampling layer for 1D inputs. Repeats each temporal step size times along the time axis. UpSampling1D Layer Upsampling layer for 1D inputs. Repeats each temporal step size times along the time axis. https://www.tensorflow.org/api_docs/python/tf/keras/layers/UpSampling1D Upsampling layer for 2D inputs. Repeats the rows and columns of the data by size[0] and size[1] respectively. UpSampling2D Layer Upsampling layer for 2D inputs. Repeats the rows and columns of the data by size[0] and size[1] respectively. https://www.tensorflow.org/api_docs/python/tf/keras/layers/UpSampling2D Upsampling layer for 3D inputs. UpSampling3D Layer Upsampling layer for 3D inputs. https://www.tensorflow.org/api_docs/python/tf/keras/layers/UpSampling3D An information-processing bias, the tendency to inappropriately analyze ambiguous stimuli, scenarios and events. Interpretive Bias Use And Interpretation Bias An information-processing bias, the tendency to inappropriately analyze ambiguous stimuli, scenarios and events. https://en.wikipedia.org/wiki/Interpretive_bias Arises when a user imposes their own self-selected biases and behavior during interaction with data, output, results, etc. User Interaction Bias Arises when a user imposes their own self-selected biases and behavior during interaction with data, output, results, etc. https://doi.org/10.6028/NIST.SP.1270 Variational autoencoders are meant to compress the input information into a constrained multivariate latent distribution (encoding) to reconstruct it as accurately as possible (decoding). (https://en.wikipedia.org/wiki/Variational_autoencoder) VAE Input, Probabilistic Hidden, Matched Output-Input Variational Auto Encoder Abstract wrapper base class. Wrappers take another layer and augment it in various ways. Do not use this class as a layer, it is only an abstract base class. Two usable wrappers are the TimeDistributed and Bidirectional wrappers. Wrapper Layer Abstract wrapper base class. Wrappers take another layer and augment it in various ways. Do not use this class as a layer, it is only an abstract base class. Two usable wrappers are the TimeDistributed and Bidirectional wrappers. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Wrapper Methods where at test time, a learner observes samples from classes, which were not observed during training, and needs to predict the class that they belong to. ZSL Zero-shot Learning Methods where at test time, a learner observes samples from classes, which were not observed during training, and needs to predict the class that they belong to. https://en.wikipedia.org/wiki/Zero-shot_learning Zero-padding layer for 1D input (e.g. temporal sequence). ZeroPadding1D Layer Zero-padding layer for 1D input (e.g. temporal sequence). https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding1D Zero-padding layer for 2D input (e.g. picture). This layer can add rows and columns of zeros at the top, bottom, left and right side of an image tensor. ZeroPadding2D Layer Zero-padding layer for 2D input (e.g. picture). This layer can add rows and columns of zeros at the top, bottom, left and right side of an image tensor. https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding2D Zero-padding layer for 3D data (spatial or spatio-temporal). ZeroPadding3D Layer Zero-padding layer for 3D data (spatial or spatio-temporal). https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding3D The exponential linear unit (ELU) with alpha > 0 is: x if x > 0 and alpha * (exp(x) - 1) if x < 0 The ELU hyperparameter alpha controls the value to which an ELU saturates for negative net inputs. ELUs diminish the vanishing gradient effect. ELUs have negative values which pushes the mean of the activations closer to zero. Mean activations that are closer to zero enable faster Learning as they bring the gradient closer to the natural gradient. ELUs saturate to a negative value when the argument gets smaller. Saturation means a small derivative which decreases the variation and the information that is propagated to the next layer. ELU Exponential Linear Unit ELU Function The exponential linear unit (ELU) with alpha > 0 is: x if x > 0 and alpha * (exp(x) - 1) if x < 0 The ELU hyperparameter alpha controls the value to which an ELU saturates for negative net inputs. ELUs diminish the vanishing gradient effect. ELUs have negative values which pushes the mean of the activations closer to zero. Mean activations that are closer to zero enable faster Learning as they bring the gradient closer to the natural gradient. ELUs saturate to a negative value when the argument gets smaller. Saturation means a small derivative which decreases the variation and the information that is propagated to the next layer. https://www.tensorflow.org/api_docs/python/tf/keras/activations/elu In the continuous bag-of-words architecture, the model predicts the current node from a window of surrounding context nodes. The order of context nodes does not influence prediction (bag-of-words assumption). N2V-CBOW CBOW Input, Hidden, Output node2vec-CBOW In the continuous bag-of-words architecture, the model predicts the current node from a window of surrounding context nodes. The order of context nodes does not influence prediction (bag-of-words assumption). https://en.wikipedia.org/wiki/Word2vec In the continuous skip-gram architecture, the model uses the current node to predict the surrounding window of context nodes. The skip-gram architecture weighs nearby context nodes more heavily than more distant context nodes. (https://en.wikipedia.org/wiki/Word2vec) N2V-SkipGram SkipGram Input, Hidden, Output node2vec-SkipGram In the continuous skip-gram architecture, the model uses the current node to predict the surrounding window of context nodes. The skip-gram architecture weighs nearby context nodes more heavily than more distant context nodes. (https://en.wikipedia.org/wiki/Word2vec) https://en.wikipedia.org/wiki/Word2vec A statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map. t-SNE tSNE t-Distributed Stochastic Neighbor embedding A statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map. https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding In the continuous bag-of-words architecture, the model predicts the current word from a window of surrounding context words. The order of context words does not influence prediction (bag-of-words assumption). (https://en.wikipedia.org/wiki/Word2vec) W2V-CBOW CBOW Input, Hidden, Output word2vec-CBOW In the continuous bag-of-words architecture, the model predicts the current word from a window of surrounding context words. The order of context words does not influence prediction (bag-of-words assumption). (https://en.wikipedia.org/wiki/Word2vec) https://en.wikipedia.org/wiki/Word2vec In the continuous skip-gram architecture, the model uses the current word to predict the surrounding window of context words. The skip-gram architecture weighs nearby context words more heavily than more distant context words. W2V-SkipGram SkipGram Input, Hidden, Output word2vec-SkipGram In the continuous skip-gram architecture, the model uses the current word to predict the surrounding window of context words. The skip-gram architecture weighs nearby context words more heavily than more distant context words. https://en.wikipedia.org/wiki/Word2vec A statistical phenomenon where the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more other variables. For example, the statistical association or correlation that has been detected between two variables for an entire population disappears or reverses when the population is divided into subgroups. Simpson's Paradox Simpon's Paradox Bias A statistical phenomenon where the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more other variables. For example, the statistical association or correlation that has been detected between two variables for an entire population disappears or reverses when the population is divided into subgroups. https://doi.org/10.6028/NIST.SP.1270