--- name: tensorflow-neural-networks description: Build and train neural networks with TensorFlow allowed-tools: [Bash, Read] --- # TensorFlow Neural Networks Build and train neural networks using TensorFlow's high-level Keras API and low-level custom implementations. This skill covers everything from simple sequential models to complex custom architectures with multiple outputs, custom layers, and advanced training techniques. ## Sequential Models with Keras The Sequential API provides the simplest way to build neural networks by stacking layers linearly. ### Basic Image Classification ```python import tensorflow as tf from tensorflow import keras import numpy as np # Load MNIST dataset (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() # Preprocess data x_train = x_train.astype('float32') / 255.0 x_test = x_test.astype('float32') / 255.0 x_train = x_train.reshape(-1, 28 * 28) x_test = x_test.reshape(-1, 28 * 28) # Build Sequential model model = keras.Sequential([ keras.layers.Dense(128, activation='relu', input_shape=(784,)), keras.layers.Dropout(0.2), keras.layers.Dense(64, activation='relu'), keras.layers.Dropout(0.2), keras.layers.Dense(10, activation='softmax') ]) # Compile model model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) # Display model architecture model.summary() # Train model history = model.fit( x_train, y_train, batch_size=32, epochs=5, validation_split=0.2, verbose=1 ) # Evaluate model test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0) print(f"Test accuracy: {test_accuracy:.4f}") # Make predictions predictions = model.predict(x_test[:5]) predicted_classes = np.argmax(predictions, axis=1) print(f"Predicted classes: {predicted_classes}") print(f"True classes: {y_test[:5]}") # Save model model.save('mnist_model.h5') # Load model loaded_model = keras.models.load_model('mnist_model.h5') ``` ### Convolutional Neural Network ```python def create_cnn_model(input_shape=(224, 224, 3), num_classes=1000): """Create CNN model for image classification.""" model = tf.keras.Sequential([ # Block 1 tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=input_shape), tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'), tf.keras.layers.MaxPooling2D((2, 2)), tf.keras.layers.BatchNormalization(), # Block 2 tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'), tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'), tf.keras.layers.MaxPooling2D((2, 2)), tf.keras.layers.BatchNormalization(), # Block 3 tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same'), tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same'), tf.keras.layers.MaxPooling2D((2, 2)), tf.keras.layers.BatchNormalization(), # Classification head tf.keras.layers.GlobalAveragePooling2D(), tf.keras.layers.Dense(512, activation='relu'), tf.keras.layers.Dropout(0.5), tf.keras.layers.Dense(num_classes, activation='softmax') ]) return model ``` ### CIFAR-10 CNN Architecture ```python def generate_model(): return tf.keras.models.Sequential([ tf.keras.layers.Conv2D(32, (3, 3), padding='same', input_shape=x_train.shape[1:]), tf.keras.layers.Activation('relu'), tf.keras.layers.Conv2D(32, (3, 3)), tf.keras.layers.Activation('relu'), tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), tf.keras.layers.Dropout(0.25), tf.keras.layers.Conv2D(64, (3, 3), padding='same'), tf.keras.layers.Activation('relu'), tf.keras.layers.Conv2D(64, (3, 3)), tf.keras.layers.Activation('relu'), tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), tf.keras.layers.Dropout(0.25), tf.keras.layers.Flatten(), tf.keras.layers.Dense(512), tf.keras.layers.Activation('relu'), tf.keras.layers.Dropout(0.5), tf.keras.layers.Dense(10), tf.keras.layers.Activation('softmax') ]) model = generate_model() ``` ## Custom Layers Create reusable custom layers by subclassing `tf.keras.layers.Layer`. ### Custom Dense Layer ```python import tensorflow as tf class CustomDense(tf.keras.layers.Layer): def __init__(self, units=32, activation=None): super(CustomDense, self).__init__() self.units = units self.activation = tf.keras.activations.get(activation) def build(self, input_shape): """Create layer weights.""" self.w = self.add_weight( shape=(input_shape[-1], self.units), initializer='glorot_uniform', trainable=True, name='kernel' ) self.b = self.add_weight( shape=(self.units,), initializer='zeros', trainable=True, name='bias' ) def call(self, inputs): """Forward pass.""" output = tf.matmul(inputs, self.w) + self.b if self.activation is not None: output = self.activation(output) return output def get_config(self): """Enable serialization.""" config = super().get_config() config.update({ 'units': self.units, 'activation': tf.keras.activations.serialize(self.activation) }) return config # Use custom components custom_model = tf.keras.Sequential([ CustomDense(64, activation='relu', input_shape=(10,)), CustomDense(32, activation='relu'), CustomDense(1, activation='sigmoid') ]) custom_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) ``` ### Residual Block ```python import tensorflow as tf class ResidualBlock(tf.keras.layers.Layer): def __init__(self, filters, kernel_size=3): super(ResidualBlock, self).__init__() self.conv1 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same') self.bn1 = tf.keras.layers.BatchNormalization() self.conv2 = tf.keras.layers.Conv2D(filters, kernel_size, padding='same') self.bn2 = tf.keras.layers.BatchNormalization() self.activation = tf.keras.layers.Activation('relu') self.add = tf.keras.layers.Add() def call(self, inputs, training=False): x = self.conv1(inputs) x = self.bn1(x, training=training) x = self.activation(x) x = self.conv2(x) x = self.bn2(x, training=training) x = self.add([x, inputs]) # Residual connection x = self.activation(x) return x ``` ### Custom Projection Layer with TF NumPy ```python class ProjectionLayer(tf.keras.layers.Layer): """Linear projection layer using TF NumPy.""" def __init__(self, units): super(ProjectionLayer, self).__init__() self._units = units def build(self, input_shape): import tensorflow.experimental.numpy as tnp stddev = tnp.sqrt(self._units).astype(tnp.float32) initial_value = tnp.random.randn(input_shape[1], self._units).astype( tnp.float32) / stddev # Note that TF NumPy can interoperate with tf.Variable. self.w = tf.Variable(initial_value, trainable=True) def call(self, inputs): import tensorflow.experimental.numpy as tnp return tnp.matmul(inputs, self.w) # Call with ndarray inputs layer = ProjectionLayer(2) tnp_inputs = tnp.random.randn(2, 4).astype(tnp.float32) print("output:", layer(tnp_inputs)) # Call with tf.Tensor inputs tf_inputs = tf.random.uniform([2, 4]) print("\noutput: ", layer(tf_inputs)) ``` ## Custom Models Build complex architectures by subclassing `tf.keras.Model`. ### Multi-Task Model ```python import tensorflow as tf class MultiTaskModel(tf.keras.Model): def __init__(self, num_classes_task1=10, num_classes_task2=5): super(MultiTaskModel, self).__init__() # Shared layers self.conv1 = tf.keras.layers.Conv2D(32, 3, activation='relu') self.pool = tf.keras.layers.MaxPooling2D() self.flatten = tf.keras.layers.Flatten() self.shared_dense = tf.keras.layers.Dense(128, activation='relu') # Task-specific layers self.task1_dense = tf.keras.layers.Dense(64, activation='relu') self.task1_output = tf.keras.layers.Dense(num_classes_task1, activation='softmax', name='task1') self.task2_dense = tf.keras.layers.Dense(64, activation='relu') self.task2_output = tf.keras.layers.Dense(num_classes_task2, activation='softmax', name='task2') def call(self, inputs, training=False): # Shared feature extraction x = self.conv1(inputs) x = self.pool(x) x = self.flatten(x) x = self.shared_dense(x) # Task 1 branch task1 = self.task1_dense(x) task1_output = self.task1_output(task1) # Task 2 branch task2 = self.task2_dense(x) task2_output = self.task2_output(task2) return task1_output, task2_output ``` ### Three-Layer Neural Network Module ```python class Model(tf.Module): """A three layer neural network.""" def __init__(self): self.layer1 = Dense(128) self.layer2 = Dense(32) self.layer3 = Dense(NUM_CLASSES, use_relu=False) def __call__(self, inputs): x = self.layer1(inputs) x = self.layer2(x) return self.layer3(x) @property def params(self): return self.layer1.params + self.layer2.params + self.layer3.params ``` ## Recurrent Neural Networks ### Custom GRU Cell ```python import tensorflow.experimental.numpy as tnp class GRUCell: """Builds a traditional GRU cell with dense internal transformations. Gated Recurrent Unit paper: https://arxiv.org/abs/1412.3555 """ def __init__(self, n_units, forget_bias=0.0): self._n_units = n_units self._forget_bias = forget_bias self._built = False def __call__(self, inputs): if not self._built: self.build(inputs) x, gru_state = inputs # Dense layer on the concatenation of x and h. y = tnp.dot(tnp.concatenate([x, gru_state], axis=-1), self.w1) + self.b1 # Update and reset gates. u, r = tnp.split(tf.sigmoid(y), 2, axis=-1) # Candidate. c = tnp.dot(tnp.concatenate([x, r * gru_state], axis=-1), self.w2) + self.b2 new_gru_state = u * gru_state + (1 - u) * tnp.tanh(c) return new_gru_state def build(self, inputs): # State last dimension must be n_units. assert inputs[1].shape[-1] == self._n_units # The dense layer input is the input and half of the GRU state. dense_shape = inputs[0].shape[-1] + self._n_units self.w1 = tf.Variable(tnp.random.uniform( -0.01, 0.01, (dense_shape, 2 * self._n_units)).astype(tnp.float32)) self.b1 = tf.Variable((tnp.random.randn(2 * self._n_units) * 1e-6 + self._forget_bias ).astype(tnp.float32)) self.w2 = tf.Variable(tnp.random.uniform( -0.01, 0.01, (dense_shape, self._n_units)).astype(tnp.float32)) self.b2 = tf.Variable((tnp.random.randn(self._n_units) * 1e-6).astype(tnp.float32)) self._built = True @property def weights(self): return (self.w1, self.b1, self.w2, self.b2) ``` ### Custom Dense Layer Implementation ```python import tensorflow.experimental.numpy as tnp class Dense: def __init__(self, n_units, activation=None): self._n_units = n_units self._activation = activation self._built = False def __call__(self, inputs): if not self._built: self.build(inputs) y = tnp.dot(inputs, self.w) + self.b if self._activation != None: y = self._activation(y) return y def build(self, inputs): shape_w = (inputs.shape[-1], self._n_units) lim = tnp.sqrt(6.0 / (shape_w[0] + shape_w[1])) self.w = tf.Variable(tnp.random.uniform(-lim, lim, shape_w).astype(tnp.float32)) self.b = tf.Variable((tnp.random.randn(self._n_units) * 1e-6).astype(tnp.float32)) self._built = True @property def weights(self): return (self.w, self.b) ``` ### Sequential RNN Model ```python class Model: def __init__(self, vocab_size, embedding_dim, rnn_units, forget_bias=0.0, stateful=False, activation=None): self._embedding = Embedding(vocab_size, embedding_dim) self._gru = GRU(rnn_units, forget_bias=forget_bias, stateful=stateful) self._dense = Dense(vocab_size, activation=activation) self._layers = [self._embedding, self._gru, self._dense] self._built = False def __call__(self, inputs): if not self._built: self.build(inputs) xs = inputs for layer in self._layers: xs = layer(xs) return xs def build(self, inputs): self._embedding.build(inputs) self._gru.build(tf.TensorSpec(inputs.shape + (self._embedding._embedding_dim,), tf.float32)) self._dense.build(tf.TensorSpec(inputs.shape + (self._gru._cell._n_units,), tf.float32)) self._built = True @property def weights(self): return [layer.weights for layer in self._layers] @property def state(self): return self._gru.state def create_state(self, *args): self._gru.create_state(*args) def reset_state(self, *args): self._gru.reset_state(*args) ``` ## Training Configuration ### Model Parameters ```python # Length of the vocabulary in chars vocab_size = len(vocab) # The embedding dimension embedding_dim = 256 # Number of RNN units rnn_units = 1024 # Batch size BATCH_SIZE = 64 # Buffer size to shuffle the dataset BUFFER_SIZE = 10000 ``` ### Training Constants for MNIST ```python # Size of each input image, 28 x 28 pixels IMAGE_SIZE = 28 * 28 # Number of distinct number labels, [0..9] NUM_CLASSES = 10 # Number of examples in each training batch (step) TRAIN_BATCH_SIZE = 100 # Number of training steps to run TRAIN_STEPS = 1000 # Loads MNIST dataset. train, test = tf.keras.datasets.mnist.load_data() train_ds = tf.data.Dataset.from_tensor_slices(train).batch(TRAIN_BATCH_SIZE).repeat() # Casting from raw data to the required datatypes. def cast(images, labels): images = tf.cast( tf.reshape(images, [-1, IMAGE_SIZE]), tf.float32) labels = tf.cast(labels, tf.int64) return (images, labels) ``` ### Post-Training Quantization ```python # Load MNIST dataset mnist = keras.datasets.mnist (train_images, train_labels), (test_images, test_labels) = mnist.load_data() # Normalize the input image so that each pixel value is between 0 to 1. train_images = train_images / 255.0 test_images = test_images / 255.0 # Define the model architecture model = keras.Sequential([ keras.layers.InputLayer(input_shape=(28, 28)), keras.layers.Reshape(target_shape=(28, 28, 1)), keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu), keras.layers.MaxPooling2D(pool_size=(2, 2)), keras.layers.Flatten(), keras.layers.Dense(10) ]) # Train the digit classification model model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model.fit( train_images, train_labels, epochs=1, validation_data=(test_images, test_labels) ) ``` ## When to Use This Skill Use the tensorflow-neural-networks skill when you need to: - Build image classification models with CNNs - Create text processing models with RNNs or transformers - Implement custom layer architectures for specific use cases - Design multi-task learning models with shared representations - Train sequential models for tabular data - Implement residual connections or skip connections - Create embedding layers for discrete inputs - Build autoencoders or generative models - Fine-tune pre-trained models with custom heads - Implement attention mechanisms in custom architectures - Create time-series prediction models - Design reinforcement learning policy networks - Build siamese networks for similarity learning - Implement custom gradient computation in layers - Create models with dynamic architectures based on input ## Best Practices 1. **Use Keras Sequential for simple architectures** - Start with Sequential API for linear layer stacks before moving to functional or subclassing APIs 2. **Leverage pre-built layers** - Use tf.keras.layers built-in implementations before creating custom layers 3. **Initialize weights properly** - Use appropriate initializers (glorot_uniform, he_normal) based on activation functions 4. **Add batch normalization** - Place BatchNormalization layers after Conv2D/Dense layers for training stability 5. **Use dropout for regularization** - Apply Dropout layers (0.2-0.5) to prevent overfitting in fully connected layers 6. **Compile before training** - Always call model.compile() with optimizer, loss, and metrics before fit() 7. **Monitor validation metrics** - Use validation_split or validation_data to track overfitting during training 8. **Save model checkpoints** - Implement ModelCheckpoint callback to save best models during training 9. **Use model.summary()** - Verify architecture and parameter counts before training 10. **Implement early stopping** - Add EarlyStopping callback to prevent unnecessary training iterations 11. **Normalize input data** - Scale pixel values to [0,1] or standardize features to mean=0, std=1 12. **Use appropriate activation functions** - ReLU for hidden layers, softmax for multi-class, sigmoid for binary 13. **Set proper loss functions** - sparse_categorical_crossentropy for integer labels, categorical_crossentropy for one-hot 14. **Implement custom get_config()** - Override get_config() in custom layers for model serialization 15. **Use training parameter in call()** - Pass training flag to enable/disable dropout and batch norm behavior ## Common Pitfalls 1. **Forgetting to normalize data** - Unnormalized inputs cause slow convergence and poor performance 2. **Wrong loss function for labels** - Using categorical_crossentropy with integer labels causes errors 3. **Missing input_shape** - First layer needs input_shape parameter for model building 4. **Overfitting on small datasets** - Add dropout, augmentation, or reduce model capacity 5. **Learning rate too high** - Causes unstable training and loss divergence 6. **Not shuffling training data** - Leads to biased batch statistics and poor generalization 7. **Batch size too small** - Causes noisy gradients and slow training on large datasets 8. **Too many parameters** - Large models overfit and train slowly on limited data 9. **Vanishing gradients in deep networks** - Use residual connections or batch normalization 10. **Not using validation data** - Cannot detect overfitting or tune hyperparameters properly 11. **Forgetting to set training=False** - Dropout/BatchNorm behave incorrectly during inference 12. **Incompatible layer dimensions** - Output shape of one layer must match input of next 13. **Not calling build() before weights** - Custom layers need proper initialization before accessing weights 14. **Using wrong optimizer** - Adam works well generally, but SGD with momentum for some tasks 15. **Ignoring class imbalance** - Implement class weights or resampling for imbalanced datasets ## Resources - [TensorFlow Keras Guide](https://www.tensorflow.org/guide/keras) - [Building Custom Layers](https://www.tensorflow.org/guide/keras/custom_layers_and_models) - [Training and Evaluation](https://www.tensorflow.org/guide/keras/train_and_evaluate) - [Sequential Model API](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) - [Model Subclassing Guide](https://www.tensorflow.org/guide/keras/custom_layers_and_models#the_model_class) - [RNN Guide](https://www.tensorflow.org/guide/keras/rnn) - [Custom Training Loops](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch) - [Transfer Learning Guide](https://www.tensorflow.org/tutorials/images/transfer_learning)