Deep Learning with JAX and Flax/

...

Training model in Flax vs. TensorFlow

Learn about model training in Flax and TensorFlow.

We'll cover the following...

Distributed training in Flax and TensorFlow
Working with TPU accelerators
Model evaluation
Visualizing model performance

Press + to interact

In the code above:

Line 1: We import the train_state module from flax.training.
Lines 3–8: We define the create_train_state() function that creates the initial state for model training. This function takes the random number generator key, rng, as the argument. Inside this function:
- Lines 5–6: We create an instance model of the LSTMModel class and get the initial model parameters params by calling the init() method of the model. This method takes the random number generator key and a sample input, X_train_padded[0].
- Line 7: We define the Adam optimizer with the provided parameters, including learning rate, beta1, beta2, and epsilon.
- Line 8: We create and return the train state by calling the create() method of the train_state.TrainState module. This method takes three arguments: apply_fn is the function to apply the model, params are the model parameters, and tx is the optimizer.

Press + to interact

@jax.jit
def train_step(state, text, labels):
    def loss_fn(params):
        logits = LSTMModel().apply({'params': params}, text)
        loss = jnp.mean(optax.softmax_cross_entropy(
            logits=logits,
            labels=jax.nn.one_hot(labels, num_classes=2)))
        return loss, logits
    grad_fn = jax.value_and_grad(loss_fn, has_aux=True)
    (_, logits), grads = grad_fn(state.params)
    state = state.apply_gradients(grads=grads)
    metrics = compute_metrics(logits, labels)
    return state, metrics

Course Introduction

Basics of JAX

Optimizers in JAX and Flax

Loss and Activation Functions

Load Datasets in JAX

Image Classification and Distributed Training

TensorBoard and State Handling

LSTM in JAX and Flax

Flax vs. TensorFlow

Using ResNet Model in Flax

Transfer Learning in JAX and Flax

Conclusion

Appendix

Training model in Flax vs. TensorFlow