Training

Learn about efficient and structured training in TensorFlow.

Chapter Goals:

  • Understand how a MonitoredTrainingSession works
  • Learn about saving checkpoints and tracking scalar values during training
  • Train a machine learning model using a MonitoredTrainingSession

A. Logging values

While tf.summary.scalar lets us keep track of certain values in an events file for TensorBoard, it is also useful to directly log values to STDOUT during training. For instance, it is customary to log the loss and iteration count, so we can stop training if there is an issue.

Press + to interact
python train_model.py

You’ll notice each line of output is prepended by “INFO:tensorflow”. This just means the logging level is set to INFO.

We log specific values while training using a tf.compat.v1.train.LoggingTensorHook object. The object is initialized with a dictionary mapping labels to scalar valued tensors. In our example, the labels we used were 'loss' and 'step', for the ...