Train the Network

Test backpropagation and forward propagation for the neural networks in the given code widget.

We'll cover the following...

In the code below, our classifier undergoes training for 20 iterations, but to obtain better accuracy we must increase the number of iterations. The results of training the classifier for 10,000 iterations are given below this code widget.

We can expect to get different results each time we run this program, because it uses random values for its weights.

Note: To learn why the randomly-initialized weights are a good choice, visit the “Weight initialization done right” lesson.

# A neural network implementation (almost the same as backpropagation.py,
# except for a tiny refactoring in the back() function).

import numpy as np

# Applying Logistic Regression
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Applying Softmax Activation function
def softmax(logits):
    exponentials = np.exp(logits)
    return exponentials / np.sum(exponentials, axis=1).reshape(-1, 1)

# calculating gradient of logistic regression
def sigmoid_gradient(sigmoid):
    return np.multiply(sigmoid, (1 - sigmoid))

# Computing Loss over using logistic regression
def loss(Y, y_hat):
    return -np.sum(Y * np.log(y_hat)) / Y.shape[0]

# Adding bias
def prepend_bias(X):
    return np.insert(X, 0, 1, axis=1)

# Basically doing prediction but named forward as its 
# performing Forward-Propagation
def forward(X, w1, w2):
    h = sigmoid(np.matmul(prepend_bias(X), w1))
    y_hat = softmax(np.matmul(prepend_bias(h), w2))
    return (y_hat, h)


# performing back-Propagation
def back(X, Y, y_hat, w2, h):
    w2_gradient = np.matmul(prepend_bias(h).T, (y_hat - Y)) / X.shape[0]
    w1_gradient = np.matmul(prepend_bias(X).T, np.matmul(y_hat - Y, w2[1:].T)
                            * sigmoid_gradient(h)) / X.shape[0]
    return (w1_gradient, w2_gradient)

# Calling the predict() function
def classify(X, w1, w2):
    y_hat, _ = forward(X, w1, w2)
    labels = np.argmax(y_hat, axis=1)
    return labels.reshape(-1, 1)

# initiazing weights for all nodes
def initialize_weights(n_input_variables, n_hidden_nodes, n_classes):
    w1_rows = n_input_variables + 1
    w1 = np.random.randn(w1_rows, n_hidden_nodes) * np.sqrt(1 / w1_rows)

    w2_rows = n_hidden_nodes + 1
    w2 = np.random.randn(w2_rows, n_classes) * np.sqrt(1 / w2_rows)

    return (w1, w2)

# Printing results to the terminal screen
def report(iteration, X_train, Y_train, X_test, Y_test, w1, w2):
    y_hat, _ = forward(X_train, w1, w2)
    training_loss = loss(Y_train, y_hat)
    classifications = classify(X_test, w1, w2)
    accuracy = np.average(classifications == Y_test) * 100.0
    print("Iteration: %5d, Loss: %.8f, Accuracy: %.2f%%" %
          (iteration, training_loss, accuracy))

# training phase
def train(X_train, Y_train, X_test, Y_test, n_hidden_nodes, iterations, lr):
    n_input_variables = X_train.shape[1]
    n_classes = Y_train.shape[1]
    w1, w2 = initialize_weights(n_input_variables, n_hidden_nodes, n_classes)
    for iteration in range(iterations):
        y_hat, h = forward(X_train, w1, w2)
        w1_gradient, w2_gradient = back(X_train, Y_train, y_hat, w2, h)
        w1 = w1 - (w1_gradient * lr)
        w2 = w2 - (w2_gradient * lr)
        report(iteration, X_train, Y_train, X_test, Y_test, w1, w2)
    return (w1, w2)


import mnist
w1, w2 = train(mnist.X_train, mnist.Y_train,
               mnist.X_test, mnist.Y_test,
               n_hidden_nodes=200, iterations=20, lr=0.01)
Final neural network for MNIST

To write the very last line here, we have to set values for the hyperparameters. The number of hidden nodes is easy: we already decided to have 200 ...

Access this course and 1400+ top-rated courses and projects.