Train the Network
Test backpropagation and forward propagation for the neural networks in the given code widget.
We'll cover the following...
In the code below, our classifier undergoes training for 20 iterations
, but to obtain better accuracy
we must increase the number of iterations
. The results of training the classifier for 10,000 iterations
are given below this code widget.
We can expect to get different results each time we run this program, because it uses random values for its weights
.
Note: To learn why the randomly-initialized
weights
are a good choice, visit the “Weight initialization done right” lesson.
# A neural network implementation (almost the same as backpropagation.py, # except for a tiny refactoring in the back() function). import numpy as np # Applying Logistic Regression def sigmoid(z): return 1 / (1 + np.exp(-z)) # Applying Softmax Activation function def softmax(logits): exponentials = np.exp(logits) return exponentials / np.sum(exponentials, axis=1).reshape(-1, 1) # calculating gradient of logistic regression def sigmoid_gradient(sigmoid): return np.multiply(sigmoid, (1 - sigmoid)) # Computing Loss over using logistic regression def loss(Y, y_hat): return -np.sum(Y * np.log(y_hat)) / Y.shape[0] # Adding bias def prepend_bias(X): return np.insert(X, 0, 1, axis=1) # Basically doing prediction but named forward as its # performing Forward-Propagation def forward(X, w1, w2): h = sigmoid(np.matmul(prepend_bias(X), w1)) y_hat = softmax(np.matmul(prepend_bias(h), w2)) return (y_hat, h) # performing back-Propagation def back(X, Y, y_hat, w2, h): w2_gradient = np.matmul(prepend_bias(h).T, (y_hat - Y)) / X.shape[0] w1_gradient = np.matmul(prepend_bias(X).T, np.matmul(y_hat - Y, w2[1:].T) * sigmoid_gradient(h)) / X.shape[0] return (w1_gradient, w2_gradient) # Calling the predict() function def classify(X, w1, w2): y_hat, _ = forward(X, w1, w2) labels = np.argmax(y_hat, axis=1) return labels.reshape(-1, 1) # initiazing weights for all nodes def initialize_weights(n_input_variables, n_hidden_nodes, n_classes): w1_rows = n_input_variables + 1 w1 = np.random.randn(w1_rows, n_hidden_nodes) * np.sqrt(1 / w1_rows) w2_rows = n_hidden_nodes + 1 w2 = np.random.randn(w2_rows, n_classes) * np.sqrt(1 / w2_rows) return (w1, w2) # Printing results to the terminal screen def report(iteration, X_train, Y_train, X_test, Y_test, w1, w2): y_hat, _ = forward(X_train, w1, w2) training_loss = loss(Y_train, y_hat) classifications = classify(X_test, w1, w2) accuracy = np.average(classifications == Y_test) * 100.0 print("Iteration: %5d, Loss: %.8f, Accuracy: %.2f%%" % (iteration, training_loss, accuracy)) # training phase def train(X_train, Y_train, X_test, Y_test, n_hidden_nodes, iterations, lr): n_input_variables = X_train.shape[1] n_classes = Y_train.shape[1] w1, w2 = initialize_weights(n_input_variables, n_hidden_nodes, n_classes) for iteration in range(iterations): y_hat, h = forward(X_train, w1, w2) w1_gradient, w2_gradient = back(X_train, Y_train, y_hat, w2, h) w1 = w1 - (w1_gradient * lr) w2 = w2 - (w2_gradient * lr) report(iteration, X_train, Y_train, X_test, Y_test, w1, w2) return (w1, w2) import mnist w1, w2 = train(mnist.X_train, mnist.Y_train, mnist.X_test, mnist.Y_test, n_hidden_nodes=200, iterations=20, lr=0.01)
Final neural network for MNIST
To write the very last line here, we have to set values for the hyperparameters. The number of hidden nodes is easy: we already decided to have ...
Access this course and 1400+ top-rated courses and projects.