Develop a Neural Network to Classify Handwritten Digits

In this tutorial, we will build a convolutional neural network that can classify handwritten digits (i.e. 0 through 9). We will train and test our neural networking using the MNIST data set, a large data set of handwritten digits that is often used as a first project for people who are getting started in deep learning for computer vision.

mnist-data-set
The MNIST data set. Image Source: Wikipedia

Real-World Applications

The program we develop in this project is the basis for a number of real-world applications, including: 

  • Convert handwritten notes into text
  • Road sign recognition
  • Check scanning
  • And more…

Prerequisites

  • You have TensorFlow 2 Installed.
  • Windows 10 Users, see this post.
  • If you want to use GPU support for your TensorFlow installation, you will need to follow these steps. If you have trouble following those steps, you can follow these steps (note that the steps change quite frequently, but the overall process remains relatively the same).
  • This post can also help you get your system setup, including your virtual environment in Anaconda (if you decide to go this route).

Directions

Open up a new Python program (in your favorite text editor or Python IDE), and write the following code. I’m going to name the program mnist_handwritten_digits.py

Here is the code. Don’t be afraid of how long the code is. All you need to do is copy the code and paste it in a file. I included a lot of comments. I’ll explain each piece of the code in detail in a second. 

# Project: Detect Handwritten Digits Using the MNIST Data Set
# Author: Addison Sears-Collins
# Date created: January 4, 2021

import tensorflow as tf # Machine learning library
from tensorflow import keras # Library for neural networks
from tensorflow.keras.datasets import mnist # MNIST data set
from tensorflow.keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import categorical_crossentropy
import numpy as np
from time import time
import matplotlib.pyplot as plt

num_classes = 10 # Number of distinct labels

def generate_neural_network(x_train):
  """
  Create a convolutional neural network.
  :x_train: Training images of handwritten digits (grayscale)
  :return: Neural network
  """
  model = keras.Sequential()

  # Add a convolutional layer to the neural network
  model.add(Conv2D(filters=6, kernel_size=(
        5, 5), activation='relu', padding='same', input_shape=x_train.shape[
        1:]))
  model.add(MaxPooling2D()) # subsampling using max pooling

  # Add a convolutional layer to the neural network 
  model.add(Conv2D(filters=16, kernel_size=(5, 5), activation='relu'))
  model.add(MaxPooling2D())

  # Add the dense layers
  model.add(Flatten())
  model.add(Dense(units=120, activation='relu'))
  model.add(Dense(units=84, activation='relu'))

  # Softmax transforms the prediction into a probability
  # The highest probability corresponds to the category of the image (i.e. 0-9)
  model.add(Dense(units=num_classes, activation='softmax'))

  return model

def tf_and_gpu_ok():
  """
  Test if TensorFlow and GPU support are working properly.
  """
  # Test code to see if Tensorflow is installed properly
  print("Tensorflow:", tf.__version__)
  print("Tensorflow Git:", tf.version.GIT_VERSION)	

  # See if you can use the GPU on your computer
  print("CUDA ON" if tf.test.is_built_with_cuda() else "CUDA OFF")
  print("GPU ON" if tf.test.is_gpu_available() else "GPU OFF")

def main():

  # Uncomment to check if everything is working properly
  #tf_and_gpu_ok()

  # Load the MNIST data set and make sure the training and testing
  # data are four dimensions (which is what Keras needs)
  (x_train, y_train), (x_test, y_test) = mnist.load_data()
  x_train = np.reshape(x_train, np.append(x_train.shape, (1)))
  x_test = np.reshape(x_test, np.append(x_test.shape, (1)))

  # Uncomment to display the dimensions of the training data set and the 
  # testing data set
  #print('x_train', x_train.shape, ' --- x_test', x_test.shape)
  #print('y_train', y_train.shape, ' --- y_test', y_test.shape)  

  # Normalize image intensity values to a range between 0 and 1 (from 0-255)
  x_train = x_train.astype('float32')
  x_test = x_test.astype('float32')
  x_train = x_train / 255
  x_test = x_test / 255

  # Uncomment to display the first 3 labels	
  #print('First 3 labels for train set:', y_train[0], y_train[1], y_train[2])
  #print('First 3 labels for testing set:', y_test[0], y_test[1], y_test[2])
  
  # Perform one-hot encoding
  y_train = keras.utils.to_categorical(y_train, num_classes)
  y_test = keras.utils.to_categorical(y_test, num_classes)

  # Create the neural network
  model = generate_neural_network(x_train)
	
  # Uncomment to print the summary statistics of the neural network
  #print(model.summary())  

  # Configure the neural network
  model.compile(loss=categorical_crossentropy, optimizer=Adam(), metrics=['accuracy'])
	
  start = time()
  history = model.fit(x_train, y_train, batch_size=16, epochs=5, validation_data=(x_test, y_test), shuffle=True)
  training_time = time() - start
  print(f'Training time: {training_time}')
	
  # A measure of how well the neural network learned the training data
  # The lower, the better
  print("Minimum Loss: ", min(history.history['loss']))

  # A measure of how well the neural network did on the validation data set
  # The lower, the better
  print("Minimum Validation Loss: ", min(history.history['val_loss']))

  # Maximum percentage of correct predictions on the training data
  # The higher, the better
  print("Maximum Accuracy: ", max(history.history['accuracy']))

  # Maximum percentage of correct predictions on the validation data
  # The higher, the better
  print("Maximum Validation Accuracy: ", max(history.history['val_accuracy']))
	
  # Plot the key statistics
  plt.plot(history.history['loss'])
  plt.plot(history.history['val_loss'])
  plt.plot(history.history['accuracy'])
  plt.plot(history.history['val_accuracy'])
  plt.title("Mean Squared Error for the Neural Network on the MNIST Data")  
  plt.ylabel("Mean Squared Error")
  plt.xlabel("Epoch")
  plt.legend(['Training Loss', 'Validation Loss', 'Training Accuracy', 
    'Validation Accuracy'], loc='center right') 
  plt.show() # Press Q to close the graph
	
  # Save the neural network in Hierarchical Data Format version 5 (HDF5) format
  model.save('mnist_nnet.h5')
   
  # Import the saved model
  model = keras.models.load_model('mnist_nnet.h5')
  print("\n\nNeural network has loaded successfully...\n")
    	
main()

Code Walkthrough

The tf_and_gpu_ok() method is used to check if:

  1. TensorFlow is installed properly.
  2. TensorFlow is able to use the GPU (Graphic Processing Unit) on your computer. The GPU on your computer will help speed up the training of your neural network.

If you see the output below when you run the mnist_handwritten_digits.py program…

python mnist_handwritten_digits.py
1-tensorflow-working
2-cuda-working
3-gpu-on

…the software is installed correctly.

The next piece of code in main displays the dimensions of x_train and y_train. You can see from the output that we train the neural network on 60,000 grayscale images that are 28×28 pixels in size and test the data set on 10,000 grayscale images that are 28×28 pixels in size. Each image contains a handwritten number from 0-9.

4-dimensions-of-training-data

At this stage, the intensity for all of the images in the sample ranges from 0-255 (i.e. grayscale). To improve the performance of the neural network, it is helpful to normalize the image intensities to values between 0 and 1. To do this, we need to take the intensity values for each image and divide by the largest value (i.e. 255). 

At this stage, we have 10 different labels: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. These labels represent the 10 different categories (i.e. values) that the handwritten digits could possibly be. To enable the neural network to make better predictions, we need to convert this categorical data into a vector format of just 0s and 1s using a technique known as one-hot encoding

For each of the 10 possible label values, you get a 10-element vector where only one element is hot (i.e. 1), and the rest of the elements are cold (i.e. 0). 

For example, here is the one-hot encoding for the label possibilities in this project:

0 -> 1 0 0 0 0 0 0 0 0 0

1 -> 0 1 0 0 0 0 0 0 0 0 

2 -> 0 0 1 0 0 0 0 0 0 0 

3 -> 0 0 0 1 0 0 0 0 0 0

4 -> 0 0 0 0 1 0 0 0 0 0

5 -> 0 0 0 0 0 1 0 0 0 0

6 -> 0 0 0 0 0 0 1 0 0 0 

7 -> 0 0 0 0 0 0 0 1 0 0 

8 -> 0 0 0 0 0 0 0 0 1 0 

9 -> 0 0 0 0 0 0 0 0 0 1

Now, we need to generate the neural network. I created a method named generate_neural_network to do just that. If there is something you don’t understand in the code, consult the reference for Keras.

If you want to understand how neural networks make predictions, check out this post.

Here is what you will see for the summary statistics:

5-neural-network-statisticsJPG

You can see that the neural network has 61,706 parameters. This number is considerably more than the 4,865 parameters used when we used a deep neural network to predict vehicle fuel economy because we are dealing with images rather than just numbers. As the saying goes, “A picture is worth 1,000 words.”

Now that we have constructed our neural network, we need to train it so that it can be able to make predictions (i.e. convert handwritten numbers into digits 0-9).

We need to feed our network the training samples that consist of the handwritten digits and the corresponding truth (i.e. the labels 0-9). Then we need to train the model and store the data into the history object. If you run the program and hit issues with the version of cuDNN, go to the Anaconda terminal, and type:

conda install -c anaconda cudnn 

When you run the program, you will see that the accuracy statistics are around 99%.

6-training-the-neural-networkJPG
You should see output on the terminal similar to this graphic when you train the neural network.
7-accuracy-statistics
8-neural-network-plotted-statistics

At the end of main, I saved our neural network in Hierarchical Data Format version 5 (HDF5) format. I then added some other code to show you how to load the model if you want to do that at a later date.

9-save-neural-network

That’s it! You now know how to build a neural network to accurately classify handwritten digits. 

Keep building!

Predict Vehicle Fuel Economy Using a Deep Neural Network

In this tutorial, we will use Tensorflow 2.0 with Keras to build a deep neural network that will enable us to predict a vehicle’s fuel economy (in miles per gallon) from eight different attributes: 

  1. Cylinders
  2. Displacement
  3. Horsepower
  4. Weight
  5. Acceleration 
  6. Model year 
  7. Origin
  8. Car name

We will use the Auto MPG Data Set at the UCI Machine Learning Repository.

Prerequisites

  • You have TensorFlow 2 Installed.
    • Windows 10 Users, see this post.
    • If you want to use GPU support for your TensorFlow installation, you will need to follow these steps. If you have trouble following those steps, you can follow these steps (note that the steps change quite frequently, but the overall process remains relatively the same).

Directions

Open up a new Python program (in your favorite text editor or Python IDE) and write the following code. I’m going to name the program vehicle_fuel_economy.py. I’ll explain the code later in this tutorial.

# Project: Predict Vehicle Fuel Economy Using a Deep Neural Network
# Author: Addison Sears-Collins
# Date created: November 3, 2020

import pandas as pd # Used for data analysis
import pathlib # An object-oriented interface to the filesystem
import matplotlib.pyplot as plt # Handles the creation of plots
import seaborn as sns # Data visualization library
import tensorflow as tf # Machine learning library
from tensorflow import keras # Library for neural networks
from tensorflow.keras import layers # Handles the layers of the neural network

def main():

  # Set the data path for the Auto-Mpg data set from the UCI Machine Learning Repository
  datasetPath = keras.utils.get_file("auto-mpg.data", "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")

  # Set the column names for the data set
  columnNames = ['MPG', 'Cylinders','Displacement','Horsepower','Weight',
               'Acceleration','Model Year','Origin']

  # Import the data set
  originalData = pd.read_csv(datasetPath, names=columnNames, na_values = "?", 
                           comment='\t', sep=" ", skipinitialspace=True)
						 
  # Check the data set
  # print("Original Data Set Excerpt")
  # print(originalData.head())
  # print()

  # Generate a copy of the data set
  data = originalData.copy()

  # Count how many NAs each data attribute has
  # print("Number of NAs in the data set")
  # print(data.isna().sum())
  # print()

  # Now, let's remove the NAs from the data set
  data = data.dropna()

  # Perform one-hot encoding on the Origin attribute 
  # since it is a categorical variable
  origin = data.pop('Origin') # Return item and drop from frame
  data['USA'] = (origin == 1) * 1.0
  data['Europe'] = (origin == 2) * 1.0
  data['Japan'] = (origin == 3) * 1.0

  # Generate a training data set (80% of the data) and a testing set (20% of the data)
  trainingData = data.sample(frac = 0.8, random_state = 0)

  # Generate a testing data set
  testingData = data.drop(trainingData.index)

  # Separate the attributes from the label in both the testing
  # and training data. The label is the thing we are trying
  # to predit (i.e. miles per gallon 'MPG')
  trainingLabelData = trainingData.pop('MPG')
  testingLabelData = testingData.pop('MPG')
  
  # Normalize the data
  normalizedTrainingData = normalize(trainingData)
  normalizedTestingData = normalize(testingData)
  #print(normalizedTrainingData.head()) 
  
  # Generate the neural network
  neuralNet = generateNeuralNetwork(trainingData)
  
  # See a summary of the neural network
  # The first layer has 640 parameters 
    #(9 input values * 64 neurons) + 64 bias values
  # The second layer has 4160 parameters 
    #(64 input values * 64 neurons) + 64 bias values
  # The output layer has 65 parameters 
    #(64 input values * 1 neuron) + 1 bias value
  #print(neuralNet.summary())
  
  EPOCHS = 1000
  
  # Train the model for a fixed number of epochs
  # history.history attribute is returned from the fit() function.
  # history.history is a record of training loss values and 
  # metrics values at successive epochs, as well as validation 
  # loss values and validation metrics values.
  history = neuralNet.fit(
    x = normalizedTrainingData, 
    y = trainingLabelData,
    epochs = EPOCHS, 
    validation_split = 0.2, 
    verbose = 0,
    callbacks = [PrintDot()]
  )   
  
  # Plot the neural network metrics (Training error and validation error)
  # Training error is the error when the trained neural network is 
  #   run on the training data.
  # Validation error is used to minimize overfitting. It indicates how
  #   well the data fits on data it hasn't been trained on.
  #plotNeuralNetMetrics(history)
  
  # Generate another neural network so that we can use early stopping
  neuralNet2 = generateNeuralNetwork(trainingData)
  
  # We want to stop training the model when the 
  # validation error stops improving.
  # monitor indicates the quantity we want to monitor.
  # patience indicates the number of epochs with no improvement after which
  # training will terminate.
  earlyStopping = keras.callbacks.EarlyStopping(monitor = 'val_loss', patience = 10)

  history2 = neuralNet2.fit(
    x = normalizedTrainingData, 
    y = trainingLabelData,
    epochs = EPOCHS, 
    validation_split = 0.2, 
    verbose = 0,
    callbacks = [earlyStopping, PrintDot()]
  )    

  # Plot metrics
  #plotNeuralNetMetrics(history2) 
  
  # Return the loss value and metrics values for the model in test mode
  # The mean absolute error for the predictions should 
  # stabilize around 2 miles per gallon  
  loss, meanAbsoluteError, meanSquaredError = neuralNet2.evaluate(
    x = normalizedTestingData,
	y = testingLabelData,
    verbose = 0
  )
  
  #print(f'\nMean Absolute Error on Test Data Set = {meanAbsoluteError} miles per gallon')
  
  # Make fuel economy predictions by deploying the trained neural network on the 
  # test data set (data that is brand new for the trained neural network).
  testingDataPredictions = neuralNet2.predict(normalizedTestingData).flatten()
  
  # Plot the predicted MPG vs. the true MPG
  # testingLabelData are the true MPG values
  # testingDataPredictions are the predicted MPG values
  #plotTestingDataPredictions(testingLabelData, testingDataPredictions)
  
  # Plot the prediction error distribution
  #plotPredictionError(testingLabelData, testingDataPredictions)
  
  # Save the neural network in Hierarchical Data Format version 5 (HDF5) format
  neuralNet2.save('fuel_economy_prediction_nnet.h5')
  
  # Import the saved model
  neuralNet3 = keras.models.load_model('fuel_economy_prediction_nnet.h5')
  print("\n\nNeural network has loaded successfully...\n")
  
  # Show neural network parameters
  print(neuralNet3.summary())
  
  # Make a prediction using the saved model we just imported
  print("\nMaking predictions...")
  testingDataPredictionsNN3 = neuralNet3.predict(normalizedTestingData).flatten()
  
  # Show Predicted MPG vs. Actual MPG
  plotTestingDataPredictions(testingLabelData, testingDataPredictionsNN3) 
  
# Generate the neural network
def generateNeuralNetwork(trainingData):
  # A Sequential model is a stack of layers where each layer is
  # single-input, single-output
  # This network below has 3 layers.
  neuralNet = keras.Sequential([
  
    # Each neuron in a layer recieves input from all the 
    # neurons in the previous layer (Densely connected)
    # Use the ReLU activation function. This function transforms the input
	# into a node (i.e. summed weighted input) into output	
    # The first layer needs to know the number of attributes (keys) in the data set.
	# This first and second layers have 64 nodes.
    layers.Dense(64, activation=tf.nn.relu, input_shape=[len(trainingData.keys())]),
    layers.Dense(64, activation=tf.nn.relu),
    layers.Dense(1) # This output layer is a single, continuous value (i.e. Miles per gallon)
  ])

  # Penalize the update of the neural network parameters that are causing
  # the cost function to have large oscillations by using a moving average
  # of the square of the gradients and dibiding the gradient by the root of this
  # average. Reduces the step size for large gradients and increases 
  # the step size for small gradients.
  # The input into this function is the learning rate.
  optimizer = keras.optimizers.RMSprop(0.001)
 
  # Set the configurations for the model to get it ready for training
  neuralNet.compile(loss = 'mean_squared_error',
                optimizer = optimizer,
                metrics = ['mean_absolute_error', 'mean_squared_error'])
  return neuralNet
    
# Normalize the data set using the mean and standard deviation 
def normalize(data):
  statistics = data.describe()
  statistics = statistics.transpose()
  return(data - statistics['mean']) / statistics['std']

# Plot metrics for the neural network  
def plotNeuralNetMetrics(history):
  neuralNetMetrics = pd.DataFrame(history.history)
  neuralNetMetrics['epoch'] = history.epoch
  
  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Abs Error [MPG]')
  plt.plot(neuralNetMetrics['epoch'], 
           neuralNetMetrics['mean_absolute_error'],
           label='Train Error')
  plt.plot(neuralNetMetrics['epoch'], 
           neuralNetMetrics['val_mean_absolute_error'],
           label = 'Val Error')
  plt.ylim([0,5])
  plt.legend()
  
  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('Mean Square Error [$MPG^2$]')
  plt.plot(neuralNetMetrics['epoch'], 
           neuralNetMetrics['mean_squared_error'],
           label='Train Error')
  plt.plot(neuralNetMetrics['epoch'], 
           neuralNetMetrics['val_mean_squared_error'],
           label = 'Val Error')
  plt.ylim([0,20])
  plt.legend()
  plt.show()
  
# Plot prediction error
def plotPredictionError(testingLabelData, testingDataPredictions):

  # Error = Predicted - Actual
  error = testingDataPredictions - testingLabelData
  
  plt.hist(error, bins = 50)
  plt.xlim([-10,10])
  plt.xlabel("Predicted MPG - Actual MPG")
  _ = plt.ylabel("Count")
  plt.show()

# Plot predictions vs. true values
def plotTestingDataPredictions(testingLabelData, testingDataPredictions):

  # Plot the data points (x, y)
  plt.scatter(testingLabelData, testingDataPredictions)
  
  # Label the axes
  plt.xlabel('True Values (Miles per gallon)')
  plt.ylabel('Predicted Values (Miles per gallon)')

  # Plot a line between (0,0) and (50,50) 
  point1 = [0, 0]
  point2 = [50, 50]
  xValues = [point1[0], point2[0]] 
  yValues = [point1[1], point2[1]]
  plt.plot(xValues, yValues) 
  
  # Set the x and y axes limits
  plt.xlim(0, 50)
  plt.ylim(0, 50)

  # x and y axes are equal in displayed dimensions
  plt.gca().set_aspect('equal', adjustable='box')
  
  # Show the plot
  plt.show()
  
  
# Show the training process by printing a period for each epoch that completes
class PrintDot(keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs):
    if epoch % 100 == 0: print('')
    print('.', end='')
	
main()

Save the Python program.

If you run your Python programs using Anaconda, open the Anaconda prompt.

If you like to run your programs in a virtual environment, activate the virtual environment. I have a virtual environment named tf_2.

conda activate tf_2

Navigate to the folder where you saved the Python program.

cd [path to folder]

For example,

cd C:\MyFiles

Install any libraries that you need. I didn’t have some of the libraries in the “import” section of my code installed, so I’ll install them now.

pip install pandas
pip install seaborn

To run the code, type:

python vehicle_fuel_economy.py

If you’re using a GPU with Tensorflow, and you’re getting error messages about libraries missing, go to this folder C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin, and you can search on the Internet for the missing dll files. Download them, and then put them in that bin folder.

Code Output

In this section, I will pull out snippets of the code and show you the resulting output when you uncomment those lines.

  # Check the data set
  print("Original Data Set Excerpt")
  print(originalData.head())
  print()
1_original_datasetJPG
  # Count how many NAs each data attribute has
  print("Number of NAs in the data set")
  print(data.isna().sum())
  print()
2-number-of-nasJPG
  # See a summary of the neural network
  # The first layer has 640 parameters 
    #(9 input values * 64 neurons) + 64 bias values
  # The second layer has 4160 parameters 
    #(64 input values * 64 neurons) + 64 bias values
  # The output layer has 65 parameters 
    #(64 input values * 1 neuron) + 1 bias value
  print(neuralNet.summary())

3-output-of-neural-net-summaryJPG
  # Plot the neural network metrics (Training error and validation error)
  # Training error is the error when the trained neural network is 
  #   run on the training data.
  # Validation error is used to minimize overfitting. It indicates how
  #   well the data fits on data it hasn't been trained on.
  plotNeuralNetMetrics(history)

4-mean-absolute-errorJPG
5-mean-squared-errorJPG
  # Plot metrics
  plotNeuralNetMetrics(history2) 
6-error-with-early-stoppingJPG
print(f'\nMean Absolute Error on Test Data Set = {meanAbsoluteError} miles per gallon') 
7-mae-test-data-setJPG
  # Plot the predicted MPG vs. the true MPG
  # testingLabelData are the true MPG values
  # testingDataPredictions are the predicted MPG values
  plotTestingDataPredictions(testingLabelData, testingDataPredictions)
8-predicted-vs-trueJPG
  # Plot the prediction error distribution
  plotPredictionError(testingLabelData, testingDataPredictions)
9-prediction-error-frequencyJPG
  # Save the neural network in Hierarchical Data Format version 5 (HDF5) format
  neuralNet2.save('fuel_economy_prediction_nnet.h5')
  
  # Import the saved model
  neuralNet3 = keras.models.load_model('fuel_economy_prediction_nnet.h5')
10-loading-and-saving-a-neural-networkJPG

References

Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.

What is Deep Learning?

In previous posts, I’ve talked a lot about deep learning.

However, I have never actually explained, in a concise way, what deep learning is, so here we go.

Deep learning is a technique for teaching a computer how to make predictions based on a set of inputs.

Input Data —–> Deep Learning Algorithm (i.e. Process) —–> Output Data

To make predictions (i.e. the “Process” part of the line above), deep learning uses deep neural networks. A deep neural network is a computer-based, simplified representation of neurons in the brain. It is computer science’s attempt to get a computer to process information just like real neurons in our brains do.

neural-network

Deep neural networks are well suited for complex applications like computer vision, natural language processing, and machine translation where you want to draw useful information from nonlinear and unstructured data such as images, audio, or text.