Recognize digits with a very simple Neural Network

Apr 21, 2024

Inspired by the sheer usefulness and potential of generative AI models (especially ChatGPT and DALL-E), I began to more seriously explore the Machine Learning / Deep Learning landscape last year after moving to a less hectic job. It very soon dawned upon me that Machine Learning is very different from traditional Software Engineering and is far closer to Statistics. To save time and to develop a structured understanding of Machine Learning, I enrolled in a Machine Learning course which gave me a better understanding of many Machine Learning models (Linear Regression, Support Vector Machines, Decision Trees etc.) including Neural Networks.

To improve my understanding of Neural Networks even further, I decided to create a very simple Neural Network to solve a simple problem, recognizing letters in an image. I chose this problem as I felt

That a very simple Neural Network with only 1 dense layer can likely solve it
That I can later possibly modify that Neural Network to recognize captchas, a more practical and harder problem that’s impossible to solve in traditional Software Engineering

Here’s the simple Neural Network I arrived at. If you’re struggling to understand the script, worry not. I’ll elaborate below.

import tensorflow as tf
from tensorflow.keras import layers, models

# Load the MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Create the neural network
simple_model = models.Sequential([
    layers.Input(shape=(28, 28)),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

# Configure training
simple_model.compile(
	loss='sparse_categorical_crossentropy',
  optimizer='sgd',
  metrics=['accuracy']
)
# Train the neural network
simple_model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_split=0.2)

The first two lines import Keras. Keras is a Python package for creating Neural Networks. Keras was originally an independent project but was later merged into Tensorflow.

import tensorflow as tf
from tensorflow.keras import layers, models

The next lines load the popular MNIST (Modified National Institute of Standards and Technology) dataset which is a popular large image dataset of handwritten digits.

mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Neural Networks often (if not always) require a large number of training samples to achieve acceptable accuracy. While I had the option of generating a training dataset myself (by modifying my captcha generator script to generate one letter captchas), I decided against this approach to save time. Using the MNIST dataset also gave me a chance to compare my neural network model’s accuracy against more complex models available on Kaggle and understand why the latter are more accurate.

The lines after that create a simple neural network that has an input layer which accepts training samples of shape 28 x 28 (as every image in the MNIST dataset is 28 pixels by 28 pixels).

simple_model = models.Sequential([
    layers.Input(shape=(28, 28)),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

The Flatten layer flattens the input into a 28 X 28 = 784 one dimensional array. This is necessary as the next layer, a dense layer, expects a one dimensional input. As there are a total of 10 digits (0 to 9) and as we’d like the model to predict the digit in a given image, the output layer has 10 neurons. Every output neuron gives us the probability that the image contains that respective digit.

The Softmax function is a simple (and popular) function that converts a list of integers (i.e. the values of output neurons) to probabilities. Untitled The next lines configure training

simple_model.compile(
	loss='sparse_categorical_crossentropy',
  optimizer='sgd',
  metrics=['accuracy']
)

Since the objective here to predict the digit in given image, I use the simple and popular Categorical Cross-entropy function to calculate loss (The less the loss function’s value becomes, the more accurate the neural network becomes) Untitled I configure Keras to use Stochastic Gradient Descent (sgd) to adjust neuron connection weights and neuron biases (Most often though, you’ll see the slightly more complicated to understand but efficient adam optimizer in use). Explaining Gradient Descent is a bit beyond the scope of article (though you can always ask ChatGPT to explain it for you :)). In brief, Gradient Descent calculates the partial derivatives of the loss function against the model’s parameters (i.e. neuron connection weights and neuron biases) so the model’s parameters can be increased/decreased by an appropriate amount to minimize the loss function. For example, if the value of the partial derivative of the loss function against a parameter is 7, decreasing the parameter’s value by a very small value (typically called the learning rate), say 0.001, will decrease the loss function by 0.001 * 7 = 0.007. Keras’ compile method defaults to a learning rate of 0.001.

The metrics=['accuracy'] option causes Keras to continuously calculate accuracy during training.

The next line trains the model for 5 epochs

simple_model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_split=0.2)

In every epoch, the models trains on the entire training dataset. The validation_split option causes Keras to set aside a fraction (0.2 in this case) of training dataset and calculate accuracy on this validation dataset. This prevents the model from overfitting (Overfitting is the scenario where the model performs well against training images but performs poorly against images not from the training dataset). The batch_size=64 options causes Keras to adjust the model’s parameters once for every 64 images during training. On my machine, even this very simple model achieved a reasonable accuracy of 87% in just 5 epochs

Epoch 1/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7770 - loss: 456.6692 - val_accuracy: 0.8344 - val_loss: 198.7621
Epoch 2/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8630 - loss: 172.9334 - val_accuracy: 0.8457 - val_loss: 213.5807
Epoch 3/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8710 - loss: 176.8432 - val_accuracy: 0.8860 - val_loss: 147.9480
Epoch 4/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8687 - loss: 167.9312 - val_accuracy: 0.8749 - val_loss: 161.2222
Epoch 5/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8718 - loss: 160.2616 - val_accuracy: 0.9109 - val_loss: 115.5114

As can be observed above, in every epoch, the model trained on 750 batches. Since the MNIST dataset has 60000 training samples and given we had set aside 20% of images as validation dataset, there are 60000 * 80% = 48000 training images and given our batch size is 64, there are a total of 48000 * 64 = 750 batches in every epoch.

If you’d like to experiment with the model yourself, you can copy and run my Kaggle notebook https://www.kaggle.com/code/nisanth074/very-simple-digit-recognizer-neural-network

I hope this post helped you improve your understanding of Neural Networks (Writing this post definitely forced me to improve mine). If you’ve queries on any of libraries or mathematical functions I mention above, pose those to ChatGPT. ChatGPT typically provides fantastic elaborate answers for beginner AI/ML queries. I also highly recommend reading the Dive Into Deep Learning book’s chapter on using Neural Networks for Classification

In future posts, I’ll explore recognizing digits with a simple Convolutional Neural Network (CNN) and then explore creating a neural network for solving captchas. Stay tuned!