Recognize digits with a very simple Convolutional Neural Network (CNN)

Apr 22, 2024

In my previous blog post, I created a simple Neural Network that can recognize the digit in a given image. In this blog post, let’s try solving the same problem using a Convolutional Neural Network (CNN), a Neural Network that’s particularly powerful for image recognition, and see of ourselves if a CNN has better accuracy.

Before we go ahead, we’ve to understand the concept of Convolution. A Convolution is a simple mathematical operation that replaces the value of a pixel with an average weighted sum of that pixel’s value and that of that surrounding pixels. A Convolution is typically performed every pixel of an image. When an appropriate Convolution matrix (also called an image mask or a Convolution filter or a Convolution kernel) is applied on an image, the resultant image can have important features highlighted (features like edges, corners or other important features that may not be obvious to us) or can be more sharper, blurred etc.. These convolved images help the dense layers in a Neural network to more accurately classify the image. In layman terms, a Convolution operation is just like an Instagram or Tiktok filter.

This Youtube video illustrates the Convolution operation https://www.youtube.com/watch?v=KuXjwB4LzSA&t=512s Wikipedia’s article on Convolution has a very useful illustrative list of various resultant images that were obtained after applying different Convolution matrices https://en.wikipedia.org/wiki/Kernel_(image_processing)#Details

Here’s the simple Neural Network I created in my previous blog post with 4 Convolution filters added.

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np

# Load the MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Reshape the train and test images from shape (28, 28) to shape (28. 28, 1)
# as Keras' CNN layer expects an image to have an extra channel dimension
train_images = train_images.reshape(-1, 28, 28, 1)
test_images = test_images.reshape(-1, 28, 28, 1)

cnn_model = models.Sequential([
    layers.Input(shape=(28, 28, 1)),
    layers.Conv2D(4, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

cnn_model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

cnn_model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_split=0.2)

In Keras, the Conv2D layer creates one or more Convolution filters. The first argument 4 tells Keras the number of Convolution filters to create and the second argument tells Keras to create Convolution filters of shape 3 x 3.

The super simple RELU activation function (f(x) = max(x, 0)) introduces non linearity (the softmax function being the only other source of non linearity) into the Neural Network. Non linearity improves accuracy as the relationships between pixel’s tends to be non linear in nature.

I also changed the optimizer from sgd to adam so the model reaches its maximum accuracy in fewer epochs.

Because of the addition of Convolution filters and RELU, the model’s accuracy improved to a whopping 97.8%

Epoch 1/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.7966 - loss: 6.0928 - val_accuracy: 0.9292 - val_loss: 0.4564
Epoch 2/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.9431 - loss: 0.2875 - val_accuracy: 0.9485 - val_loss: 0.2563
Epoch 3/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.9657 - loss: 0.1240 - val_accuracy: 0.9545 - val_loss: 0.2251
Epoch 4/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.9755 - loss: 0.0836 - val_accuracy: 0.9544 - val_loss: 0.1985
Epoch 5/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.9782 - loss: 0.0667 - val_accuracy: 0.9539 - val_loss: 0.2162

As before, I’ve published this model in a Kaggle notebook https://www.kaggle.com/code/nisanth074/simple-digit-recognizer-cnn/edit Copy and edit the Kaggle notebook to experiment

If you’d like to understand CNNs further, I highly recommend reading the Dive into Deep Learning book’s chapters on CNNs, https://d2l.ai/chapter_convolutional-neural-networks/index.html and https://d2l.ai/chapter_convolutional-modern/index.html

In my next post, I’ll explore solving captchas with a Neural Network. Stay tuned!