A Quick and Simple Dive into Deep Learning with Keras

A quick and simple dive into Deep Learning with Keras

Deep learning is one of the domains of programming that have seen a huge boost in popularity in recent years. It’s becoming increasingly hard to find a field where this new buzzword is not applied for one purpose or another.

Voice and image recognition. Natural language processing. Big data. All these fields have found major breakthroughs with the help of deep learning. Something that was a fantasy only a few years ago like self-driving cars is now almost a reality owing to the power of deep learning.

So we thought now is the ideal time for a developer to break into this field even if you’re only experimenting. Keras, one of the most powerful deep learning libraries in Python, makes it easier for anyone to take advantage of this technology without worrying about complex underlying theories.

In this tutorial, we are going to give you a brief introduction to deep learning and then, without focusing too long on the theoretical side, dive right into building a DL model using Keras.

As you would discover by the end of this post, Keras makes it super easy even for beginners. But if you have some idea about simple machine learning concepts, certain decisions we take in this tutorial will make better sense to you.

What Is Deep Learning?

Deep learning is a subset of machine learning. It mimics the structure of the human brain and its neural network to give machines the ability to derive outputs from raw data.

One of the main downsides of traditional machine learning is feature extraction. Since traditional ML models can’t process raw data itself, we have to extract significant features in data before passing them to the model.

Programmers need to have a deep understanding of the problem domain to derive which features should be extracted and how to extract them. Not to mention, the human intervention in this process opens up the possibility for not capturing important high-level features in raw data.

But deep learning eliminates the need for feature extraction. Neural networks in deep learning models can learn to identify abstract and implicit patterns in raw data and map their impact on a certain output on their own. In other words, DL combines feature extraction and classification and carries them out in a single model.

This gives deep learning an edge over machine learning in understanding hidden patterns and features in raw data. Therefore, they provide improved results over ML models, especially, as dataset sizes increase.

The Architecture of a Typical Neural Network

Neural Network architecture example, source: oreilly.com

Neural Network architecture example, source: oreilly.com

A typical neural network in a DL model is made of several layers. Each node contains a set of nodes that carry a numeric value (e.g. 0.3, 2.45).

Connections between nodes in two adjacent layers are defined by weights. It defines the weight (e.g. 1.2, 5) preceding node’s value have on deciding the following node’s value. In other words, weight decides the impact one node has in determining the value of another node. Values of weights are calculated during the training of the neural network.

The first layer of a neural network is called the input layer. The number of layers in the input layer depends on the size of the input vector. If the input vector has a size of 12, the input layer should also have a size of 12.

The final layer of the neural network is the output layer. When the model performs a classification task, the output layer should contain a node per each possible classification result.

For example, if a DL model is used to identify the number displayed in an input image (from 0-9), it should have 10 nodes in the output layer. When the model makes predictions for a given input image, each output node gives the probability of the number in the image being the number represented by the node. The number with the highest probability is then considered as the final prediction of the model.

All the other layers in between input and output layers are called hidden layers of the neural network. These layers and weights connecting their nodes carry out the mathematical operations to output the final predictions of the model. A neural network can have more than one hidden layer depending on the task it’s expected to complete.

What Are Convolutional Neural Networks (CNNs)?

Since we are going to build a simple CNN using Keras in this tutorial, let’s try to understand how CNNs are different from regular neural networks before moving on.

A CNN is a neural network that takes images as inputs. In other words, when we adjust the properties of a neural network, specifically, to work well with image inputs and their inherent qualities, we call it a CNN.

Therefore, compared to a regular neural network, a CNN architecture includes a few specific types of layers. The convolution layer, pooling layer, and fully-connected layer are few such examples.

Convolution Layer

The convolution layer applies a filter to summarize the features within a small area of the image using the mathematical operation named convolution . For example, we can define a convolution layer with a kernel (think of it as a window) of size 3x3. It passes through the image vector and applies convolution to the 9 elements inside the kernel at a given time. And this layer then modifies input values in the image vector according to convolution results.

Check this giphy to see how the convolution layer acts on an image. https://giphy.com/gifs/blog-daniel-keypoints-i4NjAwytgIRDW

Pooling Layer

The pooling layer is used to reduce the size of the input by removing its redundant data. It increases the model’s flexibility by making it difficult to map a specific feature in the input image to an exact location.

Pooling divides the image into a set of non-overlapping areas and pools their values into one value by a simple operation like finding the max, min, or average of values. Max pooling is the most common type of pooling technique used.

Max Pooling example

Max Pooling example


A Flattening layer simply flattens a multi-dimensional vector into a long single-dimensional vector. If you pass a 13x13 vector to the flattening layer, it outputs a long vector of size 169.

Fully Connected Layer

The fully connected layer combines the individual features different nodes in the previous layer have identified to paint a bigger picture about the input. Therefore, each node in the fully connected should be connected to every node in its preceding layer.

What Is Keras?

In this tutorial, we are going to build a deep learning model using Keras. Keras is a deep learning library written in Python. Keras abstracts the complex logic behind deep learning algorithms and simplifies building new models even for beginners. Keras allows the use of several backends including Tensorflow and Theano during implementation.

Install & setup Keras

You should have Python (version 3.6-3.8) and pip installed on your system before installing Keras. Then, simply run the following command to start the installation.

pip install keras

Keras is bundled with the Tensorflow 2 distribution where you can import it as tensorflow.keras. In this implementation, the backend used is by default Tensorflow.

pip install tensorflow

To set up the project we simply import Numpy and relevant modules from Keras.

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Convolution2D, MaxPooling2D
from tensorflow.keras.utils import to_categorical

Load the Dataset

In this tutorial, we are using the MNIST dataset of handwritten digits. It contains 28x28 grayscale images of handwritten digits from 0-9. Its test dataset contains 60000 images while test set contains 10000 images.

The CNN we build is intended to take an image as an input and predict the digit it displays from the 10 possible outputs.

Since the MNIST dataset is bundled with Keras distribution, we can directly load it without any extra effort.

from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

The load_data function loads training and test datasets. Each dataset contains a set of images (X_train, X_test) and a set of labels for the digits they display (y_train, y_test).

If we check the shape of image data:

print(X_train.shape)  #(60000, 28, 28)
print(X_test.shape) #(10000, 28, 28)

This output confirms that we have 60000 training images and 10000 test images of size 28x28.

We can also plot an image to get a better idea of the dataset.

from matplotlib import pyplot as plt
Sample digit from the dataset

Sample digit from the dataset

Preprocess Image Data

Before feeding the images to train and test our deep learning model, we need to normalize and reshape the images.

Normalizing image pixel values to fall in between 0 and 1 makes training the model easier and faster. To carry out the normalization without a loss of data, first, we should convert their type to 32-bit float.

X_train = X_train.astype("float32") / 255
X_test = X_test.astype("float32") / 255

The neural network we build with Keras is going to need 3D images as inputs. As these MNIST images are in greyscale, we have to specifically add a third dimension of depth 1 using the expand_dims method.

X_train = np.expand_dims(X_train, axis=3)
X_test = np.expand_dims(X_test, axis=3)

If we check the new shape of the image dataset after this reshaping step:

print(X_train.shape) #(60000, 28, 28, 1)
print(X_test.shape) #(10000, 28, 28, 1)

We can see how the images now have a new third dimension.

Preprocess Image Labels

If we check the shape of image label variables:

print(y_train.shape) #(60000,)
print(y_test.shape) #(10000,)

You can see that both test and training labels data are stored as one-dimensional arrays.

If we check a value stored in one of the arrays to get a better idea:

print(y_train[1]) #0

As you can see, these label arrays have directly stored the digits in the images. But our neural network has to use 10 nodes in the output layer to identify each digit. Therefore, we have to encode these label data to represent each digit using ten classes. For example, 5 should be encoded into [0, 0, 0, 0, 0, 1, 0, 0, 0, 0].

Keras provides a utility method to easily carry out this task.

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Now, the shape of the image label datasets are:

print(y_train.shape) #(60000, 10)
print(y_test.shape) #(10000, 10)

It’s Time to Create the Deep Learning Model

The model we are building is going to consist of 7 layers including input and output layers. In reality, deciding the number and types of layers to add to the network depends on a lot of experimenting, experience, and a good dose of math.

We are not going to delve too deep into the theories or spend time experimenting in this tutorial. Instead, we are going to adopt an architecture that’s commonly used when building CNNs.

However, you have the complete freedom to tweak this architecture and experiment with different layers to understand how they impact the final results at the end.

When building a model with Keras, we have to use either the Sequential class or Model class as its basic foundation. The Sequential class, which we are going to use, allows building a linear stack of layers. Let’s get started with creating an instance of this class.

model = Sequential()

The first layer we add to the model is going to act as the input layer. And the input layer we add is also a 2D convolution layer.

model.add(Convolution2D(32, kernel_size=(3,3), activation="relu", input_shape=(28, 28, 1)))

Here, we are creating a convolution layer that uses 32 3x3 kernels to extract features from the inputs. It uses ReLU as the activation function.

The next layer in our neural network is a pooling layer with pool size of 2x2.

model.add(MaxPooling2D(pool_size=(2, 2)))

Then, we add another convolution layer and a pooling layer to our model. The additional convolution layer can train our model to identify high-level features in the image while the additional pooling layer improves the model’s flexibility.

model.add(Convolution2D(64, kernel_size=(3,3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

The next step is flattening the input vector using a flattening layer.


We need to add a dropout layer to our model to prevent it from overfitting the training set. The dropout layer deliberately ignores or “drops out” outputs from previous nodes to escape overfitting. Our dropout layer accepts a dropout rate of 0.5.


Finally, we add a fully connected layer to the neural network as the output layer. It uses the softmax activation function to determine the output value.

model.add(Dense(10, activation='softmax'))

That’s it. Our model architecture is now complete. Below, you can see the complete architecture of the model in one place.

model = Sequential()

model.add(Convolution2D(32, kernel_size=(3,3), activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, kernel_size=(3,3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))


model.add(Dense(10, activation='softmax'))

We can view a summary of the model using the following function.

Model: "sequential_4"
Layer (type)                 Output Shape              Param #
conv2d_7 (Conv2D)            (None, 26, 26, 32)        320
max_pooling2d_6 (MaxPooling2 (None, 13, 13, 32)        0
conv2d_8 (Conv2D)            (None, 11, 11, 64)        18496
max_pooling2d_7 (MaxPooling2 (None, 5, 5, 64)          0
flatten_4 (Flatten)          (None, 1600)              0
dropout_4 (Dropout)          (None, 1600)              0
dense_5 (Dense)              (None, 10)                16010
Total params: 34,826
Trainable params: 34,826
Non-trainable params: 0

Compile and Train the Model

Before training the model, we should compile it by passing an optimization function (adam, SGD, etc.), loss function, and an evaluation metric.

model.compile('adam', loss='categorical_crossentropy', metrics=['accuracy'])

Now, we can train the model using Keras fit method. It’ll be trained in 10 epochs with a batch size of 128. We also use 10% of training data to validate the model.

model.fit(X_train, y_train, batch_size=128, epochs=10, validation_split=0.1)
Epoch 1/10
422/422 [==============================] - 21s 49ms/step - loss: 0.3711 - accuracy: 0.8857 - val_loss: 0.0863 - val_accuracy: 0.9770
Epoch 2/10
422/422 [==============================] - 22s 53ms/step - loss: 0.1151 - accuracy: 0.9651 - val_loss: 0.0564 - val_accuracy: 0.9840
Epoch 3/10
422/422 [==============================] - 24s 57ms/step - loss: 0.0852 - accuracy: 0.9740 - val_loss: 0.0487 - val_accuracy: 0.9873
Epoch 4/10
422/422 [==============================] - 23s 53ms/step - loss: 0.0735 - accuracy: 0.9779 - val_loss: 0.0428 - val_accuracy: 0.9893
Epoch 5/10
422/422 [==============================] - 23s 54ms/step - loss: 0.0642 - accuracy: 0.9799 - val_loss: 0.0418 - val_accuracy: 0.9885
Epoch 6/10
422/422 [==============================] - 23s 54ms/step - loss: 0.0577 - accuracy: 0.9823 - val_loss: 0.0374 - val_accuracy: 0.9898
Epoch 7/10
422/422 [==============================] - 26s 61ms/step - loss: 0.0538 - accuracy: 0.9831 - val_loss: 0.0354 - val_accuracy: 0.9907
Epoch 8/10
422/422 [==============================] - 23s 54ms/step - loss: 0.0485 - accuracy: 0.9852 - val_loss: 0.0375 - val_accuracy: 0.9897
Epoch 9/10
422/422 [==============================] - 22s 53ms/step - loss: 0.0468 - accuracy: 0.9852 - val_loss: 0.0344 - val_accuracy: 0.9908
Epoch 10/10
422/422 [==============================] - 25s 60ms/step - loss: 0.0434 - accuracy: 0.9865 - val_loss: 0.0304 - val_accuracy: 0.9912

That was incredibly easy. Now we have a trained deep learning model, which has shown over 99% accuracy on validation data, to identify images of handwritten digits.

Evaluate the Model

To confirm the accuracy of our model, we can evaluate it using our test dataset.

score = model.evaluate(X_test, y_test, verbose=0)

Now, if we print the evaluation results:

print("accuracy", score[1]) #accuracy 0.9894999861717224
print("loss", score[0]) #loss 0.028314810246229172

Our model has achieved close to 99% accuracy on the test dataset. Isn’t that amazing?

Make Predictions Using the Trained Model

Finally, we can use this newly trained deep learning model to make predictions. Let’s get the model’s predictions for the first 20 images in the test dataset and view the results against the original image label.

predictions = model.predict(X_test[:20])
print("predictions:", np.argmax(predictions, axis=1))
print("labels     :", np.argmax(y_test[:20], axis=1))
predictions: [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]
labels     : [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]

Our model has gotten all of them correct!


Today, we introduced you to a programming domain that has become quite popular in the developer community in recent years. Even though it’s a tough area with a lot of mathematical theories, Keras makes working with neural networks incredibly easy even for complete beginners. So I hope this post excited you to continue experimenting with deep learning to build fun and useful AI models.

Thank you for reading!

If you liked what you saw, please support my work!

Anjalee Sudasinghe - Author @ Live Code Stream

Anjalee Sudasinghe

I’m a software engineering student who loves web development. I also have a habit of automating everyday stuff with code. I’ve discovered I love writing about programming as much as actual programming.

I’m extremely lucky to join the Live Code Stream community as a writer and share my love for programming with others one article at a time.