We already covered the most important deep learning concepts and created different implementations using vanilla Python. Now, we are in a position where we can start building something a bit more elaborate.
We'll use a more hands-on approach by building a deep learning model for classification using production-grade software.
You will learn how to make a classifier for hand-written digits using Keras, running on top of Tensorflow.
Great, let's get started!
Before we proceed, what's Keras?
Keras is a popular library for creating deep learning models.
It's a high-level API that runs on multiple backends that handle all the processing, like Tensorflow, Theano or Microsoft Cognitive Toolkit.
Since Tensorflow 2 was released, Keras became its official high-level API, so it now comes included by default with the Tensorflow installation.
Instead of writing instructions that might go outdated I'll just link to the official resources. You will need to install these:
Good, let's get started!
Import the libraries and get the data
We can get all the libraries we need with the following lines:
# Import TensorFlow and Keras import TensorFlow as tf from tensorflow import keras # Import support libraries for plotting import matplotlib.pyplot as plt
Now, let's get the data. We will use the MNIST set of handwritten digits, one of the most popular datasets for testing ML image classification models. The set includes 60.000 examples for training and 10.000 for testing:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
load_data method returns two tuples:
- (x_train, y_train) are the 60.000 examples we will use to train the network.
- (x_test, y_test) are the examples we will use to evaluate the performance of the network.
Explore the data
x_train.shape Output: (60000, 28, 28) y_train.shape Output: (60000,)
x_train is a collection of 60.000 28x28 matrices, where each entry in the matrix is a number from 0 to 255. Each 28x28 matrix represents a handwritten number, pixel by pixel. You can visualize them using PyPlot.
# Print the first element in x_train as an image plt.figure() plt.imshow(x_train) plt.colorbar() plt.grid(False) plt.show()
y_train is a collection of 60.000 labels, in this case representing the number its respective x_train element represents. If you recall previous articles, these are the expected values.
Let's verify the label for the first element is a five:
y_train Output: 5
Preprocessing the data
Not much is needed for this particular dataset. Neural networks work better if their inputs are between 0 and 1, so let's perform some normalization on the inputs:
x_train = x_train / 255.0 x_test = x_test / 255.0
These two lines did the trick, we can now start building our network.
Build the network
Now we will build and compile our neural network. In Keras, you only need to define the layers in your network and you are almost done. This is the 8-layer network we will use:
model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)), keras.layers.Dense(512, activation='relu'), keras.layers.Dense(216, activation='relu'), keras.layers.Dense(128, activation='relu'), keras.layers.Dense(64, activation='relu'), keras.layers.Dense(32, activation='relu'), keras.layers.Dense(16, activation='relu'), keras.layers.Dense(10, activation='softmax') ])
Keras has many different types of layers, our network is made of two main types: 1 Flatten layer and 7 Dense layers.
A Flatten layer is used to transform higher-dimension tensors into vectors. In our case, it transforms a 28x28 matrix into a vector with 728 entries (28x28=784).
Dense layers need you to specify the shape of the output (the first parameter) and the activation function. All our layers have relu activation except for the last layer. The output layer uses softmax because that's the right choice when building a multiclass classifier.
The next step is compiling the model:
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
These parameters are:
- loss: Is the optimization score function, the number the network will try to reduce during training.
- optimizer: Is the algorithm used to update the values of the weights in the network.
- metrics: Measurements we perform during training.
You can check the final shape of the network using the .summary() method:
model.summary() Output: Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten (Flatten) (None, 784) 0 _________________________________________________________________ dense (Dense) (None, 512) 401920 _________________________________________________________________ dense_1 (Dense) (None, 216) 110808 _________________________________________________________________ dense_2 (Dense) (None, 128) 27776 _________________________________________________________________ dense_3 (Dense) (None, 64) 8256 _________________________________________________________________ dense_4 (Dense) (None, 32) 2080 _________________________________________________________________ dense_5 (Dense) (None, 16) 528 _________________________________________________________________ dense_6 (Dense) (None, 10) 170 ================================================================= Total params: 551,538 Trainable params: 551,538 Non-trainable params: 0
Why did we use a network with this shape? Well, to be honest, because I felt like building one with 9 layers. You can try building a smaller network as an exercise and seeing how the performance changes:
model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)), keras.layers.Dense(216, activation='relu'), keras.layers.Dense(10, activation='softmax')
Train the network and evaluate its accuracy
We train the network using the fit() method:
history = model.fit(x_train, y_train, epochs=20, batch_size=15)
The first two arguments are the training examples and their labels, and the third argument is the number of training cycles we will run (in this case 20).
batch_size tells you the number of samples that will be propagated through the network before updating the weights. In our network we will update weights every time we process 15 inputs.
Fit returns a
history object with information about the trainig process. When you call the fit method you will see the following output:
Train on 60000 samples Epoch 1/20 60000/60000 [==============================] - 25s 420us/sample - loss: 0.2670 - accuracy: 0.9205 Epoch 2/20 60000/60000 [==============================] - 25s 412us/sample - loss: 0.1127 - accuracy: 0.9670 Epoch 3/20 60000/60000 [==============================] - 26s 440us/sample - loss: 0.0833 - accuracy: 0.9768 Epoch 4/20 60000/60000 [==============================] - 24s 400us/sample - loss: 0.0668 - accuracy: 0.9816 Epoch 5/20 60000/60000 [==============================] - 25s 418us/sample - loss: 0.0579 - accuracy: 0.9838 Epoch 6/20 60000/60000 [==============================] - 36s 606us/sample - loss: 0.0475 - accuracy: 0.9871 Epoch 7/20 60000/60000 [==============================] - 36s 598us/sample - loss: 0.0421 - accuracy: 0.9887 Epoch 8/20 60000/60000 [==============================] - 35s 580us/sample - loss: 0.0356 - accuracy: 0.9905 Epoch 9/20 60000/60000 [==============================] - 32s 534us/sample - loss: 0.0341 - accuracy: 0.9911 Epoch 10/20 60000/60000 [==============================] - 28s 461us/sample - loss: 0.0296 - accuracy: 0.9925 Epoch 11/20 60000/60000 [==============================] - 28s 463us/sample - loss: 0.0289 - accuracy: 0.9927 Epoch 12/20 60000/60000 [==============================] - 28s 459us/sample - loss: 0.0284 - accuracy: 0.9936 Epoch 13/20 60000/60000 [==============================] - 28s 467us/sample - loss: 0.0259 - accuracy: 0.9938 Epoch 14/20 60000/60000 [==============================] - 27s 455us/sample - loss: 0.0259 - accuracy: 0.9938 Epoch 15/20 60000/60000 [==============================] - 29s 486us/sample - loss: 0.0206 - accuracy: 0.9952 Epoch 16/20 60000/60000 [==============================] - 28s 471us/sample - loss: 0.0241 - accuracy: 0.9945 Epoch 17/20 60000/60000 [==============================] - 28s 469us/sample - loss: 0.0211 - accuracy: 0.9952 Epoch 18/20 60000/60000 [==============================] - 28s 461us/sample - loss: 0.0172 - accuracy: 0.9960 Epoch 19/20 60000/60000 [==============================] - 25s 424us/sample - loss: 0.0209 - accuracy: 0.9953 Epoch 20/20 60000/60000 [==============================] - 31s 510us/sample - loss: 0.0170 - accuracy: 0.9959
It tells you epoch by epoch how the values of loss and accuracy change. As expected, every epoch the results become a bit better. If you prefer, you can plot them using PyPlot.
# Plot training & validation accuracy values plt.plot(history.history['accuracy']) plt.title('Model accuracy') plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.legend(['Train'], loc='upper left') plt.show() # Plot training & validation loss values plt.plot(history.history['loss']) plt.title('Model loss') plt.ylabel('Loss') plt.xlabel('Epoch') plt.legend(['Train'], loc='upper left') plt.show()
So, we get a final accuracy of 99.61, but you might also remember that evaluating performance on the training set doesn't make much sense, so let's try the model on data it hasn't seen yet.
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0) print('Test accuracy:', test_acc) print('Test loss:', test_loss) Output: Test accuracy: 0.9822 Test loss: 0.14858887433931195
Nice! Accuracy of 98.61% with a loss of 0.01, not bad for our first try on this dataset!
There are many ways of improving these results, but for now, it's enough to be happy with our hard work.
As we get closer to real-world examples with neural networks, we get closer to the end of this series. In the following article, we will talk about regularization: a series of techniques we can use to fight overfitting in our DL models.
Thank you for reading!
What to do next
- Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
- Try different networks, both bigger and smaller, and see how the size of the network affects performance on both the training and test sets. You can also try training a different amount of epochs.
- You can find the source code for this series in this repo.
- Here is the official Keras documentation
- This article is based on Grokking Deep Learning and on Deep Learning (Goodfellow, Bengio, Courville). These and other very helpful books can be found in the recommended reading list.
- Send me an email with questions, comments or suggestions (it's in the About Me page)