In this notebook, we will walk through how to use the
keras
R package for a toy example in deep learning with the
handwritten digits image dataset (MNIST). The notebook’s purpose is to
illustrate how to build Convolutional Neural Network using R. Please
check the keras
R package website for the most recent
development: https://keras.rstudio.com/.
We can install keras
through CRAN by calling
install.packages("keras")
. As it is still at a fast
development stage, we can also get it directly from github for the most
recent version that might not be in CRAN yet using
devtools::install_github('rstudio/keras')
. The following
cell may take a few minutes to finish installing all dependencies.
devtools::install_github("rstudio/keras")
As keras is just an interface to popular deep learning frameworks, we
have to install the deep learning backend. The default and recommended
backend is TensorFlow. By calling install_keras()
, it will
install all the needed dependencies for TensorFlow.
library(keras)
# if you haven't, install miniconda
# reticulate::install_miniconda()
# install_keras()
You can run this notebook in the Databrick community edition with R as the interface. For an audience with a statistical background, using a well-managed cloud environment has the following benefit:
You can also run this notebook on your local machine with R and the
required Python packages (keras
uses the Python TensorFlow
backend engine). Different versions of Python may cause some errors when
running install_keras()
. Here are the things you could do
when you encounter the Python backend issue in your local machine:
reticulate::py_config()
to check the current Python
configuration to see if anything needs to be changed.install_keras()
uses virtual environment
~/.virtualenvs/r-reticulate
. If you don’t know how to set
the right environment, try to set the installation method as conda
(install_keras(method = "conda")
)keras
and the TensorFlow backend.We will use the widely used MNIST handwritten digit image dataset. More information about the dataset and benchmark results from various machine learning methods can be found at http://yann.lecun.com/exdb/mnist/ and https://en.wikipedia.org/wiki/MNIST_database.
This dataset is already included in the keras/TensorFlow installation and we can simply load the dataset as described in the following cell. It takes less than a minute to load the dataset.
mnist <- dataset_mnist()
## Loaded Tensorflow version 2.8.0
The data structure of the MNIST dataset is straight forward and well prepared for R, which has two pieces:
training set: x (i.e. features): 60000x28x28 tensor which corresponds to 60000 28x28 pixel greyscale images (i.e. all the values are integers between 0 and 255 in each 28x28 matrix), and y (i.e. responses): a length 60000 vector which contains the corresponding digits with integer values between 0 and 9.
testing set: same as the training set, but with only 10000 images
and responses. The detailed structure for the dataset can be seen with
str(mnist)
below.
str(mnist)
## List of 2
## $ train:List of 2
## ..$ x: int [1:60000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ y: int [1:60000(1d)] 5 0 4 1 9 2 1 3 1 4 ...
## $ test :List of 2
## ..$ x: int [1:10000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ y: int [1:10000(1d)] 7 2 1 0 4 1 4 9 5 9 ...
CNN leverages the relationship among neighbor pixels in the 2D image for better performance. It also avoids generating thousands or millions of features for high resolution images with full color. CNN requires different preprocessing steps. Let’s start with a few parameters to be used later.
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
# Define a few parameters to be used in the CNN model
batch_size <- 128
num_classes <- 10
epochs <- 10
# Input image dimensions
img_rows <- 28
img_cols <- 28
For CNN, the input is a \(n_H \times n_W
\times n_C\) 3D array with \(n_C\) channels. For example, a greyscale
\(n_H \times n_W\) image has only one
channel, and the input is \(n_H \times n_W
\times 1\) tensor. A \(n_H \times
n_W\) 8-bit per channel RGB image has three channels with 3 \(n_H \times n_W\) array with values between
0 and 255, so the input is \(n_H \times n_W
\times 3\) tensor. For the problem that we have now, the image is
greyscaled, but we need to specifically define there is one channel by
reshaping the 2D array into 3D tensor using
array_reshape()
. The input_shape
variable will
be used in the CNN model later. For an RGB color image, the number of
channels is 3 and we need to replace “1” with “3” for the code cell
below if the input image is RGB format.
x_train <- array_reshape(x_train, c(nrow(x_train), img_rows, img_cols, 1))
x_test <- array_reshape(x_test, c(nrow(x_test), img_rows, img_cols, 1))
input_shape <- c(img_rows, img_cols, 1)
Here is the structure of the reshaped image, the first dimension is the image index, the 2-4 dimension is a 3D tensor even though there is only one channel.
str(x_train)
## int [1:60000, 1:28, 1:28, 1] 0 0 0 0 0 0 0 0 0 0 ...
Scale the input values to be between 0 and 1 for the same numerical stability consideration in the optimization process.
x_train <- x_train / 255
x_test <- x_test / 255
Encode the response variable to binary vectors.
# Convert class vectors to binary class matrices
y_train <- to_categorical(y_train, num_classes)
y_test <- to_categorical(y_test, num_classes)
CNN model contains a series of 3D convolutional layers which contains a few parameters:
the kernal_size which is typically 3x3 or 5x5;
the number of filters, which is equal to the number of channels of the output;
activation function.
For the first layer, there is also an input_shape parameter which is the input image size and channel. To prevent overfitting and speed up computation, a pooling layer is usually applied after one or a few convolutional layers. A maximum pooling layer with pool_size = 2x2 reduces the size to half. Dropout can be used as well in addition to pooling neighbor values. After a few 3D convolutional layers, we also need to ‘flatten’ the 3D tensor output into 1D tensor, and then add one or a couple of dense layers to connect the output to the target response classes.
Let’s define the CNN model structure. Now we define a CNN model with two convolutional layers, two max-pooling layers, and two dropout layers to mediate overfitting. There are three dense layers after flattening the 3D tensor. The last layer is a dense layer that connects to the response.
# define model structure
cnn_model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(5,5), activation = 'relu', input_shape = input_shape) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(5,5), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.2) %>%
layer_flatten() %>%
layer_dense(units = 120, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 84, activation = 'relu') %>%
layer_dense(units = num_classes, activation = 'softmax')
summary(cnn_model)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_1 (Conv2D) (None, 24, 24, 32) 832
## max_pooling2d_1 (MaxPooling2D) (None, 12, 12, 32) 0
## conv2d (Conv2D) (None, 8, 8, 64) 51264
## max_pooling2d (MaxPooling2D) (None, 4, 4, 64) 0
## dropout_1 (Dropout) (None, 4, 4, 64) 0
## flatten (Flatten) (None, 1024) 0
## dense_2 (Dense) (None, 120) 123000
## dropout (Dropout) (None, 120) 0
## dense_1 (Dense) (None, 84) 10164
## dense (Dense) (None, 10) 850
## ================================================================================
## Total params: 186,110
## Trainable params: 186,110
## Non-trainable params: 0
## ________________________________________________________________________________
We need to compile the defined CNN model.
cnn_model %>% compile(
loss = loss_categorical_crossentropy,
optimizer = optimizer_adadelta(),
metrics = c('accuracy')
)
Train the model and save each epochs’s history. Please note, as we are not using GPU, it takes a few minutes to finish. Please be patient while waiting for the results. The training time can be significantly reduced if running on GPU.
cnn_history <- cnn_model %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = epochs,
validation_split = 0.2
)
The trained model accuracy can be evaluated on the testing dataset which is pretty good.
cnn_model %>% evaluate(x_test, y_test)
## loss accuracy
## 0.02301287 0.99300003
There is some useful information stored in the output object
cnn_history
and the details can be shown by using
str()
. We can plot the training and validation accuracy and
loss as function of epoch by simply calling
plot(cnn_history)
.
str(cnn_history)
## List of 2
## $ params :List of 3
## ..$ verbose: int 1
## ..$ epochs : int 10
## ..$ steps : int 375
## $ metrics:List of 4
## ..$ loss : num [1:10] 0.3415 0.0911 0.0648 0.0504 0.0428 ...
## ..$ accuracy : num [1:10] 0.891 0.973 0.981 0.985 0.987 ...
## ..$ val_loss : num [1:10] 0.071 0.0515 0.0417 0.0377 0.0412 ...
## ..$ val_accuracy: num [1:10] 0.978 0.985 0.988 0.99 0.988 ...
## - attr(*, "class")= chr "keras_training_history"
plot(cnn_history)
We can apply the trained model to predict new image.
# model prediction
cnn_pred <- cnn_model %>%
predict(x_test) %>%
k_argmax()
head(cnn_pred)
## tf.Tensor([7 2 1 0 4 1], shape=(6,), dtype=int64)
Now let’s check a few misclassified images to see whether a human can do a better job than this simple CNN model.
## Convert tf.tensor to array
cnn_pred <- as.array(cnn_pred)
## number of mis-classcified images
sum(cnn_pred != mnist$test$y)
## [1] 70
missed_image = mnist$test$x[cnn_pred != mnist$test$y,,]
missed_digit = mnist$test$y[cnn_pred != mnist$test$y]
missed_pred = cnn_pred[cnn_pred != mnist$test$y]
index_image = 10 ## change this index to see different image.
input_matrix <- missed_image[index_image,1:28,1:28]
output_matrix <- apply(input_matrix, 2, rev)
output_matrix <- t(output_matrix)
image(1:28, 1:28, output_matrix, col=gray.colors(256),
xlab=paste('Image for digit ', missed_digit[index_image], ',
wrongly predicted as ', missed_pred[index_image]), ylab="")