TUTORIAL 3b

Convolutional neural networks [optional]


Course: Math 535 - Mathematical Methods in Data Science (MMiDS)
Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison
Updated: Nov 12, 2020
Copyright: © 2020 Sebastien Roch


In this optional notebook, we illustrate the use of automatic differentiation on multiclass classification with convolutional neural networks. We will not expand on the concepts required here. Review [Wri, Section 2.11] first, and then see the following references for background:

  1. Convolutional neural networks: See [Bis, Sections 5.1-2, 5.3.1-2, 5.5.6-7] and this module from Stanford's CS231n.

  2. Flux.jl: See the documentation for the Flux.jl package.

We have already used automatic differentiation and Flux.jl in previous notebooks. We introduce more advanced features here.

We will use the MNIST dataset. Quoting Wikipedia:

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels. The MNIST database contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset.

Here is a sample of the images:

MNIST sample images

(Source)

In [2]:
imgs = MNIST.images()
labels = MNIST.labels()
length(imgs)
Out[2]:
60000
In [3]:
imgs[1]
Out[3]:
In [4]:
labels[1]
Out[4]:
5
In [6]:
reshape(Float32.(imgs[1]),:)
Out[6]:
784-element Array{Float32,1}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 ⋮
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
In [7]:
Xtrain = reduce(hcat, [reshape(Float32.(imgs[i]),:) for i = 1:length(imgs)]);

We also convert the labels into vectors. We use one-hot encoding, that is, we convert the label 0 to the standard basis $\mathbf{e}_1 \in \mathbb{R}^{10}$, the label 1 to $\mathbf{e}_2 \in \mathbb{R}^{10}$, and so on. The functions onehot and onehotbatch perform this transformation, while onecold undoes it.

In [11]:
onehot(labels[1], 0:9)
Out[11]:
10-element Flux.OneHotVector:
 0
 0
 0
 0
 0
 1
 0
 0
 0
 0
In [12]:
onecold(ans, 0:9)
Out[12]:
5
In [13]:
ytrain = onehotbatch(labels, 0:9);
In [14]:
test_imgs = MNIST.images(:test)
test_labels = MNIST.labels(:test)
length(test_labels)
Out[14]:
10000
In [15]:
Xtest = reduce(hcat, 
    [reshape(Float32.(test_imgs[i]),:) for i = 1:length(test_imgs)])
ytest = onehotbatch(test_labels, 0:9);

2 Convolutional neural networks

Finally, we consider a class of neural networks tailored for image processing, convolutional neural networks (CNN). From Wikipedia:

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics.

More background can be found in this excellent module from Stanford's CS231n.

We will use CNNs on the MNIST dataset. What follows is based on Flux's model zoo. Our CNN will be a composition of convolutional layers and pooling layers.

In [18]:
m = Chain(
    # First convolution, operating upon a 28x28 image
    Conv((3, 3), 1=>16, pad=(1,1), relu),
    MaxPool((2,2)),

    # Second convolution, operating upon a 14x14 image
    Conv((3, 3), 16=>32, pad=(1,1), relu),
    MaxPool((2,2)),

    # Third convolution, operating upon a 7x7 image
    Conv((3, 3), 32=>32, pad=(1,1), relu),
    MaxPool((2,2)),

    # Reshape 3d tensor into a 2d one, at this point it should be (3, 3, 32, N)
    # which is where we get the 288 in the `Dense` layer below:
    x -> reshape(x, :, size(x, 4)),
    Dense(288, 10),

    # Finally, softmax to get nice probabilities
    softmax,
);

One complication is that the convolutional layers take as input a tensor, that is, a multidimensional array. So the first step is to convert the images in the dataset into $4d$-arrays in WHCN order (width, height, #channels, batch size). Here the number of of channels is $1$ for grayscale and the batch size is $1$ for a single image. We will use DataLoader as before to create larger mini-batches.

We use reshape to make a $4d$-array.

In [19]:
reshape(Float32.(imgs[1]), 28, 28, 1, 1)
Out[19]:
28×28×1×1 Array{Float32,4}:
[:, :, 1, 1] =
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.498039  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.25098   0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 ⋮                             ⋮         ⋱                 ⋮         
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.215686  0.67451      0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.533333  0.992157     0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
In [20]:
m(ans)
Out[20]:
10×1 Array{Float32,2}:
 0.09954709
 0.094213165
 0.08629797
 0.10479064
 0.12571757
 0.089546174
 0.09167668
 0.12285014
 0.10734907
 0.07801155

We concatenate the images into a large $4d$ tensor where the last dimension is for the samples. Here we cannot use hcat, as we are concatenating tensors rather than vectors. Instead we pre-allocate the tensor and then assign the images as we scan the last dimension.

In [21]:
train_tensor_imgs = zeros(Float32, 28, 28, 1, length(labels))
for i in 1:length(labels)
    train_tensor_imgs[:, :, :, i] = reshape(Float32.(imgs[i]), 28, 28, 1, 1)
end
train_onehot_labels = ytrain;
In [22]:
train_tensor_imgs[:,:,:,1:1]
Out[22]:
28×28×1×1 Array{Float32,4}:
[:, :, 1, 1] =
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.498039  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.25098   0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 ⋮                             ⋮         ⋱                 ⋮         
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.215686  0.67451      0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.533333  0.992157     0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0       …  0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0       0.0          0.0       0.0  0.0  0.0  0.0
In [23]:
test_tensor_imgs = zeros(Float32, 28, 28, 1, length(test_labels))
for i in 1:length(test_labels)
    test_tensor_imgs[:, :, :, i] = reshape(Float32.(test_imgs[i]), 28, 28, 1, 1)
end
test_onehot_labels = ytest;
In [24]:
loader = DataLoader(train_tensor_imgs, train_onehot_labels; 
    batchsize=128, shuffle=true);
In [25]:
accuracy(x, y) = mean(onecold(m(x), 0:9) .== onecold(y, 0:9))
loss(x, y) = crossentropy(m(x), y)
ps = params(m)
opt = ADAM()
evalcb = () -> @show(accuracy(test_tensor_imgs, test_onehot_labels));
In [26]:
accuracy(test_tensor_imgs,test_onehot_labels)
Out[26]:
0.1227
In [27]:
@time train!(loss, ps, ncycle(loader, 1), opt, cb = throttle(evalcb, 60))
accuracy(test_tensor_imgs, test_onehot_labels) = 0.1469
 68.820500 seconds (65.93 M allocations: 30.727 GiB, 6.90% gc time)
In [28]:
@time train!(loss, ps, ncycle(loader, 9), opt, cb = throttle(evalcb, 60))
accuracy(test_tensor_imgs, test_onehot_labels) = 0.9755
accuracy(test_tensor_imgs, test_onehot_labels) = 0.9866
accuracy(test_tensor_imgs, test_onehot_labels) = 0.9886
accuracy(test_tensor_imgs, test_onehot_labels) = 0.9892
accuracy(test_tensor_imgs, test_onehot_labels) = 0.9875
accuracy(test_tensor_imgs, test_onehot_labels) = 0.9897
accuracy(test_tensor_imgs, test_onehot_labels) = 0.9905
373.053069 seconds (18.16 M allocations: 244.312 GiB, 10.95% gc time)
In [29]:
@time accuracy(test_tensor_imgs, test_onehot_labels)
  1.818649 seconds (158.78 k allocations: 1.711 GiB, 12.15% gc time)
Out[29]:
0.9911
In [30]:
new_tensor_img = reshape(Float32.(test_imgs[1]), 28, 28, 1, 1)
onecold(m(new_tensor_img), 0:9)
Out[30]:
1-element Array{Int64,1}:
 7
In [31]:
onecold(ytest[:,1], 0:9)
Out[31]:
7