Course: Math 535 - Mathematical Methods in Data Science (MMiDS)
Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison
Updated: Nov 12, 2020
Copyright: © 2020 Sebastien Roch
In this optional notebook, we illustrate the use of automatic differentiation on multiclass classification with convolutional neural networks. We will not expand on the concepts required here. Review [Wri, Section 2.11] first, and then see the following references for background:
Convolutional neural networks: See [Bis, Sections 5.1-2, 5.3.1-2, 5.5.6-7] and this module from Stanford's CS231n.
Flux.jl: See the documentation for the Flux.jl package.
We have already used automatic differentiation and Flux.jl in previous notebooks. We introduce more advanced features here.
We will use the MNIST dataset. Quoting Wikipedia:
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels. The MNIST database contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset.
imgs = MNIST.images()
labels = MNIST.labels()
length(imgs)
60000
imgs[1]
labels[1]
5
reshape(Float32.(imgs[1]),:)
784-element Array{Float32,1}: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ⋮ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Xtrain = reduce(hcat, [reshape(Float32.(imgs[i]),:) for i = 1:length(imgs)]);
We also convert the labels into vectors. We use one-hot encoding, that is, we convert the label 0
to the standard basis $\mathbf{e}_1 \in \mathbb{R}^{10}$, the label 1
to $\mathbf{e}_2 \in \mathbb{R}^{10}$, and so on. The functions onehot
and onehotbatch
perform this transformation, while onecold
undoes it.
onehot(labels[1], 0:9)
10-element Flux.OneHotVector: 0 0 0 0 0 1 0 0 0 0
onecold(ans, 0:9)
5
ytrain = onehotbatch(labels, 0:9);
test_imgs = MNIST.images(:test)
test_labels = MNIST.labels(:test)
length(test_labels)
10000
Xtest = reduce(hcat,
[reshape(Float32.(test_imgs[i]),:) for i = 1:length(test_imgs)])
ytest = onehotbatch(test_labels, 0:9);
Finally, we consider a class of neural networks tailored for image processing, convolutional neural networks (CNN). From Wikipedia:
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics.
More background can be found in this excellent module from Stanford's CS231n.
We will use CNNs on the MNIST dataset. What follows is based on Flux's model zoo. Our CNN will be a composition of convolutional layers and pooling layers.
m = Chain(
# First convolution, operating upon a 28x28 image
Conv((3, 3), 1=>16, pad=(1,1), relu),
MaxPool((2,2)),
# Second convolution, operating upon a 14x14 image
Conv((3, 3), 16=>32, pad=(1,1), relu),
MaxPool((2,2)),
# Third convolution, operating upon a 7x7 image
Conv((3, 3), 32=>32, pad=(1,1), relu),
MaxPool((2,2)),
# Reshape 3d tensor into a 2d one, at this point it should be (3, 3, 32, N)
# which is where we get the 288 in the `Dense` layer below:
x -> reshape(x, :, size(x, 4)),
Dense(288, 10),
# Finally, softmax to get nice probabilities
softmax,
);
One complication is that the convolutional layers take as input a tensor, that is, a multidimensional array. So the first step is to convert the images in the dataset into $4d$-arrays in WHCN order (width, height, #channels, batch size). Here the number of of channels is $1$ for grayscale and the batch size is $1$ for a single image. We will use DataLoader
as before to create larger mini-batches.
We use reshape
to make a $4d$-array.
reshape(Float32.(imgs[1]), 28, 28, 1, 1)
28×28×1×1 Array{Float32,4}: [:, :, 1, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.498039 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.25098 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ⋮ ⋮ ⋱ ⋮ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.215686 0.67451 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.533333 0.992157 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
m(ans)
10×1 Array{Float32,2}: 0.09954709 0.094213165 0.08629797 0.10479064 0.12571757 0.089546174 0.09167668 0.12285014 0.10734907 0.07801155
We concatenate the images into a large $4d$ tensor where the last dimension is for the samples. Here we cannot use hcat
, as we are concatenating tensors rather than vectors. Instead we pre-allocate the tensor and then assign the images as we scan the last dimension.
train_tensor_imgs = zeros(Float32, 28, 28, 1, length(labels))
for i in 1:length(labels)
train_tensor_imgs[:, :, :, i] = reshape(Float32.(imgs[i]), 28, 28, 1, 1)
end
train_onehot_labels = ytrain;
train_tensor_imgs[:,:,:,1:1]
28×28×1×1 Array{Float32,4}: [:, :, 1, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.498039 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.25098 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ⋮ ⋮ ⋱ ⋮ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.215686 0.67451 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.533333 0.992157 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
test_tensor_imgs = zeros(Float32, 28, 28, 1, length(test_labels))
for i in 1:length(test_labels)
test_tensor_imgs[:, :, :, i] = reshape(Float32.(test_imgs[i]), 28, 28, 1, 1)
end
test_onehot_labels = ytest;
loader = DataLoader(train_tensor_imgs, train_onehot_labels;
batchsize=128, shuffle=true);
accuracy(x, y) = mean(onecold(m(x), 0:9) .== onecold(y, 0:9))
loss(x, y) = crossentropy(m(x), y)
ps = params(m)
opt = ADAM()
evalcb = () -> @show(accuracy(test_tensor_imgs, test_onehot_labels));
accuracy(test_tensor_imgs,test_onehot_labels)
0.1227
@time train!(loss, ps, ncycle(loader, 1), opt, cb = throttle(evalcb, 60))
accuracy(test_tensor_imgs, test_onehot_labels) = 0.1469 68.820500 seconds (65.93 M allocations: 30.727 GiB, 6.90% gc time)
@time train!(loss, ps, ncycle(loader, 9), opt, cb = throttle(evalcb, 60))
accuracy(test_tensor_imgs, test_onehot_labels) = 0.9755 accuracy(test_tensor_imgs, test_onehot_labels) = 0.9866 accuracy(test_tensor_imgs, test_onehot_labels) = 0.9886 accuracy(test_tensor_imgs, test_onehot_labels) = 0.9892 accuracy(test_tensor_imgs, test_onehot_labels) = 0.9875 accuracy(test_tensor_imgs, test_onehot_labels) = 0.9897 accuracy(test_tensor_imgs, test_onehot_labels) = 0.9905 373.053069 seconds (18.16 M allocations: 244.312 GiB, 10.95% gc time)
@time accuracy(test_tensor_imgs, test_onehot_labels)
1.818649 seconds (158.78 k allocations: 1.711 GiB, 12.15% gc time)
0.9911
new_tensor_img = reshape(Float32.(test_imgs[1]), 28, 28, 1, 1)
onecold(m(new_tensor_img), 0:9)
1-element Array{Int64,1}: 7
onecold(ytest[:,1], 0:9)
7