Deep Learning CNN’s in Tensorflow with GPU’s

In my last tutorial, you created a complex convolutional neural network from a pre-trained inception v3 model.

In this tutorial, you’ll learn the architecture of a convolutional neural network (CNN), how to create a CNN in Tensorflow, and provide predictions on labels of images. Finally, you’ll learn how to run the model on a GPU so you can spend your time creating better models, not waiting for them to converge.


  • Introduction to CNN’s
  • Creating your first CNN and training on CPU
  • Training on a GPU


  • Basic machine learning understanding
  • Basic Tensorflow understanding
  • AWS account (for gpu)

Convolutional Neural Networks

Convolutional neural networks are the current state-of-art architecture for image classification. They’re used in practice today in facial recognition, self driving cars, and detecting whether an object is a hot-dog.

Basic Architecture

The basics of a CNN architecture consist of 3 components. A convolution, pooling, and fully connected layer. These components work together to learn a dense feature representation of an input.


Image from

A convolution consists of a kernel (green square above), also called filter, that is applied in a sliding window fashion to extract features from the input. This filter is shifted after each operation across the input by an amount called strides. At each operation, a matrix multiply of the kernel and current region of input is calculated. Filters can be stacked to create high-dimensional representations of the input.

What happens if the filter doesn’t evenly map to the size of the input ?

There are two ways of handling differing filter size and input size, known as same padding and valid padding. Same padding will pad the input border with zeros (as seen above) to ensure the input width and height are preserved. Valid padding does not pad.

Typically, you’ll want to use same padding or you’ll rapidly reduce the dimensionality of your input.

Finally, an activation function (typically a ReLU) is applied to give the convolution non-linearity. ReLU’s are a bit different from other activation functions, such as sigmoid or tanh, as ReLUs are one-sided. This one-sided property allows the network to create sparse representation (zero value for hidden units), increasing computational efficiency.



Pooling is an operation to reduce dimensionality. It applies a function summarizing neighboring information. Two common functions are max pooling and average pooling. By calculating the max of an input region, the output summarizes intensity of surrounding values.

Pooling layers also have a kernel, padding and are moved in strides. To calculate the output size of a pooling operation, you can use the formula

 (Input Width - kernel width + 2 * padding) / strides + 1. 

Fully Connected Layer

Fully connected layers you are likely familiar with from neural networks. Each neuron in the input is connected to each neuron in the output; fully-connected. Due to this connectivity, each neuron in the output will be used at most one time.

Fully connected neural network

In a CNN, the input is fed from the pooling layer into the fully connected layer. Depending on the task, a regression or classification algorithm can be applied to create the desired output.


You’ve now learned about what makes up a convolutional neural network. By passing input through a convolution, you extract highly-dimensional features. Pooling summarizes spatial information and reduces dimensionality. Lastly, this feature representation is passed through fully connected layers to a classifier or regressor.

Full CNN Architecture (source)

Creating a CNN in Tensorflow

Now that you have the idea behind a convolutional neural network, you’ll code one in Tensorflow.

You’ll be creating a CNN to train against the MNIST (Images of handwritten digits) dataset. After training, you’ll achieve ~98.0% accuracy @ 10k iterations.

Setup Environment

First you’ll need to setup your environment. Additionally, you’ll create a file. Anaconda environment files for python3.5 and python2.7 are listed below.

If you do not use anaconda, you can install tensorflow via pip:

$ pip install tensorflow


python3 develop

The Data

Mnist Data

Here, you’ll create 3 separate inputs; a training set, validation set, and test set. A validation set allows you to better train your model by providing additional data to tune hyper parameters against.

Download the Data

The data can be retrieved with this command:

$ curl -o data/mnist_train.csv # 104 MB
$ curl -o data/mnist_test.csv # 17.4 MB


Here, you’ll create a few helper functions for creating the network. These functions are used to create the individual components discussed earlier.

Helper Functions / Model definition:

Helper Functions and Model Definition


Tensorflow Graph of Model

Here’s the code for training the model. The three public functions are explained below:

Code is available here:

Inference. This function is responsible for creating a prediction it believes the input represents. Here, it will return a 1×10 tensor for each input. Values contained in this tensor will be passed to the loss function to determine how far off this prediction is from ground truth.

As indicated by the batch_size hyper parameter, you are processing 128 images at a time. This technique is known as mini-batch. By processing inputs in smaller batches, as opposed to the entire dataset, input can be fit in memory. Additionally, the model will converge more rapidly due to updating the weights after each batch rather than after processing all examples.

Loss. Here, you’ll use the softmax cross entropy function to perform an N way classification. The softmax function is used to normalize (summing the tensor adds to one) the input produced from the inference function.

With this normalized tensor, cross entropy is calculated against the one hot encoded labels. Cross entropy gives a measure of how far off the prediction is from the ground truth. Each iteration, an optimizer is applied to minimize this cross entropy.

cross entropy

Loss after training

Train & Evaluate

Below you’ll train the model for 10k iterations. Each 1000 iterations, you’ll test the model against the validation set to get an idea of the accuracy. Finally, you’ll evaluate the trained model against the test dataset to get a measure of out-of-sample accuracy. At 10k iterations, you should see accuracy around 98.0%.

To execute this code, run this command:

$ python3 mnist_conv2d_medium_tutorial/
(Building the computational graph can take a few seconds depending on hardware)

With the model trained, you’ll now evaluate it on the test set from the last checkpoint.

$ python3 mnist_conv2d_medium_tutorial/

Code up to this point:

You can visualize your results by running:

$ tensorboard --logdir=graphs/ --port=6006
navigate in browser: localhost:6006

Training on a GPU

As you noticed, training a CNN can be quite slow due to the amount of computations required for each iteration. You’ll now use GPU’s to speed up the computation.

Tensorflow, by default, gives higher priority to GPU’s when placing operations if both CPU and GPU are available for the given operation. For simplifying the tutorial, you won’t explicitly define operation placement. You can read more about how to do this here.

Create a GPU Box

For this tutorial, you’ll use a community AMI. Head over to the AWS console and launch a new EC2 instance. At the AMI screen, select community and enter this AMI id: ami-5e853c48. This AMI comes with Tensorflow and Nvidia drivers with CUDA pre-installed.

Ami id: ami-5e853c48

For instance type, select G2.2xlarge. After selecting an instance type, be sure to create a key-pair. This key-pair will allow you to ssh into the instance and copy/execute your code.


Sync Your Code

Now that your instance is created, you’ll need to copy your code and dataset onto it. The easiest way to do this is with rsync. Rsync is a unix command built on top of ssh that allows for efficient file transfer. It’s highly flexible, offering multiple options to directly alter the behavior. Below, the command will copy your project directory to your gpu instance user’s home directory.

rsync -trucv mnist_conv2d_medium_tutorial ip-address-of-your-gpu-box:/home/ubuntu/

Run the code

Below, you’ll ssh into the instance and install the package. After installation, run the train command. After running the train command, you’ll see output indicating where the operations are being placed. As shown below, operations are being placed onto the gpu as expected.

$ ssh ubuntu@ip-address-of-your-gpu-box
$ cd mnist-conv2d-medium-tutorial
$ pip3 install .
$ python3 mnist_conv2d_medium_tutorial/

Operations being placed on GPU

After ~20 mins, training will complete and you can run the evaluate command to test against the test set.

$ python3 mnist_conv2d_medium_tutorial/


In this tutorial you learned the concept behind convolutional neural networks. Additionally, you learned the Tensorflow implementation of a basic CNN to achieve ~98.0% accuracy. Finally, you learned how to run your code on a GPU for performance improvement.

Complete Code here:

Next Steps:

  • Play with hyperparameters (batch size, learning rate, kernel size, number of iterations) to see how it affects model performance
  • Train and evaluate your model against other datasets (CIFAR-10)
  • Go deeper

Call to Action:

If you enjoyed this tutorial, follow and recommend!

Interested in learning more about Deep Learning / Machine Learning? Check out my other tutorials:

– Deep learning with Keras on Google Compute Engine

– Recommendation Systems with Apache Spark on Google Compute Engine

Other places you can find me:

– Twitter:

Leave a Reply

Your email address will not be published. Required fields are marked *