Keras Inception V3 on Google Compute Engine

Inception V3 architecture

Inception, a model developed by Google is a deep CNN. Against the ImageNet dataset (a common dataset for measuring image recognition performance) it performed top-5 error 3.47%.

In this tutorial, you’ll use the pre-trained Inception model to provide predictions on images uploaded to a web server.


You’ll use:

  • Keras for image prediction running on Google Compute Engine
  • Google Cloud Storage to store the uploaded images
  • App Engine w/Flask for your front-end web server.




Keras is a high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Below, the Inception model is loaded with Keras. Keras holds a cache directory of the models pre-trained weights. On first use, Keras will download these weights into ~/.keras/models/.

You’ll create a predict function with accepts a base64 encoded image file. Inception V3 requires images to be 299 x 299. After loading the image, it is expanded into a vector and pre-processed.

First, install the dependencies

$ pip install -r requirements.txt

Serving Predictions

With the model loaded, create a web server that can accept base64 encoded images using flask.

You now have image-recognition as a service! Let’s test it out locally before deploying. First, start the server. Then make a POST request to the prediction service. Sample image

$ python
(in another terminal window)
$ (echo -n '{"data": "'; base64 cat.jpg; echo '"}') |
curl -X POST -H "Content-Type: application/json" -d @-

If everything is working correctly, you’ll see a response back

 “predictions”: [
 “description”: “tiger_cat”,
 “label”: “n02123159”,
 “probability”: 55.242210626602173
 “description”: “tabby”,
 “label”: “n02123045”,
 “probability”: 25.407189130783081
 “description”: “Egyptian_cat”,
 “label”: “n02124075”,
 “probability”: 10.042409598827362


You’ve got the model created and generating predictions. Time to deploy the model to Google Compute Engine using Docker. You’ll use a few different technologies Gunicorn, Nginx, and Supervisor. Below, a dockerfile and a few configuration files will set up and serve the prediction api.

After creating the docker image, you’ll push it to Google Container Registry.

For docker to copy the files correctly, structure the directory:

| |
| |

You’ve built and pushed the docker image to Google Container Registry. From here, create the server and pull down the previously created docker image. First, you’ll enable the API

$ gcloud compute firewall-rules create default-allow-http --allow=tcp:80 --target-tags http-server
$ gcloud compute instances create predict-service --machine-type=n1-standard-1 --zone=us-central1-c --tags=http-server
$ gcloud compute ssh predict-service --zone=us-central1-c
$ curl -sSL | sh
$ sudo gcloud docker pull

Finally, run it!

$ sudo docker run -td -p 80:80

If all went well, after a short bit, you’ll have a running prediction service. You can use the curl command above to confirm it’s working. Let’s build a quick front-end to visualize our predictions.

Google App Engine Front End Server

Cloud Storage

Create a bucket to upload images received from the front-end. We modify the permissions of the bucket to give read access to anyone on the internet.

$ gsutil mb gs://my-unique-bucket-identifier

Add this snippet to your project. It is an abstraction around Google Cloud Storage you’ll use when uploading files. First, we’ll install our dependencies

App engine requires libraries to be installed into a folder for deployment. You’ll download the Gcloud Storage Client as well.

$ git clone
$ pip install -r requirements.txt -t lib
$ pip install GoogleAppEngineCloudStorageClient -t lib

Upload File & Predict

First, let’s create our config file. Here, you’ll insert your project and storage bucket you created earlier. You’ll also need to insert your prediction services IPv4 address. In the networking tab, click reserve static instance address on our machine.

Reserve static address for prediction service

MAX_CONTENT_LENGTH = 8 * 1024 * 1024
ALLOWED_EXTENSIONS = set(['png', 'jpg', 'jpeg', 'gif'])

Below, you’ll create the front-end web server using flask. This web server will be deployed to Google App Engine.

Copy the below templates into /templates:

Last step, let’s deploy to google app engine.

gcloud app deploy


That’s it! You’ve created a flask web server that will present the user with a form to upload photos. These photos are uploaded to google cloud storage and sent to our image prediction API.

Complete Code here:

git clone --recursive

If you liked the tutorial, follow & recommend!

Interested in recommendation systems, node, android, or react? Check out my other tutorials:

Recommendation Systems with Spark on Google DataProc

Deploy Node to Google Cloud

Android Impression Tracking in RecyclerViews

React & Flux in ES6

Other places to find me:


Leave a Reply

Your email address will not be published. Required fields are marked *