Play Flappy Bird Using Imitation Learning

In this tutorial, we will learn the fundamentals of imitation learning by playing a video game. We will teach an agent how to play Flappy Bird, one of the most popular video games in the world way back in 2014.


The objective of the game is to get a bird to move through a sequence of pipes without touching them. If the bird touches them, he falls out of the sky.

Table of Contents

You Will Need

Return to Table of Contents

What is Imitation Learning?

In order to understand what imitation learning is, it is first helpful to understand what reinforcement learning is all about.

Reinforcement learning is a machine learning method that involves training a software “agent” (i.e. algorithm) to learn through trial and error. The agent is rewarded by the environment when it does something good and punished when it does something bad. “Good” and “bad” behavior in this case are defined beforehand by the programmer. 

The idea behind reinforcement learning is that the agent learns over time the best actions it must take in different situations (i.e. states of the environment) in order to maximize its rewards and minimize its punishment. That stuff in bold in the previous sentence is formally known as the agent’s optimal policy.

Check out my reinforcement learning post for an example of how reinforcement learning works.

One of the problems with reinforcement learning is that it takes a long time for an agent to learn through trial and error. Knowing the optimal action to take in any given state of the environment means that the agent has to explore a lot of different random actions in different states of the environment in order to fully learn. 

You can imagine how infeasible this gets as the environment gets more complex. Imagine a robot learning how to make a breakfast of bacon and eggs. Assume the robot is in your kitchen and has no clue what bacon and eggs are. It has no clue how to use a stove. It has no clue how to use a spatula. It doesn’t even know how to use a frying pan.


The only thing the robot has is the feedback it receives when it takes an action that is in accordance with the goal of getting it to make bacon and eggs.

You could imagine that it could take decades for a robot to learn how to cook bacon and eggs this way, through trial and error. As humans, we don’t have that much time, so we need to use some other machine learning method that is more efficient.

And this brings us to imitation learning. Imitation learning helps address the shortcomings of reinforcement learning. It is also known as learning from demonstrations.

In imitation learning, during the training phase, the agent first observes the actions of an expert (e.g. a human). The agent keeps a record of what actions the expert took in different states of the environment. The agent uses this data to create a policy that attempts to mimic the expert. 

Diagram of How Imitation Learning Works

Babies and little kids learn this way. They imitate the actions of their parents. For example, I remember learning how to make my bed by watching my mom and then trying to imitate what she did.

Now, at runtime, the environment won’t be exactly similar as it was when the agent was in the training phase observing the expert. The agent will try to approximate the best action to take in any given state based on what it learned while watching the expert during the training phase.

Imitation learning is a powerful technique that works well when:

  • The number of states of the environment is large (e.g. in a kitchen, where you can have near infinite ways a kitchen can be arranged)
  • The rewards are sparse (e.g. there are only a few actions a robot can take that will bring him closer to the goal of making a bacon and egg breakfast) 
  • An expert is available.

Going back to our bacon and eggs example, a robotic agent doing imitation learning would observe a human making bacon and eggs from scratch. The robot would then try to imitate that process.

In the case of reinforcement learning, the robotic agent will take random actions and will receive a reward every time it takes an action that is consistent with the goal of cooking bacon and eggs. 

For example, if the robot cracks an egg into a frying pan, it would receive a reward. If the robot cracks an egg on the floor, it would not receive a reward.

After a considerable amount of time (maybe years!), the robot would finally be smart enough (based on performance metrics established by the programmer) to know how to cook bacon and eggs.

So you see that the runtime behavior of imitation learning and reinforcement learning are the same. The difference is in how the agent learns during the training phase. 

In reinforcement learning, the agent learns through trial and error by receiving varying rewards from the environment in response to its action. 

In imitation learning, the agent learns by watching an expert. It observes the action the expert took in various states.

Because there is no explicit reward in the case of imitation learning, the agent only becomes as good as the teacher (but never better). Thus, if you have a bad expert, you will have a bad agent.

The agent doesn’t know the reward (consequences) of taking a specific action in a certain state. The graphic below further illustrates the difference between reinforcement learning and imitation learning.


So with that background, let’s dive into implementing an imitation learning algorithm to teach an agent to play Flappy Bird.

Return to Table of Contents


Install Flappy Bird

You can find the instructions for installing FlappyBird at this link, but let’s run through the process as it can be a bit tricky. Go slow, so you make sure you get all the necessary libraries.

Open up a terminal window.

Install all the dependencies for Python 3.X listed here. I’ll copy and paste all that below, but if you have any issues, check out the link. You can copy and paste the command below into your terminal.

sudo apt-get install git python3-dev python3-setuptools python3-numpy python3-opengl \
    libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev libsmpeg-dev \
    libsdl1.2-dev libportmidi-dev libswscale-dev libavformat-dev libavcodec-dev \
    libtiff5-dev libx11-6 libx11-dev fluid-soundfont-gm timgm6mb-soundfont \
    xfonts-base xfonts-100dpi xfonts-75dpi xfonts-cyrillic fontconfig fonts-freefont-ttf libfreetype6-dev

You don’t have to know what each library does, but in case you’re interested, you can take a look at the Ubuntu packages page to get a complete description.

Make sure you have pip installed.

Now you need to install pygame.

pip3 install pygame

We’re not done yet with all the installations. Let’s install the PyGame Learning Environment (PLE) now by copying the entire repository to our computer. It already includes Flappy Bird.

git clone

You will be asked to login to GitHub. Type in your username and password.

Type the following command to see if it installed properly. You should see the PyGame-Learning-Environment folder.


Now let’s switch to that folder.

cd PyGame-Learning-Environment

Complete the install, by typing the following command (be sure to include the period).

sudo pip3 install -e .

Now Flappy Bird is all set up.

Test to see if everything is working, type the following command to run the Aliens program

python3 -m pygame.examples.aliens

Return to Table of Contents

Launch Flappy Bird

Let’s write a Python script in order to use the game environment. 

Open a new terminal window.

Go to the PyGame-Learning-Environment directory.

cd PyGame-Learning-Environment

We will create the script Open the text editor to create a new file.


Here is the full code. You can find a full explanation of the methods at the PyGameLearning Environment website. The function getScreenRGB retrieves the state of the environment in the form of red/green/blue image pixels:

# Import Flappy Bird from games library in
# Python Learning Environment (PLE)
from import FlappyBird 

# Import PyGame Learning Environment
from ple import PLE 

# Create a game instance
game = FlappyBird() 

# Pass the game instance to the PLE
p = PLE(game, fps=30, display_screen=True, force_fps=False)

# Initialize the environment

actions = p.getActionSet()
print(actions) # 119 to flap wings, or None to do nothing
action_dict = {0: actions[1], 1:actions[0]}

reward = 0.0

for i in range(10000):
  if p.game_over():

  state = p.getScreenRGB()
  action = 1
  reward = p.act(action_dict[action])

Now, run the program using the following command:


Here is what you should see:


Here is The agent takes a random action at each iteration of the game.

# Import Flappy Bird from games library in
# Python Learning Environment (PLE)
from import FlappyBird 

# Import PyGame Learning Environment
from ple import PLE 

# Import Numpy
import numpy as np

class NaiveAgent():
  This is a naive agent that just picks actions at random.
  def __init__(self,actions):
    self.actions = actions

  def pickAction(self, reward, obs):
    return self.actions[np.random.randint(0, len(self.actions))]

# Create a game instance
game = FlappyBird() 

# Pass the game instance to the PLE
p = PLE(game)

# Create the agent
agent = NaiveAgent(p.getActionSet())

# Initialize the environment

actions = p.getActionSet()
action_dict = {0: actions[1], 1:actions[0]}

reward = 0.0

for f in range(15000):

  # If the game is over
  if p.game_over():

  action = agent.pickAction(reward, p.getScreenRGB())
  reward = p.act(action)

  if f > 1000:
    p.display_screen = True
    p.force_fps = False # Slow screen

Return to Table of Contents

Implement the Dataset Aggregation Algorithm (DAgger)

How the DAgger Algorithm Works

In this section, we will implement the Dataset Aggregation Algorithm (DAgger) and apply it to FlappyBird. 

The DAgger algorithm is a type of imitation learning algorithm that is intended to help address the problem of the agent “memorizing” the actions of the expert in different states. It is time-consuming to watch the expert do every combination of possible states of the environment, so we need the expert to give the agent some guidance on what to do in the event the agent makes errors.

In the first phase, we first let the expert take action in the environment and keep track of what action the expert took in a variety of states of the environment. This phase creates an initial “policy”, a collection of state-expert action pairs.

Then for the second phase, we add a twist to the situation. The expert continues to act, and we record the expert’s actions. However, the environment doesn’t change according to the expert’s actions. Rather it changes in response to the policy from phase 1. The environment is “ignoring” the actions of the expert. 

In the meantime, a new policy is created that maps the state to the expert’s actions.

You might now wonder, which policy would the agent choose? The policy from phase 1 or phase 2. One method is to pick one at random. Another method (and the one often preferred in practice) is to hold out a subset of states of the environment and test which one performs the best.

You can find the Dagger paper here to learn more.

Install TensorFlow

We now need to install TensorFlow, the popular machine learning library.

Open a new terminal window, and type the following command (wait a bit as it will take some time to download).

pip3 install tensorflow

Run FlappyBird

Now we need to run FlappyBird with the imitation learning algorithm. Since the game has no real expert, we have to have our algorithm load a special file (included in the GitHub folder that we will clone in a second) that has the expert’s pre-prepared policy. 

Clone the code at this GitHub page here into the file.

git clone

To run the code, migrate to the “Chapter10” file. You want to find the file called


To see FlappyBird in action, change this line:

env = PLE(env, fps=30, display_screen=False)

To this:

env = PLE(env, fps=30, display_screen=True, force_fps=False)

To run the code, type the following command:


Eventually, you should see the results of each game spit out, with the bird’s score on the far right. Notice that the agent’s score gets above 100 after only a few minutes of training.


That’s it for imitation learning. If you want to learn more, I highly suggest you check out Andrea Lonza’s book, Reinforcement Learning Algorithms with Python, that does a great job teaching imitation learning.

Return to Table of Contents


Return to Table of Contents

How Do Neural Networks Make Predictions?

Neural networks are the workhorses of the rapidly growing field known as deep learning. Neural networks are used for all sorts of applications where a prediction of some sort is desired. Here are some examples:

  • Predicting the type of objects in an image or video
  • Sales forecasting
  • Speech recognition
  • Medical diagnosis
  • Risk management
  • and countless other applications… 

In this post, I will explain how neural networks make those predictions by boiling these structures down to their fundamental parts and then building up from there.

You Will Need 

Create Your First Neural Network

Imagine you run a business that provides short online courses for working professionals. Some of your courses are free, but your best courses require the students to pay for a subscription. 

You want to create a neural network to predict if a free student is likely to upgrade to a paid subscription. Let’s create the most basic neural network you can make.


OK, so there is our neural network. To implement this neural network on a computer, we need to translate this diagram into a software program. Let’s do that now using Python, the most popular language for machine learning.

# Declare a variable named weight and 
# initiate it with a value
weight = 0.075

# Create a method called neural_network that 
# takes as inputs, the input data (number of 
free courses a student has taken during the 
# last 30 days) and the weight of the connection. 
# The method returns the prediction.

def neural_network(input, weight):

  # The input data multiplied by the weight 
  # equals the prediction
  prediction = input * weight

  # This is the output
  return prediction

So we currently have five students, all of whom are free students. The number of free courses these users have taken during the last 30 days is 12, 3, 5, 6, and 8. Let’s code this in Python as a list.

number_of_free_courses_taken = [12, 3, 5, 6, 8]

Let’s make a prediction for the first student, the one who has taken 12 free courses over the last 30 days.


Now let’s put the diagram into code.

# Extract the first value of the list...12...
# and store into a variable named input
first_student_input = number_of_free_courses_taken[0]

# Call the neural_network method and store the 
# prediction result into a variable
first_student_prediction = neural_network(
                         first_student_input, weight)

# Print the prediction

OK. We have finished the code. Let’s see how it looks all together.

weight = 0.075

def neural_network(input, weight):

  prediction = input * weight

  return prediction

number_of_free_courses_taken = [
                        12, 3, 5, 6, 8]

first_student_input = number_of_free_courses_taken[0]

first_student_prediction = neural_network(
                       first_student_input, weight)


Open a Jupyter Notebook and run the code above, or run the code inside your favorite Python IDE.

Here is what I got:


What did you get? Did you get 0.9? If so, congratulations!

Let’s see what is happening when we run our code. We called the neural_network method. The first operation performed inside that method is to multiply the input by the weight and return the result. In this case, the input is 12, and the weight is 0.075. The result is 0.9.


0.9 is stored in the first_student_prediction variable.


And this, my friend, is the most basic building block of a neural network. A neural network in its simplest form consists of one or more weights which you can multiply by input data to make a prediction

Let’s take a look at some questions you might have at this stage.

What kind of input data can go into a neural network?

Real numbers that can be measured or calculated somewhere in the real world. Yesterday’s high temperature, a medical patient’s blood pressure reading, previous year’s rainfall, or average annual rainfall are all valid inputs into a neural network. Negative numbers are totally acceptable as well.

A good rule of thumb is, if you can quantify it, you can use it as an input into a neural network. It is best to use input data into a neural network that you think will be relevant for making the prediction you desire.

For example, if you are trying to create a neural network to predict if a patient has breast cancer or not, how many fingers a person has probably not going to be all that relevant. However, how many days per month a patient exercises is likely to be a relevant piece of input data that you would want to feed into your neural network.

What does a neural network predict?

A neural network outputs some real number. In some neural network implementations, we can do some fancy mathematics to limit the output to some real number between 0 and 1. Why would we want to do that? Well in some applications we might want to output probabilities. Let me explain.

Suppose you want to predict the probability that tomorrow will be sunny. The input into a neural network to make such a prediction could be today’s high temperature. 


If the output is some number like 0.30, we can interpret this as a 30% change of the weather being sunny tomorrow given today’s high temperature. Pretty cool huh!

We don’t have to limit the output to between 0 and 1. For example, let’s say we have a neural network designed to predict the price of a house given the house’s area in square feet. Such a network might tell us, “given the house’s area in square feet, the predicted price of the house is $432,000.”

What happens if the neural network’s predictions are incorrect?

The neural network will adjust its weights so that the next time it makes a more accurate prediction. Recall that the weights are multiplied by the input values to make a prediction.

What is a neural network really learning?

A neural network is learning the best possible set of weights. “Best” in the context of neural networks means the weights that minimize the prediction error.

Remember, the core math operation in a neural network is multiplication, where the simplest neural network is:

Input Value * Weight = Prediction

How does the neural network find the best set of weights?

Short answer: Trial and error

Long answer: A neural network starts out with random numbers for weights. It then takes in a single input data point, makes a prediction, and then sees if its prediction was either too high or too low. The neural network then adjusts its weight(s) accordingly so that the next time it sees the same input data point, it makes a more accurate prediction.

Once the weights are adjusted, the neural network is fed the next data point, and so on. A neural network gets better and better each time it makes a prediction. It “learns” from its mistakes one data point at a time.

Do you notice something here?

Standard neural networks have no memory. They are fed an input data point, make a prediction, see how close the prediction was to reality, adjust the weights accordingly, and then move on to the next data point. At each step of the learning process of a neural network, it has no memory of the most recent prediction it made.

Standard neural networks focus on one input data point at a time. For example, in our subscriber prediction neural network we built earlier in this tutorial, if we feed our neural network number_of_free_courses_taken[1], it will have no clue what it predicted when number_of_free_courses_taken[0] was the input value.

There are some networks that have short term memories. These are called Long short-term memory networks (LSTM).

Real-Time Object Recognition Using a Webcam and Deep Learning

In this tutorial, we will develop a program that can recognize objects in a real-time video stream on a built-in laptop webcam using deep learning.


Object recognition involves two main tasks:

  1. Object Detection (Where are the objects?): Locate objects in a photo or video frame
  2. Image Classification (What are the objects?): Predict the type of each object in a photo or video frame

Humans can do both tasks effortlessly, but computers cannot.

Computers require a lot of processing power to take full advantage of the state-of-the-art algorithms that enable object recognition in real time. However, in recent years, the technology has matured, and real-time object recognition is now possible with only a laptop computer and a webcam.

Real-time object recognition systems are currently being used in a number of real-world applications, including the following:

  • Self-driving cars: detection of pedestrians, cars, traffic lights, bicycles, motorcycles, trees, sidewalks, etc.
  • Surveillance: catching thieves, counting people, identifying suspicious behavior, child detection.
  • Traffic monitoring: identifying traffic jams, catching drivers that are breaking the speed limit.
  • Security: face detection, identity verification on a smartphone.
  • Robotics: robotic surgery, agriculture, household chores, warehouses, autonomous delivery.
  • Sports: ball tracking in baseball, golf, and football.
  • Agriculture: disease detection in fruits.
  • Food: food identification.

There are a lot of steps in this tutorial. Have fun, be patient, and be persistent. Don’t give up! If something doesn’t work the first time around, try again. You will learn a lot more by fighting through to the end of this project. Stay relentless!

By the end of this tutorial, you will have the rock-solid confidence to detect and recognize objects in real time on your laptop’s GPU (Graphics Processing Unit) using deep learning.

Let’s get started!

Table of Contents

You Will Need

Install TensorFlow CPU

We need to get all the required software set up on our computer. I will be following this really helpful tutorial.

Open an Anaconda command prompt terminal.


Type the command below to create a virtual environment named tensorflow_cpu that has Python 3.6 installed. 

conda create -n tensorflow_cpu pip python=3.6

Press y and then ENTER.

A virtual environment is like an independent Python workspace which has its own set of libraries and Python version installed. For example, you might have a project that needs to run using an older version of Python, like Python 2.7. You might have another project that requires Python 3.7. You can create separate virtual environments for these projects.

Now, let’s activate the virtual environment by using this command:

conda activate tensorflow_cpu

Type the following command to install TensorFlow CPU.

pip install --ignore-installed --upgrade tensorflow==1.9

Wait for Tensorflow CPU to finish installing. Once it is finished installing, launch Python by typing the following command:



import tensorflow as tf

Here is what my screen looks like now:


Now type the following:

hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()

You should see a message that says: “Your CPU supports instructions that this TensorFlow binary….”. Just ignore that. Your TensorFlow will still run fine.

Now run this command to complete the test of the installation:


Press CTRL+Z. Then press ENTER to exit.



That’s it for TensorFlow CPU. Now let’s install TensorFlow GPU.

Return to Table of Contents

Install TensorFlow GPU

Your system must have the following requirements:

  • Nvidia GPU (GTX 650 or newer…I’ll show you later how to find out what Nvidia GPU version is in your computer)
  • CUDA Toolkit v9.0 (we will install this later in this tutorial)
  • CuDNN v7.0.5 (we will install this later in this tutorial)
  • Anaconda with Python 3.7+

Here is a good tutorial that walks through the installation, but I’ll outline all the steps below.

Install CUDA Toolkit v9.0

The first thing we need to do is to install the CUDA Toolkit v9.0. Go to this link.

Select your operating system. In my case, I will select Windows, x86_64, Version 10, and exe (local).


Download the Base Installer as well as all the patches. I downloaded all these files to my Desktop. It will take a while to download, so just wait while your computer downloads everything.


Open the folder where the downloads were saved to.


Double-click on the Base Installer program, the largest of the files that you downloaded from the website.

Click Yes to allow the program to make changes to your device.

Click OK to extract the files to your computer.


I saw this error window. Just click Continue.


Click Agree and Continue.


If you saw that error window earlier… “…you may not be able to run CUDA applications with this driver…,” select the Custom (Advanced) install option and click Next. Otherwise, do the Express installation and follow all the prompts.


Uncheck the Driver components, PhysX, and Visual Studio Integration options. Then click Next.

Click Next.


Wait for everything to install.


Click Close.


Delete  C:\Program Files\NVIDIA Corporation\Installer2.


Double-click on Patch 1.

Click Yes to allow changes to your computer.

Click OK.


Click Agree and Continue.


Go to Custom (Advanced) and click Next.


Click Next.


Click Close.

The process is the same for Patch 2. Double-click on Patch 2 now.

Click Yes to allow changes to your computer.

Click OK.

Click Agree and Continue.

Go to Custom (Advanced) and click Next.

Click Next.


Click Close.


The process is the same for Patch 3. Double-click on Patch 3 now.

Click Yes to allow changes to your computer.

Click OK.

Click Agree and Continue.

Go to Custom (Advanced) and click Next.

Click Next.

Click Close.

The process is the same for Patch 4. Double-click on Patch 4 now.

Click Yes to allow changes to your computer.

Click OK.

Click Agree and Continue.

Go to Custom (Advanced) and click Next.

Click Next.

After you’ve installed Patch 4, your screen should look like this:


Click Close.

To verify your CUDA installation, go to the command terminal on your computer, and type:

nvcc --version

Return to Table of Contents

Install the NVIDIA CUDA Deep Neural Network library (cuDNN)

Now that we installed the CUDA 9.0 base installer and its four patches, we need to install the NVIDIA CUDA Deep Neural Network library (cuDNN). Official instructions for installing are on this page, but I’ll walk you through the process below.

Go to

Create a user profile if needed and log in.


Go to this page:

Agree to the terms of the cuDNN Software License Agreement.


We have CUDA 9.0, so we need to click cuDNN v7.6.4 (September 27, 2019), for CUDA 9.0.


I have Windows 10, so I will download cuDNN Library for Windows 10.


In my case, the zip file downloaded to my Desktop. I will unzip that zip file now, which will create a new folder of the same name…just without the .zip part. These are your cuDNN files. We’ll come back to these in a second.


Before we get going, let’s double check what GPU we have. If you are on a Windows machine, search for the “Device Manager.”


Once you have the Device Manager open, you should see an option near the top for “Display Adapters.” Click the drop-down arrow next to that, and you should see the name of your GPU. Mine is NVIDIA GeForce GTX 1060.


If you are on Windows, you can also check what NVIDIA graphics driver you have by right-clicking on your Desktop and clicking the NVIDIA Control Panel. My version is 430.86. This version fits the requirements for cuDNN.

Ok, now that we have verified that our system meets the requirements, lets navigate to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0, your CUDA Toolkit directory.


Now go to your cuDNN files, that new folder that was created when you did the unzipping. Inside that folder, you should see a folder named cuda. Click on it.


Click bin.


Copy cudnn64_7.dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin. Your computer might ask you to allow Administrative Privileges. Just click Continue when you see that prompt.

Now go back to your cuDNN files. Inside the cuda folder, click on include. You should see a file named cudnn.h.


Copy that file to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include. Your computer might ask you to allow Administrative Privileges. Just click Continue when you see that prompt.

Now go back to your cuDNN files. Inside the cuda folder, click on lib -> x64. You should see a file named cudnn.lib. 


Copy that file to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64. Your computer might ask you to allow Administrative Privileges. Just click Continue when you see that prompt.

If you are using Windows, do a search on your computer for Environment Variables. An option should pop up to allow you to edit the Environment Variables on your computer.

Click on Environment Variables.


Make sure you CUDA_PATH variable is set to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0.


I recommend restarting your computer now.

Return to Table of Contents

Install TensorFlow GPU

Now we need to install TensorFlow GPU. Open a new Anaconda terminal window. 


Create a new Conda virtual environment named tensorflow_gpu by typing this command:

conda create -n tensorflow_gpu pip python=3.6

Type y and press Enter.


Activate the virtual environment.

conda activate tensorflow_gpu

Install TensorFlow GPU for Python.

pip install --ignore-installed --upgrade tensorflow-gpu==1.9

Wait for TensorFlow GPU to install.

Now let’s test the installation. Launch the Python interpreter.


Type this command.

import tensorflow as tf

If you don’t see an error, TensorFlow GPU is successfully installed.


Now type this:

hello = tf.constant('Hello, TensorFlow!')

And run this command. It might take a few minutes to run, so just wait until it finishes:

sess = tf.Session()

Now type this command to complete the test of the installation:


You can further confirm whether TensorFlow can access the GPU, by typing the following into the Python interpreter (just copy and paste into the terminal window while the Python interpreter is running).


To exit the Python interpreter, type:


And press Enter.

Return to Table of Contents

Install TensorFlow Models

Now that we have everything setup, let’s install some useful libraries. I will show you the steps for doing this in my TensorFlow GPU virtual environment, but the steps are the same for the TensorFlow CPU virtual environment.

Open a new Anaconda terminal window. Let’s take a look at the list of virtual environments that we can activate.

conda env list

I’m going to activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

Install the libraries. Type this command:

conda install pillow lxml jupyter matplotlib opencv cython

Press y to proceed.

Once that is finished, you need to create a folder somewhere that has the TensorFlow Models  (e.g. C:\Users\addis\Documents\TensorFlow). If you have a D drive, you can also save it there as well.

In your Anaconda terminal window, move to the TensorFlow directory you just created. You will use the cd command to change to that directory. For example:

cd C:\Users\addis\Documents\TensorFlow

Go to the TensorFlow models page on GitHub:

Click the button to download the zip file of the repository. It is a large file, so it will take a while to download.


Move the zip folder to the TensorFlow directory you created earlier and extract the contents.

Rename the extracted folder to models instead of models-master. Your TensorFlow directory hierarchy should look like this:


  • models
    • official
    • research
    • samples
    • tutorials

Return to Table of Contents

Install Protobuf

Now we need to install Protobuf, which is used by the TensorFlow Object Detection API to configure the training and model parameters.

Go to this page:

Download the latest * release (assuming you are on a Windows machine).


Create a folder in C:\Program Files named it Google Protobuf.

Extract the contents of the downloaded *, inside C:\Program Files\Google Protobuf


Search for Environment Variables on your system. A window should pop up that says System Properties.


Click Environment Variables.

Go down to the Path variable and click Edit.


Click New.


Add C:\Program Files\Google Protobuf\bin

You can also add it the Path System variable.

Click OK a few times to close out all the windows.

Open a new Anaconda terminal window.

I’m going to activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your \TensorFlow\models\research\ directory and run the following command:

for /f %i in ('dir /b object_detection\protos\*.proto') do protoc object_detection\protos\%i --python_out=.

Now go back to the Environment Variables on your system. Create a New Environment Variable named PYTHONPATH (if you don’t have one already). Replace C:\Python27amd64 if you don’t have Python installed there. Also, replace <your_path> with the path to your TensorFlow folder.


For example:


Now add these two paths to your PYTHONPATH environment variable:


Return to Table of Contents

Install COCO API

Now, we are going to install the COCO API. You don’t need to worry about what this is at this stage. I’ll explain it later.

Download the Visual Studios Build Tools here: Visual C++ 2015 build tools from here:

Choose the default installation.


After it has installed, restart your computer.


Open a new Anaconda terminal window.

I’m going to activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your \TensorFlow\models\research\ directory and run the following command to install pycocotools (everything below goes on one line):

pip install git+

If it doesn’t work, install git:

Follow all the default settings for installing Git. You will have to click Next several times.

Once you have finished installing Git, run this command (everything goes on one line):

pip install git+

Return to Table of Contents

Test the Installation

Open a new Anaconda terminal window.

I’m going to activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your \TensorFlow\models\research\object_detection\builders directory and run the following command to test your installation.


You should see an OK message.


Return to Table of Contents

Install LabelImg

Now we will install LabelImg, a graphical image annotation tool for labeling object bounding boxes in images.

Open a new Anaconda/Command Prompt window.

Create a new virtual environment named labelImg by typing the following command:

conda create -n labelImg

Activate the virtual environment.

conda activate labelImg

Install pyqt.

conda install pyqt=5

Click y to proceed.

Go to your TensorFlow folder, and create a new folder named addons.


Change to that directory using the cd command.

Type the following command to clone the repository:

git clone

Wait while labelImg downloads.

You should now have a folder named addons\labelImg under your TensorFlow folder.

Type exit to exit the terminal.

Open a new terminal window.

Activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your TensorFlow\addons\labelImg directory.

Type the following commands, one right after the other.

conda install pyqt=5
conda install lxml
pyrcc5 -o libs/ resources.qrc

Test the LabelImg Installation

Open a new terminal window.

Activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your TensorFlow\addons\labelImg directory.

Type the following commands:


If you see this window, you have successfully installed LabelImg. Here is a tutorial on how to label your own images. Congratulations!


Return to Table of Contents

Recognize Objects Using Your WebCam


Note: This section gets really technical. If you know the basics of computer vision and deep learning, it will make sense. Otherwise, it will not. You can skip this section and head straight to the Implementation section if you are not interested in what is going on under the hood of the object recognition application we are developing.

In this project, we use OpenCV and TensorFlow to create a system capable of automatically recognizing objects in a webcam. Each detected object is outlined with a bounding box labeled with the predicted object type as well as a detection score.

The detection score is the probability that a bounding box contains the object of a particular type (e.g. the confidence a model has that an object identified as a “backpack” is actually a backpack).

The particular SSD with Inception v2 model used in this project is the ssd_inception_v2_coco model. The ssd_inception_v2_coco model uses the Single Shot MultiBox Detector (SSD) for its architecture and the Inception v2 framework for feature extraction.

Single Shot MultiBox Detector (SSD)

Most state-of-the-art object detection methods involve the following stages:

  1. Hypothesize bounding boxes 
  2. Resample pixels or features for each box
  3. Apply a classifier

The Single Shot MultiBox Detector (SSD) eliminates the multi-stage process above and performs all object detection computations using just a single deep neural network.

Inception v2

Most state-of-the-art object detection methods based on convolutional neural networks at the time of the invention of Inception v2 added increasingly more convolution layers or neurons per layer in order to achieve greater accuracy. The problem with this approach is that it is computationally expensive and prone to overfitting. The Inception v2 architecture (as well as the Inception v3 architecture) was proposed in order to address these shortcomings.

Rather than stacking multiple kernel filter sizes sequentially within a convolutional neural network, the approach of the inception-based model is to perform a convolution on an input with multiple kernels all operating at the same layer of the network. By factorizing convolutions and using aggressive regularization, the authors were able to improve computational efficiency. Inception v2 factorizes the traditional 7 x 7 convolution into 3 x 3 convolutions.

Szegedy, Vanhoucke, Ioffe, Shlens, & Wojna, (2015) conducted an empirically-based demonstration in their landmark Inception v2 paper, which showed that factorizing convolutions and using aggressive dimensionality reduction can substantially lower computational cost while maintaining accuracy.

Data Set

The ssd_inception_v2_coco model used in this project is pretrained on the Common Objects in Context (COCO) data set (COCO data set), a large-scale data set that contains 1.5 million object instances and more than 200,000 labeled images. The COCO data required 70,000 crowd worker hours to gather, annotate, and organize images of objects in natural environments.

Software Dependencies

The following libraries form the object recognition backbone of the application implemented in this project:

  • OpenCV, a library of programming functions for computer vision.
  • Pillow, a library for manipulating images.
  • Numpy, a library for scientific computing.
  • Matplotlib, a library for creating graphs and visualizations.
  • TensorFlow Object Detection API, an open source framework developed by Google that enables the development, training, and deployment of pre-trained object detection models.

Return to Table of Contents


Now to the fun part, we will now recognize objects using our computer webcam.

Copy the following program, and save it to your TensorFlow\models\research\object_detection directory as .

# Import all the key libraries
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import cv2

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from utils import label_map_util
from utils import visualization_utils as vis_util

# Define the video stream
cap = cv2.VideoCapture(0)  

# Which model are we downloading?
# The models are listed here:
MODEL_NAME = 'ssd_inception_v2_coco_2018_01_28'

# Path to the frozen detection graph. 
# This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add the correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

# Number of classes to detect

# Download Model
opener = urllib.request.URLopener()
tar_file =
for file in tar_file.getmembers():
    file_name = os.path.basename(
    if 'frozen_inference_graph.pb' in file_name:
        tar_file.extract(file, os.getcwd())

# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph =
        tf.import_graph_def(od_graph_def, name='')

# Loading label map
# Label maps map indices to category names, so that when our convolution network 
# predicts `5`, we know that this corresponds to `airplane`.  Here we use internal 
# utility functions, but anything that returns a dictionary mapping integers to 
# appropriate string labels would be fine
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(
    label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

# Helper code
def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    return np.array(image.getdata()).reshape(
        (im_height, im_width, 3)).astype(np.uint8)
# Detection
with detection_graph.as_default():
    with tf.Session(graph=detection_graph) as sess:
        while True:

            # Read frame from camera
            ret, image_np =
            # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
            image_np_expanded = np.expand_dims(image_np, axis=0)
            # Extract image tensor
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
            # Extract detection boxes
            boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
            # Extract detection scores
            scores = detection_graph.get_tensor_by_name('detection_scores:0')
            # Extract detection classes
            classes = detection_graph.get_tensor_by_name('detection_classes:0')
            # Extract number of detectionsd
            num_detections = detection_graph.get_tensor_by_name(
            # Actual detection.
            (boxes, scores, classes, num_detections) =
                [boxes, scores, classes, num_detections],
                feed_dict={image_tensor: image_np_expanded})
            # Visualization of the results of a detection.

            # Display output
            cv2.imshow('object detection', cv2.resize(image_np, (800, 600)))

            if cv2.waitKey(25) & 0xFF == ord('q'):

print("We are finished! That was fun!")

Open a new terminal window.

Activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your TensorFlow\models\research\object_detection directory.

At the time of this writing, we need to use Numpy version 1.16.4. Type the following command to see what version of Numpy you have on your system.

pip show numpy

If it is not 1.16.4, execute the following commands:

pip uninstall numpy
pip install numpy==1.16.4

Now run, your program:


In about 30 to 90 seconds, you should see your webcam power up and object recognition take action. That’s it! Congratulations for making it to the end of this tutorial!


Keep building!

Return to Table of Contents