Real-Time Object Recognition Using a Webcam and Deep Learning

In this tutorial, we will develop a program that can recognize objects in a real-time video stream on a built-in laptop webcam using deep learning.


Object recognition involves two main tasks:

  1. Object Detection (Where are the objects?): Locate objects in a photo or video frame
  2. Image Classification (What are the objects?): Predict the type of each object in a photo or video frame

Humans can do both tasks effortlessly, but computers cannot.

Computers require a lot of processing power to take full advantage of the state-of-the-art algorithms that enable object recognition in real time. However, in recent years, the technology has matured, and real-time object recognition is now possible with only a laptop computer and a webcam.

Real-time object recognition systems are currently being used in a number of real-world applications, including the following:

  • Self-driving cars: detection of pedestrians, cars, traffic lights, bicycles, motorcycles, trees, sidewalks, etc.
  • Surveillance: catching thieves, counting people, identifying suspicious behavior, child detection.
  • Traffic monitoring: identifying traffic jams, catching drivers that are breaking the speed limit.
  • Security: face detection, identity verification on a smartphone.
  • Robotics: robotic surgery, agriculture, household chores, warehouses, autonomous delivery.
  • Sports: ball tracking in baseball, golf, and football.
  • Agriculture: disease detection in fruits.
  • Food: food identification.

There are a lot of steps in this tutorial. Have fun, be patient, and be persistent. Don’t give up! If something doesn’t work the first time around, try again. You will learn a lot more by fighting through to the end of this project. Stay relentless!

By the end of this tutorial, you will have the rock-solid confidence to detect and recognize objects in real time on your laptop’s GPU (Graphics Processing Unit) using deep learning.

Let’s get started!

Table of Contents

You Will Need

Install TensorFlow CPU

We need to get all the required software set up on our computer. I will be following this really helpful tutorial.

Open an Anaconda command prompt terminal.


Type the command below to create a virtual environment named tensorflow_cpu that has Python 3.6 installed. 

conda create -n tensorflow_cpu pip python=3.6

Press y and then ENTER.

A virtual environment is like an independent Python workspace which has its own set of libraries and Python version installed. For example, you might have a project that needs to run using an older version of Python, like Python 2.7. You might have another project that requires Python 3.7. You can create separate virtual environments for these projects.

Now, let’s activate the virtual environment by using this command:

conda activate tensorflow_cpu

Type the following command to install TensorFlow CPU.

pip install --ignore-installed --upgrade tensorflow==1.9

Wait for Tensorflow CPU to finish installing. Once it is finished installing, launch Python by typing the following command:



import tensorflow as tf

Here is what my screen looks like now:


Now type the following:

hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()

You should see a message that says: “Your CPU supports instructions that this TensorFlow binary….”. Just ignore that. Your TensorFlow will still run fine.

Now run this command to complete the test of the installation:


Press CTRL+Z. Then press ENTER to exit.



That’s it for TensorFlow CPU. Now let’s install TensorFlow GPU.

Return to Table of Contents

Install TensorFlow GPU

Your system must have the following requirements:

  • Nvidia GPU (GTX 650 or newer…I’ll show you later how to find out what Nvidia GPU version is in your computer)
  • CUDA Toolkit v9.0 (we will install this later in this tutorial)
  • CuDNN v7.0.5 (we will install this later in this tutorial)
  • Anaconda with Python 3.7+

Here is a good tutorial that walks through the installation, but I’ll outline all the steps below.

Install CUDA Toolkit v9.0

The first thing we need to do is to install the CUDA Toolkit v9.0. Go to this link.

Select your operating system. In my case, I will select Windows, x86_64, Version 10, and exe (local).


Download the Base Installer as well as all the patches. I downloaded all these files to my Desktop. It will take a while to download, so just wait while your computer downloads everything.


Open the folder where the downloads were saved to.


Double-click on the Base Installer program, the largest of the files that you downloaded from the website.

Click Yes to allow the program to make changes to your device.

Click OK to extract the files to your computer.


I saw this error window. Just click Continue.


Click Agree and Continue.


If you saw that error window earlier… “…you may not be able to run CUDA applications with this driver…,” select the Custom (Advanced) install option and click Next. Otherwise, do the Express installation and follow all the prompts.


Uncheck the Driver components, PhysX, and Visual Studio Integration options. Then click Next.

Click Next.


Wait for everything to install.


Click Close.


Delete  C:\Program Files\NVIDIA Corporation\Installer2.


Double-click on Patch 1.

Click Yes to allow changes to your computer.

Click OK.


Click Agree and Continue.


Go to Custom (Advanced) and click Next.


Click Next.


Click Close.

The process is the same for Patch 2. Double-click on Patch 2 now.

Click Yes to allow changes to your computer.

Click OK.

Click Agree and Continue.

Go to Custom (Advanced) and click Next.

Click Next.


Click Close.


The process is the same for Patch 3. Double-click on Patch 3 now.

Click Yes to allow changes to your computer.

Click OK.

Click Agree and Continue.

Go to Custom (Advanced) and click Next.

Click Next.

Click Close.

The process is the same for Patch 4. Double-click on Patch 4 now.

Click Yes to allow changes to your computer.

Click OK.

Click Agree and Continue.

Go to Custom (Advanced) and click Next.

Click Next.

After you’ve installed Patch 4, your screen should look like this:


Click Close.

To verify your CUDA installation, go to the command terminal on your computer, and type:

nvcc --version

Return to Table of Contents

Install the NVIDIA CUDA Deep Neural Network library (cuDNN)

Now that we installed the CUDA 9.0 base installer and its four patches, we need to install the NVIDIA CUDA Deep Neural Network library (cuDNN). Official instructions for installing are on this page, but I’ll walk you through the process below.

Go to

Create a user profile if needed and log in.


Go to this page:

Agree to the terms of the cuDNN Software License Agreement.


We have CUDA 9.0, so we need to click cuDNN v7.6.4 (September 27, 2019), for CUDA 9.0.


I have Windows 10, so I will download cuDNN Library for Windows 10.


In my case, the zip file downloaded to my Desktop. I will unzip that zip file now, which will create a new folder of the same name…just without the .zip part. These are your cuDNN files. We’ll come back to these in a second.


Before we get going, let’s double check what GPU we have. If you are on a Windows machine, search for the “Device Manager.”


Once you have the Device Manager open, you should see an option near the top for “Display Adapters.” Click the drop-down arrow next to that, and you should see the name of your GPU. Mine is NVIDIA GeForce GTX 1060.


If you are on Windows, you can also check what NVIDIA graphics driver you have by right-clicking on your Desktop and clicking the NVIDIA Control Panel. My version is 430.86. This version fits the requirements for cuDNN.

Ok, now that we have verified that our system meets the requirements, lets navigate to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0, your CUDA Toolkit directory.


Now go to your cuDNN files, that new folder that was created when you did the unzipping. Inside that folder, you should see a folder named cuda. Click on it.


Click bin.


Copy cudnn64_7.dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin. Your computer might ask you to allow Administrative Privileges. Just click Continue when you see that prompt.

Now go back to your cuDNN files. Inside the cuda folder, click on include. You should see a file named cudnn.h.


Copy that file to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include. Your computer might ask you to allow Administrative Privileges. Just click Continue when you see that prompt.

Now go back to your cuDNN files. Inside the cuda folder, click on lib -> x64. You should see a file named cudnn.lib. 


Copy that file to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64. Your computer might ask you to allow Administrative Privileges. Just click Continue when you see that prompt.

If you are using Windows, do a search on your computer for Environment Variables. An option should pop up to allow you to edit the Environment Variables on your computer.

Click on Environment Variables.


Make sure you CUDA_PATH variable is set to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0.


I recommend restarting your computer now.

Return to Table of Contents

Install TensorFlow GPU

Now we need to install TensorFlow GPU. Open a new Anaconda terminal window. 


Create a new Conda virtual environment named tensorflow_gpu by typing this command:

conda create -n tensorflow_gpu pip python=3.6

Type y and press Enter.


Activate the virtual environment.

conda activate tensorflow_gpu

Install TensorFlow GPU for Python.

pip install --ignore-installed --upgrade tensorflow-gpu==1.9

Wait for TensorFlow GPU to install.

Now let’s test the installation. Launch the Python interpreter.


Type this command.

import tensorflow as tf

If you don’t see an error, TensorFlow GPU is successfully installed.


Now type this:

hello = tf.constant('Hello, TensorFlow!')

And run this command. It might take a few minutes to run, so just wait until it finishes:

sess = tf.Session()

Now type this command to complete the test of the installation:


You can further confirm whether TensorFlow can access the GPU, by typing the following into the Python interpreter (just copy and paste into the terminal window while the Python interpreter is running).


To exit the Python interpreter, type:


And press Enter.

Return to Table of Contents

Install TensorFlow Models

Now that we have everything setup, let’s install some useful libraries. I will show you the steps for doing this in my TensorFlow GPU virtual environment, but the steps are the same for the TensorFlow CPU virtual environment.

Open a new Anaconda terminal window. Let’s take a look at the list of virtual environments that we can activate.

conda env list

I’m going to activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

Install the libraries. Type this command:

conda install pillow lxml jupyter matplotlib opencv cython

Press y to proceed.

Once that is finished, you need to create a folder somewhere that has the TensorFlow Models  (e.g. C:\Users\addis\Documents\TensorFlow). If you have a D drive, you can also save it there as well.

In your Anaconda terminal window, move to the TensorFlow directory you just created. You will use the cd command to change to that directory. For example:

cd C:\Users\addis\Documents\TensorFlow

Go to the TensorFlow models page on GitHub:

Click the button to download the zip file of the repository. It is a large file, so it will take a while to download.


Move the zip folder to the TensorFlow directory you created earlier and extract the contents.

Rename the extracted folder to models instead of models-master. Your TensorFlow directory hierarchy should look like this:


  • models
    • official
    • research
    • samples
    • tutorials

Return to Table of Contents

Install Protobuf

Now we need to install Protobuf, which is used by the TensorFlow Object Detection API to configure the training and model parameters.

Go to this page:

Download the latest * release (assuming you are on a Windows machine).


Create a folder in C:\Program Files named it Google Protobuf.

Extract the contents of the downloaded *, inside C:\Program Files\Google Protobuf


Search for Environment Variables on your system. A window should pop up that says System Properties.


Click Environment Variables.

Go down to the Path variable and click Edit.


Click New.


Add C:\Program Files\Google Protobuf\bin

You can also add it the Path System variable.

Click OK a few times to close out all the windows.

Open a new Anaconda terminal window.

I’m going to activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your \TensorFlow\models\research\ directory and run the following command:

for /f %i in ('dir /b object_detection\protos\*.proto') do protoc object_detection\protos\%i --python_out=.

Now go back to the Environment Variables on your system. Create a New Environment Variable named PYTHONPATH (if you don’t have one already). Replace C:\Python27amd64 if you don’t have Python installed there. Also, replace <your_path> with the path to your TensorFlow folder.


For example:


Now add these two paths to your PYTHONPATH environment variable:


Return to Table of Contents

Install COCO API

Now, we are going to install the COCO API. You don’t need to worry about what this is at this stage. I’ll explain it later.

Download the Visual Studios Build Tools here: Visual C++ 2015 build tools from here:

Choose the default installation.


After it has installed, restart your computer.


Open a new Anaconda terminal window.

I’m going to activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your \TensorFlow\models\research\ directory and run the following command to install pycocotools (everything below goes on one line):

pip install git+

If it doesn’t work, install git:

Follow all the default settings for installing Git. You will have to click Next several times.

Once you have finished installing Git, run this command (everything goes on one line):

pip install git+

Return to Table of Contents

Test the Installation

Open a new Anaconda terminal window.

I’m going to activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your \TensorFlow\models\research\object_detection\builders directory and run the following command to test your installation.


You should see an OK message.


Return to Table of Contents

Install LabelImg

Now we will install LabelImg, a graphical image annotation tool for labeling object bounding boxes in images.

Open a new Anaconda/Command Prompt window.

Create a new virtual environment named labelImg by typing the following command:

conda create -n labelImg

Activate the virtual environment.

conda activate labelImg

Install pyqt.

conda install pyqt=5

Click y to proceed.

Go to your TensorFlow folder, and create a new folder named addons.


Change to that directory using the cd command.

Type the following command to clone the repository:

git clone

Wait while labelImg downloads.

You should now have a folder named addons\labelImg under your TensorFlow folder.

Type exit to exit the terminal.

Open a new terminal window.

Activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your TensorFlow\addons\labelImg directory.

Type the following commands, one right after the other.

conda install pyqt=5
conda install lxml
pyrcc5 -o libs/ resources.qrc

Test the LabelImg Installation

Open a new terminal window.

Activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your TensorFlow\addons\labelImg directory.

Type the following commands:


If you see this window, you have successfully installed LabelImg. Here is a tutorial on how to label your own images. Congratulations!


Return to Table of Contents

Recognize Objects Using Your WebCam


Note: This section gets really technical. If you know the basics of computer vision and deep learning, it will make sense. Otherwise, it will not. You can skip this section and head straight to the Implementation section if you are not interested in what is going on under the hood of the object recognition application we are developing.

In this project, we use OpenCV and TensorFlow to create a system capable of automatically recognizing objects in a webcam. Each detected object is outlined with a bounding box labeled with the predicted object type as well as a detection score.

The detection score is the probability that a bounding box contains the object of a particular type (e.g. the confidence a model has that an object identified as a “backpack” is actually a backpack).

The particular SSD with Inception v2 model used in this project is the ssd_inception_v2_coco model. The ssd_inception_v2_coco model uses the Single Shot MultiBox Detector (SSD) for its architecture and the Inception v2 framework for feature extraction.

Single Shot MultiBox Detector (SSD)

Most state-of-the-art object detection methods involve the following stages:

  1. Hypothesize bounding boxes 
  2. Resample pixels or features for each box
  3. Apply a classifier

The Single Shot MultiBox Detector (SSD) eliminates the multi-stage process above and performs all object detection computations using just a single deep neural network.

Inception v2

Most state-of-the-art object detection methods based on convolutional neural networks at the time of the invention of Inception v2 added increasingly more convolution layers or neurons per layer in order to achieve greater accuracy. The problem with this approach is that it is computationally expensive and prone to overfitting. The Inception v2 architecture (as well as the Inception v3 architecture) was proposed in order to address these shortcomings.

Rather than stacking multiple kernel filter sizes sequentially within a convolutional neural network, the approach of the inception-based model is to perform a convolution on an input with multiple kernels all operating at the same layer of the network. By factorizing convolutions and using aggressive regularization, the authors were able to improve computational efficiency. Inception v2 factorizes the traditional 7 x 7 convolution into 3 x 3 convolutions.

Szegedy, Vanhoucke, Ioffe, Shlens, & Wojna, (2015) conducted an empirically-based demonstration in their landmark Inception v2 paper, which showed that factorizing convolutions and using aggressive dimensionality reduction can substantially lower computational cost while maintaining accuracy.

Data Set

The ssd_inception_v2_coco model used in this project is pretrained on the Common Objects in Context (COCO) data set (COCO data set), a large-scale data set that contains 1.5 million object instances and more than 200,000 labeled images. The COCO data required 70,000 crowd worker hours to gather, annotate, and organize images of objects in natural environments.

Software Dependencies

The following libraries form the object recognition backbone of the application implemented in this project:

  • OpenCV, a library of programming functions for computer vision.
  • Pillow, a library for manipulating images.
  • Numpy, a library for scientific computing.
  • Matplotlib, a library for creating graphs and visualizations.
  • TensorFlow Object Detection API, an open source framework developed by Google that enables the development, training, and deployment of pre-trained object detection models.

Return to Table of Contents


Now to the fun part, we will now recognize objects using our computer webcam.

Copy the following program, and save it to your TensorFlow\models\research\object_detection directory as .

# Import all the key libraries
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import cv2

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from utils import label_map_util
from utils import visualization_utils as vis_util

# Define the video stream
cap = cv2.VideoCapture(0)  

# Which model are we downloading?
# The models are listed here:
MODEL_NAME = 'ssd_inception_v2_coco_2018_01_28'

# Path to the frozen detection graph. 
# This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add the correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

# Number of classes to detect

# Download Model
opener = urllib.request.URLopener()
tar_file =
for file in tar_file.getmembers():
    file_name = os.path.basename(
    if 'frozen_inference_graph.pb' in file_name:
        tar_file.extract(file, os.getcwd())

# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph =
        tf.import_graph_def(od_graph_def, name='')

# Loading label map
# Label maps map indices to category names, so that when our convolution network 
# predicts `5`, we know that this corresponds to `airplane`.  Here we use internal 
# utility functions, but anything that returns a dictionary mapping integers to 
# appropriate string labels would be fine
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(
    label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

# Helper code
def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    return np.array(image.getdata()).reshape(
        (im_height, im_width, 3)).astype(np.uint8)
# Detection
with detection_graph.as_default():
    with tf.Session(graph=detection_graph) as sess:
        while True:

            # Read frame from camera
            ret, image_np =
            # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
            image_np_expanded = np.expand_dims(image_np, axis=0)
            # Extract image tensor
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
            # Extract detection boxes
            boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
            # Extract detection scores
            scores = detection_graph.get_tensor_by_name('detection_scores:0')
            # Extract detection classes
            classes = detection_graph.get_tensor_by_name('detection_classes:0')
            # Extract number of detectionsd
            num_detections = detection_graph.get_tensor_by_name(
            # Actual detection.
            (boxes, scores, classes, num_detections) =
                [boxes, scores, classes, num_detections],
                feed_dict={image_tensor: image_np_expanded})
            # Visualization of the results of a detection.

            # Display output
            cv2.imshow('object detection', cv2.resize(image_np, (800, 600)))

            if cv2.waitKey(25) &amp; 0xFF == ord('q'):

print("We are finished! That was fun!")

Open a new terminal window.

Activate the TensorFlow GPU virtual environment.

conda activate tensorflow_gpu

cd into your TensorFlow\models\research\object_detection directory.

At the time of this writing, we need to use Numpy version 1.16.4. Type the following command to see what version of Numpy you have on your system.

pip show numpy

If it is not 1.16.4, execute the following commands:

pip uninstall numpy
pip install numpy==1.16.4

Now run, your program:


In about 30 to 90 seconds, you should see your webcam power up and object recognition take action. That’s it! Congratulations for making it to the end of this tutorial!


Keep building!

Return to Table of Contents

What is the Difference Between Mathematical Morphology Filters and Convolution Filters?

What is the Difference Between Mathematical Morphology Filters and Convolution Filters?

Answer: Linearity

Convolution filters generate output images in which the brightness value at a particular pixel depends on the weighted sum (i.e. linear combination) of the brightness of the neighboring pixels.

Mathematical morphology filters on the other hand perform nonlinear processing on images. These filters depend only on the relative ordering of pixel values as opposed to their numerical values. This property of mathematical morphology filters makes them really good when applied to binary images (a digital image that can only have two possible values for each pixel).

Types of Convolution and Mathematical Morphology Filters

This page at has a good overview of the different convolution filters and morphology filters.

The standard convolution filters are:

  • High Pass
  • Low Pass
  • Laplacian
  • Directional
  • Gaussian Low Pass
  • Gaussian High Pass
  • Median 
  • Sobel
  • Roberts 
  • User-Defined Convolution

The standard mathematical morphology filters are:

  • Dilation
  • Erosion
  • Opening
  • Closing

Noise Reduction Using Mathematical Morphology vs. Convolution Filters

Someone asked me this question the other day: What are the benefits and limitations of applying an image processing application such as noise reduction using mathematical morphology vs. convolution?

Before we get into the pros and cons of mathematical morphology and convolutions filters applied to noise reduction in images, let us take a look at the definitions of these terms.

Mathematical morphology is an image processing technique based on two operations: erosion and dilation. Erosion enlarges objects in an image, while dilation shrinks objects in an image.

Convolution filtering involves taking an image as input and generating an output image where each new pixel value is determined by the weighted values of itself and its neighboring pixels.

Noise reduction involves “cleaning up” an image. The goal is to take an image as input and get rid of all the unnecessary elements in that image so that it looks better.

Below are the pros and cons of doing noise reduction using mathematical morphology vs. convolution filters.

Mathematical Morphology


  • Simplicity from a theoretical perspective (it is based on basic set theory) 
  • Simplicity from an operational perspective (can be implemented with a few lines of code:
  • Computationally efficient
  • Useful for removing noise in grayscale images
  • Useful for detaching two objects that are connected together (erosion)
  • Useful for connecting an object that is broken apart in an image (dilation)
  • Can remove noise without substantially altering the underlying shape of an object


Convolution Filters



  • More complicated from an operational perspective (so many techniques and kernels to choose from…how does one decide which one is best?)
  • Can remove important image gradients because filter output is proportional to the contrast of a given section of an image
  • Shape of an object can become altered or distorted

How to Apply a Mask to an Image Using OpenCV

In this project, we will learn how to apply a mask to an image using OpenCV. Image masking involves highlighting a specific object within an image by masking it.


  • Develop a program that takes a color image as input and allows the user to apply a mask.
  • When the user presses “r,” the program masks the image and produces an output image which is the image in black and white (i.e. grayscale) with only the masked area in color.

You Will Need 

  • Python 3.7+


Let’s say you have the following image:


You want to highlight the apple in the image by applying a mask. The desired output is as follows.


You also want to see the process it took to get to that output image above. In other words, you want to have the program output, not only the masked image (as above), but also a table that shows all the steps involved: input image -> mask -> output.


To implement what I’ve described above, you will require two programs: and is a helper program. is the main driver program. To run it, you will type:

python []

For example,

python apple.jpg

Here is the code. I recommend:

  1. Copying and pasting both programs into a directory.
  2. Put your input images into that same directory.
  3. Run the program.

#!/usr/bin/env python

Welcome to the Image Masking Program!

This program allows users to highlight a specific 
object within an image by masking it.

Usage: [<image>]

  r     - mask the image
  SPACE - reset the inpainting mask
  ESC   - exit

# Python 2/3 compatibility
from __future__ import print_function

import cv2 # Import the OpenCV library
import numpy as np # Import Numpy library
import matplotlib.pyplot as plt # Import matplotlib functionality
import sys # Enables the passing of arguments
from common import Sketcher

# Project: Image Masking Using OpenCV
# Author: Addison Sears-Collins
# Date created: 9/18/2019
# Python version: 3.7
# Description: This program allows users to highlight a specific 
# object within an image by masking it.

# Define the file name of the image
INPUT_IMAGE = "fruits.jpg"
OUTPUT_IMAGE = IMAGE_NAME + "_output.jpg"
TABLE_IMAGE = IMAGE_NAME + "_table.jpg"

def main():
    Main method of the program.
    # Pull system arguments
        fn = sys.argv[1]
        fn = INPUT_IMAGE

    # Load the image and store into a variable
    image = cv2.imread(cv2.samples.findFile(fn))

    if image is None:
        print('Failed to load image file:', fn)

    # Create an image for sketching the mask
    image_mark = image.copy()
    sketch = Sketcher('Image', [image_mark], lambda : ((255, 255, 255), 255))

    # Sketch a mask
    while True:
        ch = cv2.waitKey()
        if ch == 27: # ESC - exit
        if ch == ord('r'): # r - mask the image
        if ch == ord(' '): # SPACE - reset the inpainting mask
            image_mark[:] = image

    # define range of white color in HSV
    lower_white = np.array([0,0,255])
    upper_white = np.array([255,255,255])

    # Create the mask
    mask = cv2.inRange(image_mark, lower_white, upper_white)

    # Create the inverted mask
    mask_inv = cv2.bitwise_not(mask)

    # Convert to grayscale image
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # Extract the dimensions of the original image
    rows, cols, channels = image.shape
    image = image[0:rows, 0:cols]

    # Bitwise-OR mask and original image
    colored_portion = cv2.bitwise_or(image, image, mask = mask)
    colored_portion = colored_portion[0:rows, 0:cols]

    # Bitwise-OR inverse mask and grayscale image
    gray_portion = cv2.bitwise_or(gray, gray, mask = mask_inv)
    gray_portion = np.stack((gray_portion,)*3, axis=-1)

    # Combine the two images
    output = colored_portion + gray_portion

    # Save the image
    cv2.imwrite(OUTPUT_IMAGE, output)

    # Create a table showing input image, mask, and output
    mask = np.stack((mask,)*3, axis=-1)
    table_of_images = np.concatenate((image, mask, output), axis=1)
    cv2.imwrite(TABLE_IMAGE, table_of_images)

    # Display images, used for debugging
    #cv2.imshow('Original Image', image)
    #cv2.imshow('Sketched Mask', image_mark)
    #cv2.imshow('Mask', mask)
    #cv2.imshow('Output Image', output)
    cv2.imshow('Table of Images', table_of_images)
    cv2.waitKey(0) # Wait for a keyboard event

if __name__ == '__main__':

#!/usr/bin/env python

This module contains some common routines used by other samples.

# Python 2/3 compatibility
from __future__ import print_function
import sys
PY3 = sys.version_info[0] == 3

if PY3:
    from functools import reduce

import numpy as np
import cv2 as cv

# built-in modules
import os
import itertools as it
from contextlib import contextmanager

image_extensions = ['.bmp', '.jpg', '.jpeg', '.png', '.tif', '.tiff', '.pbm', '.pgm', '.ppm']

class Bunch(object):
    def __init__(self, **kw):
    def __str__(self):
        return str(self.__dict__)

def splitfn(fn):
    path, fn = os.path.split(fn)
    name, ext = os.path.splitext(fn)
    return path, name, ext

def anorm2(a):
    return (a*a).sum(-1)
def anorm(a):
    return np.sqrt( anorm2(a) )

def homotrans(H, x, y):
    xs = H[0, 0]*x + H[0, 1]*y + H[0, 2]
    ys = H[1, 0]*x + H[1, 1]*y + H[1, 2]
    s  = H[2, 0]*x + H[2, 1]*y + H[2, 2]
    return xs/s, ys/s

def to_rect(a):
    a = np.ravel(a)
    if len(a) == 2:
        a = (0, 0, a[0], a[1])
    return np.array(a, np.float64).reshape(2, 2)

def rect2rect_mtx(src, dst):
    src, dst = to_rect(src), to_rect(dst)
    cx, cy = (dst[1] - dst[0]) / (src[1] - src[0])
    tx, ty = dst[0] - src[0] * (cx, cy)
    M = np.float64([[ cx,  0, tx],
                    [  0, cy, ty],
                    [  0,  0,  1]])
    return M

def lookat(eye, target, up = (0, 0, 1)):
    fwd = np.asarray(target, np.float64) - eye
    fwd /= anorm(fwd)
    right = np.cross(fwd, up)
    right /= anorm(right)
    down = np.cross(fwd, right)
    R = np.float64([right, down, fwd])
    tvec =, eye)
    return R, tvec

def mtx2rvec(R):
    w, u, vt = cv.SVDecomp(R - np.eye(3))
    p = vt[0] + u[:,0]*w[0]    # same as, vt[0])
    c =[0], p)
    s =[1], p)
    axis = np.cross(vt[0], vt[1])
    return axis * np.arctan2(s, c)

def draw_str(dst, target, s):
    x, y = target
    cv.putText(dst, s, (x+1, y+1), cv.FONT_HERSHEY_PLAIN, 1.0, (0, 0, 0), thickness = 2, lineType=cv.LINE_AA)
    cv.putText(dst, s, (x, y), cv.FONT_HERSHEY_PLAIN, 1.0, (255, 255, 255), lineType=cv.LINE_AA)

class Sketcher:
    def __init__(self, windowname, dests, colors_func):
        self.prev_pt = None
        self.windowname = windowname
        self.dests = dests
        self.colors_func = colors_func
        self.dirty = False
        cv.setMouseCallback(self.windowname, self.on_mouse)

    def show(self):
        cv.imshow(self.windowname, self.dests[0])

    def on_mouse(self, event, x, y, flags, param):
        pt = (x, y)
        if event == cv.EVENT_LBUTTONDOWN:
            self.prev_pt = pt
        elif event == cv.EVENT_LBUTTONUP:
            self.prev_pt = None

        if self.prev_pt and flags &amp; cv.EVENT_FLAG_LBUTTON:
            for dst, color in zip(self.dests, self.colors_func()):
                cv.line(dst, self.prev_pt, pt, color, 5)
            self.dirty = True
            self.prev_pt = pt

# palette data from matplotlib/
_jet_data =   {'red':   ((0., 0, 0), (0.35, 0, 0), (0.66, 1, 1), (0.89,1, 1),
                         (1, 0.5, 0.5)),
               'green': ((0., 0, 0), (0.125,0, 0), (0.375,1, 1), (0.64,1, 1),
                         (0.91,0,0), (1, 0, 0)),
               'blue':  ((0., 0.5, 0.5), (0.11, 1, 1), (0.34, 1, 1), (0.65,0, 0),
                         (1, 0, 0))}

cmap_data = { 'jet' : _jet_data }

def make_cmap(name, n=256):
    data = cmap_data[name]
    xs = np.linspace(0.0, 1.0, n)
    channels = []
    eps = 1e-6
    for ch_name in ['blue', 'green', 'red']:
        ch_data = data[ch_name]
        xp, yp = [], []
        for x, y1, y2 in ch_data:
            xp += [x, x+eps]
            yp += [y1, y2]
        ch = np.interp(xs, xp, yp)
    return np.uint8(np.array(channels).T*255)

def nothing(*arg, **kw):

def clock():
    return cv.getTickCount() / cv.getTickFrequency()

def Timer(msg):
    print(msg, '...',)
    start = clock()
        print("%.2f ms" % ((clock()-start)*1000))

class StatValue:
    def __init__(self, smooth_coef = 0.5):
        self.value = None
        self.smooth_coef = smooth_coef
    def update(self, v):
        if self.value is None:
            self.value = v
            c = self.smooth_coef
            self.value = c * self.value + (1.0-c) * v

class RectSelector:
    def __init__(self, win, callback): = win
        self.callback = callback
        cv.setMouseCallback(win, self.onmouse)
        self.drag_start = None
        self.drag_rect = None
    def onmouse(self, event, x, y, flags, param):
        x, y = np.int16([x, y]) # BUG
        if event == cv.EVENT_LBUTTONDOWN:
            self.drag_start = (x, y)
        if self.drag_start:
            if flags &amp; cv.EVENT_FLAG_LBUTTON:
                xo, yo = self.drag_start
                x0, y0 = np.minimum([xo, yo], [x, y])
                x1, y1 = np.maximum([xo, yo], [x, y])
                self.drag_rect = None
                if x1-x0 > 0 and y1-y0 > 0:
                    self.drag_rect = (x0, y0, x1, y1)
                rect = self.drag_rect
                self.drag_start = None
                self.drag_rect = None
                if rect:
    def draw(self, vis):
        if not self.drag_rect:
            return False
        x0, y0, x1, y1 = self.drag_rect
        cv.rectangle(vis, (x0, y0), (x1, y1), (0, 255, 0), 2)
        return True
    def dragging(self):
        return self.drag_rect is not None

def grouper(n, iterable, fillvalue=None):
    '''grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx'''
    args = [iter(iterable)] * n
    if PY3:
        output = it.zip_longest(fillvalue=fillvalue, *args)
        output = it.izip_longest(fillvalue=fillvalue, *args)
    return output

def mosaic(w, imgs):
    '''Make a grid from images.

    w    -- number of grid columns
    imgs -- images (must have same size and format)
    imgs = iter(imgs)
    if PY3:
        img0 = next(imgs)
        img0 =
    pad = np.zeros_like(img0)
    imgs = it.chain([img0], imgs)
    rows = grouper(w, imgs, pad)
    return np.vstack(map(np.hstack, rows))

def getsize(img):
    h, w = img.shape[:2]
    return w, h

def mdot(*args):
    return reduce(, args)

def draw_keypoints(vis, keypoints, color = (0, 255, 255)):
    for kp in keypoints:
        x, y =, (int(x), int(y)), 2, color)

How to Blend Multiple Images Using OpenCV

In this project, we will blend multiple images using OpenCV. “Blending” means that we compute a weighted average of the pixel values for a set of color images which have the same dimensions.

You Will Need 

  • Python 3.7+
  • A bunch of images that you want to blend together.


Let’s say you have a set of images. You would like to create a new image which is the average of all the images.

For example, I have about 10 images which I obtained from this weather forecast website. I’m interested in seeing the areas which will receive the most snow on average over the next 10 days (i.e. the darkest blues). In order to do that I need to create a single image which blends the weather forecast frames for the next 10 days.

Below is a slide show of the images I would like to blend.


Let’s blend all those images so that we create an “average” image. Here is the code:

# Python program for blending multiple images using OpenCV

import glob
import numpy as np
import cv2

# Import all image files with the .jpg extension
files = glob.glob ("*.jpg")
image_data = []
for my_file in files:
    this_image = cv2.imread(my_file, 1)

# Calculate blended image
dst = image_data[0]
for i in range(len(image_data)):
	if i == 0:
		alpha = 1.0/(i + 1)
		beta = 1.0 - alpha
		dst = cv2.addWeighted(image_data[i], alpha, dst, beta, 0.0)

# Save blended image
cv2.imwrite('weather_forecast.png', dst)

What I’m doing above is importing all images in the current directory that have the .jpg extension.

I then put each image into a list.

I multiply each image by a weight. The weight depends on how many images there are. So for example, if I have 10 images in total, each image gets multiplied by 1/10.

After computing the “average” image, I save it as weather_forecast.png. Here is the result:


Pretty cool! We can see that the snowiest areas will be in Utah, central Arizona, and southwest portions of Colorado. Now I know where to hit the slopes!

Keep building!

How to Display an Image Using OpenCV

In this project, I will show you how to display an image using OpenCV.

You Will Need 

  • Python 3.7+


Let’s say you have an image like the one below. The file name is 1.jpg.


To display it using OpenCV, go to your favorite IDE or text editor and create the following Python program:

# Display a color image using OpenCV
import numpy as np
import cv2

# Load an color image in grayscale
img = cv2.imread('1.jpg',1)


Save the program into the same directory as 1.jpg.

Run the file.


Watch the image display to your computer. That’s it!


Pros and Cons of Gaussian Smoothing

First, before we get into the pros and cons of Gaussian smoothing, let us take a quick look at what Gaussian smoothing is and why we use it.

What is Gaussian Smoothing?

Have you ever had a photo or portrait of either yourself or someone else and wanted to smooth out the facial imperfections, pimples, pores, or wrinkles? Gaussian smoothing (also known as Gaussian blur) is one way to do this. Gaussian smoothing uses a mathematical equation called the Gaussian function to blur an image, reducing image detail and noise.

Below is an example of an image with a small and large Gaussian blur. 


Image Source: Wikimedia

Pros of Gaussian Smoothing

Reduces noise in an image

Noise reduction is one of the main use cases of Gaussian smoothing. 

Easy to implement

No complicated algorithms with multiple nested for loops needed. As you can see in this MATLAB implementation, Gaussian smoothing can be done with just a single line of code.

Automatic censoring

Some use cases might require you to conceal the identity of someone or to censor images that might contain material that might be inappropriate to certain audiences. Gaussian smoothing works well in these cases.


Gaussian smoothing produces an image that is rotationally symmetric. It is applied the same no matter what direction you go in.

Cons of Gaussian Smoothing

Lose fine image detail and contrast

If you have a use case that requires you to examine fine detail, Gaussian smoothing might make that a lot harder. An example where you might want to examine fine detail would be in a medical image or a robot trying to grasp a specific point on an object.

Does not handle “salt and pepper noise” well

Sometimes an image might have what is known as “salt-and-pepper noise.” Salt-and-pepper noise is defined as sparsely occurring white and black pixels. Below is an image showing salt-and-pepper noise.


Image Source: Wikimedia

Median filters typically do a better job than Gaussian smoothing at handling salt-and-pepper noise.

How to Annotate Images Using OpenCV

In this project, we will learn how to annotate images using OpenCV — a popular and powerful open source library for image processing and computer vision. OpenCV is a cross-platform library with wrappers for Python, Ruby, C#, JavaScript, and other languages designed for real-time image processing. OpenCV has methods for image I/O, filtering, motion tracking, segmentation, 3D reconstruction, as well as machine learning techniques such as boosting, support vector machines, and deep learning.


  • Design a software application using Python and OpenCV that allows users to click in an image, annotate a number of points within an image, and export the annotated points into a CSV file.
    • Code must be implemented in Python and using OpenCV
    • Students should NOT use Jupyter Notebooks for this project
    • The input image and output CSV files will be provided as parameters.
      • Example: python cat_dog.jpg cat_dog.csv

You Will Need 

  • Python 3.7

Input Images



To run the program, open up an Anaconda Prompt terminal

Go to the proper directory.

Type python cat_dog.jpg cat_dog.csv to run the program.

Here is the code:

import cv2 # Import the OpenCV library
import numpy as np # Import Numpy library
import pandas as pd # Import Pandas library
import sys # Enables the passing of arguments

# Project: Annotate Images Using OpenCV
# Author: Addison Sears-Collins
# Date created: 9/11/2019
# Python version: 3.7
# Description: This program allows users to click in an image, annotate a 
#   number of points within an image, and export the annotated points into
#   a CSV file.

# Define the file name of the image
INPUT_IMAGE = sys.argv[1] # "cat_dog.jpg"
OUTPUT_IMAGE = IMAGE_NAME + "_annotated.jpg"
output_csv_file = sys.argv[2]

# Load the image and store into a variable
# -1 means load unchanged
image = cv2.imread(INPUT_IMAGE, -1)

# Create lists to store all x, y, and annotation values
x_vals = []
y_vals = []
annotation_vals = []

# Dictionary containing some colors
colors = {'blue': (255, 0, 0), 'green': (0, 255, 0), 'red': (0, 0, 255), 
          'yellow': (0, 255, 255),'magenta': (255, 0, 255), 
          'cyan': (255, 255, 0), 'white': (255, 255, 255), 'black': (0, 0, 0), 
          'gray': (125, 125, 125), 
          'rand': np.random.randint(0, high=256, size=(3,)).tolist(), 
          'dark_gray': (50, 50, 50), 'light_gray': (220, 220, 220)}

def draw_circle(event, x, y, flags, param):
    Draws dots on double clicking of the left mouse button
    # Store the height and width of the image
    height = image.shape[0]
    width = image.shape[1]

    if event == cv2.EVENT_LBUTTONDBLCLK:
        # Draw the dot, (x, y), 5, colors['magenta'], -1)

        # Annotate the image
        txt = input("Describe this pixel using one word (e.g. dog) and press ENTER: ")

        # Append values to the list

        # Print the coordinates and the annotation to the console
        print("x = " + str(x) + "  y = " + str(y) + "  Annotation = " + txt + "\n")

        # Set the position of the text part of the annotation
        text_x_pos = None
        text_y_pos = y

        if x < (width/2):
            text_x_pos = int(x + (width * 0.075))
            text_x_pos = int(x - (width * 0.075))
        # Write text on the image
        cv2.putText(image, txt, (text_x_pos,text_y_pos), cv2.FONT_HERSHEY_SIMPLEX, 1, colors['magenta'], 2)

        cv2.imwrite(OUTPUT_IMAGE, image)

        # Prompt user for another annotation
        print("Double click another pixel or press 'q' to quit...\n")

print("Welcome to the Image Annotation Program!\n")
print("Double click anywhere inside the image to annotate that point...\n")

# We create a named window where the mouse callback will be established
cv2.namedWindow('Image mouse')

# We set the mouse callback function to 'draw_circle':
cv2.setMouseCallback('Image mouse', draw_circle)

while True:
    # Show image 'Image mouse':
    cv2.imshow('Image mouse', image)

    # Continue until 'q' is pressed:
    if cv2.waitKey(20) &amp; 0xFF == ord('q'):

# Create a dictionary using lists
data = {'X':x_vals,'Y':y_vals,'Annotation':annotation_vals}

# Create the Pandas DataFrame
df = pd.DataFrame(data)

# Export the dataframe to a csv file
df.to_csv(path_or_buf = output_csv_file, index = None, header=True) 

# Destroy all generated windows:

Output Images


CSV Output

Here is the output for the csv file for the baby photo above:


How to Create an Image Histogram Using OpenCV

Given an image as input, how do we get the corresponding histogram using OpenCV? First, let us take a look at what a histogram is, then let us take a look at how to create one given an image. 

What is a Histogram?

A histogram is another way of looking at an image. It is a graph that shows pixel brightness values on the x-axis (e.g. 0 [black] to 255 [white] for grayscale images) and the corresponding number (i.e. frequency) of pixels (for each brightness value) on the y-axis. 

How to Create an Image Histogram Using OpenCV

There are two links I particularly like that show how to create the image histogram given an input image.

  1. Geeks for Geeks
  2. OpenCV Python Tutorials

I like these tutorials because they lead the reader through all the essentials of how to find and analyze image histograms, step-by-step. This process boils down to the following code:

# Import the required libraries
import cv2  # Open CV
from matplotlib import pyplot as plt  #Matplotlib for plotting  
# Read the input image
img = cv2.imread('example.jpg',0) 
# Calculate the frequency of pixels in the brightness range 0 - 255
histr = cv2.calcHist([img],[0],None,[256],[0,256]) 
# Plot the histogram and display

Why Use CMOS Instead of CCD Sensors in Mobile Phones

If I were designing a new state-of-the-art mobile phone, I would choose Complementary metal–oxide–semiconductor (CMOS). CMOS has several advantages over a Charge-coupled device (CCD), which I will explain below.

Processing Speed

In CCD, photosites are passive, whereas in CMOS they are not…leading to slower processing speed and information transfer.

A photosite is denoted as a single color pixel in a CCD or CMOS sensor. In a CCD sensor, light is captured and converted into a charge. The charge accumulates in the photosites, is transferred to a voltage converter, and is then amplified. This whole process happens one row at a time. This video below demonstrates this process. 

However, with a CMOS sensor, the charge to voltage conversion and the amplification of the voltage occurs inside each photosite. Because the work on an image happens locally, processing and information transfer is faster than a CCD sensor.

Space Requirements

CMOS enables integration of timers and analog-to-digital converters, which conserves space.

CCD is an older technology, and it is not possible to integrate peripheral components like analog-to-digital converters and timers on a single chip. CMOS does enable integration of these components onto a single chip, which conserves space. 

For a mobile phone that needs to be limited to a certain size, space must be conserved, which gives CMOS an advantage for use in mobile phones.

Power Consumption

CCD consumes more power than CMOS. CCD needs a variety of power supplies for the timing clocks. Also, it requires a voltage of 7V to 10V.

A CMOS sensor requires just one power supply and requires a voltage of 3.3V to 5V, roughly 50% less than a CCD sensor. This lower power consumption means extended battery life.

CMOS Prevents Blooming

In a CCD sensor, when an image is overexposed, electrons pile up in the areas of the brightest part of the image and overflow to other photosites, which creates unwanted light streaks. The structure of CMOS sensors prevents this problem.


CMOS chips can be produced on virtually any standard silicon production line, whereas this is not the case for CCD chips. As a result, production cost is lower for CMOS chips. These cost savings result in better profit margins for mobile phone companies.