Real-Time Object Tracking Using OpenCV and a Webcam

In this tutorial, we will create a program to track a moving object in real-time using the built-in webcam of a laptop computer. We will use Python and the OpenCV computer vision library for the code.


A real-world application of this is in robotics. Imagine you have a robot arm that needs to continuously pick up moving items from a conveyor belt inside a warehouse. In order for the robot to pick up an object it needs to know the exact coordinates of the object. The program we will create below will give you the basic building block to do just that. It will locate the coordinates of the center of the moving object (often called the “centroid“) in real-time using an ordinary webcam.

Let’s get started!


  • Python 3.7 or higher


Using real-time streaming video from your built-in webcam, create a program that:

  • Draws a bounding box around a moving object
  • Calculates the coordinates of the centroid of the object
  • Tracks the centroid of the object


Open up your favorite IDE or code editor.

Make sure you have the OpenCV and Numpy libraries installed. There are a number of ways to install both libraries. The most common way is to use pip, which is the standard package manager for Python.

pip install opencv-python
pip install numpy

Copy and paste the code below. This is all you need to run the program.

I put detailed comments inside the code so that you know what is going on. The technique used here is background subtraction, one of the most common ways to detect moving objects in a video stream:

#!/usr/bin/env python

Welcome to the Object Tracking Program!

Using real-time streaming video from your built-in webcam, this program:
  - Creates a bounding box around a moving object
  - Calculates the coordinates of the centroid of the object
  - Tracks the centroid of the object

  - Addison Sears-Collins

from __future__ import print_function # Python 2/3 compatibility
import cv2 # Import the OpenCV library
import numpy as np # Import Numpy library

# Project: Object Tracking
# Author: Addison Sears-Collins 
# Website:
# Date created: 06/13/2020
# Python version: 3.7

def main():
    Main method of the program.

    # Create a VideoCapture object
    cap = cv2.VideoCapture(0)

    # Create the background subtractor object
    # Use the last 700 video frames to build the background
    back_sub = cv2.createBackgroundSubtractorMOG2(history=700, 
        varThreshold=25, detectShadows=True)

    # Create kernel for morphological operation
    # You can tweak the dimensions of the kernel
    # e.g. instead of 20,20 you can try 30,30.
    kernel = np.ones((20,20),np.uint8)


        # Capture frame-by-frame
        # This method returns True/False as well
        # as the video frame.
        ret, frame =

        # Use every frame to calculate the foreground mask and update
        # the background
        fg_mask = back_sub.apply(frame)

        # Close dark gaps in foreground object using closing
        fg_mask = cv2.morphologyEx(fg_mask, cv2.MORPH_CLOSE, kernel)

        # Remove salt and pepper noise with a median filter
        fg_mask = cv2.medianBlur(fg_mask, 5) 
        # Threshold the image to make it either black or white
        _, fg_mask = cv2.threshold(fg_mask,127,255,cv2.THRESH_BINARY)

        # Find the index of the largest contour and draw bounding box
        fg_mask_bb = fg_mask
        contours, hierarchy = cv2.findContours(fg_mask_bb,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[-2:]
        areas = [cv2.contourArea(c) for c in contours]

        # If there are no countours
        if len(areas) < 1:

            # Display the resulting frame

            # If "q" is pressed on the keyboard, 
            # exit this loop
            if cv2.waitKey(1) & 0xFF == ord('q'):

            # Go to the top of the while loop

            # Find the largest moving object in the image
            max_index = np.argmax(areas)

        # Draw the bounding box
        cnt = contours[max_index]
        x,y,w,h = cv2.boundingRect(cnt)

        # Draw circle in the center of the bounding box
        x2 = x + int(w/2)
        y2 = y + int(h/2),(x2,y2),4,(0,255,0),-1)

        # Print the centroid coordinates (we'll use the center of the
        # bounding box) on the image
        text = "x: " + str(x2) + ", y: " + str(y2)
        cv2.putText(frame, text, (x2 - 10, y2 - 10),
			cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        # Display the resulting frame

        # If "q" is pressed on the keyboard, 
        # exit this loop
        if cv2.waitKey(1) & 0xFF == ord('q'):

    # Close down the video stream

if __name__ == '__main__':

How the Canny Edge Detector Works

In this post, I will explain how the Canny Edge Detector works. The Canny Edge Detector is a popular edge detection algorithm developed by John F. Canny in 1986. The goal of the Canny Edge Detector is to:

  • Minimize Error: Edges that are detected by the algorithm as edges should be real edges and not noise.
  • Good Localization: Minimize the distance between detected edge pixels and real edge pixels.
  • Minimal Responses to Single Edges: In other words, areas of the image that are not marked as edges should not be edges.

How the Canny Edge Detector Works

The Canny Edge Detector Process is as follows:

  1. Gaussian Filter: Smooth the input image with a Gaussian filter to remove noise (using a discrete Gaussian kernel).
  2. Calculate Intensity Gradients: Identify the areas in the image with the strongest intensity gradients (using a Sobel, Prewitt, or Roberts kernel).
  3. Non-maximum Suppression: Apply non-maximum suppression to thin out the edges. We want to remove unwanted pixels that might not be part of an edge.
  4. Thresholding with Hysteresis:  Hysteresis or double thresholding involves:
    • Accepting pixels as edges if the intensity gradient value exceeds an upper threshold.
    • Rejecting pixels as edges if the intensity gradient value is below a lower threshold.
    • If a pixel is between the two thresholds, accept it only if it is adjacent to a pixel that is above the upper threshold.

Mathematical Formulation of the Canny Edge Detector

More formally, in step 1 of the Canny Edge Detector, we smooth an image by convolving the image with a Gaussian kernel. An example calculation showing the convolving mathematical operation is shown in the Sobel Operator discussion. Below is an example 5×5 Gaussian kernel that can be used.


We must go through each 5×5 region in the image and apply the convolving operation between a 5×5 portion of the input image (with the pixel of interest as the center cell, or anchor) and the 5×5 kernel above. The result is then summed to give us the new intensity value for that pixel.

After smoothing the image using the Gaussian kernel, we then calculate the intensity gradients. A common method is to use the Sobel Operator.

Here are the two kernels used in the Sobel algorithm:


The gradient approximations at pixel (x,y) given a 3×3 portion of the source image Ii are calculated as follows:

Gx = x-direction kernel * (3x3 portion of image A with (x,y) as the center cell)
Gy = y-direction kernel * (3x3 portion of image A with (x,y) as the center cell)

* above is not normal matrix multiplication. * denotes the convolution operation.

We then combine the values above to calculate the magnitude of the gradient:

magnitude(G) = square_root(Gx2 + Gy2)

The direction of the gradient Ɵ is:

Ɵ = atan(Gy / Gx)

where atan is the arctangent operator.

Once we have the gradient magnitude and direction, we perform non-maximum suppression by scanning the entire image to get rid of pixels that might not be part of an edge. Non-maximum suppression works by finding pixels that are local maxima in the direction of the gradient (gradient direction is perpendicular to edges).

If, for example, we have three pixels that are next to each other: pixels a, b, and then c. Pixel b is larger in intensity than both a and c where pixels a and c are in the gradient direction of b. Therefore, pixel b is marked as an edge. Otherwise, if pixel b was not a local maximum, it would be set to 0 (i.e. black), meaning it would not be an edge pixel.

a ——> b <edge> ——> c

Non-maximum suppression is not perfect because some edges might actually be noise and not real edges. To solve this, Canny Edge Detector goes one step further and applies thresholding to remove the weakest edges and keep the strongest ones. Edge pixels that are borderline weak or strong are only considered strong if they are connected to strong edge pixels.

Canny Edge Detector Code

This tutorial has the Python code for the Canny Edge Detector.


In this discussion, we covered the Canny Edge Detector. The Canny Edge Detector is just one of many edge detection algorithms.

The most common edge detection algorithms fall into the following categories:

  • Gradient Operators
    • Roberts Cross Operator
    • Sobel Operator
    • Prewitt Operator
  • Canny Edge Detector
  • Laplacian of Gaussian
  • Haralick Operator

Which edge detection algorithm you choose depends on what you are trying to achieve with your application.

Keep building!

How the Laplacian of Gaussian Filter Works

In this post, I will explain how the Laplacian of Gaussian (LoG) filter works. Laplacian of Gaussian is a popular edge detection algorithm.

Edge detection is an important part of image processing and computer vision applications. It is used to detect objects, locate boundaries, and extract features. Edge detection is about identifying sudden, local changes in the intensity values of the pixels in an image.


How LoG Works

Edge detection algorithms like the Sobel Operator work on the first derivative of an image. In other words, if we have a graph of the intensity values for each pixel in an image, the Sobel Operator takes a look at where the slope of the graph of the intensity reaches a peak, and that peak is marked as an edge.

For our 10×1 pixel image, the blue curve below is a plot of the intensity values, and the orange curve is the plot of the first derivative of the blue curve. In layman’s terms, the orange curve is a plot of the slope.


The orange curve peaks in the middle, so we know that is likely an edge. When we look at the original source image, we confirm that yes, it is an edge.

One limitation with the approach above is that the first derivative of an image might be subject to a lot of noise. Local peaks in the slope of the intensity values might be due to shadows or tiny color changes that are not edges at all.

An alternative to using the first derivative of an image is to use the second derivative, which is the slope of the first derivative curve (i.e. that orange curve above). Such a curve looks something like this (see the gray curve below):


An edge occurs where the graph of the second derivative crosses zero. This second derivative-based method is called the Laplacian algorithm.

The Laplacian algorithm is also subject to noise. For example, consider a photo of a cat.

A cat hair or whisker might register as an edge because it is an area of a sharp change in intensity. However, it is not an edge. It is just noise. To solve this problem, a Gaussian smoothing filter is commonly applied to an image to reduce noise before the Laplacian is applied. This method is called the Laplacian of Gaussian (LoG).

We also set a threshold value to distinguish noise from edges. If the second derivative magnitude at a pixel exceeds this threshold, the pixel is part of an edge.

Mathematical Formulation of LoG

More formally, given a pixel (x, y), the Laplacian L(x,y) of an image with intensity values Ii can be written mathematically as follows:


Just like in the case of the Sobel Operator, we cannot calculate the second derivative directly because pixels in an image are discrete. We need to approximate it using the convolution operator. The two most common kernels are:


Calculating just the Laplacian will result in a lot of noise, so we need to convolve a Gaussian smoothing filter with the Laplacian filter to reduce noise prior to computing the second derivatives. The equation that combines both of these filters is called the Laplacian of Gaussian and is as follows:


The above equation is continuous, so we need to discretize it so that we can use it on discrete pixels in an image.

Here is an example of a LoG approximation kernel where σ = 1.4. This is just an example of one convolution kernel that can be used. There are others that would work as well.


This LoG kernel is convolved with a grayscale input image to detect the zero crossings of the second derivative. We set a threshold for these zero crossings and retain only those zero crossings that exceed the threshold. Strong zero crossings are ones that have a big difference between the positive maximum and the negative minimum on either size of the zero crossing. Weak zero crossings are most likely noise, so they are ignored due to the thresholding we apply.

Laplacian of Gaussian Code

This tutorial has the Python code for the Laplacian of Gaussian.