Image Feature Detection, Description, and Matching in OpenCV

In this tutorial, we will implement various image feature detection (a.k.a. feature extraction) and description algorithms using OpenCV, the computer vision library for Python. I’ll explain what a feature is later in this post.

We will also look at an example of how to match features between two images. This process is called feature matching.

Real-World Applications

Object Detection
Object Tracking
Object Classification

Let’s get started!

Prerequisites

Python 3.7 or higher

What is a Feature?

Do you remember when you were a kid, and you played with puzzles? The objective was to put the puzzle pieces together. When the puzzle was all assembled, you would be able to see the big picture, which was usually some person, place, thing, or combination of all three.

What enabled you to successfully complete the puzzle? Each puzzle piece contained some clues…perhaps an edge, a corner, a particular color pattern, etc. You used these clues to assemble the puzzle.

The “clues” in the example I gave above are image features. A feature in computer vision is a region of interest in an image that is unique and easy to recognize. Features include things like, points, edges, blobs, and corners.

For example, suppose you saw this feature?

You see some shaped, edges, and corners. These features are clues to what this object might be.

Now, let’s say we also have this feature.

Can you recognize what this object is?

Many Americans and people who have traveled to New York City would guess that this is the Statue of Liberty. And in fact, it is.

With just two features, you were able to identify this object. Computers follow a similar process when you run a feature detection algorithm to perform object recognition.

The Python computer vision library OpenCV has a number of algorithms to detect features in an image. We will explore these algorithms in this tutorial.

Installation and Setup

Before we get started, let’s make sure we have all the software packages installed. Check to see if you have OpenCV installed on your machine. If you are using Anaconda, you can type:

conda install -c conda-forge opencv

Alternatively, you can type:

pip install opencv-python

Install Numpy, the scientific computing library.

pip install numpy

Install Matplotlib, the plotting library.

pip install matplotlib

Find an Image File

Find an image of any size. Here is mine:

Difference Between a Feature Detector and a Feature Descriptor

Before we get started developing our program, let’s take a look at some definitions.

The algorithms for features fall into two categories: feature detectors and feature descriptors.

A feature detector finds regions of interest in an image. The input into a feature detector is an image, and the output are pixel coordinates of the significant areas in the image.

A feature descriptor encodes that feature into a numerical “fingerprint”. Feature description makes a feature uniquely identifiable from other features in the image.

We can then use the numerical fingerprint to identify the feature even if the image undergoes some type of distortion.

Feature Detection Algorithms

Harris Corner Detection

A corner is an area of an image that has a large variation in pixel color intensity values in all directions. One popular algorithm for detecting corners in an image is called the Harris Corner Detector.

Here is some basic code for the Harris Corner Detector. I named my file harris_corner_detector.py.

# Code Source: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html

import cv2
import numpy as np

filename = 'random-shapes-small.jpg'
img = cv2.imread(filename)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

gray = np.float32(gray)
dst = cv2.cornerHarris(gray,2,3,0.04)

#result is dilated for marking the corners, not important
dst = cv2.dilate(dst,None)

# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.01*dst.max()]=[0,0,255]

cv2.imshow('dst',img)
if cv2.waitKey(0) & 0xff == 27:
    cv2.destroyAllWindows()

Here is my image before:

Here is my image after:

For a more detailed example, check out my post “Detect the Corners of Objects Using Harris Corner Detector.”

Shi-Tomasi Corner Detector and Good Features to Track

Another corner detection algorithm is called Shi-Tomasi. Let’s run this algorithm on the same image and see what we get. Here is the code. I named the file shi_tomasi_corner_detect.py.

# Code Source: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_shi_tomasi/py_shi_tomasi.html

import numpy as np
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('random-shapes-small.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Find the top 20 corners
corners = cv2.goodFeaturesToTrack(gray,20,0.01,10)
corners = np.int0(corners)

for i in corners:
    x,y = i.ravel()
    cv2.circle(img,(x,y),3,255,-1)

cv2.imshow('Shi-Tomasi', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Here is the image after running the program:

Scale-Invariant Feature Transform (SIFT)

When we rotate an image or change its size, how can we make sure the features don’t change? The methods I’ve used above aren’t good at handling this scenario.

For example, consider these three images below of the Statue of Liberty in New York City. You know that this is the Statue of Liberty regardless of changes in the angle, color, or rotation of the statue in the photo. However, computers have a tough time with this task.

OpenCV has an algorithm called SIFT that is able to detect features in an image regardless of changes to its size or orientation. This property of SIFT gives it an advantage over other feature detection algorithms which fail when you make transformations to an image.

Here is an example of code that uses SIFT:

# Code source: https://docs.opencv.org/master/da/df5/tutorial_py_sift_intro.html
import numpy as np
import cv2 as cv

# Read the image
img = cv.imread('chessboard.jpg')

# Convert to grayscale
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)

# Find the features (i.e. keypoints) and feature descriptors in the image
sift = cv.SIFT_create()
kp, des = sift.detectAndCompute(gray,None)

# Draw circles to indicate the location of features and the feature's orientation
img=cv.drawKeypoints(gray,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

# Save the image
cv.imwrite('sift_with_features_chessboard.jpg',img)

Here is the before:

Here is the after. Each of those circles indicates the size of that feature. The line inside the circle indicates the orientation of the feature:

Speeded-up robust features (SURF)

SURF is a faster version of SIFT. It is another way to find features in an image.

Here is the code:

# Code Source: https://docs.opencv.org/master/df/dd2/tutorial_py_surf_intro.html

import numpy as np
import cv2 as cv

# Read the image
img = cv.imread('chessboard.jpg')

# Find the features (i.e. keypoints) and feature descriptors in the image
surf = cv.xfeatures2d.SURF_create(400)
kp, des = sift.detectAndCompute(img,None)

# Draw circles to indicate the location of features and the feature's orientation
img=cv.drawKeypoints(gray,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

# Save the image
cv.imwrite('surf_with_features_chessboard.jpg',img)

Features from Accelerated Segment Test (FAST)

A lot of the feature detection algorithms we have looked at so far work well in different applications. However, they aren’t fast enough for some robotics use cases (e.g. SLAM).

The FAST algorithm, implemented here, is a really fast algorithm for detecting corners in an image.

Blob Detectors With LoG, DoG, and DoH

A blob is another type of feature in an image. A blob is a region in an image with similar pixel intensity values. Another definition you will hear is that a blob is a light on dark or a dark on light area of an image.

Three popular blob detection algorithms are Laplacian of Gaussian (LoG), Difference of Gaussian (DoG), and Determinant of Hessian (DoH).
Basic implementations of these blob detectors are at this page on the scikit-image website. Scikit-image is an image processing library for Python.

Feature Descriptor Algorithms

Histogram of Oriented Gradients

The HoG algorithm breaks an image down into small sections and calculates the gradient and orientation in each section. This information is then gathered into bins to compute histograms. These histograms give an image numerical “fingerprints” that make it uniquely identifiable.

A basic implementation of HoG is at this page.

Binary Robust Independent Elementary Features (BRIEF)

BRIEF is a fast, efficient alternative to SIFT. A sample implementation of BRIEF is here at the OpenCV website.

Oriented FAST and Rotated BRIEF (ORB)

SIFT was patented for many years, and SURF is still a patented algorithm. ORB was created in 2011 as a free alternative to these algorithms. It combines the FAST and BRIEF algorithms. You can find a basic example of ORB at the OpenCV website.

Feature Matching Example

You can use ORB to locate features in an image and then match them with features in another image.

For example, consider this Whole Foods logo. This logo will be our training image.

I want to locate this Whole Foods logo inside this image below. This image below is our query image.

Here is the code you need to run. My file is called feature_matching_orb.py.

import numpy as np 
import cv2 
from matplotlib import pyplot as plt
	
# Read the training and query images
query_img = cv2.imread('query_image.jpg') 
train_img = cv2.imread('training_image.jpg') 

# Convert the images to grayscale 
query_img_gray = cv2.cvtColor(query_img,cv2.COLOR_BGR2GRAY) 
train_img_gray = cv2.cvtColor(train_img, cv2.COLOR_BGR2GRAY) 

# Initialize the ORB detector algorithm 
orb = cv2.ORB_create() 

# Detect keypoints (features) cand calculate the descriptors
query_keypoints, query_descriptors = orb.detectAndCompute(query_img_gray,None) 
train_keypoints, train_descriptors = orb.detectAndCompute(train_img_gray,None) 

# Match the keypoints
matcher = cv2.BFMatcher() 
matches = matcher.match(query_descriptors,train_descriptors) 

# Draw the keypoint matches on the output image
output_img = cv2.drawMatches(query_img, query_keypoints, 
train_img, train_keypoints, matches[:20],None) 

output_img = cv2.resize(output_img, (1200,650)) 

# Save the final image 
cv2.imwrite("feature_matching_result.jpg", output_img) 

# Close OpenCV upon keypress
cv2.waitKey(0)
cv2.destroyAllWindows()

Here is the result:

If you want to dive deeper into feature matching algorithms (Homography, RANSAC, Brute-Force Matcher, FLANN, etc.), check out the official tutorials on the OpenCV website. This page and this page have some basic examples.

That’s it. Keep building!

How to Do Multiple Object Tracking Using OpenCV

In this tutorial, we will learn how to track multiple objects in a video using OpenCV, the computer vision library for Python. By the end of this tutorial, you will be able to generate the following output:

Real-World Applications

Object tracking has a number of real-world use cases. Here are a few:

Drone Surveillance
Aerial Cinematography
Tracking Customers to Understand In-Store Consumer Behavior
Any Kind of Robot That You Want to Follow You as You Move Around

Let’s get started!

Prerequisites

Python 3.7 or higher

Installation and Setup

We first need to make sure we have all the software packages installed. Check to see if you have OpenCV installed on your machine. If you are using Anaconda, you can type:

conda install -c conda-forge opencv

Alternatively, you can type:

pip install opencv-python

Find Video Files

Find a video file that contains objects you would like to detect. I suggest finding a file that is 1920 x 1080 pixels in dimensions and is in mp4 format. You can find some good videos at sites like Pixabay.com and Pexels.com.

Save the video file in a folder somewhere on your computer.

Write the Code

Navigate to the directory where you saved your video, and open a new Python program named multi_object_tracking.py.

Here is the full code for the program. You can copy and paste this code directly into your program. The only line you will need to change is line 10. The video file I’m using is named fish.mp4, so I will write ‘fish’ as the file prefix.

If your video is not 1920 x 1080 dimensions, you will need to modify line 12 accordingly.

# Project: How to Do Multiple Object Tracking Using OpenCV
# Author: Addison Sears-Collins
# Date created: March 2, 2021
# Description: Tracking multiple objects in a video using OpenCV

import cv2 # Computer vision library
from random import randint # Handles the creation of random integers

# Make sure the video file is in the same directory as your code
file_prefix = 'fish'
filename = file_prefix + '.mp4'
file_size = (1920,1080) # Assumes 1920x1080 mp4

# We want to save the output to a video file
output_filename = file_prefix + '_object_tracking.mp4'
output_frames_per_second = 20.0 

# OpenCV has a bunch of object tracking algorithms. We list them here.
type_of_trackers = ['BOOSTING', 'MIL', 'KCF','TLD', 'MEDIANFLOW', 'GOTURN', 
                     'MOSSE', 'CSRT']

# CSRT is accurate but slow. You can try others and see what results you get.			 
desired_tracker = 'CSRT'

# Generate a MultiTracker object	
multi_tracker = cv2.MultiTracker_create()

# Set bounding box drawing parameters
from_center = False # Draw bounding box from upper left
show_cross_hair = False # Don't show the cross hair
										 
def generate_tracker(type_of_tracker):
  """
  Create object tracker.
	
  :param type_of_tracker string: OpenCV tracking algorithm 
  """
  if type_of_tracker == type_of_trackers[0]:
    tracker = cv2.TrackerBoosting_create()
  elif type_of_tracker == type_of_trackers[1]:
    tracker = cv2.TrackerMIL_create()
  elif type_of_tracker == type_of_trackers[2]:
    tracker = cv2.TrackerKCF_create()
  elif type_of_tracker == type_of_trackers[3]:
    tracker = cv2.TrackerTLD_create()
  elif type_of_tracker == type_of_trackers[4]:
    tracker = cv2.TrackerMedianFlow_create()
  elif type_of_tracker == type_of_trackers[5]:
    tracker = cv2.TrackerGOTURN_create()
  elif type_of_tracker == type_of_trackers[6]:
    tracker = cv2.TrackerMOSSE_create()
  elif type_of_tracker == type_of_trackers[7]:
    tracker = cv2.TrackerCSRT_create()
  else:
    tracker = None
    print('The name of the tracker is incorrect')
    print('Here are the possible trackers:')
    for track_type in type_of_trackers:
      print(track_type)
  return tracker

def main():

  # Load a video
  cap = cv2.VideoCapture(filename)

  # Create a VideoWriter object so we can save the video output
  fourcc = cv2.VideoWriter_fourcc(*'mp4v')
  result = cv2.VideoWriter(output_filename,  
                           fourcc, 
                           output_frames_per_second, 
                           file_size) 

  # Capture the first video frame
  success, frame = cap.read() 

  bounding_box_list = []
  color_list = []	

  # Do we have a video frame? If true, proceed.
  if success:

    while True:
    
		  # Draw a bounding box over all the objects that you want to track_type
      # Press ENTER or SPACE after you've drawn the bounding box
      bounding_box = cv2.selectROI('Multi-Object Tracker', frame, from_center, 
        show_cross_hair) 

      # Add a bounding box
      bounding_box_list.append(bounding_box)
			
      # Add a random color_list
      blue = 255 # randint(127, 255)
      green = 0 # randint(127, 255)
      red = 255 #randint(127, 255)
      color_list.append((blue, green, red))

      # Press 'q' (make sure you click on the video frame so that it is the
      # active window) to start object tracking. You can press another key
      # if you want to draw another bounding box.			
      print("\nPress q to begin tracking objects or press " + 
        "another key to draw the next bounding box\n")

      # Wait for keypress
      k = cv2.waitKey() & 0xFF

      # Start tracking objects if 'q' is pressed			
      if k == ord('q'):
        break

    cv2.destroyAllWindows()
		
    print("\nTracking objects. Please wait...")
		
    # Set the tracker
    type_of_tracker = desired_tracker	
			
    for bbox in bounding_box_list:
		
      # Add tracker to the multi-object tracker
      multi_tracker.add(generate_tracker(type_of_tracker), frame, bbox)
      
    # Process the video
    while cap.isOpened():
		
      # Capture one frame at a time
      success, frame = cap.read() 
		
      # Do we have a video frame? If true, proceed.
      if success:

        # Update the location of the bounding boxes
        success, bboxes = multi_tracker.update(frame)

        # Draw the bounding boxes on the video frame
        for i, bbox in enumerate(bboxes):
          point_1 = (int(bbox[0]), int(bbox[1]))
          point_2 = (int(bbox[0] + bbox[2]), 
            int(bbox[1] + bbox[3]))
          cv2.rectangle(frame, point_1, point_2, color_list[i], 5)
				
        # Write the frame to the output video file
        result.write(frame)

      # No more video frames left
      else:
        break
		
  # Stop when the video is finished
  cap.release()
	
  # Release the video recording
  result.release()

main()

Run the Code

Launch the program.

python multi_object_tracking.py

The first frame of the video will pop up. With this frame selected, grab your mouse, and draw a rectangle around the object you would like to track.

When you’re done drawing the rectangle, press Enter or Space.

If you just want to track this object, press ‘q’ to run the program’. Otherwise, if you want to track more objects, press any other key to draw some more rectangles around other objects you want to track.

After you press ‘q’, the program will run. Once the program is finished doing its job, you will have a new video file. It will be named something like fish_object_tracking.mp4. This file is your final video output.

Video Output

Here is my final video:

How the Code Works

OpenCV has a number of object trackers: ‘BOOSTING’, ‘MIL’, ‘KCF’,’TLD’, ‘MEDIANFLOW’, ‘GOTURN’, ‘MOSSE’, ‘CSRT’. In our implementation, we used CSRT which is slow but accurate.

When we run the program, the first video frame is captured. We have to identify the object(s) we want to track by drawing a rectangle around it. After we do that, the algorithm tracks the object through each frame of the video.

Once the program has finished processing all video frames, the annotated video is saved to your computer.

That’s it. Keep building!

How to Detect Objects in Video Using MobileNet SSD in OpenCV

In this tutorial, we will go through how to detect objects in a video stream using OpenCV. We will use MobileNet SSD, a special type of convolutional neural network architecture.

Our output will look like this:

Real-World Applications

Object Detection
Object Tracking
Object Classification
Autonomous Vehicles
Self-Driving Cars

Let’s get started!

Prerequisites

Python 3.7 or higher

Installation and Setup

We now need to make sure we have all the software packages installed. Check to see if you have OpenCV installed on your machine. If you are using Anaconda, you can type:

conda install -c conda-forge opencv

Alternatively, you can type:

pip install opencv-python

Download the Required Files

Download all the video files and other neural network-related files at this link. Place the files inside a directory on your computer.

Code

In the same folder where your image file is, open a new Python file called object_detection_mobile_ssd.py.

Here is the full code for the system. The only things you’ll need to change in this code is the name of your desired input video file on line 10 and the name of your desired output file on line 14.

# Project: How to Detect Objects in Video Using MobileNet SSD in OpenCV
# Author: Addison Sears-Collins
# Date created: March 1, 2021
# Description: Object detection using OpenCV

import cv2 # Computer vision library
import numpy as np # Scientific computing library 

# Make sure the video file is in the same directory as your code
filename = 'edmonton_canada.mp4'
file_size = (1920,1080) # Assumes 1920x1080 mp4

# We want to save the output to a video file
output_filename = 'edmonton_canada_obj_detect_mobssd.mp4'
output_frames_per_second = 20.0 

RESIZED_DIMENSIONS = (300, 300) # Dimensions that SSD was trained on. 
IMG_NORM_RATIO = 0.007843 # In grayscale a pixel can range between 0 and 255

# Load the pre-trained neural network
neural_network = cv2.dnn.readNetFromCaffe('MobileNetSSD_deploy.prototxt.txt', 
        'MobileNetSSD_deploy.caffemodel')

# List of categories and classes
categories = { 0: 'background', 1: 'aeroplane', 2: 'bicycle', 3: 'bird', 
               4: 'boat', 5: 'bottle', 6: 'bus', 7: 'car', 8: 'cat', 
               9: 'chair', 10: 'cow', 11: 'diningtable', 12: 'dog', 
              13: 'horse', 14: 'motorbike', 15: 'person', 
              16: 'pottedplant', 17: 'sheep', 18: 'sofa', 
              19: 'train', 20: 'tvmonitor'}

classes =  ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", 
            "bus", "car", "cat", "chair", "cow", 
           "diningtable",  "dog", "horse", "motorbike", "person", 
           "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
					 
# Create the bounding boxes
bbox_colors = np.random.uniform(255, 0, size=(len(categories), 3))
	
def main():

  # Load a video
  cap = cv2.VideoCapture(filename)

  # Create a VideoWriter object so we can save the video output
  fourcc = cv2.VideoWriter_fourcc(*'mp4v')
  result = cv2.VideoWriter(output_filename,  
                           fourcc, 
                           output_frames_per_second, 
                           file_size) 
	
  # Process the video
  while cap.isOpened():
		
    # Capture one frame at a time
    success, frame = cap.read() 

    # Do we have a video frame? If true, proceed.
    if success:
		
      # Capture the frame's height and width
      (h, w) = frame.shape[:2]

      # Create a blob. A blob is a group of connected pixels in a binary 
      # frame that share some common property (e.g. grayscale value)
      # Preprocess the frame to prepare it for deep learning classification
      frame_blob = cv2.dnn.blobFromImage(cv2.resize(frame, RESIZED_DIMENSIONS), 
                     IMG_NORM_RATIO, RESIZED_DIMENSIONS, 127.5)
	
      # Set the input for the neural network
      neural_network.setInput(frame_blob)

      # Predict the objects in the image
      neural_network_output = neural_network.forward()

      # Put the bounding boxes around the detected objects
      for i in np.arange(0, neural_network_output.shape[2]):
			
        confidence = neural_network_output[0, 0, i, 2]
    
        # Confidence must be at least 30%		
        if confidence > 0.30:
				
          idx = int(neural_network_output[0, 0, i, 1])

          bounding_box = neural_network_output[0, 0, i, 3:7] * np.array(
            [w, h, w, h])

          (startX, startY, endX, endY) = bounding_box.astype("int")

          label = "{}: {:.2f}%".format(classes[idx], confidence * 100) 
        
          cv2.rectangle(frame, (startX, startY), (
            endX, endY), bbox_colors[idx], 2)     
						
          y = startY - 15 if startY - 15 > 15 else startY + 15     

          cv2.putText(frame, label, (startX, y),cv2.FONT_HERSHEY_SIMPLEX, 
            0.5, bbox_colors[idx], 2)
		
      # We now need to resize the frame so its dimensions
      # are equivalent to the dimensions of the original frame
      frame = cv2.resize(frame, file_size, interpolation=cv2.INTER_NEAREST)

			# Write the frame to the output video file
      result.write(frame)
		
    # No more video frames left
    else:
      break
			
  # Stop when the video is finished
  cap.release()
	
  # Release the video recording
  result.release()

main()

To run the code, type the following command:

python object_detection_mobile_ssd.py

You will see the video output that is at the top of this tutorial.

That’s it. Keep building!