How to Convert Camera Pixels to Real-World Coordinates

In this tutorial, I’ll show you how to convert camera pixels to real-world coordinates (in centimeters). A common use case for this is in robotics (e.g. along a conveyor belt in a factory) where you want to pick up an object from one location and place it in another location using nothing but a robotic arm and an overhead camera.

Prerequisites

To complete this tutorial, it is helpful if you have completed the following prerequisites. If you haven’t that is fine. You can still follow the process I will explain below.

You have set up Raspberry Pi with the Raspbian Operating System.
You have OpenCV and a Raspberry Camera Module Installed.
You know how to determine the pixel location of an object in a real-time video stream.
You completed this tutorial to build a two degree of freedom robotic arm (Optional). My eventual goal is to be able to move this robotic arm to the location of an object that is placed somewhere in its workspace. The location of the object will be determined by the overhead Raspberry Pi camera.

You Will Need

Here are some extra components you’ll need if you want to follow along with the physical setup we put together in the prerequisites (above).

1 x VELCRO Brand Thin Clear Mounting Squares 12-pack | 7/8 Inch (you can also use Scotch tape).
1 x Overhead Video Stand Phone Holder
10 x 1cm Grid Write N Wipe Boards (check eBay….also search Amazon for ‘Centimeter Grid Dry Erase Board’)
1 x Ruler that can measure in centimeters (or measuring tape)

Mount the Camera Module on the Overhead Video Stand Phone Holder (Optional)

Grab the Overhead Video Stand Phone Holder and place it above the grid like this.

Using some Velcro adhesives or some tape, attach the camera to the holder’s end effector so that it is pointing downward towards the center of the grid.

Here is how my video feed looks.

I am running the program on this page (test_video_capture.py). I’ll retype the code here:

# Credit: Adrian Rosebrock
# https://www.pyimagesearch.com/2015/03/30/accessing-the-raspberry-pi-camera-with-opencv-and-python/

# import the necessary packages
from picamera.array import PiRGBArray # Generates a 3D RGB array
from picamera import PiCamera # Provides a Python interface for the RPi Camera Module
import time # Provides time-related functions
import cv2 # OpenCV library

# Initialize the camera
camera = PiCamera()

# Set the camera resolution
camera.resolution = (640, 480)

# Set the number of frames per second
camera.framerate = 32

# Generates a 3D RGB array and stores it in rawCapture
raw_capture = PiRGBArray(camera, size=(640, 480))

# Wait a certain number of seconds to allow the camera time to warmup
time.sleep(0.1)

# Capture frames continuously from the camera
for frame in camera.capture_continuous(raw_capture, format="bgr", use_video_port=True):
    
    # Grab the raw NumPy array representing the image
    image = frame.array
    
    # Display the frame using OpenCV
    cv2.imshow("Frame", image)
    
    # Wait for keyPress for 1 millisecond
    key = cv2.waitKey(1) & 0xFF
    
    # Clear the stream in preparation for the next frame
    raw_capture.truncate(0)
    
    # If the `q` key was pressed, break from the loop
    if key == ord("q"):
        break

What is Our Goal?

Assuming you’ve completed the prerequisites, you know how to find the location of an object in the field of view of a camera, and you know how to express that location in terms of the pixel location along both the x-axis (width) and y-axis (height) of the video frame.

In a real use case, if we want a robotic arm to automatically pick up an object that enters its workspace, we need some way to tell the robotic arm where the object is. In order to do that, we have to convert the object’s position in the camera reference frame to a position that is relative to the robotic arm’s base frame.

Once we know the object’s position relative to the robotic arm’s base frame, all we need to do is to calculate the inverse kinematics to set the servo motors to the angles that will enable the end effector of the robotic arm to reach the object.

What is the Field of View?

Before we get started, let’s take a look at what field of view means.

The field of view for our Raspberry Pi camera is the extent of the observable world that it can see at a given point in time.

In the figure below, you can see a schematic of the setup I have with the Raspberry Pi. In this perspective, we are in front of the Raspberry Pi camera.

In the Python code, we set the size of the video frame to be 640 pixels in width and 480 pixels in height. Thus, the matrix that describes the field of view of our camera has 480 rows and 640 columns.

From the perspective of the camera (i.e. camera reference frame), the first pixel in an image is at (x=0, y=0), which is in the far upper-left. The last pixel (x = 640, y = 480) is in the far lower-right.

Calculate the Centimeter to Pixel Conversion Factor

The first thing you need to do is to run test_video_capture.py.

Now, grab a ruler and measure the width of the frame in centimeters. It is hard to see in the image below, but my video frame is about 32 cm in width.

We know that in pixel units, the frame is 640 pixels in width.

Therefore, we have the following conversion factor from centimeters to pixels:

32 cm / 640 pixels = 0.05 cm / pixel

We will assume the pixels are square-shaped and the camera lens is parallel to the underlying surface so we can use the same conversion factor for both the x (width) and y (height) axes of the camera frame.

When you’re done, you can close down test_video_capture.py.

Test Your Conversion Factor

Now, let’s test this conversion factor of 0.05 cm / pixel.

Write the following code in your favorite Python IDE or text editor (I’m using Gedit).

This program is the absolute_difference_method.py code we wrote on this post with some small changes. This code detects an object and then prints its center to the video frame. I called this program absolute_difference_method_cm.py.

# Author: Addison Sears-Collins
# Description: This algorithm detects objects in a video stream
#   using the Absolute Difference Method. The idea behind this 
#   algorithm is that we first take a snapshot of the background.
#   We then identify changes by taking the absolute difference 
#   between the current video frame and that original 
#   snapshot of the background (i.e. first frame). 

# import the necessary packages
from picamera.array import PiRGBArray # Generates a 3D RGB array
from picamera import PiCamera # Provides a Python interface for the RPi Camera Module
import time # Provides time-related functions
import cv2 # OpenCV library
import numpy as np # Import NumPy library

# Initialize the camera
camera = PiCamera()

# Set the camera resolution
camera.resolution = (640, 480)

# Set the number of frames per second
camera.framerate = 30

# Generates a 3D RGB array and stores it in rawCapture
raw_capture = PiRGBArray(camera, size=(640, 480))

# Wait a certain number of seconds to allow the camera time to warmup
time.sleep(0.1)

# Initialize the first frame of the video stream
first_frame = None

# Create kernel for morphological operation. You can tweak
# the dimensions of the kernel.
# e.g. instead of 20, 20, you can try 30, 30
kernel = np.ones((20,20),np.uint8)

# Centimeter to pixel conversion factor
# I measured 32.0 cm across the width of the field of view of the camera.
CM_TO_PIXEL = 32.0 / 640

# Capture frames continuously from the camera
for frame in camera.capture_continuous(raw_capture, format="bgr", use_video_port=True):
    
    # Grab the raw NumPy array representing the image
    image = frame.array

    # Convert the image to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # Close gaps using closing
    gray = cv2.morphologyEx(gray,cv2.MORPH_CLOSE,kernel)
      
    # Remove salt and pepper noise with a median filter
    gray = cv2.medianBlur(gray,5)
    
    # If first frame, we need to initialize it.
    if first_frame is None:
        
      first_frame = gray
      
      # Clear the stream in preparation for the next frame
      raw_capture.truncate(0)
      
      # Go to top of for loop
      continue
      
    # Calculate the absolute difference between the current frame
    # and the first frame
    absolute_difference = cv2.absdiff(first_frame, gray)

    # If a pixel is less than ##, it is considered black (background). 
    # Otherwise, it is white (foreground). 255 is upper limit.
    # Modify the number after absolute_difference as you see fit.
    _, absolute_difference = cv2.threshold(absolute_difference, 50, 255, cv2.THRESH_BINARY)

    # Find the contours of the object inside the binary image
    contours, hierarchy = cv2.findContours(absolute_difference,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[-2:]
    areas = [cv2.contourArea(c) for c in contours]
 
    # If there are no countours
    if len(areas) < 1:
 
      # Display the resulting frame
      cv2.imshow('Frame',image)
 
      # Wait for keyPress for 1 millisecond
      key = cv2.waitKey(1) & 0xFF
 
      # Clear the stream in preparation for the next frame
      raw_capture.truncate(0)
    
      # If "q" is pressed on the keyboard, 
      # exit this loop
      if key == ord("q"):
        break
    
      # Go to the top of the for loop
      continue
 
    else:
        
      # Find the largest moving object in the image
      max_index = np.argmax(areas)
      
    # Draw the bounding box
    cnt = contours[max_index]
    x,y,w,h = cv2.boundingRect(cnt)
    cv2.rectangle(image,(x,y),(x+w,y+h),(0,255,0),3)
 
    # Draw circle in the center of the bounding box
    x2 = x + int(w/2)
    y2 = y + int(h/2)
    cv2.circle(image,(x2,y2),4,(0,255,0),-1)
	
    # Calculate the center of the bounding box in centimeter coordinates
    # instead of pixel coordinates
    x2_cm = x2 * CM_TO_PIXEL
    y2_cm = y2 * CM_TO_PIXEL
 
    # Print the centroid coordinates (we'll use the center of the
    # bounding box) on the image
    text = "x: " + str(x2_cm) + ", y: " + str(y2_cm)
    cv2.putText(image, text, (x2 - 10, y2 - 10),
      cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
         
    # Display the resulting frame
    cv2.imshow("Frame",image)
    
    # Wait for keyPress for 1 millisecond
    key = cv2.waitKey(1) & 0xFF
 
    # Clear the stream in preparation for the next frame
    raw_capture.truncate(0)
    
    # If "q" is pressed on the keyboard, 
    # exit this loop
    if key == ord("q"):
      break

# Close down windows
cv2.destroyAllWindows()

To get the object’s center in centimeter coordinates rather than pixel coordinates, we had to add the cm-to-pixel conversion factor to our code.

When you first launch the code, be sure there are no objects in the field of view and that the camera is not moving. Also, make sure that the level of light is fairly uniform across the board with no moving shadows (e.g. such as from the sun shining through a nearby window). Then place an object in the field of view and record the object’s x and y coordinate.

Here is the camera output when I first run the code:

Here is the output after I place my wallet in the field of view:

x-coordinate of the wallet in centimeters: 12.1 cm
y-coordinate of the wallet in centimeters: 12.75 cm

Get a ruler, and measure the object’s x coordinate (measure from the left-side of the camera frame) in centimeters, and see if that matches up with the x-value printed to the camera frame.

Get a ruler, and measure the object’s y coordinate (measure from the top of the camera frame) in centimeters, and see if that matches up with the y-value printed to the camera frame.

The measurements should match up pretty well.

That’s it. Keep building!

References

Credit to Professor Angela Sodemann for teaching me this stuff. Dr. Sodemann is an excellent teacher (She runs a course on RoboGrok.com).

Motion Detection Using OpenCV on Raspberry Pi 4

In this tutorial, I will show you how to use background subtraction to detect moving objects. We will use the OpenCV computer vision library on a Raspberry Pi 4.

Prerequisites

What is Background Subtraction?

Background subtraction is a technique that is commonly used to identify moving objects in a video stream. A real-world use case would be video surveillance or in a factory to detect moving objects (i.e. object detection) on a conveyor belt using a stationary video camera.

The idea behind background subtraction is that once you have a model of the background, you can detect objects by examining the difference between the current video frame and the background frame.

Let’s see background subtraction in action using a couple (there are many more than two) of OpenCV’s background subtraction algorithms. I won’t go into the detail and math behind each algorithm, but if you want to learn how each one works, check out this page.

If you are building a product like a robot, you don’t need to get bogged down in the details. You just need to be able to know how to use the algorithm to detect objects.

Absolute Difference Method

The idea behind this algorithm is that we first take a snapshot of the background. We then identify changes by taking the absolute difference between the current video frame and that original snapshot of the background.

This algorithm runs really fast, but it is sensitive to noise, like shadows and even the smallest changes in lighting.

Start your Raspberry Pi.

Go to the Python IDE in your Raspberry Pi by clicking the logo -> Programming -> Thonny Python IDE.

Write the following code. I’ll name the file absolute_difference_method.py.

# Author: Addison Sears-Collins
# Description: This algorithm detects objects in a video stream
#   using the Absolute Difference Method. The idea behind this 
#   algorithm is that we first take a snapshot of the background.
#   We then identify changes by taking the absolute difference 
#   between the current video frame and that original 
#   snapshot of the background (i.e. first frame). 

# import the necessary packages
from picamera.array import PiRGBArray # Generates a 3D RGB array
from picamera import PiCamera # Provides a Python interface for the RPi Camera Module
import time # Provides time-related functions
import cv2 # OpenCV library
import numpy as np # Import NumPy library

# Initialize the camera
camera = PiCamera()

# Set the camera resolution
camera.resolution = (640, 480)

# Set the number of frames per second
camera.framerate = 30

# Generates a 3D RGB array and stores it in rawCapture
raw_capture = PiRGBArray(camera, size=(640, 480))

# Wait a certain number of seconds to allow the camera time to warmup
time.sleep(0.1)

# Initialize the first frame of the video stream
first_frame = None

# Create kernel for morphological operation. You can tweak
# the dimensions of the kernel.
# e.g. instead of 20, 20, you can try 30, 30
kernel = np.ones((20,20),np.uint8)

# Capture frames continuously from the camera
for frame in camera.capture_continuous(raw_capture, format="bgr", use_video_port=True):
    
    # Grab the raw NumPy array representing the image
    image = frame.array

    # Convert the image to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # Close gaps using closing
    gray = cv2.morphologyEx(gray,cv2.MORPH_CLOSE,kernel)
      
    # Remove salt and pepper noise with a median filter
    gray = cv2.medianBlur(gray,5)
    
    # If first frame, we need to initialize it.
    if first_frame is None:
        
      first_frame = gray
      
      # Clear the stream in preparation for the next frame
      raw_capture.truncate(0)
      
      # Go to top of for loop
      continue
      
    # Calculate the absolute difference between the current frame
    # and the first frame
    absolute_difference = cv2.absdiff(first_frame, gray)

    # If a pixel is less than ##, it is considered black (background). 
    # Otherwise, it is white (foreground). 255 is upper limit.
    # Modify the number after absolute_difference as you see fit.
    _, absolute_difference = cv2.threshold(absolute_difference, 100, 255, cv2.THRESH_BINARY)

    # Find the contours of the object inside the binary image
    contours, hierarchy = cv2.findContours(absolute_difference,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[-2:]
    areas = [cv2.contourArea(c) for c in contours]
 
    # If there are no countours
    if len(areas) < 1:
 
      # Display the resulting frame
      cv2.imshow('Frame',image)
 
      # Wait for keyPress for 1 millisecond
      key = cv2.waitKey(1) & 0xFF
 
      # Clear the stream in preparation for the next frame
      raw_capture.truncate(0)
    
      # If "q" is pressed on the keyboard, 
      # exit this loop
      if key == ord("q"):
        break
    
      # Go to the top of the for loop
      continue
 
    else:
        
      # Find the largest moving object in the image
      max_index = np.argmax(areas)
      
    # Draw the bounding box
    cnt = contours[max_index]
    x,y,w,h = cv2.boundingRect(cnt)
    cv2.rectangle(image,(x,y),(x+w,y+h),(0,255,0),3)
 
    # Draw circle in the center of the bounding box
    x2 = x + int(w/2)
    y2 = y + int(h/2)
    cv2.circle(image,(x2,y2),4,(0,255,0),-1)
 
    # Print the centroid coordinates (we'll use the center of the
    # bounding box) on the image
    text = "x: " + str(x2) + ", y: " + str(y2)
    cv2.putText(image, text, (x2 - 10, y2 - 10),
      cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
         
    # Display the resulting frame
    cv2.imshow("Frame",image)
    
    # Wait for keyPress for 1 millisecond
    key = cv2.waitKey(1) & 0xFF
 
    # Clear the stream in preparation for the next frame
    raw_capture.truncate(0)
    
    # If "q" is pressed on the keyboard, 
    # exit this loop
    if key == ord("q"):
      break

# Close down windows
cv2.destroyAllWindows()

Run the code.

Here is the background:

Here is what things look like after we place an object in the field of view:

You notice that we’ve drawn a bounding box. We have also labeled the center of the object with the pixel coordinates (i.e. centroid).

Feel free to tweak the lower threshold on the _, absolute_difference = cv2.threshold… line to your liking.

BackgroundSubtractorMOG2

Here is another method. I named the file background_subtractor_mog2_method.py.

# Author: Addison Sears-Collins
# Description: This algorithm detects objects in a video stream
#   using the Gaussian Mixture Model background subtraction method. 

# import the necessary packages
from picamera.array import PiRGBArray # Generates a 3D RGB array
from picamera import PiCamera # Provides a Python interface for the RPi Camera Module
import time # Provides time-related functions
import cv2 # OpenCV library
import numpy as np # Import NumPy library

# Initialize the camera
camera = PiCamera()

# Set the camera resolution
camera.resolution = (640, 480)

# Set the number of frames per second
camera.framerate = 30

# Generates a 3D RGB array and stores it in rawCapture
raw_capture = PiRGBArray(camera, size=(640, 480))

# Create the background subtractor object
# Feel free to modify the history as you see fit.
back_sub = cv2.createBackgroundSubtractorMOG2(history=150,
  varThreshold=25, detectShadows=True)

# Wait a certain number of seconds to allow the camera time to warmup
time.sleep(0.1)

# Create kernel for morphological operation. You can tweak
# the dimensions of the kernel.
# e.g. instead of 20, 20, you can try 30, 30
kernel = np.ones((20,20),np.uint8)

# Capture frames continuously from the camera
for frame in camera.capture_continuous(raw_capture, format="bgr", use_video_port=True):
    
    # Grab the raw NumPy array representing the image
    image = frame.array

    # Convert to foreground mask
    fg_mask = back_sub.apply(image)
    
    # Close gaps using closing
    fg_mask = cv2.morphologyEx(fg_mask,cv2.MORPH_CLOSE,kernel)
      
    # Remove salt and pepper noise with a median filter
    fg_mask = cv2.medianBlur(fg_mask,5)
      
    # If a pixel is less than ##, it is considered black (background). 
    # Otherwise, it is white (foreground). 255 is upper limit.
    # Modify the number after fg_mask as you see fit.
    _, fg_mask = cv2.threshold(fg_mask, 127, 255, cv2.THRESH_BINARY)

    # Find the contours of the object inside the binary image
    contours, hierarchy = cv2.findContours(fg_mask,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[-2:]
    areas = [cv2.contourArea(c) for c in contours]
 
    # If there are no countours
    if len(areas) < 1:
 
      # Display the resulting frame
      cv2.imshow('Frame',image)
 
      # Wait for keyPress for 1 millisecond
      key = cv2.waitKey(1) & 0xFF
 
      # Clear the stream in preparation for the next frame
      raw_capture.truncate(0)
    
      # If "q" is pressed on the keyboard, 
      # exit this loop
      if key == ord("q"):
        break
    
      # Go to the top of the for loop
      continue
 
    else:
        
      # Find the largest moving object in the image
      max_index = np.argmax(areas)
      
    # Draw the bounding box
    cnt = contours[max_index]
    x,y,w,h = cv2.boundingRect(cnt)
    cv2.rectangle(image,(x,y),(x+w,y+h),(0,255,0),3)
 
    # Draw circle in the center of the bounding box
    x2 = x + int(w/2)
    y2 = y + int(h/2)
    cv2.circle(image,(x2,y2),4,(0,255,0),-1)
 
    # Print the centroid coordinates (we'll use the center of the
    # bounding box) on the image
    text = "x: " + str(x2) + ", y: " + str(y2)
    cv2.putText(image, text, (x2 - 10, y2 - 10),
      cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
         
    # Display the resulting frame
    cv2.imshow("Frame",image)
    
    # Wait for keyPress for 1 millisecond
    key = cv2.waitKey(1) & 0xFF
 
    # Clear the stream in preparation for the next frame
    raw_capture.truncate(0)
    
    # If "q" is pressed on the keyboard, 
    # exit this loop
    if key == ord("q"):
      break

# Close down windows
cv2.destroyAllWindows()

This method is more computationally-intensive than the previous method, but it handles shadows better. If you want to detect objects that are moving, this is a good method. If you want to detect objects that enter the field of view and then stay there, use the absolute difference method.

Here is the before:

Here is the after:

You can see that the algorithm detected that pen pretty well.

Unlike the absolute difference method which uses the same initial frame as the background until the program stops execution, with the background subtractor MOG2 method, the background image continually updates based on a certain number of previous frames (i.e. history) that you specify in the code.

That’s it. Keep building!

How to Set Up Real-Time Video Using OpenCV on Raspberry Pi 4

In this tutorial, I will show you how to install OpenCV on Raspberry Pi 4 and then get a real-time video stream going. OpenCV is a library that has a bunch of programming functions that enable us to do real-time computer vision.

Prerequisites

You have set up Raspberry Pi with the Raspbian Operating System.

You Will Need

Install the Raspberry Pi Camera Module

Let’s install the Raspberry Pi Camera Module. Here are the official instructions, but I’ll walk through the whole process below.

Grab the Raspberry Pi Camera’s plastic clip. (optional)

Remove the ribbon cable that is currently in there. (optional)

Replace that small cable with the longer ribbon cable that came with the Flex Extension Cable Set. (optional)

Open the Camera Serial Interface on the Raspberry Pi by taking your fingers, pinching either side, and pulling up. The Camera Serial Interface is labeled “CAMERA”.

Push the long piece of ribbon into the interface. The silver pieces of the ribbon should be facing towards the CAMERA label on the Raspberry Pi board.

Hold the ribbon in place with one finger while you push down on the door. The door should snap into place.

This is how it should look at this stage (Ignore the other stuff in the photo).

Lightly pull on the ribbon to make sure that it is in their snugly. It shouldn’t come out when you pull on it.

Configure the Raspberry Pi

We now need to make sure the Raspberry Pi is configured properly to use the camera.

Start the Raspberry Pi.

Open a fresh terminal window, and type the following command:

sudo raspi-config

Go to Interfacing Options and press Enter.

Select Camera and press Enter to enable the camera.

Press Enter.

Go to Advanced Options and press Enter.

Select Resolution and press Enter.

Select a screen resolution. I’m using 1920 x 1080.

Press Enter.

Go to Finish and press Enter.

Press Enter on <Yes> to reboot the Raspberry Pi.

Test the Raspberry Pi Camera Module

Open up a new terminal window. Let’s take a test photo by typing the following command:

raspistill -o Desktop/image.jpg

Your photo should be on your Desktop.

Here is how the camera looks when it is right side up. The black bar needs to be on top above the camera lens.

Install OpenCV

Let’s install OpenCV on our Raspberry Pi. I will largely be following this excellent tutorial. The process has a lot of steps, so go slow.

Update the packages by typing this command:

sudo apt-get update

sudo apt-get upgrade

Install these packages that assist in the compilation of the OpenCV code:

sudo apt install cmake build-essential pkg-config git

These packages provide the support that enable OpenCV to use different formats for both images and videos.

sudo apt install libjpeg-dev libtiff-dev libjasper-dev libpng-dev libwebp-dev libopenexr-dev

sudo apt install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev libxvidcore-dev libx264-dev libdc1394-22-dev libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev

Install some more OpenCV dependencies: You can learn about each of these packages by doing a search here at the Debian website:

sudo apt install libgtk-3-dev libqtgui4 libqtwebkit4 libqt4-test python3-pyqt5

Install these packages that help OpenCV run quickly on your Raspberry Pi.

sudo apt install libatlas-base-dev liblapacke-dev gfortran

Install the HDF5 packages that OpenCV will use to manage the data.

sudo apt install libhdf5-dev libhdf5-103

Install Python support packages.

sudo apt install python3-dev python3-pip python3-numpy

Increase the swap space. The swap space is the space that your operating system uses when RAM has reached its limit.

sudo nano /etc/dphys-swapfile

Locate this line:

CONF_SWAPSIZE=100

Change that to:

CONF_SWAPSIZE=2048

Save the file and close out using the following keystrokes, one after the other:

CTRL+X

Enter

Regenerate the swap file:

sudo systemctl restart dphys-swapfile

Get the latest version of OpenCV from GitHub.

git clone https://github.com/opencv/opencv.git

git clone https://github.com/opencv/opencv_contrib.git

Both of these commands above will take a while to finish executing, so be patient.

Now, we need to compile OpenCV. We’ll create a new directory for this purpose.

mkdir ~/opencv/build

cd ~/opencv/build

Generate the makefile. Copy and paste this entire command below into your terminal and then press Enter.

cmake -D CMAKE_BUILD_TYPE=RELEASE \
    -D CMAKE_INSTALL_PREFIX=/usr/local \
    -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
    -D ENABLE_NEON=ON \
    -D ENABLE_VFPV3=ON \
    -D BUILD_TESTS=OFF \
    -D INSTALL_PYTHON_EXAMPLES=OFF \
    -D OPENCV_ENABLE_NONFREE=ON \
    -D CMAKE_SHARED_LINKER_FLAGS=-latomic \
    -D BUILD_EXAMPLES=OFF ..

Compile OpenCV using the command below. -j$(nproc) makes sure that all of the processors that are available to us are used for the compilation. This speeds things up:

make -j$(nproc)

Wait for OpenCV to compile. It will take a while, so you can go to lunch or do something else, and then return. My Raspberry Pi took about an hour to finish everything.

Once that compilation process has completed, you need to make sure the files get installed.

sudo make install

Run this command so that Raspberry Pi can find OpenCV.

sudo ldconfig

Now, let’s go back to the swapfile and reset the size to something smaller.

sudo nano /etc/dphys-swapfile

Locate this line:

CONF_SWAPSIZE=2048

Change that to:

CONF_SWAPSIZE=100

Save the file and close out using the following keystrokes:

CTRL+X

Enter

Restart swap.

sudo systemctl restart dphys-swapfile

Now, make sure that picamera is installed. picamera is a Python package that enables your camera to interface with your Python code. This second array module that we’re installing enables us to use OpenCV.

pip3 install picamera

pip3 install "picamera[array]"

Test OpenCV

Launch python.

python3

Import OpenCV

import cv2

Check to see which version of OpenCV you have installed.

cv2.__version__

Here is what you should see:

To close out the window, type:

exit()

Capture Real-Time Video

Now, go to the Python IDE in your Raspberry Pi by clicking the logo -> Programming -> Thonny Python IDE.

Write the following code to capture a live video feed. Credit to Dr. Adrian Rosebrock for this source code. I’ll name the file test_video_capture.py.

Here is the code:

# Credit: Adrian Rosebrock
# https://www.pyimagesearch.com/2015/03/30/accessing-the-raspberry-pi-camera-with-opencv-and-python/

# import the necessary packages
from picamera.array import PiRGBArray # Generates a 3D RGB array
from picamera import PiCamera # Provides a Python interface for the RPi Camera Module
import time # Provides time-related functions
import cv2 # OpenCV library

# Initialize the camera
camera = PiCamera()

# Set the camera resolution
camera.resolution = (640, 480)

# Set the number of frames per second
camera.framerate = 32

# Generates a 3D RGB array and stores it in rawCapture
raw_capture = PiRGBArray(camera, size=(640, 480))

# Wait a certain number of seconds to allow the camera time to warmup
time.sleep(0.1)

# Capture frames continuously from the camera
for frame in camera.capture_continuous(raw_capture, format="bgr", use_video_port=True):
    
    # Grab the raw NumPy array representing the image
    image = frame.array
    
    # Display the frame using OpenCV
    cv2.imshow("Frame", image)
    
    # Wait for keyPress for 1 millisecond
    key = cv2.waitKey(1) & 0xFF
    
    # Clear the stream in preparation for the next frame
    raw_capture.truncate(0)
    
    # If the `q` key was pressed, break from the loop
    if key == ord("q"):
        break

When you’re ready, click Run. Here is what my output was:

That’s it for this tutorial. Keep building!