How to Detect Pedestrians in Images and Video Using OpenCV

In this tutorial, we will write a program to detect pedestrians in a photo and a video using a technique called the Histogram of Oriented Gradients (HOG). We will use the OpenCV computer vision library, which has a built-in pedestrian detection method that is based on the original research paper on HOG

I won’t go into the details and the math behind HOG (you don’t need to know the math in order to implement the algorithm), but if you’re interested in learning about what goes on under the hood, check out that research paper.


Real-World Applications

  • Self-Driving Cars


Install OpenCV

The first thing you need to do is to make sure you have OpenCV installed on your machine. If you’re using Anaconda, you can type this command into the terminal:

conda install -c conda-forge opencv

Alternatively, you can type:

pip install opencv-python

Detect Pedestrians in an Image

Get an image that contains pedestrians and put it inside a directory somewhere on your computer. 

Here are a couple images that I will use as test cases:


Inside that same directory, write the following code. I will save my file as

import cv2 # Import the OpenCV library to enable computer vision

# Author: Addison Sears-Collins
# Description: Detect pedestrians in an image using the 
#   Histogram of Oriented Gradients (HOG) method

# Make sure the image file is in the same directory as your code
filename = 'pedestrians_1.jpg'

def main():

  # Create a HOGDescriptor object
  hog = cv2.HOGDescriptor()
  # Initialize the People Detector
  # Load an image
  image = cv2.imread(filename)
  # Detect people
  # image: Source image
  # winStride: step size in x and y direction of the sliding window
  # padding: no. of pixels in x and y direction for padding of sliding window
  # scale: Detection window size increase coefficient	
  # bounding_boxes: Location of detected people
  # weights: Weight scores of detected people
  (bounding_boxes, weights) = hog.detectMultiScale(image, 
                                                   winStride=(4, 4),
                                                   padding=(8, 8), 

  # Draw bounding boxes on the image
  for (x, y, w, h) in bounding_boxes: 
                  (x, y),  
                  (x + w, y + h),  
                  (0, 0, 255), 
  # Create the output file name by removing the '.jpg' part
  size = len(filename)
  new_filename = filename[:size - 4]
  new_filename = new_filename + '_detect.jpg'
  # Save the new image in the working directory
  cv2.imwrite(new_filename, image)
  # Display the image 
  cv2.imshow("Image", image) 
  # Display the window until any key is pressed
  # Close all windows


Run the code.


Image Results

Here is the output for the first image:


Here is the output for the second image:


Detect Pedestrians in a Video

Now that we know how to detect pedestrians in an image, let’s detect pedestrians in a video.

Find a video file that has pedestrians in it. You can check free video sites like Pixabay if you don’t have any videos.

The video should have dimensions of 1920 x 1080 for this implementation. If you have a video of another size, you will have to tweak the parameters in the code.

Place the video inside a directory.

Now let’s download a library that will apply a fancy mathematical technique called non-maxima suppression to take multiple overlapping bounding boxes and compress them into just one bounding box.

pip install --upgrade imutils

Also, make sure you have NumPy installed, a scientific computing library for Python.

If you’re using Anaconda, you can type:

conda install numpy

Alternatively, you can type:

pip install numpy

Inside that same directory, write the following code. I will save my file as We will save the output as an .mp4 video file:

# Author: Addison Sears-Collins
# Description: Detect pedestrians in a video using the 
#   Histogram of Oriented Gradients (HOG) method

import cv2 # Import the OpenCV library to enable computer vision
import numpy as np # Import the NumPy scientific computing library
from imutils.object_detection import non_max_suppression # Handle overlapping

# Make sure the video file is in the same directory as your code
filename = 'pedestrians_on_street_1.mp4'
file_size = (1920,1080) # Assumes 1920x1080 mp4
scale_ratio = 1 # Option to scale to fraction of original size. 

# We want to save the output to a video file
output_filename = 'pedestrians_on_street.mp4'
output_frames_per_second = 20.0 

def main():

  # Create a HOGDescriptor object
  hog = cv2.HOGDescriptor()
  # Initialize the People Detector
  # Load a video
  cap = cv2.VideoCapture(filename)

  # Create a VideoWriter object so we can save the video output
  fourcc = cv2.VideoWriter_fourcc(*'mp4v')
  result = cv2.VideoWriter(output_filename,  
  # Process the video
  while cap.isOpened():
    # Capture one frame at a time
    success, frame = 
    # Do we have a video frame? If true, proceed.
    if success:
    	# Resize the frame
      width = int(frame.shape[1] * scale_ratio)
      height = int(frame.shape[0] * scale_ratio)
      frame = cv2.resize(frame, (width, height))
      # Store the original frame
      orig_frame = frame.copy()
      # Detect people
      # image: a single frame from the video
      # winStride: step size in x and y direction of the sliding window
      # padding: no. of pixels in x and y direction for padding of 
      #	sliding window
      # scale: Detection window size increase coefficient	
      # bounding_boxes: Location of detected people
      # weights: Weight scores of detected people
      # Tweak these parameters for better results
      (bounding_boxes, weights) = hog.detectMultiScale(frame, 
                                                       winStride=(16, 16),
                                                       padding=(4, 4), 

      # Draw bounding boxes on the frame
      for (x, y, w, h) in bounding_boxes: 
            (x, y),  
            (x + w, y + h),  
            (0, 0, 255), 
      # Get rid of overlapping bounding boxes
      # You can tweak the overlapThresh value for better results
      bounding_boxes = np.array([[x, y, x + w, y + h] for (
                                x, y, w, h) in bounding_boxes])
      selection = non_max_suppression(bounding_boxes, 
      # draw the final bounding boxes
      for (x1, y1, x2, y2) in selection:
                     (x1, y1), 
                     (x2, y2), 
                     (0, 255, 0), 
      # Write the frame to the output video file
      # Display the frame 
      cv2.imshow("Frame", frame) 	

      # Display frame for X milliseconds and check if q key is pressed
      # q == quit
      if cv2.waitKey(25) & 0xFF == ord('q'):
    # No more video frames left
  # Stop when the video is finished
  # Release the video recording
  # Close all windows


Run the code.


Video Results

Here is a video of the output:

How To Convert a Quaternion Into Euler Angles in Python

Given a quaternion of the form  (x, y, z, w) where w is the scalar (real) part and x, y, and z are the vector parts, how do we convert this quaternion into the three Euler angles:

  • Rotation about the x axis = roll angle = α
  • Rotation about the y-axis = pitch angle = β
  • Rotation about the z-axis = yaw angle = γ

Doing this operation is important because ROS2 (and ROS) uses quaternions as the default representation for the orientation of a robot in 3D space. Roll, pitch, and yaw angles are a lot easier to understand and visualize than quaternions.

Here is the Python code:

import math

def euler_from_quaternion(x, y, z, w):
		Convert a quaternion into euler angles (roll, pitch, yaw)
		roll is rotation around x in radians (counterclockwise)
		pitch is rotation around y in radians (counterclockwise)
		yaw is rotation around z in radians (counterclockwise)
		t0 = +2.0 * (w * x + y * z)
		t1 = +1.0 - 2.0 * (x * x + y * y)
		roll_x = math.atan2(t0, t1)
		t2 = +2.0 * (w * y - z * x)
		t2 = +1.0 if t2 > +1.0 else t2
		t2 = -1.0 if t2 < -1.0 else t2
		pitch_y = math.asin(t2)
		t3 = +2.0 * (w * z + x * y)
		t4 = +1.0 - 2.0 * (y * y + z * z)
		yaw_z = math.atan2(t3, t4)
		return roll_x, pitch_y, yaw_z # in radians


Suppose a robot is on a flat surface. It has the following quaternion:

Quaternion [x,y,z,w] = [0, 0, 0.7072, 0.7072]

What is the robot’s orientation in Euler Angle representation in radians?


The program shows that the roll, pitch, and yaw angles in radians are (0.0, 0.0, 1.5710599372799763).


Which is the same as:

Euler Angle (roll, pitch, yaw) = (0.0, 0.0, π/2)

And in Axis-Angle Representation, the angle is:

Axis-Angle {[x, y, z], angle} = { [ 0, 0, 1 ], 1.571 }

So we see that the robot is rotated π/2 radians (90 degrees) around the z axis (going counterclockwise). 

And that’s all there is to it folks. That’s how you convert a quaternion into Euler angles.

You can use the code in this tutorial for your work in ROS2 since, as of this writing, the tf.transformations.euler_from_quaternion method isn’t available for ROS2 yet. 

How to Make a Mobile Robot in Gazebo (ROS2 Foxy)

In this tutorial, we will learn how to make a model of a mobile robot in Gazebo from scratch. Our simulated robot will be a wheeled mobile robot. It will have two big wheels on each side and a caster wheel in the middle. Here is what you will build. In this case, I have the robot going in reverse (i.e. the big wheels are on the front of the vehicle):



Setup the Model Directory

Here are the official instructions, but we’ll walk through all the steps below. It is important to go slow and build your robotic models in small steps. No need to hurry. 

Create a folder for the model.

mkdir -p ~/.gazebo/models/my_robot

Create a model config file. This file will contain a description of the model.

gedit ~/.gazebo/models/my_robot/model.config

Add the following lines to the file, and then Save. You can see this file contains fields for the name of the robot, the version, the author (that’s you), your e-mail address, and a description of the robot.

<?xml version="1.0"?>
  <name>My Robot</name>
  <sdf version='1.4'>model.sdf</sdf>

   <name>My Name</name>

    My awesome robot.

Close the config file.

Now, let’s create an SDF (Simulation Description Format) file. This file will contain the tags that are needed to create an instance of the my_robot model. 

gedit ~/.gazebo/models/my_robot/model.sdf

Copy and paste the following lines inside the sdf file.

<?xml version='1.0'?>
<sdf version='1.4'>
  <model name="my_robot">

Save the file, but don’t close it yet.

Create the Structure of the Model

Now we need to create the structure of the robot. We will start out by adding basic shapes. While we are creating the robot, we want Gazebo’s physics engine to ignore the robot. Otherwise the robot will move around the environment as we add more stuff on it.

To get the physics engine to ignore the robot, add this line underneath the <model name=”my_robot”> tag.


This is what you’re sdf file should look like at this stage:


Now underneath the <static>true</static> line, add these lines:

          <link name='chassis'>
            <pose>0 0 .1 0 0 0</pose>

            <collision name='collision'>
                  <size>.4 .2 .1</size>

            <visual name='visual'>
                  <size>.4 .2 .1</size>

Here is how your sdf file should look now. 


Click Save and close the file.

What you have done is create a box. The name of this structure (i.e. link) is ‘chassis’. A chassis for a mobile robot is the frame (i.e. skeleton) of the vehicle.

The pose (i.e. position and orientation) of the geometric center of this box will be a position of (x = 0 meters, y = 0 meters, z = 0.1 meters) and an orientation of (roll = 0 radians, pitch = 0 radians, yaw = 0 radians). The pose of the chassis (as well as the post of any link in an sdf file) is defined relative to the model coordinate frame (which is ‘my_robot’).


  • Roll (rotation about the x-axis)
  • Pitch (rotation about the y-axis)
  • Yaw (rotation about the z-axis)  

Inside the collision element of the sdf file, you specify the shape (i.e. geometry) that Gazebo’s collision detection engine will use. In this case, we want the collision detection engine to represent our vehicle as a box that is 0.4 meters in length, 0.2 meters in width, and 0.1 meters in height. 

Inside the visual tag, we place the shape that Gazebo’s rendering engine will use to display the robot. 

In most cases, the visual tag will be the same as the collision tag, but in some cases it is not.

For example, if your robot car has some complex geometry that looks like the toy Ferrari car below, you might model the collision physics of that body as a box but would use a custom mesh to make the robot model look more realistic, like the figure below.


Now let’s run Gazebo so that we can see our model. Type the following command:


On the left-hand side, click the “Insert” tab.

On the left panel, click “My Robot”. You should see a white box. You can place it wherever you want.


Go back to the terminal window, and type CTRL + C to close Gazebo.

Let’s add a caster wheel to the robot. The caster wheel will be modeled as a sphere with a radius of 0.05 meters and no friction.

Relative to the geometric center of the white box (i.e. the robot chassis), the center of the spherical caster wheel will be located at x = -0.15 meters, y = 0 meters, and z = -0.05 meters.

Note that when we create a box shape, the positive x-axis points towards the front of the vehicle (i.e. the direction of travel). The x-axis is that red line below.

The positive y-axis points out the left side of the chassis. It is the green line below. The positive z-axis points straight upwards towards the sky.

gedit ~/.gazebo/models/my_robot/model.sdf

Add these lines after the first </visual> tag but before the </link> tag.

          <collision name='caster_collision'>
            <pose>-0.15 0 -0.05 0 0 0</pose>


          <visual name='caster_visual'>
            <pose>-0.15 0 -0.05 0 0 0</pose>

Click Save and close the file.

Now relaunch gazebo, and insert the model.


Insert -> My Robot

Go back to the terminal window, and close gazebo by typing CTRL + C.

You can now see our robot has a brand new caster wheel located towards the rear of the vehicle.


Now let’s add the left wheel. We will model this as a cylinder with a radius of 0.1 meters and a length of 0.05 meters.

gedit ~/.gazebo/models/my_robot/model.sdf

Right after the </link> tag (but before the </model> tag, add the following lines. You can see that the wheel is placed at the front of the vehicle (x=0.1m), the left side of the vehicle (y=0.13m), and 0.1m above the surface (z=0.1m). It is a cylinder that is rotated 90 degrees (i.e. 1.5707 radians) around the model’s y axis and 90 degrees around the model’s z axis.

     <link name="left_wheel">
        <pose>0.1 0.13 0.1 0 1.5707 1.5707</pose>
        <collision name="collision">
        <visual name="visual">

Launch Gazebo, and see how it looks. You can click the box at the top of the panel to see the robot from different perspectives.


Now, let’s add a right wheel. Insert these lines into your sdf file right before the </model> tag. 

    <link name="right_wheel">
        <pose>0.1 -0.13 0.1 0 1.5707 1.5707</pose>
        <collision name="collision">
        <visual name="visual">

Save it, close it, and open Gazebo to see your robot. Your robot has a body, a spherical caster wheel, and two big wheels on either side. 


Now let’s change the robot from static to dynamic. We need to add two revolute joints, one for the left wheel and one for the right wheel. Revolute joints (also known as “hinge joints”) are your wheels motors. These motors generate rotational motion to make the wheels move.

In the sdf file, first change this:


To this


Then add these lines right before the closing </model> tag.

      <joint type="revolute" name="left_wheel_hinge">
        <pose>0 0 -0.03 0 0 0</pose>
          <xyz>0 1 0</xyz>

      <joint type="revolute" name="right_wheel_hinge">
        <pose>0 0 0.03 0 0 0</pose>
          <xyz>0 1 0</xyz>

<child> defines the name of this joint’s child link.

<parent> defines the parent link.

<pose> describes the offset from the the origin of the child link in the frame of the child link.

<axis> defines the joint’s axis specified in the parent model frame. This axis is the axis of rotation. In this case, the two joints rotate about the y axis.

<xyz> defines the x, y, and z components of the normalized axis vector. 

You can learn more about sdf tags at the official website.

Save the sdf file, open Gazebo, and insert your model.

Click on your model to select it.

You will see six tiny dots on the right side of the screen. Click on those and drag your mouse to the left. It is a bit tricky to click on these as Gazebo can be quite sensitive. Just keep trying to drag those dots to the left using your mouse.

Under the “World -> Models” tab on the left, select the my_robot model. You should see two joints appear on the Joints tab on the right.


Under the Force tab, increase the force applied to each joint to 0.1 N-m. You should see your robot move around in the environment.

Congratulations! You have built a mobile robot in Gazebo.